text analysis

What kind of data should the tool work with?

Textable is an open source program for text analysis. It offers a set of basic text-analytic components (e.g. import text from files, segment into words, measure segment diversity, etc.), which the user combines using a visual interface to build custom analytic workflows.

Code license: GNU GPL v3
Last updated: 20 Aug 2017

DiscoverText allows users to import data from a variety of sources (including free and premium Gnip Twitter feeds, plain text, Word, Excel, public YouTube comments, blogs/wikis, PDF, etc.), to view, search, filter, deduplicate, code and machine classify the data. This is a collaborative, web-based platform widely used by academics.

Code license: Closed source
Last updated: 24 Feb 2017

AroniSmartIntelligence™ is an application that performs text analytics on RSS articles, reviews, feedback, chat data or other unstructured texts organized into sub-folders. The output may be further input into other advanced statistical analytics or data mining modules available in AroniSmartIntelligence™, including regression analysis, econometrics, segmentation and Bayesian models.

Code license: Closed source
Last updated: 18 Mar 2016

TAToo is an embeddable Flash widget that displays TAPOR analytics for the page on which it resides.

Code license: Apache License
Last updated: 23 Feb 2016

The TAPoR Portal is an online environment where users can keep track of texts they want to study (uploaded or available online), learn about and try different tools, and run tools on texts.

Last updated: 23 Feb 2016

A graphical user interface tool for Latent Dirichlet Allocation topic modeling.

Last updated: 17 Feb 2016

The MONK workbench provides 525 works of American literature from the 18th and 19th centuries, and 37 plays and 5 works of poetry by William Shakespeare, along with tools to enable literary research through the discovery, exploration, and visualization of patterns.

Users affiliated with CIC (Big Ten) schools can access a larger data set that includes about a thousand works of British literature from the 16th through the 19th century, provided by The Text Creation Partnership (EEBO and ECCO) and ProQuest (Chadwyck-Healey Nineteenth-Century Fiction).

Last updated: 12 Aug 2015

This product can filter or format text-based content. It also includes a document or link organiser and search capabilities and might more correctly be termed a text management system. With the large number of documents stored on your computer and online links that you might use, this is a helpful application that allows you to navigate the environment more easily. Although the feature set is now well developed, an inexperienced user should still be able to use it relatively easily. It is not intended only for the expert managers.

Code license: GNU GPL v3
Last updated: 15 Jun 2015

"Linguistic Inquiry and Word Count (LIWC) is a text analysis software program...LIWC is able to calculate the degree to which people use different categories of words across a wide array of texts." Free and limited web analysis available.

Last updated: 23 May 2015

Whatizit can ingest up to 500,000 terms pasted into the input box and execute any of the pre-defined text analysis pipelines.

Last updated: 23 May 2015

The Macro-Etymological Analyzer is a web app for text analysis that will look up every word of your text in the Etymological Wordnet, and generate statistics about the macro-etymology of your text, organized by language family. For instance, it can analyze a novel and tell you the proportions of words of Anglo-Saxon origin, or of Afroasiatic origin. First-generation and second-generation language ancestor data is included, and the output is highly granular, allowing the scholar to see the origins of individual words, and statistics about each ancestor language.

Code license: GNU GPL v3
Last updated: 20 May 2015

Lexos is an online tool that enables you to "scrub" (clean) your text(s), cut a text(s) into various size chunks, manage chunks and chunk sets, and choose from a suite of analysis tools for investigating those texts. Functionality includes building dendrograms, making graphs of rolling averages of word frequencies or ratios of words or letters, and playing with visualizations of word frequencies including word clouds and bubble visualizations.

Code license: Open source
Last updated: 17 May 2015

AntWordProfiler is free software for analyzing word frequency.

Last updated: 9 May 2015

Juxta is an open-source cross-platform desktop tool for comparing and collating multiple witnesses to a single textual work. The software allows you to set any of the witnesses as the base text, to add or remove witness texts, to switch the base text at will, and to annotate Juxta-revealed comparisons and save the results. New in version 1.6.5 is the ability to upload your comparison sets to a free online workspace called Juxta Commons where you can analyze your data privately or choose to share visualizations of your work with anyone on the web.

Code license: Open source, Creative Commons
Last updated: 4 May 2015

Text analysis software aimed at beginners to qualitative research, and using live visualizations as the interface. Quirkos supports standard code-and-retrieve operations, searches and queries on the data, and can visualize connections between topics and themes.

Find more information at http://www.quirkos.com/qualitative-data-analysis-software.html

Code license: Closed source
Last updated: 3 May 2015

A software tool for performing concordance – the analysis of a set of words within its immediate context - on a body of text. The tool performs full concordance, reading and analysing each and every word in a text. It was initially written for the analysis of English texts, but has since been extended to cater for other Western languages. Limited support is also provided for text in East Asian scripts, such as Chinese and Korean.

Features:

Code license: Closed source
Last updated: 11 Feb 2015

CATMA (Computer Aided Textual Markup & Analysis) is a free, open source markup and analysis tool from the University of Hamburg's Department of Languages, Literature and Media. It incorporates three interactive modules: (1) The tagger enables flexible and individual textual markup and markup editing. (2) The analyzer incorporates a query language and predefined functions. It also includes a query builder that allows users to construct queries from combinations of pre-defined questions while allowing for manual modification for more specific questions.

Code license: GNU GPL v3
Last updated: 29 Dec 2014

MONK is a digital environment designed to help humanities scholars discover and analyze patterns in the texts they study.

Last updated: 29 Dec 2014

HyperPo is a user-friendly text exploration and analysis program that allows users to import texts or use texts available online (in English or French), and provides frequency lists of characters, words and series of words, color-coding to indicate repetition, KWIC, co-occurrence and distribution lists, and the ability to simultaneously compare data from multiple texts.

Last updated: 29 Dec 2014

text analytic and data extraction framework: data and semantic analytics in a suite of business applications.

Last updated: 29 Dec 2014

Basis provides natural language processing technology for the analysis of unstructured multilingual text.

Last updated: 29 Dec 2014

The main programs that comprise the Information processor are called the analyst server and query or knowledge processor. The analyst program can be called from a command line, from an html form, or through a TCP/IP socket protocol. The query processor can be accessed with any browser using HTML commands. It analyzes text and allows the user to search it.

Code license: Closed source
Last updated: 29 Dec 2014

The Versioning Machine displays multiple versions of text encoded according to TEI Guidelines and allows for comparisons of annotation and introductory materials. This is a text editor and allows editors "to immediately see the consequences of their editorial decisions." This tool does not appear to have been updated since 2011.

Last updated: 29 Dec 2014

CollateX-based text collation client. CollateX, run on an server independent from the URL above, is a powerful, fully automatic, baseless text collation engine for multiple witnesses. A second collation technique, ncritic, provides a slightly different baseless text collation. Each engine complements each other nicely. The user can use different files, even URLs, then output the result in GraphML, TEI, JSON, HTML, or SVG. Fuzzy matching is an option.

Last updated: 29 Dec 2014

LATtice lets you explore and compare texts across entire corpora but also allows you to “drill down” to the level of individual LATs (language action types) to ask exactly what rhetorical categories make texts similar or different.

Last updated: 29 Dec 2014

Bookworm enables you to graphically explore lexical trends in repositories of digitized texts.

Code license: Open source
Last updated: 29 Dec 2014

Voyant Tools is a web-based reading and analysis environment for digital texts.

Code license: Open source
Last updated: 29 Dec 2014

Meld is a visual diff and merge tool targeted at developers. Meld helps you compare files, directories, and version controlled projects. It provides two- and three-way comparison of both files and directories, and has support for many popular version control systems.

Code license: Open source, GNU GPL v2
Last updated: 29 Dec 2014

Kaleidoscope is one of the world's best tools for spotting differences in images and text, and now it supports merging of files and folders, too. Kaleidoscope integrates directly with Git, Subversion, Mercurial, and Bazaar to fit perfectly in your workflow.

Last updated: 29 Dec 2014

The Tesserae project aims to provide a flexible and robust web interface for exploring intertextual parallels.

Last updated: 29 Dec 2014

TVE is an interactive Java tool for exploring the effect of window size on three common linguistic measures: type-token ratio, proportion of hapax legomena, and average word length. In addition, TVE can cluster the text fragments according to a user-given set of words by applying principal component analysis (PCA).

Last updated: 29 Dec 2014

Annotation Studio is an open source, web-based annotation application that integrates a powerful set of textual interpretation tools behind an intuitive and easy-to-use interface. Users can upload their own texts, and annotate with styled text, video, images, and weblinks. To date, the project has been used with great success in disciplines such as Writing, Literature, Foreign Languages, Anthropology, Film and Media Studies, and others at institutions including Harvard, Yale, Stanford, MIT, Barnard College, and Washington University.

Code license: Open source, GNU GPL, GNU GPL v2
Last updated: 29 Dec 2014
CSV
Subscribe to text analysis