Content Analysis

What kind of data should the tool work with?

Gephi is graphing software that provides a way to explore data through visualization and network analysis.

Code license: Open source, GNU GPL v3
Last updated: 15 Feb 2017

MyIndicators (http://myindicators.net/) is a digital, easy-to-use tool allows researchers, educators, students or anyone, to build their own tailored indicators (e.g. goals, strategies, parameters, survey, questions, calories intakes, alcohol consumptions or quantified self in terms of training, mood tracking or sleeping quality etc.)

Code license: Closed source
Last updated: 2 Sep 2016
Last updated: 2 Sep 2016

TXM

TXM is a free and open-source cross-platform Unicode, XML & TEI based text analysis software, supporting Windows, Mac OS X and Linux. It is also available as a J2EE standard compliant portal software (GWT based) for online access with access control built in (see a demo portal: http://portal.textometrie.org/demo).

Code license: Open source, GNU GPL v3
Last updated: 29 Jun 2016

TAToo is an embeddable Flash widget that displays TAPOR analytics for the page on which it resides.

Code license: Apache License
Last updated: 23 Feb 2016

Philomine is an extension to the Philologic text retrieval engine that supports a variety of machine learning, text mining, and document clustering tasks.

Code license: Open source, GNU GPL
Last updated: 22 Feb 2016

A graphical user interface tool for Latent Dirichlet Allocation topic modeling.

Last updated: 17 Feb 2016

Superfastmatch is designed to find exact duplicates of text strings between documents.

Code license: Open source, GNU GPL
Last updated: 1 Dec 2015

Unlock Text is a powerful geoparser that can search text hosted on the web in txt or html format for references to locations. These locations are then returned ready for use in your results page, web map or any other application.

The Unlock Text API provides access to two parsers, the Edinburgh Geoparser from the Edinburgh Language Technology Group and the CLAVIN parser.

Code license: Open source
Last updated: 19 Nov 2015

CulturalAnalytics is an R package containing functions for statistical analysis and plotting of image properties, including statistics such as the standard deviation and mean in the RGB and HSV color spaces, image entropy and histograms in greyscale (intensity) and color, and for plotting color clouds and image scatter charts.

Code license: Open source, GNU GPL
Last updated: 12 Nov 2015

corpkit is a tool for doing corpus linguistics.

It does a lot of the usual things, like parsing, concordancing and keywording, but also extends their potential significantly: you can concordance by searching for combinations of lexical and grammatical features, and can do keywording of lemmas, of subcorpora compared to corpora, or of words in certain positions within clauses.

Corpus interrogations can be quickly edited and visualised in complex ways, or saved and loaded within projects, or exported to formats that can be handled by other tools.

Code license: MIT License
Last updated: 30 Oct 2015

NVivo is commercial software for qualitative analysis of unstructured data, in a range of formats and from diverse sources. Enables users to collect, organize, and analyze content from interviews, focus group discussions, surveys, audio, social media, videos, and webpages.

Code license: Closed source
Last updated: 30 Oct 2015

corpkit is a tool for doing corpus linguistics.

It does a lot of the usual things, like parsing, concordancing and keywording, but also extends their potential significantly: you can concordance by searching for combinations of lexical and grammatical features, and can do keywording of lemmas, of subcorpora compared to corpora, or of words in certain positions within clauses.

Corpus interrogations can be quickly edited and visualised in complex ways, or saved and loaded within projects, or exported to formats that can be handled by other tools.

Code license: MIT License
Last updated: 5 Oct 2015

corpkit is a tool for doing corpus linguistics.

It does a lot of the usual things, like parsing, concordancing and keywording, but also extends their potential significantly: you can concordance by searching for combinations of lexical and grammatical features, and can do keywording of lemmas, of subcorpora compared to corpora, or of words in certain positions within clauses.

Corpus interrogations can be quickly edited and visualised in complex ways, or saved and loaded within projects, or exported to formats that can be handled by other tools.

Code license: MIT License
Last updated: 5 Oct 2015

Aimed at the TEI editing community and intended to be run inside oXygen, the Data Dictionary Generator (DDG) generates profiles of every element and attribute appearing in a TEI file. Each entry includes a definition from the TEI Guidelines, a local, project-specific definition (if provided), and a brief snapshot of how the element or attribute is actually being used. By making it easy to compare these three things, the DDG aims to help project editors reflect on current practice within their projects and quickly create stronger encoding guidelines for their collaborators.

Last updated: 2 Oct 2015

Aimed at the TEI editing community and intended to be run inside oXygen, the Data Dictionary Generator (DDG) generates profiles of every element and attribute appearing in a TEI file. Each entry includes a definition from the TEI Guidelines, a local, project-specific definition (if provided), and a brief snapshot of how the element or attribute is actually being used. By making it easy to compare these three things, the DDG aims to help project editors reflect on current practice within their projects and quickly create stronger encoding guidelines for their collaborators.

Last updated: 28 Sep 2015

nodegoat is a web-based data management, analysis & visualisation environment.

Using nodegoat, you can define, create, update, query, and manage any number of datasets by use of a graphic user interface. Your custom data model autoconfigures the backbone of notegoat's core functionalities.

Code license: Closed source
Last updated: 17 Aug 2015

Bibliopedia will perform advanced data-mining and cross-referencing of scholarly literature to create a humanities-centered collaboratory. As a prototype, it will search resources including JSTOR and Library of Congress for metadata about scholarly articles and books that mention the famed medieval travel narrative The Travels of Sir John Mandeville, examine the articles and books for citations, then save the results in a publicly accessible database.

Code license: Open source
Last updated: 2 Jul 2015

This product can filter or format text-based content. It also includes a document or link organiser and search capabilities and might more correctly be termed a text management system. With the large number of documents stored on your computer and online links that you might use, this is a helpful application that allows you to navigate the environment more easily. Although the feature set is now well developed, an inexperienced user should still be able to use it relatively easily. It is not intended only for the expert managers.

Code license: GNU GPL v3
Last updated: 15 Jun 2015

A text-mining system for scientific literature. Textpresso's two major elements are (1) access to full text, so that entire articles can be searched, and (2) introduction of categories of biological concepts and classes that relate to objects (e.g., association, regulation, etc.) or describe one (e.g., methods, etc).

Code license: Open source
Last updated: 28 May 2015

AnSWR supports qualitative analysis of word-based data. This entails a set of methods for organizing, displaying, processing, summarizing, and interpreting information.

Last updated 9/23/2005.

Only available for Windows 2000 and Windows XP.

Last updated: 24 May 2015

Weft QDA is a free and open-source tool for the analysis of textual data. You may import documents from plain text or PDF, apply character-level coding, category and document memos, retrieve coded text, apply simple coding statistics, apply free-text search, and export to HTML and CSV formats.

Last updated: 23 May 2015

HyperRESEARCH enables users to code and retrieve, build theories, and conduct analyses of your data. You may work with text, graphics, audio and video sources.

Last updated: 23 May 2015

Qualrus is an innovative qualitative data analysis tool that helps you manage unstructured data. Additionally, Qualrus learns your coding trends, provides a visual semantic network display, and gives advice and technical support.

Last updated: 22 May 2015

The Macro-Etymological Analyzer is a web app for text analysis that will look up every word of your text in the Etymological Wordnet, and generate statistics about the macro-etymology of your text, organized by language family. For instance, it can analyze a novel and tell you the proportions of words of Anglo-Saxon origin, or of Afroasiatic origin. First-generation and second-generation language ancestor data is included, and the output is highly granular, allowing the scholar to see the origins of individual words, and statistics about each ancestor language.

Code license: GNU GPL v3
Last updated: 20 May 2015

A website that explains statistical concepts and provides a web-based environment for performing those calculations. Tools include a graph maker, distribution generators, t-tests and procedures, and correlation and regression tests. All tools have been written in Javascript and run within the browser.

Code license: Closed source
Last updated: 14 May 2015

AntWordProfiler is free software for analyzing word frequency.

Last updated: 9 May 2015

Cross-platform app for analyzing text, video, and spreadsheet data (analyzing qualitative, quantitative, and mixed methods research)

Last updated: 2 May 2015

ANTHROPAC is a menu-driven DOS program for collecting and analyzing data on cultural domains. The program assists with the collection and analysis of structured qualitative and quantitative data, and provides analytical and multivariate tools.

Last updated: 2 May 2015

Leximancer is text analysis software that can create topic and concept based network visualizations and includes a sentiment analyzer.

Last updated: 2 May 2015

ScraperWiki is an online tool to make that makes the process of data scraping simpler and more collaborative. Anyone can write a screen scraper using the online editor. In the free version, the code and data are shared with the world. Because it's a wiki, other programmers can contribute to and improve the code.

Code license: GPL
Last updated: 1 May 2015

This package allows users to train topic models in MALLET and load results directly into R.

Code license: Open source, MIT License
Last updated: 25 Mar 2015

TAMS Analyzer is a program that works with TAMS to let you assign ethnographic codes to passages of a text just by selecting the relevant text and double clicking the name of the code on a list. It then allows you to extract, analyze, and save coded information.

Code license: Open source, GNU GPL
Last updated: 24 Mar 2015

AntConc is free concordance software. It is multi-platform and easy to deploy and use.

AntConc is part of a suite of related tools for text processing and analysis, including applications for parallel corpus analysis, word profiling, PDF to text conversion, text structure analysis, detecting and converting character encodings, Japanese and Chinese segmenter and tokenizer, wordclass tagger, and spelling variant anaysis. The developer is currently drafting a more explicit licence for the use of the software.

Last updated: 11 Feb 2015

MONK is a digital environment designed to help humanities scholars discover and analyze patterns in the texts they study.

Last updated: 29 Dec 2014

The Visual Understanding Environment (VUE) is concept mapping software that can integrate with multiple repositories to pull in, organize, and analyze data. Multiple features for advanced management of digital resources for teaching, learning, and research.

Last updated: 29 Dec 2014

The main programs that comprise the Information processor are called the analyst server and query or knowledge processor. The analyst program can be called from a command line, from an html form, or through a TCP/IP socket protocol. The query processor can be accessed with any browser using HTML commands. It analyzes text and allows the user to search it.

Code license: Closed source
Last updated: 29 Dec 2014

Software for creating data dashboards. Many of the sample galleries portray corporate financial data.

Last updated: 29 Dec 2014

Pliny is a scholarly note-taking and annotation tool. It may be used with both digital (web pages, images, PDF files) and non-digital (books, printed articles) materials, run as a desktop application on the user's computer. Pliny is useful for taking and managing annotations and notes while reading, as well as subsequently developing and presenting an interpretation.

Last updated: 29 Dec 2014

Voyant Tools is a web-based reading and analysis environment for digital texts.

Code license: Open source
Last updated: 29 Dec 2014

Korbo is a powerful aggregation platform for gathering Linked Data objects relevant to your area of research into single workspaces or “baskets”.

Korbo is targeted primarily at developers who want to build applications on top of its API and make full use of the linked cultural data from sources such as Europeana, FreeBase and DBPedia.

Korbo is currently in the early stages of development, but you can already try out a demo version of the platform.

Code license: Open source, GNU GPL
Last updated: 29 Dec 2014

Umigon is a free tool for sentiment analysis on Twitter.

Main features:

  1. Export to Excel and csv
  2. Distinction between sentiments ("I hate war", will be classified as negative sentiment) and negative factuals ("war has been declared", will be declared as neutral)
  3. Connects to twitter or allows free text input



The developer of Umigon can be reached on Twitter.

Code license: Apache License
Last updated: 29 Dec 2014
CSV
Subscribe to Content Analysis