Text

What kind of data should the tool work with?

HEURIST (http://HeuristNetwork.org) is an extremely flexible, end-user oriented, web-based data management system designed specifically for Humanities data. Developed since 2005, it has been in active use across many projects since 2009. It is available both as a free web service for researchers (hosted at the University of Sydney Data Centre) or for installation on a physical or virtual server (Open Source on gitHub).

Researchers can design, create, manage, analyse, visualise and publish their own richly-structured database(s) through a simple web interface, without the need for a programmer(s). Quite complex databases can be built in a few hours by borrowing structures and vocabularies published by other users. Databases can be designed and built incrementally, as existing data are not affected by changes in structure. Databases created by Heurist are stored in MySQL with a repeatable structure facilitating independant access by other software.

Advanced features include record linking, graph structure, drill-down facet searches, rule-based queries, custom reports, linked map-timelines, network visualisation, normalised spreadsheet import, crosstabulation, XML feeds, XSLT transforms. The team provides initial email and skype assistance for project setup at no cost, and special customisations at modest cost.

Code license: Open source, GNU GPL, GNU GPL v3
Last updated: 13 Oct 2017

DiscoverText allows users to import data from a variety of sources (including free and premium Gnip Twitter feeds, plain text, Word, Excel, public YouTube comments, blogs/wikis, PDF, etc.), to view, search, filter, deduplicate, code and machine classify the data. This is a collaborative, web-based platform widely used by academics.

Code license: Closed source
Last updated: 24 Feb 2017

Gephi is graphing software that provides a way to explore data through visualization and network analysis.

Code license: Open source, GNU GPL v3
Last updated: 15 Feb 2017
Code license: Creative Commons
Last updated: 10 Jan 2017

Recogito is an online platform for collaborative document annotation.

Recogito provides a personal workspace where you can upload, collect and organize your source materials - texts and images - and collaborate in their annotation and interpretation. Recogito enables you to make your work more visible on the Web more easily, and to expose the results of your research as Open Data.

Code license: Open source, Apache License
Last updated: 21 Dec 2016

Jotform allows users to create web forms (for surveys, etc.) using a drag-and-drop interface.

Code license: Closed source
Last updated: 10 Aug 2016

EPPT allows users to encode image-based scholarly editions without having to know XML syntax. It automates or semi-automates repeating attributes, and provides templates to reduce errors and accelerate the encoding process.

Last updated: 9 Aug 2016

Scripto is an open-source tool for community transcription of documents, images, and multimedia files. Registered users are permitted to view digital files and transcribe them with an easy-to-use toolbar. The tool includes a versioning history and editorial controls to make public contributions more manageable, and supports the transcription of a wide range of file types.

Code license: Open source
Last updated: 11 Jul 2016

Recollection is a platform developed by Zepheira for the Library of Congress National Digital Information Infrastructure and Preservation Program (NDIIPP), allowing users to create and share embeddable interfaces to digital cultural heritage collections. The Library of Congress released its latest version of Recollection as Viewshare, built to increase the ease of finding, using, and sharing the project's software.

Code license: Open source, MIT License
Last updated: 6 Jul 2016

TXM

TXM is a free and open-source cross-platform Unicode, XML & TEI based text analysis software, supporting Windows, Mac OS X and Linux. It is also available as a J2EE standard compliant portal software (GWT based) for online access with access control built in (see a demo portal: http://portal.textometrie.org/demo).

Code license: Open source, GNU GPL v3
Last updated: 29 Jun 2016

IBM AeroText is an information extraction system for developing knowledge-based content analysis applications.

Last updated: 15 Jun 2016

The DataTank is an open source tool that publishes data, stored in text-based files (e.g., CSV, XML, JSON) or in binary structures (e.g., SHP files, relational databases). The DataTank reads data from these structures and publishes them to the web using a URI as an identifier, providing these data in any format a user wants regardless of the original data structure. The DataTank requires a server with Apache2 or Nginx, mod rewrite enabled, PHP 5.4 or higher, Git, any database supported by Laravel 4.

Features

Last updated: 7 Jun 2016

Quadrigram describes itself as a "visual programming environment" for living data. It is a web-based tool for data visualization that allows the user to customize and publish interactive visualizations with a range of data types. Visualization possibilities range from basic charts and graphs (e.g., pie chart, bar graph), to more sophisticated visualizations for exploring complex datasets (e.g., networks, geo-data, zoomable tree map, quadrification, stacked flow).

Code license: Closed source
Last updated: 22 May 2016

The HathiTrust Research Center (HTRC) provides research access to the public domain text of the HathiTrust Digital Library. The HTRC is a collaborative research center launched jointly by Indiana University and the University of Illinois, along with the HathiTrust Digital Library, to help meet the technical challenges of dealing with massive amounts of digital text that researchers face by developing cutting-edge software tools and cyberinfrastructure to enable advanced computational access to the growing digital record of human knowledge.

Last updated: 22 May 2016

An (optical character recognition) engine for creating editable and searchable electronic files from scanned paper documents, PDFs and digital photographs.
Features:

  • Recognition of Digital Camera and Mobile Phone Camera Images
  • Comprehensive Language Support
  • Complete Integration with Popular Office Applications
  • PDF conversion, archiving and security
Code license: Closed source
Last updated: 17 May 2016

Part-of-Speech (POS) tagging software for English - the classification of words into one or more categories based upon its definition, relationship with other words, or other context, also known as wordclass tagging. CLAWS (Constituent Likelihood Automatic Word-tagging System) uses several methods to identify parts of speech., most notably a system called Hidden Markov models (HMMs) which involve counting examples of co-occurrence of words and wordclasses in training data and making a table of the probabilities of certain sequences of words.

Features:

Code license: Closed source
Last updated: 3 May 2016

DM is an environment for the study and annotation of images and texts. It is a suite of tools, enabling scholars to gather and organize the evidence necessary to support arguments based in digitized resources. DM enables users to mark fragments of interest in manuscripts, print materials, photographs, etc. and provide commentary on these resources and the relationships among them.

Last updated: 1 May 2016

Text 2 Mind Map is a web-based tool for mind mapping. Very basic interface and functionality. Requires users to structure information in a linear text outline, which it returns as a diagram.

Code license: Closed source
Last updated: 22 Mar 2016

AroniSmartIntelligence™ is an application that performs text analytics on RSS articles, reviews, feedback, chat data or other unstructured texts organized into sub-folders. The output may be further input into other advanced statistical analytics or data mining modules available in AroniSmartIntelligence™, including regression analysis, econometrics, segmentation and Bayesian models.

Code license: Closed source
Last updated: 18 Mar 2016

CloudConvert supports the conversion between more than 200 different audio, video, document, ebook, archive, image, spreadsheet and presentation formats.

The CloudConvert API offers the full functionality of CloudConvert and makes it possible to use the conversion services in your own applications.

Code license: Closed source
Last updated: 10 Mar 2016

Overview is a tool for analyzing large sets of documents. In includes a sophisticated search engine, word clouds, entity detection, and topic-based document clustering. If that’s not good enough, you can write your own plugins using the API. It is open source and you can run it on your own computer.

It was originally designed for investigative journalists, but it’s now also used for qualitative research, social media conversation analysis, legal document review, digital humanities, and more.

Overview is built to do several types of tasks:

Code license: Open source
Last updated: 9 Mar 2016

A software application that enables a user to search, manipulate and publish large SGML/XML documents. Anastasia was developed within an academic context to enable the manipulation of a single, large mark-up documents or a set of documents. It utilises two methods to interpret the structure of a mark-up document: First, it uses pattern-matching algorithms to process a hierarchical tree, similar to other XML software applications; Second, it interprets the document structure as a series of sequential 'events' which must be processed.

Code license: Open source, GNU GPL
Last updated: 23 Feb 2016

TAToo is an embeddable Flash widget that displays TAPOR analytics for the page on which it resides.

Code license: Apache License
Last updated: 23 Feb 2016

Philomine is an extension to the Philologic text retrieval engine that supports a variety of machine learning, text mining, and document clustering tasks.

Code license: Open source, GNU GPL
Last updated: 22 Feb 2016

PhiloLine is an add-on for the Philologic text retrieval engine that provides a sequence alignment algorithm for humanities text analysis designed to identify "similar passages" in large collections of texts.

Code license: Open source, GNU GPL
Last updated: 22 Feb 2016

A graphical user interface tool for Latent Dirichlet Allocation topic modeling.

Last updated: 17 Feb 2016

Sigil is a free, open source, multi-platform e-book editor, designed for editing books in EPUB format.

  • Full UTF-16 support and full EPUB 2 spec support
  • Multiple views: code view (complete control over directly editing EPUB syntax), book view (WYSIWYG), and preview view
  • Table of contents generator, metadata editor, multi-language user interface, spell checking tool, EPUB compliance validator, support for find and replace
Code license: Open source, GNU GPL v3
Last updated: 3 Feb 2016

Combined with the Leptonica Image Processing Library Tesseract can read a wide variety of image formats and convert them to text in over 40 languages.

This code is a raw OCR engine. It has no output formatting and no UI. It can detect fixed pitch vs proportional text. Nevertheless in 1995 this engine was in the top 3 in terms of character accuracy, and it compiles and runs on both Linux and Windows. Training code is included in the open source release.

The core developer on the project is Ray Smith (theraysmith).

Code license: Open source, Apache License
Last updated: 27 Jan 2016

A Python-based XML web publishing framework which enables dynamic pipelining of XSLT transformations. Data is processed by an XML pipeline composed of several WSGI applications and middleware components.

Features:

  • Apache Cocoon Sitemap 1.0 compatible
  • WSGI modularity
  • URI pattern matching
Code license: Open source, GNU GPL
Last updated: 26 Jan 2016

Google Docs is an online environment for editing and sharing documents, spreadsheets, presentations, forms, drawings, and tables. Google Docs documents can be public or private, or shared with anyone with a Google account, e-mailed, or downloaded in various formats, including conversions to PDF and other formats not identical to the original or to the proprietary format used at creation. Designated people with whom items are shared can be given permission to comment or edit the files, thus providing a quick way to collaborate on creating and editing documents and presentations.

Code license: Closed source
Last updated: 26 Jan 2016

Figshare is a repository where users can make all of their research outputs available in a citable, shareable and discoverable manner. All file formats can be published, including videos and datasets that are often demoted to the supplemental materials section in current publishing models. Users of the site maintain full control over the management of their research whilst benefiting from global access, version control and secure backups in the cloud.

Code license: Closed source
Last updated: 29 Dec 2015

A free iOS app for text analysis. Textal allows you to analyze documents, tweet streams, and webpages. Create clickable text clouds based on the source data that you choose. It comes pre-loaded with a large number of public domain texts. Text clouds are easily shareable via various Twitter and email.

Last updated: 18 Dec 2015

Superfastmatch is designed to find exact duplicates of text strings between documents.

Code license: Open source, GNU GPL
Last updated: 1 Dec 2015

Unlock Text is a powerful geoparser that can search text hosted on the web in txt or html format for references to locations. These locations are then returned ready for use in your results page, web map or any other application.

The Unlock Text API provides access to two parsers, the Edinburgh Geoparser from the Edinburgh Language Technology Group and the CLAVIN parser.

Code license: Open source
Last updated: 19 Nov 2015

corpkit is a tool for doing corpus linguistics.

It does a lot of the usual things, like parsing, concordancing and keywording, but also extends their potential significantly: you can concordance by searching for combinations of lexical and grammatical features, and can do keywording of lemmas, of subcorpora compared to corpora, or of words in certain positions within clauses.

Corpus interrogations can be quickly edited and visualised in complex ways, or saved and loaded within projects, or exported to formats that can be handled by other tools.

Code license: MIT License
Last updated: 30 Oct 2015

NVivo is commercial software for qualitative analysis of unstructured data, in a range of formats and from diverse sources. Enables users to collect, organize, and analyze content from interviews, focus group discussions, surveys, audio, social media, videos, and webpages.

Code license: Closed source
Last updated: 30 Oct 2015

corpkit is a tool for doing corpus linguistics.

It does a lot of the usual things, like parsing, concordancing and keywording, but also extends their potential significantly: you can concordance by searching for combinations of lexical and grammatical features, and can do keywording of lemmas, of subcorpora compared to corpora, or of words in certain positions within clauses.

Corpus interrogations can be quickly edited and visualised in complex ways, or saved and loaded within projects, or exported to formats that can be handled by other tools.

Code license: MIT License
Last updated: 5 Oct 2015

corpkit is a tool for doing corpus linguistics.

It does a lot of the usual things, like parsing, concordancing and keywording, but also extends their potential significantly: you can concordance by searching for combinations of lexical and grammatical features, and can do keywording of lemmas, of subcorpora compared to corpora, or of words in certain positions within clauses.

Corpus interrogations can be quickly edited and visualised in complex ways, or saved and loaded within projects, or exported to formats that can be handled by other tools.

Code license: MIT License
Last updated: 5 Oct 2015

Aimed at the TEI editing community and intended to be run inside oXygen, the Data Dictionary Generator (DDG) generates profiles of every element and attribute appearing in a TEI file. Each entry includes a definition from the TEI Guidelines, a local, project-specific definition (if provided), and a brief snapshot of how the element or attribute is actually being used. By making it easy to compare these three things, the DDG aims to help project editors reflect on current practice within their projects and quickly create stronger encoding guidelines for their collaborators.

Last updated: 2 Oct 2015

Aimed at the TEI editing community and intended to be run inside oXygen, the Data Dictionary Generator (DDG) generates profiles of every element and attribute appearing in a TEI file. Each entry includes a definition from the TEI Guidelines, a local, project-specific definition (if provided), and a brief snapshot of how the element or attribute is actually being used. By making it easy to compare these three things, the DDG aims to help project editors reflect on current practice within their projects and quickly create stronger encoding guidelines for their collaborators.

Last updated: 28 Sep 2015

A cross-platform XML editor that may be used to create and validate XML documents and associated schema. It fully supports XSL (both XSLT and FO), DTD, Schema (Relax RNG and W3C), Database, XQuery and CSS. OXygen XML Editor works with all XML-based technologies, including XML databases, XProc pipelines, and web services and comes with ready-to-use DITA, DocBook, TEI, and XHTML support.

Frequently updated and supported, and with a very large set of features, this software tool has proved popular with digital humanists.

Code license: Closed source
Last updated: 10 Sep 2015

Joomla is an open source content management system (CMS), enabling users to build websites and applications.

Code license: Open source, GNU GPL v2
Last updated: 8 Sep 2015

Captivate is software for recording audio and video of a user's screen. Users can import PowerPoint slides and add rich media, simulations, and quizzes, and publish them to learning management systems that support the SCORM standard.

Last updated: 8 Sep 2015

CONTENTdm is digital collection management software that allows for the upload, description, management, and access of digital collections. CONTENTdm is mostly used by libraries, archives, museums, government agencies, universities, corporations, historical societies, and other organizations that wish to host a digital collection.

  • Collection storage, management, and delivery to users across the web, in any format (e.g., local history archives, newspapers, books, maps, slide libraries, audio, video)
Code license: Closed source
Last updated: 7 Sep 2015

Stanza allows you to read books on your iPhone, iPod Touch and iPad. Stanza supports HTML, PDF, Microsoft Word, and Rich Text Format reading, as well as all the major eBook standards. It's also a very open reader - developers can add new formats to the programs API meaning it will always be able to handle the latest releases

Last updated: 22 Aug 2015

DEVONthink is a database that helps users organize, manage and collaborate on digital files, including Office files, links, e-mails, research data and PDFs.

Code license: Closed source
Last updated: 10 Aug 2015

DH Press (originally called diPH) is a toolkit conceived as an easy-to-use WordPress plugin which allows potentially every kind of user to visualise and mashup historic and geographic information, documents and various types of multimedia content to develop digital humanities project.

Code license: Open source
Last updated: 10 Aug 2015

Plone is a powerful, flexible, open source Content Management System (CMS) built on top of Zope application server and CMF.
Features:

  • Flexible and adaptable workflow
  • Customisable
  • Free add-ons
  • Versioning, history and reverting content
  • Support for multiple mark up formats
  • Multilingual content management
  • RSS feed support
  • WebDAV and FTP support
  • WYSIWYG
  • Integrates with Active Directory, Salesforce, LDAP, SQL, Web Services, LDAP and Oracle
Code license: Open source, GNU GPL, GNU GPL v2
Last updated: 7 Aug 2015

CiteULike is a free service to help you to store, organise and share the scholarly papers you are reading. When you see a paper on the web that interests you, you can click one button and have it added to your personal library. CiteULike automatically extracts the citation details, so there's no need to type them in yourself. It all works from within your web browser so you can access it from any computer with an Internet connection. CiteULike supports annotation and rating of items, and upload of attachments (e.g. PDF file). (Attachments are only accessible privately by individual users).

Code license: GNU GPL
Last updated: 5 Aug 2015

VisualEyes is web-based authoring tool developed at the University of Virginia to weave images, maps, charts, video, and data into highly interactive and compelling dynamic visualizations.

Code license: Open source
Last updated: 3 Aug 2015

Omeka is a content management system designed for the display of library, museum, archives, and scholarly collections and exhibitions.

Code license: Open source, GNU GPL
Last updated: 2 Aug 2015

Xendo is an online research tool that provides unified search across cloud-based storage (such as Dropbox, Evernote, Google Drive, OneDrive) and email (such as Gmail, Office 365) and other services such as Slack, Trello and Asana (25 integrations to-date). Xendo offers advanced search capabilities such as proximity searching (looking for term or phrase within a number of words of a second term or phrase). Xendo uses OCR (Optical Character Recognition) to make scanned documents searchable. For some services, Xendo offers a content preview to speed research.

Last updated: 2 Aug 2015

Citavi's core features are reference management, knowledge organization, and task planning.

Reference management

Code license: Closed source
Last updated: 2 Aug 2015

Perl is a high-level, general-purpose, interpreted, dynamic programming language. Originally developed for text manipulation, it is now used for a wide range of tasks including graphics programming, system administration, network programming, applications that require database access and CGI programming on the Web.

Features:

  • C, shell scripting (sh), AWK, and sed
  • Powerful text processing facilities
  • Flexibility and adaptability
  • Support for multiple programming paradigms
Code license: Open source, GNU GPL
Last updated: 2 Aug 2015

Zenodo builds and operates a simple and innovative service that enables researchers, scientists, EU projects and institutions to share, preserve and showcase multidisciplinary research results (data and publications) that are not part of the existing institutional or subject-based repositories of the research communities.

Code license: GNU GPL
Last updated: 2 Aug 2015

Editors' Notes is an open-source, web-based tool for recording, organizing, preserving, and opening access to research notes, built with the needs of documentary editing projects, archives, and library special collections in mind.

Code license: Open source
Last updated: 8 Jul 2015

Cirilo is an application developed for content preservation and data curation in FEDORA-based repository systems. Content preservation and data curation include object creation and management, versioning, normalization and standards, and the choice of data formats. The client offers functionalities which are especially prone to be used as tools for mass operations on FEDORA objects, such as ingest or replacement processes.

Code license: Open source
Last updated: 8 Jul 2015

Microsoft OneNote is a digital notebook that allows you to gather notes and information in a central environment, and search across your shared notebooks to better manage information and work with others. OneNote used to be available as paid software, but is now free across platforms.

Code license: Closed source
Last updated: 5 Jul 2015

The ‘Stylo’ package provides easy-to-use implementations of various established analyses in the field of computational stylistics, including non-traditional authorship attribution, genre recognition, style development (“stylochronometry”), etc. The package includes a number of explanatory methods provided by the function stylo() (multidimensional scaling, principal component analysis, cluster analysis, bootstrap consensus trees).

Last updated: 16 Jun 2015

The Open Science Framework (OSF) is a free, open source tool designed to help researchers manage the entire research workflow: planning, execution, reporting, archiving and discovery. It is part collaboration software and part version control system. The OSF can be used to manage individual projects or large collaborative ones. Privacy and sharing settings allow for fine-grained control over access to files and materials stored on the platform - share privately with collaborators or publicly with the community at large.

Code license: Apache License
Last updated: 14 Jun 2015

Coggle is a web-based tool for non-linear structuring and visualization of information. Easy to create visually appealing diagrams with little to no technical expertise. Supports Markdown and LaTeX formatting (use LaTeX via the \\( \\) or \\[ \\] escape sequences). Users can add images by dragging and dropping them in the browser, view change history for each diagram and revert to previous states, and download their work as PDFs or images. Also enables real-time collaboration with others.

Possible use cases for Coggle may include:

Code license: Closed source
Last updated: 9 Jun 2015

A text-mining system for scientific literature. Textpresso's two major elements are (1) access to full text, so that entire articles can be searched, and (2) introduction of categories of biological concepts and classes that relate to objects (e.g., association, regulation, etc.) or describe one (e.g., methods, etc).

Code license: Open source
Last updated: 28 May 2015

OxGarage is a web, and RESTful, service to manage the transformation of documents between a variety of formats. The majority of transformations use the Text Encoding Initiative format as a pivot format.

OxGarage is based on the Enrich Garage Engine developed by Poznan Supercomputing and Networking Center and Oxford University Computing Services for the ENRICH project.

See the conversion matrix for details.

Code license: Open source
Last updated: 27 May 2015

Tesla is a virtual research environment for text engineering - a framework you can use to create experiments in corpus linguistics, and to develop new algorithms for natural language processing. Tesla is a client-server application, which can be used by individual researchers as well as by workgroups. The screenshot below shows the experiment editor of Tesla's Client application.

Last updated: 24 May 2015

Developed at Indiana University, Event Structure Analysis is made up of three components: Ethno, prerequisite analysis, and composition analysis. Ethno is an on-line Java program that helps you analyze sequential events; prerequisite analysis produces a diagram showing how events are connected; composition analysis involves coding agent, action, object, and other characteristics of each event.

Last updated: 24 May 2015

AnSWR supports qualitative analysis of word-based data. This entails a set of methods for organizing, displaying, processing, summarizing, and interpreting information.

Last updated 9/23/2005.

Only available for Windows 2000 and Windows XP.

Last updated: 24 May 2015

Find searches that correlate with real-world data: Google Correlate finds search patterns which correspond with real-world trends.

Last updated: 24 May 2015

Whatizit can ingest up to 500,000 terms pasted into the input box and execute any of the pre-defined text analysis pipelines.

Last updated: 23 May 2015

Weft QDA is a free and open-source tool for the analysis of textual data. You may import documents from plain text or PDF, apply character-level coding, category and document memos, retrieve coded text, apply simple coding statistics, apply free-text search, and export to HTML and CSV formats.

Last updated: 23 May 2015

HyperRESEARCH enables users to code and retrieve, build theories, and conduct analyses of your data. You may work with text, graphics, audio and video sources.

Last updated: 23 May 2015

WordSmith allows users to develop concordances, find keywords, and develop word lists from plain text files.

Last updated: 22 May 2015

Qualrus is an innovative qualitative data analysis tool that helps you manage unstructured data. Additionally, Qualrus learns your coding trends, provides a visual semantic network display, and gives advice and technical support.

Last updated: 22 May 2015

The Macro-Etymological Analyzer is a web app for text analysis that will look up every word of your text in the Etymological Wordnet, and generate statistics about the macro-etymology of your text, organized by language family. For instance, it can analyze a novel and tell you the proportions of words of Anglo-Saxon origin, or of Afroasiatic origin. First-generation and second-generation language ancestor data is included, and the output is highly granular, allowing the scholar to see the origins of individual words, and statistics about each ancestor language.

Code license: GNU GPL v3
Last updated: 20 May 2015

Diction analyzes texts for language indicating certainty, activity, optimism, realism, and commonality.

Last updated: 19 May 2015

Cucumber lets software development teams describe how software should behave in plain text. The text is written in a business-readable domain-specific language and serves as documentation, automated tests and development-aid - all rolled into one format.

Last updated: 19 May 2015

Lexos is an online tool that enables you to "scrub" (clean) your text(s), cut a text(s) into various size chunks, manage chunks and chunk sets, and choose from a suite of analysis tools for investigating those texts. Functionality includes building dendrograms, making graphs of rolling averages of word frequencies or ratios of words or letters, and playing with visualizations of word frequencies including word clouds and bubble visualizations.

Code license: Open source
Last updated: 17 May 2015

RSiena is a package for the R language that enables the statistical analysis of network data, including longitudinal network data, longitudinal data of networks and behavior, and cross-sectional network data. It provides the same functionality available in SIENA (Simulation Investigation for Empirical Network Analysis), Windows software which is no longer maintained.

Code license: Open source, GNU GPL v2
Last updated: 13 May 2015

AntWordProfiler is free software for analyzing word frequency.

Last updated: 9 May 2015

Tumblr is a blogging/microblogging platform with a focus on data sharing between individual blogs. Users can create and disseminate data in a visual or HTML editor, using standard Tumblr posting formats: text, photo, quote, link, chat, audio, and video.

Built-in customization tools allow users to manipulate the appearance of their blog with little knowledge of web development. Tumblr also provides a CSS/HTML customization panel for more advanced users, including theme documentation and the ability to upload/alter theme asset files (e.g., CSS, JS).

Code license: Open source
Last updated: 9 May 2015

TypePad is a commercial, fully hosted blogging platform. Provides library of customizable blog designs.

Code license: Closed source
Last updated: 9 May 2015

Greenstone is a suite of software for building and distributing digital library collections. It also allows users to publish to the internet or CD-ROM. Software interface and documentation available in English, French, Spanish, Russian, and Kazakh.

Code license: Open source, GNU GPL
Last updated: 8 May 2015

Juxta is an open-source cross-platform desktop tool for comparing and collating multiple witnesses to a single textual work. The software allows you to set any of the witnesses as the base text, to add or remove witness texts, to switch the base text at will, and to annotate Juxta-revealed comparisons and save the results. New in version 1.6.5 is the ability to upload your comparison sets to a free online workspace called Juxta Commons where you can analyze your data privately or choose to share visualizations of your work with anyone on the web.

Code license: Open source, Creative Commons
Last updated: 4 May 2015

Text analysis software aimed at beginners to qualitative research, and using live visualizations as the interface. Quirkos supports standard code-and-retrieve operations, searches and queries on the data, and can visualize connections between topics and themes.

Find more information at http://www.quirkos.com/qualitative-data-analysis-software.html

Code license: Closed source
Last updated: 3 May 2015

LiveJournal is a community publishing platform, with features characteristic of both blogging and social networking platforms. The site is longstanding, originally established in 1999 as a blogging platform and online community built around personal journals. Today comprises more than 50 million journals, with topical focuses such as politics, entertainment, fashion, literature, and design.

Code license: Open source
Last updated: 2 May 2015

Cross-platform app for analyzing text, video, and spreadsheet data (analyzing qualitative, quantitative, and mixed methods research)

Last updated: 2 May 2015

Linguistic Inquiry and Word Count is a text analysis software program that calculates the degree to which people use different categories of words across a wide array of texts.

Last updated: 2 May 2015

Evernote is note-taking software in the cloud, with options for private and shared notebooks. Users can take text notes, and upload files to attach them to notes. Evernote has built-in OCR for images with printed or handwritten text. A premium account allows access to notebooks offline, as well as more storage and embedded PDF search.

Code license: Closed source
Last updated: 2 May 2015

ANTHROPAC is a menu-driven DOS program for collecting and analyzing data on cultural domains. The program assists with the collection and analysis of structured qualitative and quantitative data, and provides analytical and multivariate tools.

Last updated: 2 May 2015

Leximancer is text analysis software that can create topic and concept based network visualizations and includes a sentiment analyzer.

Last updated: 2 May 2015

Sophie is an electronic tool for authoring, collaborating, reading, and publishing rich media documents in networked environments. Built in Java it runs on a variety of platforms.

It does not support either the epub or mobi formats instead using its own internal format.

Development of the project seems to have stalled

Last updated: 1 May 2015

SearchTeam is a collaborative search engine that allows individuals and groups to curate search results in a public or shared SearchSpace.

Code license: Closed source
Last updated: 1 May 2015

After creating a free account, users can submit requests for mining and analyzing JSTOR content. By submitting a query, a user will receive a random sample of 1,000 of JSTOR's 4.6 million documents; more documents can be received by contacting JSTOR directly. Users can choose to receive the following results:

  • Citations Only (all requests come with citations by default)
  • Word Counts
  • Bigrams
  • Trigrams
  • Quadgrams
  • Key Terms
  • References
Last updated: 29 Apr 2015

Importing, transforming, storing and indexing data should be easy.

Catmandu provides a suite of Perl modules to ease the import, storage, retrieval, export and transformation of metadata records. Combine Catmandu modules with web application frameworks such as PSGI/Plack, document stores such as MongoDB and full text indexes such as Solr to create a rapid development environment for digital library services such as institutional repositories and search engines.

Code license: GNU GPL v3
Last updated: 22 Apr 2015

Scrivener is software for writing that includes virtual index cards, outlining, version control, import/export options, and scriptwriting features, and provides a management system for notes and documents plus support for document metadata.

It allows the creation of documents from sub documents, ebook (epub and Kindle/mobi) and TeX and LaTeX export as well as ODF, PDF and Microsoft Word exports.

A Linux version is in beta, and an iOS version is reportedly under development

Code license: Closed source
Last updated: 18 Aug 2015

Commentpress is a theme and plugin for WordPress that enables granular public commenting on texts.

Code license: Open source, GNU GPL
Last updated: 6 Apr 2015

CollateX is a Java software for collating textual sources, for example, to produce a critical apparatus. As of January 2012 the project was at an early stage of development and lacked thorough documentation.

Code license: GNU GPL v3
Last updated: 25 Mar 2015

Bitext provides multilingual semantic technologies in the field of Text Analyics via API with services like Entity Extraction, Concept Extraction, Sentiment Analysis, and Text Categorisation.

Last updated: 25 Mar 2015

JGAAP is software designed for textual analysis, text categorization, and authorship attribution

Last updated: 25 Mar 2015

TAMS Analyzer is a program that works with TAMS to let you assign ethnographic codes to passages of a text just by selecting the relevant text and double clicking the name of the code on a list. It then allows you to extract, analyze, and save coded information.

Code license: Open source, GNU GPL
Last updated: 24 Mar 2015

"TextSTAT is a simple programme for the analysis of texts. It reads plain text files (in different encodings) and HTML files (directly from the internet) and it produces word frequency lists and concordances from these files. This version includes a web-spider which reads as many pages as you want from a particular website and puts them in a TextSTAT-corpus. The new news-reader, too, puts news messages in a TextSTAT-readable corpus file.
TextSTAT reads MS Word and OpenOffice files. No conversion needed, just add the files to your corpus...

Last updated: 24 Mar 2015

Oracle Database is a powerful and extensive relational database management system (RDBMS). There are restrictions on the free version of the software.
Features:

  • Supports symmetric multiprocessing (SMP)
  • Stores data logically in the form of tablespaces and physically in the form of datafiles
  • Transportable tablespaces
  • Advanced Queuing (AQ)
  • 64-bit database
  • Data Mining Option
Code license: Closed source
Last updated: 22 Mar 2015

TiddlyWiki is a reusable personal web notebook. It allows anyone to create personal hypertext documents that can be published on the Web, and also search and tag content. The developers write, "TiddlyWiki is designed to be non-linear, structuring content with stories, tags, hyperlinks, and other features. You can organise and retrieve your notes in ways that conform to your personal thought patterns, rather than feel chained to one preset organisational structure. You can use TiddlyWiki as a single file that you view and edit through any web browser, whether you are online or offline.

Code license: Open source, BSD
Last updated: 22 Mar 2015

A French-developed Java application that displays the lexical relations of a word in a 3D environment.

Last updated: 22 Mar 2015

Freedity can create an RSS feed from any web page, with the number of feeds and update interval varying based on the tier of the subscription.

Last updated: 5 Mar 2015

VARD 2 is an interactive piece of software produced in Java designed to assist users of historical corpora in dealing with spelling variation, particularly in Early Modern English texts. The tool is intended to be a pre-processor to other corpus linguistic methods such as keyword analysis, collocations and annotation (e.g. POS and semantic tagging), the aim being to improve the accuracy of these tools

Last updated: 19 Feb 2015

CorpusSearch 2 allows users to construct and search syntactically annotated corpora, including finding and counting lexical and syntactic patterns, correcting systemic errors, and coding linguistic features.

The software is released under Mozilla Public License 1.1 (MPL 1.1) .

Code license: Open source
Last updated: 11 Feb 2015

A software tool for performing concordance – the analysis of a set of words within its immediate context - on a body of text. The tool performs full concordance, reading and analysing each and every word in a text. It was initially written for the analysis of English texts, but has since been extended to cater for other Western languages. Limited support is also provided for text in East Asian scripts, such as Chinese and Korean.

Features:

Code license: Closed source
Last updated: 11 Feb 2015

AntConc is free concordance software. It is multi-platform and easy to deploy and use.

AntConc is part of a suite of related tools for text processing and analysis, including applications for parallel corpus analysis, word profiling, PDF to text conversion, text structure analysis, detecting and converting character encodings, Japanese and Chinese segmenter and tokenizer, wordclass tagger, and spelling variant anaysis. The developer is currently drafting a more explicit licence for the use of the software.

Last updated: 11 Feb 2015

WriteLaTeX is a free service that lets users create, edit and share their scientific ideas easily online using LaTeX, a comprehensive and powerful tool for scientific writing. Users can start projects with quality LaTeX templates for journals, CVs, resumes, papers, presentations, assignments, letters, project reports, and more.

Code license: Closed source
Last updated: 30 Jan 2015

CATMA (Computer Aided Textual Markup & Analysis) is a free, open source markup and analysis tool from the University of Hamburg's Department of Languages, Literature and Media. It incorporates three interactive modules: (1) The tagger enables flexible and individual textual markup and markup editing. (2) The analyzer incorporates a query language and predefined functions. It also includes a query builder that allows users to construct queries from combinations of pre-defined questions while allowing for manual modification for more specific questions.

Code license: GNU GPL v3
Last updated: 29 Dec 2014

960 Grid System is a CSS template that comes with corresponding Acorn, Fireworks, Flash, InDesign, GIMP, Inkscape, Illustrator, OmniGraffle, Photoshop, QuarkXPress, Visio, Exp Design, and printable templates to facilitate different stages of the web design process.

Code license: Open source, GNU GPL, MIT License
Last updated: 29 Dec 2014

A simple word cloud generator with customizable font and color options. Word clouds are generated by pasting text into a box, or by entering the URL of any blog, blog feed, or any other web page that has an Atom or RSS feed.

Code license: Closed source
Last updated: 29 Dec 2014

MONK is a digital environment designed to help humanities scholars discover and analyze patterns in the texts they study.

Last updated: 29 Dec 2014

The Visual Understanding Environment (VUE) is concept mapping software that can integrate with multiple repositories to pull in, organize, and analyze data. Multiple features for advanced management of digital resources for teaching, learning, and research.

Last updated: 29 Dec 2014

Integrated Content Environment (ICE) was an open source project of the Learning Resources Development (LRD) unit at the University of Southern Queensland. The content management system allowed users to convert content authored in Microsoft Word or OpenOffice.org Writer into self-contained course websites using the IMS format.

The ICE authoring environment enabled:

Code license: Open source, GNU GPL
Last updated: 29 Dec 2014

Calibre is a free and open source ebook library management application, including options for syncing to devices and converting between a large number of formats. Calibre also has a built-in e-book editor for EPUB and AZW3 formats.

Code license: Open source, GNU GPL, GNU GPL v3
Last updated: 29 Dec 2014

QuarkXPress desktop publishing software is commonly used to create page layouts for a variety of print publications such as books, newspapers, magazines, posters and brochures. Similar in function to InDesign, the main differences are Quark's unique features for exporting documents as interactive webpages as well as its widespread use by printers, typesetters and page designers.
Features:

Code license: Closed source
Last updated: 29 Dec 2014

The main programs that comprise the Information processor are called the analyst server and query or knowledge processor. The analyst program can be called from a command line, from an html form, or through a TCP/IP socket protocol. The query processor can be accessed with any browser using HTML commands. It analyzes text and allows the user to search it.

Code license: Closed source
Last updated: 29 Dec 2014

Exhibit 3.0 is a publishing framework for large scale data-rich interactive Web pages. The beta version is scalable up to 100k items.

Last updated: 29 Dec 2014

Blogger is simple blog publishing software owned by Google.

Code license: Closed source
Last updated: 29 Dec 2014

"The Virtual Lightbox for Museums and Archives (VLMA) is an educational tool for collecting and reusing in a structured fashion the online contents of museums and archives with visual components. With VLMA, you can browse and search collections, construct personal collections, export these collections to xml or Impress presentation format, annotate them, and share your collections with other VLMA users."

Code license: Open source
Last updated: 29 Dec 2014

A text editor designed for use by software developers and web designers to edit, search, and manipulate text. BBEdit provides native support for several programming and scripting languages. Third party custom modules are available, created by users, to handle languages that are not supported in the native application.

Features:

Code license: Closed source
Last updated: 29 Dec 2014

CHET-C, or Chapel Hill Electronic Text-Converter, is a browser based software tool designed to convert digital texts that employ standard epigraphic conventions such as the Leiden sigla into EpiDoc-compliant XML files.

The tool can be accessed online at http://www.stoa.org/projects/epidoc/stable/chetc-js/chetc.html. Fragments of epigraphic text using standard sigla (eg Leiden convention markup) are pasted into the tool and Epidoc compliant XML is generated.

Code license: Open source, GNU GPL
Last updated: 29 Dec 2014

Fedora (Flexible Extensible Digital Object Repository Architecture) was originally developed by researchers at Cornell University as an architecture to store, manage, and access digital content in the form of digital objects. Fedora defines a set of abstractions for expressing digital objects, asserting relationships among digital objects, and linking behaviors to digital objects.

Code license: Open source, Apache License
Last updated: 29 Dec 2014

MLA, APA, Chicago / Turabian and most-common Bluebook forms as an integrated citing and note-taking platform for individual or group projects. Prompts for analysis of source types and is unique in offering teaching support and personal help on any citation. Instructor / librarian view allows teacher to comment on work-in-progress providing just-in-time feedback in-context. Archives copies of web pages and pdfs which can be annotated. Dashboard provides long-term access to a portfolio of work.

Code license: Closed source
Last updated: 29 Dec 2014

MediaWiki is a free software open source wiki package written in PHP, originally for use on Wikipedia and other Wikimedia Foundation projects. It is designed to be run on a large server farm for a website that gets millions of hits per day.

Code license: Open source, GNU GPL, GNU GPL v2
Last updated: 29 Dec 2014

WriteRoom is an alternative to Microsoft Word, that removes distractions on your computer while you're writing. WriteRoom is a full screen writing environment that has certain functions like word count and autosave. WriteRoom for iOS is synced with Dropbox, and your iPhone/iPad/iPod touch.

Code license: Closed source
Last updated: 29 Dec 2014

Journler is a daily notebook and entry based information manager. Scholars, teachers, students, writers, and everyday users may use this on a daily basis to integrate their notebook content to other sources of media such as audio and video.

The site has not been updated since 201. It looks like Journaler is now available open source though the option to purchase is still displayed.

Code license: Open source
Last updated: 29 Dec 2014

Co-ment is a text annotation and collaborative writing tool. Co-ment provides a friendly graphic user interface for text annotation, collaboration and writing texts online.

Code license: GNU Affero GPL v.3
Last updated: 29 Dec 2014

eLaborate is an online work environment in which scholars can upload scans, transcribe and annotate text, and publish the results as on online text edition which is freely available to all users.

Code license: GNU GPL v3
Last updated: 29 Dec 2014

Jarnal is an open-source application for notetaking, sketching, keeping a journal, making a presentation, annotating a document - including pdf - or collaborating using a stylus, mouse or keyboard. It is similar to Microsoft Windows Journal and to the earlier Mimeo whiteboarding and Palm notepad applications.

Code license: GPL
Last updated: 29 Dec 2014

Pliny is a scholarly note-taking and annotation tool. It may be used with both digital (web pages, images, PDF files) and non-digital (books, printed articles) materials, run as a desktop application on the user's computer. Pliny is useful for taking and managing annotations and notes while reading, as well as subsequently developing and presenting an interpretation.

Last updated: 29 Dec 2014

Project Pad is web-based system for media annotation and collaboration for teaching and learning and scholarly applications. Project Pad provides tools for browsing and working with audio, video, and images from digital repositories. The user may organize and annotate excerpts within their own "online notebook." Available as a standalone web application or set of Sakai tools.

Code license: Open source, GPL
Last updated: 29 Dec 2014

The Annotator allows you to analyze any block of text created by other authors. You may use virtual markers to highlight important passages, questions, thoughts, or add comments.

Last updated: 29 Dec 2014

Mnemomap is a flash interactive search engine that generates a visual "Atomic-Tree", sends your queries to a Query List, and delivers the search results. The Atomic-Tree allows you to improve your query mid-search. The Query List allows you to customize your search query.

Last updated: 29 Dec 2014

Silobreaker is a search engine that aggregates the news from numerous sources and presents the contents in various visualization formats.

Last updated: 29 Dec 2014

Processing is an open source programming language and environment for people who want to create images, animations, and interactions. Initially developed to serve as a software sketchbook and to teach fundamentals of computer programming within a visual context, Processing also has evolved into a tool for generating finished professional work. Today, there are tens of thousands of students, artists, designers, researchers, and hobbyists who use Processing for learning, prototyping, and production.

Last updated: 29 Dec 2014

LATtice lets you explore and compare texts across entire corpora but also allows you to “drill down” to the level of individual LATs (language action types) to ask exactly what rhetorical categories make texts similar or different.

Last updated: 29 Dec 2014

TEI Boilerplate is a lightweight solution for publishing styled TEI (Text Encoding Initiative) P5 content directly in modern browsers. With TEI Boilerplate, TEI XML files can be served directly to the web without server-side processing or translation to HTML.

Last updated: 29 Dec 2014

A simple and easy tool for creating EPUB, MOBI, and other ebook formats.

Code license: Closed source
Last updated: 29 Dec 2014

Bookworm enables you to graphically explore lexical trends in repositories of digitized texts.

Code license: Open source
Last updated: 29 Dec 2014

Trello is a web-based project management and collaboration tool that allows users to organize projects in a dashboard view, containing one or more project-oriented boards. The dashboard provides a real-time overview of what is being worked on, who is working on what, and overall progress toward project milestones. Useful for organized task management, delegation, communication, and collaboration across teams.

Code license: Closed source
Last updated: 29 Dec 2014

Jekyll is a simple, blog aware, static site generator. It takes a template directory containing raw text files in various formats, runs it through Textile or Markdown and Liquid converters, and creates a complete, static ready-to-publish website suitable for serving with your favorite web server. Jekyll also happens to be the engine behind GitHub Pages, which means you can use Jekyll to host your project’s page, blog, or website from GitHub’s servers for free.

Code license: Open source, MIT License
Last updated: 29 Dec 2014

Pandoc can convert documents in reStructuredText, textile, HTML, or LaTeX formats to a variety of other formats including XHTML, PDF, EPUB, docx, odt, and more.

Code license: Open source
Last updated: 29 Dec 2014

Markdown is a text-to-HTML conversion tool for web writers. Markdown allows you to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML).

Code license: Open source, BSD
Last updated: 29 Dec 2014

DSpace is the software of choice for academic, non-profit, and commercial organizations building open digital repositories. It is free and easy to install "out of the box" and completely customizable to fit the needs of any organization.

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets. DSpace has an active community of developers and is used by thousands of institutions worldwide.

Last updated: 29 Dec 2014

A web application used to build and maintain an archetypal, invisible website format that combines text, image, movie and sound.

Last updated: 29 Dec 2014

WikiPack is a web based personal information organizer and Markdown editor that uses Dropbox for synced storage. Using plain text Markdown files and WikiWords, WikiPack gives information context and links entries together by turning your Markdown pages into a private, password protected wiki. The easy to use Markdown language lets you create and edit your wiki pages without having to learn complex wiki syntax.

Code license: Closed source
Last updated: 29 Dec 2014

Voyant Tools is a web-based reading and analysis environment for digital texts.

Code license: Open source
Last updated: 29 Dec 2014

Available as a web-based service and as an app for iOS, Mac, PC, and Android, Google Drive allows users to create, store, edit, and share files across all their devices. Online and offline file access available. Requires a Google account for use, but allows files from Drive to be shared with non-Google users.

Last updated: 1 Sep 2016

This online tool can be used for a wide variety of annotation tasks, including visualization and collaboration.

brat is designed in particular for structured annotation, where the notes are not freeform text but have a fixed form that can be automatically processed and "interpreted" by a computer. brat also supports the annotation of n-ary associations that can link together any number of other annotations participating in specific roles. brat also implements a number of features relying on natural language processing techniques to support human annotation efforts.

Last updated: 29 Dec 2014

Web-based discussion tool (not a full-fledged learning management system, but you can link to Piazza from your LMS, including Blackboard, Moodle, and Coursera) that allows student to ask questions and interact with instructors and other students in a public space. A wiki style format enables collaboration in a single space and features LaTeX editor, highlighted syntax and code blocking. Questions and posts needing immediate action are highlighted and instructors endorse answers to keep the class on track. Anonymous posting encourages every student to participate.

Code license: Closed source
Last updated: 29 Dec 2014

Korbo is a powerful aggregation platform for gathering Linked Data objects relevant to your area of research into single workspaces or “baskets”.

Korbo is targeted primarily at developers who want to build applications on top of its API and make full use of the linked cultural data from sources such as Europeana, FreeBase and DBPedia.

Korbo is currently in the early stages of development, but you can already try out a demo version of the platform.

Code license: Open source, GNU GPL
Last updated: 29 Dec 2014

Insync extends Google Drive's web functionality to your desktop by integrating with Windows, Mac and Linux platforms. Insync allows for built-in sharing without a browser, multiple account support, on-demand shared file syncing, desktop notifications and more.

Code license: Closed source
Last updated: 29 Dec 2014

A Web-based image annotation tool, built specifically for integration with existing Web pages or portal environments. Annotorious is also available as a plugin to the Annotator Web annotation system by the Open Knowledge Foundation.

Code license: MIT License
Last updated: 29 Dec 2014

Meld is a visual diff and merge tool targeted at developers. Meld helps you compare files, directories, and version controlled projects. It provides two- and three-way comparison of both files and directories, and has support for many popular version control systems.

Code license: Open source, GNU GPL v2
Last updated: 29 Dec 2014

Kaleidoscope is one of the world's best tools for spotting differences in images and text, and now it supports merging of files and folders, too. Kaleidoscope integrates directly with Git, Subversion, Mercurial, and Bazaar to fit perfectly in your workflow.

Last updated: 29 Dec 2014

Participad is a WordPress plugin that allows multiple people to edit the same WP content at the same time. Powered by Etherpad Lite, Participad gives you: notepads for collaborative notetaking; synchronous authoring of any content in the WordPress Dashboard; front-end editing. You can download it from the WordPress plugin repository.

Participad has three modules:

Code license: Open source, GNU GPL v3
Last updated: 29 Dec 2014

The DocScanner app uses a device's built-in camera to scan documents. Features include image optimization, OCR, document type recognition (document, business card, receipt, etc.), autosorting, and ability to upload documents to Evernote, Dropbox, and Google Drive.

Code license: Closed source
Last updated: 29 Dec 2014

nanoc is a Ruby-based, "static site generator" --it works as a tool that runs on your local computer and compiles documents written in formats such as Markdown, Textile, Haml… into a static web site consisting of simple HTML files, ready for uploading to any web server.

Code license: MIT License
Last updated: 29 Dec 2014

Project management software for sharing files, messages, and task management, including options for daily update emails, and real time document editing.

Code license: Closed source
Last updated: 29 Dec 2014

LitBlitz is free beta Chrome extension that aims to improve how students and researchers manage their notes for literature reviews, assignment research and more by simplifying pdf management, allowing capture and annotation of document snippets


LitBlitz v1.0 is currently available as a Google Chrome extension.

LitBlitz, while still available on the Google Chrome store no longer appears to be under development, and the company url redirects to a Japanese language web page.

Last updated: 29 Dec 2014

Digitate is a free application designed for use on the iOS platform, specifically on iPad devices. The application allows scholars and enthusiasts with an interest in the visual and material elements of a cultural artefact to make notes and annotations directly on an image of such an artefact.

For example, a literary scholar might use it to annotate the material or bibliographic elements of a rare text or first edition, while an art historian might do the same on an image of a painting.

Code license: Open source, Creative Commons
Last updated: 29 Dec 2014

From the website: NodeXL is a free, open-source template for Microsoft® Excel® 2007 and 2010 that makes it easy to explore network graphs. With NodeXL, you can enter a network edge list in a worksheet, click a button and see your graph, all in the familiar environment of the Excel window. (http://nodexl.codeplex.com/)

Last updated: 29 Dec 2014

Nomenklatura is a reference data recon server. It is a service that allows users to define and manage manage lists of canonical entities (e.g. person or organization names) and aliases that connect to one of the canonical entities. This helps to clean up messy data in which a single entity may be referred to by many names.It includes a user interface, an API, and a reconciliation endpoint for OpenRefine for matching data from data sets with the canonical entries.

Code license: Open source
Last updated: 29 Dec 2014

NodeBox is an application for creating 2D graphics and visualizations. It provides a visual and process-based editor for an underlying Python-based analysis and visualisation package. It is developer-described as a generative design app and this really taps into the serendipitous nature of the environment. The user constructs models and can tweak them in real time via the interface and see the resulting changes too the output.
It has been described as being "similar to Processing, but without all the interactivity".

Last updated: 29 Dec 2014

Writefull is a light-weight app that uses data from Google Books (5+ million books) and the Web to improve your writing, It compares small sections of your text to a large data set of writing found online and in Google Books. All you need to do is select a chunk of your text in your browser or text editing software, activate the Writefull popover, and choose one of its five options:

1) check the number of results (how often the chunk appears in Google Books or the Web);

Code license: Closed source
Last updated: 29 Dec 2014

Annotation Studio is an open source, web-based annotation application that integrates a powerful set of textual interpretation tools behind an intuitive and easy-to-use interface. Users can upload their own texts, and annotate with styled text, video, images, and weblinks. To date, the project has been used with great success in disciplines such as Writing, Literature, Foreign Languages, Anthropology, Film and Media Studies, and others at institutions including Harvard, Yale, Stanford, MIT, Barnard College, and Washington University.

Code license: Open source, GNU GPL, GNU GPL v2
Last updated: 29 Dec 2014

NowComment makes it easy to have rich, engaging discussions of online documents no matter how large (or small) your class or collaboration group. It's fast, powerful, and feature-rich: you can sort comments, skim summaries, create assignments, hide comments, highlight with multiple colors and meanings, and much more. Integrates to any LMS via LTI. Used in universities and K12 schools for the past 6 years.

Code license: Closed source
Last updated: 29 Dec 2014

BLLIP Parser (or Charniak-Johnson parser) is a statistical natural language parser for analyzing text to determine its grammatical structure. Grammatical structures are provided in Penn Treebank format.

Code license: Apache License
Last updated: 29 Dec 2014

Umigon is a free tool for sentiment analysis on Twitter.

Main features:

  1. Export to Excel and csv
  2. Distinction between sentiments ("I hate war", will be classified as negative sentiment) and negative factuals ("war has been declared", will be declared as neutral)
  3. Connects to twitter or allows free text input



The developer of Umigon can be reached on Twitter.

Code license: Apache License
Last updated: 29 Dec 2014

Annotating documents with highlights and notes can quickly clutter the page. Annotations simplifies adding and managing notes to texts while keeping the documents clear and readable.

Features

  • Highlight text with colours, assign custom keywords or add notes
  • Auto-completion to match existing keywords as you type
  • Organise and filter annotations by collections, type, keywords or matching search criteria
  • Create relationships between different annotations
Last updated: 29 Dec 2014

Ghost is a free, open source publishing platform. Also available as a hosted service for a monthly subscription cost.

Code license: Open source, MIT License
Last updated: 29 Dec 2014
CSV
Subscribe to Text