What kind of data should the tool work with?

HEURIST ( is an extremely flexible, end-user oriented, web-based data management system designed specifically for Humanities data. Developed since 2005, it has been in active use across many projects since 2009. It is available both as a free web service for researchers (hosted at the University of Sydney Data Centre) or for installation on a physical or virtual server (Open Source on gitHub).

Researchers can design, create, manage, analyse, visualise and publish their own richly-structured database(s) through a simple web interface, without the need for a programmer(s). Quite complex databases can be built in a few hours by borrowing structures and vocabularies published by other users. Databases can be designed and built incrementally, as existing data are not affected by changes in structure. Databases created by Heurist are stored in MySQL with a repeatable structure facilitating independant access by other software.

Advanced features include record linking, graph structure, drill-down facet searches, rule-based queries, custom reports, linked map-timelines, network visualisation, normalised spreadsheet import, crosstabulation, XML feeds, XSLT transforms. The team provides initial email and skype assistance for project setup at no cost, and special customisations at modest cost.

Code license: Open source, GNU GPL, GNU GPL v3
Last updated: 16 May 2018

This is a Windows program for generating and searching a KWIC concordance of a document ("KWIC" = "Keywords in Context"). A KWIC concordance is a list of the different words occurring in the document, with each instance of each word shown in context (that is, within a phrase). Word frequency is shown. Context size is user-definable, anything from 3 to 19 words long. The software acts on text files and on MS Word docx files, skipping over "stop" words. The concordance can be displayed alphabetically or by frequency, and can be written to a file.

Code license: Closed source
Last updated: 3 Feb 2017

BASE (Bielefeld Academic Search Engine) is a search engine for academic open access web resources that searches materials stored in OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) enabled repositories.

Last updated: 19 Apr 2016

Beautiful Soup is a library, written in the Python programming language, for pulling specific pieces of data out of HTML and XML files. It is especially suitable when working with data files that aren't well-formed, or are otherwise difficult to parse.

Saves programmers hours or days of work on quick-turnaround screen scraping projects.

Last updated: 19 Apr 2016

Superfastmatch is designed to find exact duplicates of text strings between documents.

Code license: Open source, GNU GPL
Last updated: 1 Dec 2015

Voyeur is a web-based text analysis environment where users can apply a wide variety of tools to any text they import.

Last updated: 3 Nov 2015

The MONK workbench provides 525 works of American literature from the 18th and 19th centuries, and 37 plays and 5 works of poetry by William Shakespeare, along with tools to enable literary research through the discovery, exploration, and visualization of patterns.

Users affiliated with CIC (Big Ten) schools can access a larger data set that includes about a thousand works of British literature from the 16th through the 19th century, provided by The Text Creation Partnership (EEBO and ECCO) and ProQuest (Chadwyck-Healey Nineteenth-Century Fiction).

Last updated: 12 Aug 2015

Philologic is a full-text search, retrieval and analysis tool with support for TEI-Lite XML/SGML, Unicode encoding, plaintext, Dublin Core/HTML, and DocBook.

Code license: GNU GPL, Open source
Last updated: 9 Aug 2015

iBoogie is a clustering search engine that puts documents with similar content or with related topics into the same group. Each group is assigned a label based on the content of the documents. The results are presented to the user in a hierarchy of topics (clusters) for browsing.

Last updated: 3 Aug 2015

SearchTeam is a collaborative search engine that allows individuals and groups to curate search results in a public or shared SearchSpace.

Code license: Closed source
Last updated: 1 May 2015

CorpusSearch 2 allows users to construct and search syntactically annotated corpora, including finding and counting lexical and syntactic patterns, correcting systemic errors, and coding linguistic features.

The software is released under Mozilla Public License 1.1 (MPL 1.1) .

Code license: Open source
Last updated: 11 Feb 2015

HyperPo is a user-friendly text exploration and analysis program that allows users to import texts or use texts available online (in English or French), and provides frequency lists of characters, words and series of words, color-coding to indicate repetition, KWIC, co-occurrence and distribution lists, and the ability to simultaneously compare data from multiple texts.

Last updated: 29 Dec 2014

Lextec offers a range of services and software for full-text indexing search and retrieval; automatic classification, routing, and filtering electronic text according to user defined profiles.

Last updated: 29 Dec 2014

Google Scholar searches books and scholarly articles (and optionally patents, legal opinions, and legal journals).

Last updated: 29 Dec 2014

Gathers smart search results based on user feedback. Similar to Google Alerts but you can give a thumbs up/down to improve the search results.

Last updated: 29 Dec 2014
Subscribe to search