What kind of data should the tool work with?

Textable is an open source program for text analysis. It offers a set of basic text-analytic components (e.g. import text from files, segment into words, measure segment diversity, etc.), which the user combines using a visual interface to build custom analytic workflows.

Code license: GNU GPL v3
Last updated: 20 Aug 2017

EPPT allows users to encode image-based scholarly editions without having to know XML syntax. It automates or semi-automates repeating attributes, and provides templates to reduce errors and accelerate the encoding process.

Last updated: 9 Aug 2016

A tool to convert Normal/Scanned PDF and Image to Word, Excel, PPT, Keynote, Pages, Text, etc. on Mac.

  • Convert PDF to Word (.doc), Excel (.xlsx), and More Common Office Format Files
  • Convert PDF to Pages and Keynote
  • Convert PDF to Graphics Files
  • Convert Scanned PDF with Accurate OCR
  • Convert Multilingual PDF Files
  • Support Password-Restricted PDF Files
Code license: Closed source
Last updated: 29 May 2016

Part-of-Speech (POS) tagging software for English - the classification of words into one or more categories based upon its definition, relationship with other words, or other context, also known as wordclass tagging. CLAWS (Constituent Likelihood Automatic Word-tagging System) uses several methods to identify parts of speech., most notably a system called Hidden Markov models (HMMs) which involve counting examples of co-occurrence of words and wordclasses in training data and making a table of the probabilities of certain sequences of words.


Code license: Closed source
Last updated: 3 May 2016

Smallpdf is a free online tool to compress, merge, split and convert PDF documents. It is simple to use and free to use. It can be useful to compress research papers, merge several documents together or extract graphs and images from PDF files.

Code license: Closed source
Last updated: 29 Mar 2016

CloudConvert supports the conversion between more than 200 different audio, video, document, ebook, archive, image, spreadsheet and presentation formats.

The CloudConvert API offers the full functionality of CloudConvert and makes it possible to use the conversion services in your own applications.

Code license: Closed source
Last updated: 10 Mar 2016

Overview is a tool for analyzing large sets of documents. In includes a sophisticated search engine, word clouds, entity detection, and topic-based document clustering. If that’s not good enough, you can write your own plugins using the API. It is open source and you can run it on your own computer.

It was originally designed for investigative journalists, but it’s now also used for qualitative research, social media conversation analysis, legal document review, digital humanities, and more.

Overview is built to do several types of tasks:

Code license: Open source
Last updated: 9 Mar 2016

Audacity is a free, easy-to-use and multilingual audio editor and recorder. Basic features, as listed on their website, include:

  • Record live audio.
  • Record computer playback on any Windows Vista or later machine.
  • Convert tapes and records into digital recordings or CDs.
  • Edit WAV, AIFF, FLAC, MP2, MP3 or Ogg Vorbis sound files.
  • Cut, copy, splice or mix sounds together.
  • Change the speed or pitch of a recording.
Code license: Open source, GNU GPL
Last updated: 24 Feb 2016

Combined with the Leptonica Image Processing Library Tesseract can read a wide variety of image formats and convert them to text in over 40 languages.

This code is a raw OCR engine. It has no output formatting and no UI. It can detect fixed pitch vs proportional text. Nevertheless in 1995 this engine was in the top 3 in terms of character accuracy, and it compiles and runs on both Linux and Windows. Training code is included in the open source release.

The core developer on the project is Ray Smith (theraysmith).

Code license: Open source, Apache License
Last updated: 27 Jan 2016

Google Docs is an online environment for editing and sharing documents, spreadsheets, presentations, forms, drawings, and tables. Google Docs documents can be public or private, or shared with anyone with a Google account, e-mailed, or downloaded in various formats, including conversions to PDF and other formats not identical to the original or to the proprietary format used at creation. Designated people with whom items are shared can be given permission to comment or edit the files, thus providing a quick way to collaborate on creating and editing documents and presentations.

Code license: Closed source
Last updated: 26 Jan 2016

TwapperKeeper is now called Hootsuite Archives and can be accessed from within Hootsuite.

Code license: Closed source
Last updated: 13 Dec 2015

CulturalAnalytics is an R package containing functions for statistical analysis and plotting of image properties, including statistics such as the standard deviation and mean in the RGB and HSV color spaces, image entropy and histograms in greyscale (intensity) and color, and for plotting color clouds and image scatter charts.

Code license: Open source, GNU GPL
Last updated: 12 Nov 2015

Map Warper is a tool for digitally aligning ("rectifying") historical maps to match today's precise maps. It is used publicly by the NYPL to crowdsource georectification of their own library of digitised historical maps.
In the wider version developed by Tim Waters user supplied maps can be georectified for subsequent use in your own mapping projects.

Code license: Open source
Last updated: 16 Jul 2015

VoxcribeCC has the most accurate speaker-independent and topic-independent desktop speech recognition technology. It is used for media (audio\video) transcription and video-captioning.

Please watch VoxcribeCC Usage Video to learn using VoxcribeCC just in 2 minutes.

Code license: Closed source
Last updated: 16 Jun 2015

OxGarage is a web, and RESTful, service to manage the transformation of documents between a variety of formats. The majority of transformations use the Text Encoding Initiative format as a pivot format.

OxGarage is based on the Enrich Garage Engine developed by Poznan Supercomputing and Networking Center and Oxford University Computing Services for the ENRICH project.

See the conversion matrix for details.

Code license: Open source
Last updated: 27 May 2015

Importing, transforming, storing and indexing data should be easy.

Catmandu provides a suite of Perl modules to ease the import, storage, retrieval, export and transformation of metadata records. Combine Catmandu modules with web application frameworks such as PSGI/Plack, document stores such as MongoDB and full text indexes such as Solr to create a rapid development environment for digital library services such as institutional repositories and search engines.

Code license: GNU GPL v3
Last updated: 22 Apr 2015

Praat is software for the phonetic analysis of speech, including support for articulatory and speech synthesis.

Code license: GNU GPL v2
Last updated: 19 Feb 2015

VARD 2 is an interactive piece of software produced in Java designed to assist users of historical corpora in dealing with spelling variation, particularly in Early Modern English texts. The tool is intended to be a pre-processor to other corpus linguistic methods such as keyword analysis, collocations and annotation (e.g. POS and semantic tagging), the aim being to improve the accuracy of these tools

Last updated: 19 Feb 2015

AGTK is a suite of software components for building tools for annotating linguistic signals, time-series data which documents any kind of linguistic behavior (e.g. audio, video). The internal data structures are based on annotation graphs. Annotation Graphs are a formal framework for representing linguistic annotations of time series data.

Code license: Open source
Last updated: 11 Feb 2015

Calibre is a free and open source ebook library management application, including options for syncing to devices and converting between a large number of formats. Calibre also has a built-in e-book editor for EPUB and AZW3 formats.

Code license: Open source, GNU GPL, GNU GPL v3
Last updated: 29 Dec 2014

CHET-C, or Chapel Hill Electronic Text-Converter, is a browser based software tool designed to convert digital texts that employ standard epigraphic conventions such as the Leiden sigla into EpiDoc-compliant XML files.

The tool can be accessed online at Fragments of epigraphic text using standard sigla (eg Leiden convention markup) are pasted into the tool and Epidoc compliant XML is generated.

Code license: Open source, GNU GPL
Last updated: 29 Dec 2014

HandBrake is an open-source, GPL-licensed, multiplatform, multithreaded video transcoder

Code license: Open source
Last updated: 29 Dec 2014

Insync extends Google Drive's web functionality to your desktop by integrating with Windows, Mac and Linux platforms. Insync allows for built-in sharing without a browser, multiple account support, on-demand shared file syncing, desktop notifications and more.

Code license: Closed source
Last updated: 29 Dec 2014

Best Media Converter for Mac to convert video or rip DVD to 160+ formats, burn to DVD, download online video, and share to YouTube easily.

  • Convert video/audio files and rip DVD to 160+ formats including 4K Ultra HD
  • Burn video to DVD disc, DVD folder and ISO files
  • Batch download streaming videos from 50+ popular Websites
  • Share to YouTube with a pop-up login window instantly
Code license: Closed source
Last updated: 29 Dec 2014
Subscribe to Conversion