Capture

What kind of data should the tool work with?

HEURIST is an extremely flexible data management system designed specifically for Humanities data - see http://HeuristNetwork.org. It is available as a free web service for researchers (hosted at the University of Sydney Data Centre) or for local installation (Open Source). Any confident researcher can design, create, manage, analyse, visualise and publish their own richly-structured database(s) through a simple web interface, without programmers or consultants. Quite complex databases can be built in a few hours through borrowing structures and vocabularies published by other users. Databases can be designed and built incrementally, as existing data are not affected by changes in structure. Advanced features include record linking, drilldown facet searches, rule-based queries, custom reports, linked map-timelines, network visualisation, normalised spreadsheet import, crosstabulation, XML feeds, XSLT transforms.

Code license: Open source, GNU GPL, GNU GPL v3
Last updated: 10 Mar 2017

Sifter provides search and retrieve access to every undeleted Tweet in the history of Twitter. Users can submit three historical Twitter estimate requests per day using a variety of Gnip PowerTrack rules. When the query is done, Sifter generates an email estimating the approximate number of tweets responsive to the query and the cost to get access to the data via DiscoverText.

Code license: Closed source
Last updated: 24 Feb 2017

Gephi is graphing software that provides a way to explore data through visualization and network analysis.

Code license: Open source, GNU GPL v3
Last updated: 15 Feb 2017

Recogito is an online platform for collaborative document annotation.

Recogito provides a personal workspace where you can upload, collect and organize your source materials - texts and images - and collaborate in their annotation and interpretation. Recogito enables you to make your work more visible on the Web more easily, and to expose the results of your research as Open Data.

Code license: Open source, Apache License
Last updated: 21 Dec 2016

Jotform allows users to create web forms (for surveys, etc.) using a drag-and-drop interface.

Code license: Closed source
Last updated: 10 Aug 2016

The FAIMS Mobile Platform (http://www.fedarch.org) is an open source, generalised system for digital data collection on Android. It works offline and helps record free text, multimedia, structured or spatial data with ample opportunity for the capture of metadata and certainty components of the captured data. It needs to be customised (via an xml definition document) for particular field/lab workflows. As a server-client system it facilitates simultaneous operation by multiple users.

Code license: Open source
Last updated: 28 Jun 2016

Roambi Flow is an iOS online publishing service. Roambi allows you to transform Excel data and other publications to visualizations compatible with mobile, and also send them to the iPhone or iPad.

Code license: Closed source
Last updated: 18 May 2016

An (optical character recognition) engine for creating editable and searchable electronic files from scanned paper documents, PDFs and digital photographs.
Features:

  • Recognition of Digital Camera and Mobile Phone Camera Images
  • Comprehensive Language Support
  • Complete Integration with Popular Office Applications
  • PDF conversion, archiving and security
Code license: Closed source
Last updated: 17 May 2016

DM is an environment for the study and annotation of images and texts. It is a suite of tools, enabling scholars to gather and organize the evidence necessary to support arguments based in digitized resources. DM enables users to mark fragments of interest in manuscripts, print materials, photographs, etc. and provide commentary on these resources and the relationships among them.

Last updated: 1 May 2016

Online Digital Asset Management (DAM) and collaboration platform for business use. Offers user management, custom branding and password-protected folders.

Code license: Closed source
Last updated: 27 Mar 2016

Bookworm is a tool that visualizes language usage trends in repositories of digitized texts in a simple and powerful way. It is a tool for culturomic exploration through the observation of chronological trends for words and phrases in large digitized collections of textual documents with metadata facets.

Code license: Open source
Last updated: 11 Mar 2016

Overview is a tool for analyzing large sets of documents. In includes a sophisticated search engine, word clouds, entity detection, and topic-based document clustering. If that’s not good enough, you can write your own plugins using the API. It is open source and you can run it on your own computer.

It was originally designed for investigative journalists, but it’s now also used for qualitative research, social media conversation analysis, legal document review, digital humanities, and more.

Overview is built to do several types of tasks:

Code license: Open source
Last updated: 9 Mar 2016

Audacity is a free, easy-to-use and multilingual audio editor and recorder. Basic features, as listed on their website, include:

  • Record live audio.
  • Record computer playback on any Windows Vista or later machine.
  • Convert tapes and records into digital recordings or CDs.
  • Edit WAV, AIFF, FLAC, MP2, MP3 or Ogg Vorbis sound files.
  • Cut, copy, splice or mix sounds together.
  • Change the speed or pitch of a recording.
Code license: Open source, GNU GPL
Last updated: 24 Feb 2016

The LC Newspaper Viewer is an open-source web application that understands how to model newspaper data created according to a set of technical guidelines, with the goal of publishing an online archive like Chronicling America.

Code license: Open source
Last updated: 23 Feb 2016

Combined with the Leptonica Image Processing Library Tesseract can read a wide variety of image formats and convert them to text in over 40 languages.

This code is a raw OCR engine. It has no output formatting and no UI. It can detect fixed pitch vs proportional text. Nevertheless in 1995 this engine was in the top 3 in terms of character accuracy, and it compiles and runs on both Linux and Windows. Training code is included in the open source release.

The core developer on the project is Ray Smith (theraysmith).

Code license: Open source, Apache License
Last updated: 27 Jan 2016

import.io is a free web-based platform that puts the power of the machine readable web in user's hands. Using their tools users can create an API or crawl an entire website in a fraction of the time of traditional methods, no coding required. Their highly efficient and scalable platform allows users to process 1,000s of queries at once and get real-time data in any format you choose. They also offer an easy to use client library to make exporting, integrating and using data as simple as extracting it.

Code license: Closed source
Last updated: 15 Jan 2016

Figshare is a repository where users can make all of their research outputs available in a citable, shareable and discoverable manner. All file formats can be published, including videos and datasets that are often demoted to the supplemental materials section in current publishing models. Users of the site maintain full control over the management of their research whilst benefiting from global access, version control and secure backups in the cloud.

Code license: Closed source
Last updated: 29 Dec 2015

A free iOS app for text analysis. Textal allows you to analyze documents, tweet streams, and webpages. Create clickable text clouds based on the source data that you choose. It comes pre-loaded with a large number of public domain texts. Text clouds are easily shareable via various Twitter and email.

Last updated: 18 Dec 2015

TwapperKeeper is now called Hootsuite Archives and can be accessed from within Hootsuite.

Code license: Closed source
Last updated: 13 Dec 2015

CulturalAnalytics is an R package containing functions for statistical analysis and plotting of image properties, including statistics such as the standard deviation and mean in the RGB and HSV color spaces, image entropy and histograms in greyscale (intensity) and color, and for plotting color clouds and image scatter charts.

Code license: Open source, GNU GPL
Last updated: 12 Nov 2015

NVivo is commercial software for qualitative analysis of unstructured data, in a range of formats and from diverse sources. Enables users to collect, organize, and analyze content from interviews, focus group discussions, surveys, audio, social media, videos, and webpages.

Code license: Closed source
Last updated: 30 Oct 2015

Jing allows you to take screenshots, record screencasts, and instantly share information. Jing is the low end of a suite of screen capture products. SnagIt provides a few extra features (such as saving videos in formats other than SWF) for a small fee. Camtasia is at the high end, with full video editing capabilities.

Last updated: 5 Oct 2015

Simple screencasting and image capture tool. SnagIt is part of TechSmith's family of screen capture and video editing products. Jing offers fewer features, but is a free alternative. Camtasia is the most fully featured of the products, but also the most expensive.

Code license: Closed source
Last updated: 2 Oct 2015

Camtasia is Mac/Windows software for recording screencasts and editing video. Videos can be sent directly to YouTube or integrated with Google Drive. Camtasia is the high end of a suite of screen capture products. SnagIt is a cheaper alternative with fewer features. Jing, the most basic of the TechSmith screen capture products, is free.

Code license: Closed source
Last updated: 8 Sep 2015

Snapzen is a browser tool that is used to collaborate with others about the information on any web page - right from your browser.

Discuss information on web pages with your colleagues, friends or family. It is easy to collaborate with others because they see exactly what you see on the web pages.

If you still use copy and paste, screenshot tools, email or chat to discuss web pages, Snapzen will show you a better way.

Code license: Closed source
Last updated: 15 Jun 2015

SylvaDB is a graph database management system. It allows users with no knowledge in graph theory to model, collect, query, and analyze data in a network structure. SylvaDB provides tools for easy creation of schemas and modelling, automatic forms creation to input the data, collaborative features, a visual query editor, global and local search, reports charts generation, networks metrics, and visualizations tools.

Code license: GNU Affero GPL v.3
Last updated: 9 Jun 2015

A text-mining system for scientific literature. Textpresso's two major elements are (1) access to full text, so that entire articles can be searched, and (2) introduction of categories of biological concepts and classes that relate to objects (e.g., association, regulation, etc.) or describe one (e.g., methods, etc).

Code license: Open source
Last updated: 28 May 2015

140kit provides a management layer for tweet collection and analysis.

Raw data cannot be passed through to the users, but any analytical process can be run across your dataset, and the data is held for as long as the user wants. When new analytical processes are created, they can be run on existing sets of data. 140kit does not claim any control of the analysis, however it retains ownership of the data collected.

Last updated: 24 May 2015

Whatizit can ingest up to 500,000 terms pasted into the input box and execute any of the pre-defined text analysis pipelines.

Last updated: 23 May 2015

A free (under the GNU General Public License) toolkit for the development of document image recognition systems.

Features:

  • Custom dictionaries may be created to assist with analysis of specific record types
  • Extensible functionality
  • Optical character recognition (OCR) toolkit plugin
Code license: Open source, GNU GPL
Last updated: 22 May 2015

Scrapy is an open source programming library for web crawling and web page text extraction, written in Python. You can make calls to Scrapy code from within your own scripts and applications to automate the task of extracting data from websites.

You would typically use Scrapy to automate the task of visiting one or more web pages, on a website to which you have access. You could alternately use it to invoke web-based Application Programming Interfaces (APIs).

Code license: Open source
Last updated: 22 May 2015

Users can upload photos and organize them into albums, and they can search photos that have been posted in public albums and filter the results by license (any Creative Commons license, licenses that allow commercial use, licenses that allow remixing).

Last updated: 18 May 2015

Lynks provides an easy to use, in-browser tool that helps you to create your own networks. Lynks is an initiative by Centre for Innovation, part of Leiden University (Campus The Hague). The software has been developed in 2014 in co-creation, with expertise from Dr. Eelke Heemskerk from University of Amsterdam. The software development has been supported by the financial contributions from the European Union Fund for Regional Development (EFRO) and the Municipality of The Hague.

Code license: Closed source
Last updated: 12 May 2015

MDID is software for teaching and learning with digital images, with tools for discovering, aggregating, and presenting digital media in a variety of learning spaces.

Code license: Open source, GNU GPL
Last updated: 8 May 2015

Heritrix is web crawler used by the Internet Archive, which provides a web-based user interface after initial configuration on a Linux machine. Also used by the Library of Congress, Heritrix captures metadata in the Web ARChive (WARC) format.

Code license: Open source, Apache License
Last updated: 6 May 2015

SiteSucker is OSX and iOS software that can download an entire website, including images and videos.

Last updated: 6 May 2015

HTTrack provides an easy-to-use interface for downloading websites-- including HTML, images, and other files-- or update a copy of a previously-downloaded site.

Code license: Open source, GNU GPL
Last updated: 6 May 2015

FromThePage is free software that allows volunteers to transcribe handwritten documents on-line. It's easy to index and annotate subjects within a text using a simple, wiki-like mark-up. Users can discuss difficult writing or obscure words within a page to refine their transcription. The resulting text is hosted on the web, making documents easy to read and search.

Code license: Open source, GNU Affero GPL
Last updated: 2 May 2015

Evernote is note-taking software in the cloud, with options for private and shared notebooks. Users can take text notes, and upload files to attach them to notes. Evernote has built-in OCR for images with printed or handwritten text. A premium account allows access to notebooks offline, as well as more storage and embedded PDF search.

Code license: Closed source
Last updated: 2 May 2015

SearchTeam is a collaborative search engine that allows individuals and groups to curate search results in a public or shared SearchSpace.

Code license: Closed source
Last updated: 1 May 2015

ScraperWiki is an online tool to make that makes the process of data scraping simpler and more collaborative. Anyone can write a screen scraper using the online editor. In the free version, the code and data are shared with the world. Because it's a wiki, other programmers can contribute to and improve the code.

Code license: GPL
Last updated: 1 May 2015

PDFMiner is a Python tool for extracting information from PDFs (not only text, but also information about fonts, encoding, and layout.)

Code license: MIT License
Last updated: 1 May 2015

After creating a free account, users can submit requests for mining and analyzing JSTOR content. By submitting a query, a user will receive a random sample of 1,000 of JSTOR's 4.6 million documents; more documents can be received by contacting JSTOR directly. Users can choose to receive the following results:

  • Citations Only (all requests come with citations by default)
  • Word Counts
  • Bigrams
  • Trigrams
  • Quadgrams
  • Key Terms
  • References
Last updated: 29 Apr 2015

Academia.edu is a social platform that allows academics to share research papers, gray literature, reviews and other scholarly materials. The site provides user statistics on the number and geographic origin of profile and document views. Academic affiliation is displayed in a tree-like format, grouped by universities and departments.

Code license: Closed source
Last updated: 21 Apr 2015

Bitext provides multilingual semantic technologies in the field of Text Analyics via API with services like Entity Extraction, Concept Extraction, Sentiment Analysis, and Text Categorisation.

Last updated: 25 Mar 2015

Extensive set of tools to allow collaborative transcription of manuscript pages in TEI-compliant XML.

Features of T-PEN through version 1.2 [from project blog]

Zoom Tool in Transcription User Interface: Holding CTRL+SHIFT will result in a magnified image of the current line being transcribed.

Last updated: 17 Mar 2015

Photoshop Express allows simple web-based image editing and cloud storage (2 GB free via Adobe Revel), as well as video storage and streaming, slideshow templates, and a photo gallery. Features include online galleries and slideshows, exporting and searching images, and privacy settings. Android, Windows and iOS (including iPad) apps are available.

Code license: Closed source
Last updated: 29 Dec 2014

Calibre is a free and open source ebook library management application, including options for syncing to devices and converting between a large number of formats. Calibre also has a built-in e-book editor for EPUB and AZW3 formats.

Code license: Open source, GNU GPL, GNU GPL v3
Last updated: 29 Dec 2014

eXist-db is an open source database management system that stores XML data according to the XML data model and features efficient, index-based XQuery processing.

Code license: Open source, GNU GPL, GNU LGPL
Last updated: 29 Dec 2014

"The Virtual Lightbox for Museums and Archives (VLMA) is an educational tool for collecting and reusing in a structured fashion the online contents of museums and archives with visual components. With VLMA, you can browse and search collections, construct personal collections, export these collections to xml or Impress presentation format, annotate them, and share your collections with other VLMA users."

Code license: Open source
Last updated: 29 Dec 2014

LibLime Koha is a web-based, open source integrated library system (ILS) that has also been used for virtual library systems (e.g. recreating historic libraries). LibLime Koha offers libraries circulation policies, patron management modules, parent-child relationship for patron records, club and service management features, in-depth "holds" support, single click batch import "undo" option, EzProxy compatibility, self-checkout interface and more.

Code license: Open source, GNU GPL
Last updated: 29 Dec 2014

CamStudio is free and open source screencasting software that saves the video as AVI files, though a Flash converter is included.

Code license: Open source, GNU GPL
Last updated: 29 Dec 2014

Capture Fox is a Firefox plugin that allows the user to record their voice and their sccreen.

Last updated: 29 Dec 2014

TipCam is a screen recording software for Windows, which allows you to record images and audio on your screen, and upload to YouTube. Latest version was released in 2008, likely not supported anymore.

Code license: Closed source
Last updated: 29 Dec 2014

Wink is a tutorial and presentation creation software, that allows you to create tutorials on how to use software, by capturing screenshots, mouse movements, and accompanying audio.

Last updated: 29 Dec 2014

ScreenFlick enables large-resolution recording, allowing you to make videos of screencasts with audio.

Code license: Closed source
Last updated: 29 Dec 2014

Screenr is a free web-based screen recording program that allows you to create and share screencasts on the web. You can record on a Mac or PC, and the recordings play everywhere, including iPhones. Very easy to use.

Last updated: 29 Dec 2014

ScreenFlow is a screen recording software for the Mac that allows you to record, edit and share audio and video on your computer.

Code license: Closed source
Last updated: 29 Dec 2014

Snapz Pro X allows you to record anything on your computer screen. You can save audio or video as a QuickTime® movie or screenshot that can be shared.

Last updated: 29 Dec 2014

Qiqqa is a research management software that allows you to organize large numbers of papers; find new papers to read and new information about papers you already have; review materials and create annotation reports. Qiqqa has several PDF tools that also allow you to convert from PDFs to text, and use a clipboard function to cut and paste text into your document.

Code license: Closed source
Last updated: 29 Dec 2014

MediaWiki is a free software open source wiki package written in PHP, originally for use on Wikipedia and other Wikimedia Foundation projects. It is designed to be run on a large server farm for a website that gets millions of hits per day.

Code license: Open source, GNU GPL, GNU GPL v2
Last updated: 29 Dec 2014

Dragon Dictation is a voice recognition application that allows you to speak and instantly see your text content from email messages to blog posts on your iPad, iPhone, or iPod Touch.

Code license: Closed source
Last updated: 21 Feb 2017

Express Scribe is a professional audio player software for PC or Mac that assists in the transcription of audio recordings.

Code license: Closed source
Last updated: 29 Dec 2014

eLaborate is an online work environment in which scholars can upload scans, transcribe and annotate text, and publish the results as on online text edition which is freely available to all users.

Code license: GNU GPL v3
Last updated: 29 Dec 2014

Mnemomap is a flash interactive search engine that generates a visual "Atomic-Tree", sends your queries to a Query List, and delivers the search results. The Atomic-Tree allows you to improve your query mid-search. The Query List allows you to customize your search query.

Last updated: 29 Dec 2014

Silobreaker is a search engine that aggregates the news from numerous sources and presents the contents in various visualization formats.

Last updated: 29 Dec 2014

Archive-It is a subscription web archiving service from the Internet Archive that helps organizations to harvest, build, and preserve collections of digital content. Through our user friendly web application Archive-It partners can collect, catalog, and manage their collections of archived content with 24/7 access and full text search available for their use as well as their patrons. Content is hosted and stored at the Internet Archive data centers.

Last updated: 29 Dec 2014

HandBrake is an open-source, GPL-licensed, multiplatform, multithreaded video transcoder

Code license: Open source
Last updated: 29 Dec 2014

OpenETD is an open source, web-based software application for managing the submission, approval, and distribution of electronic theses and dissertations (ETDs).

Code license: Open source, GNU GPL v3
Last updated: 29 Dec 2014

SiteCrawler is a website downloading application that allows users to capture entire sites or selected portions of sites like image galleries.

Code license: Closed source
Last updated: 29 Dec 2014

Search Flickr for photos, sort according to license types. Contains commercial as well as Creative Commons licensed photos.

Code license: Open source
Last updated: 29 Dec 2014

Insync extends Google Drive's web functionality to your desktop by integrating with Windows, Mac and Linux platforms. Insync allows for built-in sharing without a browser, multiple account support, on-demand shared file syncing, desktop notifications and more.

Code license: Closed source
Last updated: 29 Dec 2014

Mac and Windows tool for taking multiple screenshots, annotating them, and combining them into a single document.

Code license: Closed source
Last updated: 29 Dec 2014

Manage and publish your existing journal, or lead the Open Access movement in your field by starting a new journal. Scholastica makes it easy to collaborate on a journal and publish scholarship at the click of a button.

Code license: Closed source
Last updated: 29 Dec 2014

The DocScanner app uses a device's built-in camera to scan documents. Features include image optimization, OCR, document type recognition (document, business card, receipt, etc.), autosorting, and ability to upload documents to Evernote, Dropbox, and Google Drive.

Code license: Closed source
Last updated: 29 Dec 2014

Circos is a software package for visualizing data and information. It visualizes data in a circular layout — this makes Circos ideal for exploring relationships between objects or positions. There are other reasons why a circular layout is advantageous, not the least being the fact that it is attractive.
Circos is ideal for creating publication-quality infographics and illustrations with a high data-to-ink ratio, richly layered data and pleasant symmetries. You have fine control each element in the figure to tailor its focus points and detail to your audience.

Code license: GPL
Last updated: 29 Dec 2014
CSV
Subscribe to Capture