Gephi is graphing software that provides a way to explore data through visualization and network analysis.
Recogito is an online platform for collaborative document annotation.
Recogito provides a personal workspace where you can upload, collect and organize your source materials - texts and images - and collaborate in their annotation and interpretation. Recogito enables you to make your work more visible on the Web more easily, and to expose the results of your research as Open Data.
Jotform allows users to create web forms (for surveys, etc.) using a drag-and-drop interface.
The FAIMS Mobile Platform (http://www.fedarch.org) is an open source, generalised system for digital data collection on Android. It works offline and helps record free text, multimedia, structured or spatial data with ample opportunity for the capture of metadata and certainty components of the captured data. It needs to be customised (via an xml definition document) for particular field/lab workflows. As a server-client system it facilitates simultaneous operation by multiple users.
Roambi Flow is an iOS online publishing service. Roambi allows you to transform Excel data and other publications to visualizations compatible with mobile, and also send them to the iPhone or iPad.
An (optical character recognition) engine for creating editable and searchable electronic files from scanned paper documents, PDFs and digital photographs.
- Recognition of Digital Camera and Mobile Phone Camera Images
- Comprehensive Language Support
- Complete Integration with Popular Office Applications
- PDF conversion, archiving and security
DM is an environment for the study and annotation of images and texts. It is a suite of tools, enabling scholars to gather and organize the evidence necessary to support arguments based in digitized resources. DM enables users to mark fragments of interest in manuscripts, print materials, photographs, etc. and provide commentary on these resources and the relationships among them.
Online Digital Asset Management (DAM) and collaboration platform for business use. Offers user management, custom branding and password-protected folders.
Bookworm is a tool that visualizes language usage trends in repositories of digitized texts in a simple and powerful way. It is a tool for culturomic exploration through the observation of chronological trends for words and phrases in large digitized collections of textual documents with metadata facets.
Overview is a tool for analyzing large sets of documents. In includes a sophisticated search engine, word clouds, entity detection, and topic-based document clustering. If that’s not good enough, you can write your own plugins using the API. It is open source and you can run it on your own computer.
It was originally designed for investigative journalists, but it’s now also used for qualitative research, social media conversation analysis, legal document review, digital humanities, and more.
Overview is built to do several types of tasks:
Audacity is a free, easy-to-use and multilingual audio editor and recorder. Basic features, as listed on their website, include:
- Record live audio.
- Record computer playback on any Windows Vista or later machine.
- Convert tapes and records into digital recordings or CDs.
- Edit WAV, AIFF, FLAC, MP2, MP3 or Ogg Vorbis sound files.
- Cut, copy, splice or mix sounds together.
- Change the speed or pitch of a recording.
The LC Newspaper Viewer is an open-source web application that understands how to model newspaper data created according to a set of technical guidelines, with the goal of publishing an online archive like Chronicling America.
HEURIST is a database management system designed specifically for Humanities data. Any confident researcher can design, create, manage, analyse and publish their own richly-structured database(s) through a simple web interface, without programmers or consultants. Databases can be designed and built incrementally, as existing data are not affected by changes in structure.
Combined with the Leptonica Image Processing Library Tesseract can read a wide variety of image formats and convert them to text in over 40 languages.
This code is a raw OCR engine. It has no output formatting and no UI. It can detect fixed pitch vs proportional text. Nevertheless in 1995 this engine was in the top 3 in terms of character accuracy, and it compiles and runs on both Linux and Windows. Training code is included in the open source release.
The core developer on the project is Ray Smith (theraysmith).
import.io is a free web-based platform that puts the power of the machine readable web in user's hands. Using their tools users can create an API or crawl an entire website in a fraction of the time of traditional methods, no coding required. Their highly efficient and scalable platform allows users to process 1,000s of queries at once and get real-time data in any format you choose. They also offer an easy to use client library to make exporting, integrating and using data as simple as extracting it.
Figshare is a repository where users can make all of their research outputs available in a citable, shareable and discoverable manner. All file formats can be published, including videos and datasets that are often demoted to the supplemental materials section in current publishing models. Users of the site maintain full control over the management of their research whilst benefiting from global access, version control and secure backups in the cloud.
A free iOS app for text analysis. Textal allows you to analyze documents, tweet streams, and webpages. Create clickable text clouds based on the source data that you choose. It comes pre-loaded with a large number of public domain texts. Text clouds are easily shareable via various Twitter and email.
TwapperKeeper is now called Hootsuite Archives and can be accessed from within Hootsuite.
CulturalAnalytics is an R package containing functions for statistical analysis and plotting of image properties, including statistics such as the standard deviation and mean in the RGB and HSV color spaces, image entropy and histograms in greyscale (intensity) and color, and for plotting color clouds and image scatter charts.
NVivo is commercial software for qualitative analysis of unstructured data, in a range of formats and from diverse sources. Enables users to collect, organize, and analyze content from interviews, focus group discussions, surveys, audio, social media, videos, and webpages.
Jing allows you to take screenshots, record screencasts, and instantly share information. Jing is the low end of a suite of screen capture products. SnagIt provides a few extra features (such as saving videos in formats other than SWF) for a small fee. Camtasia is at the high end, with full video editing capabilities.
Simple screencasting and image capture tool. SnagIt is part of TechSmith's family of screen capture and video editing products. Jing offers fewer features, but is a free alternative. Camtasia is the most fully featured of the products, but also the most expensive.
Camtasia is Mac/Windows software for recording screencasts and editing video. Videos can be sent directly to YouTube or integrated with Google Drive. Camtasia is the high end of a suite of screen capture products. SnagIt is a cheaper alternative with fewer features. Jing, the most basic of the TechSmith screen capture products, is free.
Snapzen is a browser tool that is used to collaborate with others about the information on any web page - right from your browser.
Discuss information on web pages with your colleagues, friends or family. It is easy to collaborate with others because they see exactly what you see on the web pages.
If you still use copy and paste, screenshot tools, email or chat to discuss web pages, Snapzen will show you a better way.
SylvaDB is a graph database management system. It allows users with no knowledge in graph theory to model, collect, query, and analyze data in a network structure. SylvaDB provides tools for easy creation of schemas and modelling, automatic forms creation to input the data, collaborative features, a visual query editor, global and local search, reports charts generation, networks metrics, and visualizations tools.
A text-mining system for scientific literature. Textpresso's two major elements are (1) access to full text, so that entire articles can be searched, and (2) introduction of categories of biological concepts and classes that relate to objects (e.g., association, regulation, etc.) or describe one (e.g., methods, etc).
140kit provides a management layer for tweet collection and analysis.
Raw data cannot be passed through to the users, but any analytical process can be run across your dataset, and the data is held for as long as the user wants. When new analytical processes are created, they can be run on existing sets of data. 140kit does not claim any control of the analysis, however it retains ownership of the data collected.
Whatizit can ingest up to 500,000 terms pasted into the input box and execute any of the pre-defined text analysis pipelines.
A free (under the GNU General Public License) toolkit for the development of document image recognition systems.
- Custom dictionaries may be created to assist with analysis of specific record types
- Extensible functionality
- Optical character recognition (OCR) toolkit plugin
Scrapy is an open source programming library for web crawling and web page text extraction, written in Python. You can make calls to Scrapy code from within your own scripts and applications to automate the task of extracting data from websites.
You would typically use Scrapy to automate the task of visiting one or more web pages, on a website to which you have access. You could alternately use it to invoke web-based Application Programming Interfaces (APIs).
Users can upload photos and organize them into albums, and they can search photos that have been posted in public albums and filter the results by license (any Creative Commons license, licenses that allow commercial use, licenses that allow remixing).
Lynks provides an easy to use, in-browser tool that helps you to create your own networks. Lynks is an initiative by Centre for Innovation, part of Leiden University (Campus The Hague). The software has been developed in 2014 in co-creation, with expertise from Dr. Eelke Heemskerk from University of Amsterdam. The software development has been supported by the financial contributions from the European Union Fund for Regional Development (EFRO) and the Municipality of The Hague.
MDID is software for teaching and learning with digital images, with tools for discovering, aggregating, and presenting digital media in a variety of learning spaces.
Heritrix is web crawler used by the Internet Archive, which provides a web-based user interface after initial configuration on a Linux machine. Also used by the Library of Congress, Heritrix captures metadata in the Web ARChive (WARC) format.
SiteSucker is OSX and iOS software that can download an entire website, including images and videos.
HTTrack provides an easy-to-use interface for downloading websites-- including HTML, images, and other files-- or update a copy of a previously-downloaded site.
FromThePage is free software that allows volunteers to transcribe handwritten documents on-line. It's easy to index and annotate subjects within a text using a simple, wiki-like mark-up. Users can discuss difficult writing or obscure words within a page to refine their transcription. The resulting text is hosted on the web, making documents easy to read and search.
Evernote is note-taking software in the cloud, with options for private and shared notebooks. Users can take text notes, and upload files to attach them to notes. Evernote has built-in OCR for images with printed or handwritten text. A premium account allows access to notebooks offline, as well as more storage and embedded PDF search.
SearchTeam is a collaborative search engine that allows individuals and groups to curate search results in a public or shared SearchSpace.
ScraperWiki is an online tool to make that makes the process of data scraping simpler and more collaborative. Anyone can write a screen scraper using the online editor. In the free version, the code and data are shared with the world. Because it's a wiki, other programmers can contribute to and improve the code.
PDFMiner is a Python tool for extracting information from PDFs (not only text, but also information about fonts, encoding, and layout.)
After creating a free account, users can submit requests for mining and analyzing JSTOR content. By submitting a query, a user will receive a random sample of 1,000 of JSTOR's 4.6 million documents; more documents can be received by contacting JSTOR directly. Users can choose to receive the following results:
- Citations Only (all requests come with citations by default)
- Word Counts
- Key Terms
Academia.edu is a social platform that allows academics to share research papers, gray literature, reviews and other scholarly materials. The site provides user statistics on the number and geographic origin of profile and document views. Academic affiliation is displayed in a tree-like format, grouped by universities and departments.
Bitext provides multilingual semantic technologies in the field of Text Analyics via API with services like Entity Extraction, Concept Extraction, Sentiment Analysis, and Text Categorisation.
Extensive set of tools to allow collaborative transcription of manuscript pages in TEI-compliant XML.
Features of T-PEN through version 1.2 [from project blog]
Zoom Tool in Transcription User Interface: Holding CTRL+SHIFT will result in a magnified image of the current line being transcribed.
Photoshop Express allows simple web-based image editing and cloud storage (2 GB free via Adobe Revel), as well as video storage and streaming, slideshow templates, and a photo gallery. Features include online galleries and slideshows, exporting and searching images, and privacy settings. Android, Windows and iOS (including iPad) apps are available.
Calibre is a free and open source ebook library management application, including options for syncing to devices and converting between a large number of formats. Calibre also has a built-in e-book editor for EPUB and AZW3 formats.
eXist-db is an open source database management system that stores XML data according to the XML data model and features efficient, index-based XQuery processing.
"The Virtual Lightbox for Museums and Archives (VLMA) is an educational tool for collecting and reusing in a structured fashion the online contents of museums and archives with visual components. With VLMA, you can browse and search collections, construct personal collections, export these collections to xml or Impress presentation format, annotate them, and share your collections with other VLMA users."
LibLime Koha is a web-based, open source integrated library system (ILS) that has also been used for virtual library systems (e.g. recreating historic libraries). LibLime Koha offers libraries circulation policies, patron management modules, parent-child relationship for patron records, club and service management features, in-depth "holds" support, single click batch import "undo" option, EzProxy compatibility, self-checkout interface and more.
Capture Fox is a Firefox plugin that allows the user to record their voice and their sccreen.
CamStudio is free and open source screencasting software that saves the video as AVI files, though a Flash converter is included.
Wink is a tutorial and presentation creation software, that allows you to create tutorials on how to use software, by capturing screenshots, mouse movements, and accompanying audio.
ScreenFlick enables large-resolution recording, allowing you to make videos of screencasts with audio.
Screenr is a free web-based screen recording program that allows you to create and share screencasts on the web. You can record on a Mac or PC, and the recordings play everywhere, including iPhones. Very easy to use.
ScreenFlow is a screen recording software for the Mac that allows you to record, edit and share audio and video on your computer.
Snapz Pro X allows you to record anything on your computer screen. You can save audio or video as a QuickTime® movie or screenshot that can be shared.
TipCam is a screen recording software for Windows, which allows you to record images and audio on your screen, and upload to YouTube. Latest version was released in 2008, likely not supported anymore.
Qiqqa is a research management software that allows you to organize large numbers of papers; find new papers to read and new information about papers you already have; review materials and create annotation reports. Qiqqa has several PDF tools that also allow you to convert from PDFs to text, and use a clipboard function to cut and paste text into your document.
MediaWiki is a free software open source wiki package written in PHP, originally for use on Wikipedia and other Wikimedia Foundation projects. It is designed to be run on a large server farm for a website that gets millions of hits per day.
Dragon Dictation is a voice recognition application that allows you to speak and instantly see your text content from email messages to blog posts on your iPad, iPhone, or iPod Touch.
Express Scribe is a professional audio player software for PC or Mac that assists in the transcription of audio recordings.
eLaborate is an online work environment in which scholars can upload scans, transcribe and annotate text, and publish the results as on online text edition which is freely available to all users.
Mnemomap is a flash interactive search engine that generates a visual "Atomic-Tree", sends your queries to a Query List, and delivers the search results. The Atomic-Tree allows you to improve your query mid-search. The Query List allows you to customize your search query.
Silobreaker is a search engine that aggregates the news from numerous sources and presents the contents in various visualization formats.
Archive-It is a subscription web archiving service from the Internet Archive that helps organizations to harvest, build, and preserve collections of digital content. Through our user friendly web application Archive-It partners can collect, catalog, and manage their collections of archived content with 24/7 access and full text search available for their use as well as their patrons. Content is hosted and stored at the Internet Archive data centers.
HandBrake is an open-source, GPL-licensed, multiplatform, multithreaded video transcoder
OpenETD is an open source, web-based software application for managing the submission, approval, and distribution of electronic theses and dissertations (ETDs).
SiteCrawler is a website downloading application that allows users to capture entire sites or selected portions of sites like image galleries.
Search Flickr for photos, sort according to license types. Contains commercial as well as Creative Commons licensed photos.
Insync extends Google Drive's web functionality to your desktop by integrating with Windows, Mac and Linux platforms. Insync allows for built-in sharing without a browser, multiple account support, on-demand shared file syncing, desktop notifications and more.
Mac and Windows tool for taking multiple screenshots, annotating them, and combining them into a single document.
Manage and publish your existing journal, or lead the Open Access movement in your field by starting a new journal. Scholastica makes it easy to collaborate on a journal and publish scholarship at the click of a button.
The DocScanner app uses a device's built-in camera to scan documents. Features include image optimization, OCR, document type recognition (document, business card, receipt, etc.), autosorting, and ability to upload documents to Evernote, Dropbox, and Google Drive.
Circos is a software package for visualizing data and information. It visualizes data in a circular layout — this makes Circos ideal for exploring relationships between objects or positions. There are other reasons why a circular layout is advantageous, not the least being the fact that it is attractive.
Circos is ideal for creating publication-quality infographics and illustrations with a high data-to-ink ratio, richly layered data and pleasant symmetries. You have fine control each element in the figure to tailor its focus points and detail to your audience.