DH Answers

What kind of data should the tool work with?

Google Maps is a web mapping service application that includes street maps, satellite images, street view perspectives, as well as web functions such as routing and geocoding. The API can be used outside of the normal Google Maps interface for other projects.

Last updated: 7 Jun 2016

BatchGeo is an online service that maps address data as points. The cut and paste interface makes it easy to convert a spreadsheet of street addressed into a map can be embedded or downloaded as a KML file. A limited number of addresses can be mapped for free; large files require a subscription.

Code license: Closed source
Last updated: 7 Jun 2016

Beautiful Soup is a library, written in the Python programming language, for pulling specific pieces of data out of HTML and XML files. It is especially suitable when working with data files that aren't well-formed, or are otherwise difficult to parse.

Saves programmers hours or days of work on quick-turnaround screen scraping projects.

Last updated: 19 Apr 2016

The LC Newspaper Viewer is an open-source web application that understands how to model newspaper data created according to a set of technical guidelines, with the goal of publishing an online archive like Chronicling America.

Code license: Open source
Last updated: 23 Feb 2016

capella-scan can "OCR" music scores from PDF or common image formats and output the results in MusicXML for use with common music editing software.

Last updated: 23 Feb 2016

epub-tools is a collection of Python tools for generating and managing epub documents from Word, RTF, DocBook, TEI and FictionBook.

Code license: BSD
Last updated: 26 Jan 2016

Voyeur is a web-based text analysis environment where users can apply a wide variety of tools to any text they import.

Last updated: 3 Nov 2015

Zoho provides a drag-and-drop interface for creating database-driven applications, such as forms.

Code license: Closed source
Last updated: 3 Nov 2015

Open Journal Systems (OJS) is a journal management and publishing system. Public Knowledge Project (the sponsor of OJS) is a multi-university initiative developing (free) open source software and conducting research to improve the quality and reach of scholarly publishing

Code license: GNU GPL
Last updated: 10 Aug 2015

DEVONthink is a database that helps users organize, manage and collaborate on digital files, including Office files, links, e-mails, research data and PDFs.

Code license: Closed source
Last updated: 10 Aug 2015

Philologic is a full-text search, retrieval and analysis tool with support for TEI-Lite XML/SGML, Unicode encoding, plaintext, Dublin Core/HTML, and DocBook.

Code license: GNU GPL, Open source
Last updated: 9 Aug 2015

Microsoft Sharepoint is an environment for sharing documents with collaborators, using granular permissions. Sharepoint can tightly integrated with Microsoft Office (e.g. Office documents can be saved directly to Sharepoint, some Sharepoint installations allow web-based editing using the cloud-hosted Office 365. Sharepoint is commonly used to host collaborative workspaces, data management system, wikis and blogs.
Features:

  • Extensive integration with Microsoft Office System programs
Last updated: 14 Jul 2015

RStudio is an integrated development environment (IDE) for R. It is available in both open source and consumer versions, and can run either on your desktop, or through a browser connected to RStudio Server. Features include syntax highlighting, code completion, smart indentation, and an interactive debugger.

Code license: Open source
Last updated: 14 Jul 2015

SharpEye is music scanning/"OCR" software that can convert an image of a score into an editable format such as MusicXML.

Last updated: 28 May 2015

OxGarage is a web, and RESTful, service to manage the transformation of documents between a variety of formats. The majority of transformations use the Text Encoding Initiative format as a pivot format.

OxGarage is based on the Enrich Garage Engine developed by Poznan Supercomputing and Networking Center and Oxford University Computing Services for the ENRICH project.

See the conversion matrix for details.

Code license: Open source
Last updated: 27 May 2015

PDFtoMusic Pro converts PDFs created by other music notation programs to MusicXML scores.

Last updated: 22 May 2015

Anthologize is a WordPress plugin that allows users to outline, order, and edit content into a single volume that can be exported as PDF, TEI or epub.

Last updated: 22 May 2015

Scripto is an engine for crowdsourcing the transcription of content that can be integrated with a custom transcription GUI and existing CMS.

Last updated: 21 May 2015

Heritrix is web crawler used by the Internet Archive, which provides a web-based user interface after initial configuration on a Linux machine. Also used by the Library of Congress, Heritrix captures metadata in the Web ARChive (WARC) format.

Code license: Open source, Apache License
Last updated: 6 May 2015

SiteSucker is OSX and iOS software that can download an entire website, including images and videos.

Last updated: 6 May 2015

HTTrack provides an easy-to-use interface for downloading websites-- including HTML, images, and other files-- or update a copy of a previously-downloaded site.

Code license: Open source, GNU GPL
Last updated: 6 May 2015

Afloat is a utility that adds new window management functionality to OSX, including keeping a window on top, or turning a window into an overlay on the screen. This can be particularly useful in full-screen view (e.g. during a presentation) when you want to keep another small window, such as a Twitter client, visible somewhere on the screen.

Last updated: 5 May 2015

Confluence is enterprise wiki software, available either for installation on a local server or via cloud hosting. Open source projects can request a free license. Confluence integrates with other Atlassian products like JIRA.

Last updated: 5 May 2015

Juxta is an open-source cross-platform desktop tool for comparing and collating multiple witnesses to a single textual work. The software allows you to set any of the witnesses as the base text, to add or remove witness texts, to switch the base text at will, and to annotate Juxta-revealed comparisons and save the results. New in version 1.6.5 is the ability to upload your comparison sets to a free online workspace called Juxta Commons where you can analyze your data privately or choose to share visualizations of your work with anyone on the web.

Code license: Open source, Creative Commons
Last updated: 4 May 2015

Evernote is note-taking software in the cloud, with options for private and shared notebooks. Users can take text notes, and upload files to attach them to notes. Evernote has built-in OCR for images with printed or handwritten text. A premium account allows access to notebooks offline, as well as more storage and embedded PDF search.

Code license: Closed source
Last updated: 2 May 2015

Leximancer is text analysis software that can create topic and concept based network visualizations and includes a sentiment analyzer.

Last updated: 2 May 2015

Netvibes offers a free personal web dashboard for following feeds, friends and using the provided apps. A premium account includes functionality for analytics, tagging, curation, alerts, sentiment analysis, and search.

Last updated: 2 May 2015

Global Translator automatically translates WordPress sites into a variety of user-chosen languages, using one of four translation engines (Google Translation Engine, Babel Fish, Promt, FreeTranslations).

Code license: Open source
Last updated: 1 May 2015

Sophie is an electronic tool for authoring, collaborating, reading, and publishing rich media documents in networked environments. Built in Java it runs on a variety of platforms.

It does not support either the epub or mobi formats instead using its own internal format.

Development of the project seems to have stalled

Last updated: 1 May 2015

Scrivener is software for writing that includes virtual index cards, outlining, version control, import/export options, and scriptwriting features, and provides a management system for notes and documents plus support for document metadata.

It allows the creation of documents from sub documents, ebook (epub and Kindle/mobi) and TeX and LaTeX export as well as ODF, PDF and Microsoft Word exports.

A Linux version is in beta, and an iOS version is reportedly under development

Code license: Closed source
Last updated: 18 Aug 2015

R

R is a free software environment for statistical computing and graphics. R can be run from the command line, or using any of the many graphical user interfaces available on a variety of platforms; these are listed as separate tools.

Code license: GPL
Last updated: 29 Jan 2015

Navicat for MySQL is an interface for working with MySQL databases, including importing data from CSV or Excel, exporting, reporting, querying, and for developing scripts etc, and general database exploration

Navicat for MySQL can also be used with MariaDB databases

Last updated: 29 Dec 2014

SEASR provides an environment for developing data flows that ingest data, process it through a series of transformations and analytics, and send the data to a results viewer.

Last updated: 29 Dec 2014

MONK is a digital environment designed to help humanities scholars discover and analyze patterns in the texts they study.

Last updated: 29 Dec 2014

The Visual Understanding Environment (VUE) is concept mapping software that can integrate with multiple repositories to pull in, organize, and analyze data. Multiple features for advanced management of digital resources for teaching, learning, and research.

Last updated: 29 Dec 2014

GitHub is a web-based repository service which offers the distributed revision control and source code management (SCM) functionality of GIT with a graphical user interface, desktop, and mobile integration. It also provides collaboration tools such as access control, wikis, task management, code review, bug tracking, and feature requests. It offers free accounts, often used to host opensource software projects, and private (paid) repositories.

Code license: Closed source
Last updated: 29 Dec 2014

Image Map Tool allows you to upload an image (or specify the URL of an image found online) and turn it into a clickable image map.

Last updated: 29 Dec 2014

Integrated Content Environment (ICE) was an open source project of the Learning Resources Development (LRD) unit at the University of Southern Queensland. The content management system allowed users to convert content authored in Microsoft Word or OpenOffice.org Writer into self-contained course websites using the IMS format.

The ICE authoring environment enabled:

Code license: Open source, GNU GPL
Last updated: 29 Dec 2014

HyperPo is a user-friendly text exploration and analysis program that allows users to import texts or use texts available online (in English or French), and provides frequency lists of characters, words and series of words, color-coding to indicate repetition, KWIC, co-occurrence and distribution lists, and the ability to simultaneously compare data from multiple texts.

Last updated: 29 Dec 2014

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.

Code license: CPL, Open source
Last updated: 29 Dec 2014

Calibre is a free and open source ebook library management application, including options for syncing to devices and converting between a large number of formats. Calibre also has a built-in e-book editor for EPUB and AZW3 formats.

Code license: Open source, GNU GPL, GNU GPL v3
Last updated: 29 Dec 2014

Text Fixer allows users to copy and paste a Word document into a box and convert it to clean HTML.

Last updated: 29 Dec 2014

post is very well written with lot of useful information for me. I am happy to find your great way of writing the post. With your generous help it is easier for me to understand and implement the concept. Thank you for the post.
car for sale
new car leasing

Last updated: 29 Dec 2014

Twapper Keeper lets users create an archive of tweets based on hashtag, keyword, or person, for them to review online.

Last updated: 29 Dec 2014

PhotoScore takes an image of a music score-- including handwritten scores-- and outputs it in an editable format, including MusicXML.

Last updated: 29 Dec 2014

eXist-db is an open source database management system that stores XML data according to the XML data model and features efficient, index-based XQuery processing.

Code license: Open source, GNU GPL, GNU LGPL
Last updated: 29 Dec 2014
CSV
Subscribe to DH Answers