Gathering

What kind of data should the tool work with?

HEURIST (http://HeuristNetwork.org) is an extremely flexible, end-user oriented, web-based data management system designed specifically for Humanities data. Developed since 2005, it has been in active use across many projects since 2009. It is available both as a free web service for researchers (hosted at the University of Sydney Data Centre) or for installation on a physical or virtual server (Open Source on gitHub).

Researchers can design, create, manage, analyse, visualise and publish their own richly-structured database(s) through a simple web interface, without the need for a programmer(s). Quite complex databases can be built in a few hours by borrowing structures and vocabularies published by other users. Databases can be designed and built incrementally, as existing data are not affected by changes in structure. Databases created by Heurist are stored in MySQL with a repeatable structure facilitating independant access by other software.

Advanced features include record linking, graph structure, drill-down facet searches, rule-based queries, custom reports, linked map-timelines, network visualisation, normalised spreadsheet import, crosstabulation, XML feeds, XSLT transforms. The team provides initial email and skype assistance for project setup at no cost, and special customisations at modest cost.

Code license: Open source, GNU GPL, GNU GPL v3
Last updated: 15 Sep 2017

Textable is an open source program for text analysis. It offers a set of basic text-analytic components (e.g. import text from files, segment into words, measure segment diversity, etc.), which the user combines using a visual interface to build custom analytic workflows.

Code license: GNU GPL v3
Last updated: 20 Aug 2017

DiscoverText allows users to import data from a variety of sources (including free and premium Gnip Twitter feeds, plain text, Word, Excel, public YouTube comments, blogs/wikis, PDF, etc.), to view, search, filter, deduplicate, code and machine classify the data. This is a collaborative, web-based platform widely used by academics.

Code license: Closed source
Last updated: 24 Feb 2017

Yahoo Pipes allows users to combine, filter, translate, and geocode data from RSS feeds, JSON, KML, or other similar formats, and power widgets/badges using that data.

Last updated: 18 Jan 2017
Last updated: 2 Sep 2016

Jotform allows users to create web forms (for surveys, etc.) using a drag-and-drop interface.

Code license: Closed source
Last updated: 10 Aug 2016

ZeeMaps quickly maps point data on Google base maps in two ways:
1) The user uploads a .csv file of data points and their locations.
2) A group of users all add their own data location points to the map, on their own time from their own devices.

Each point can include text, video, image, or audio annotations.

Basic functionality is free; larger uploads and large numbers of maps require a paid subscription.

Code license: Closed source
Last updated: 7 Jun 2016

Crowdmap allows the investigator to set up a Web map around a particular topic and invite multiple users (participants, research subjects, collaborators, multiple assistants) to contribute information to the map on their own time and from their own device.

For $10/month, users can buy fee-based services including private maps and custom branding.

Code license: GNU LGPL
Last updated: 7 Jun 2016

Visualizes a series of events across both time and space. Allows researcher to create of an interactive timeline and map that are linked together. Users of the timeline can press "play" to watch the timeline scroll forward and the map zoom from place to place as they highlight each event (and the researcher's attached images and text) in turn. Users can also pause the progress of history, move forward or back at their own pace, and zoom in or out of either the map or timeline to examine areas of interest.

Compare to: StoryMap JS, MapStory, Odyssey.js

Code license: Closed source
Last updated: 7 Sep 2016

Overview is a tool for analyzing large sets of documents. In includes a sophisticated search engine, word clouds, entity detection, and topic-based document clustering. If that’s not good enough, you can write your own plugins using the API. It is open source and you can run it on your own computer.

It was originally designed for investigative journalists, but it’s now also used for qualitative research, social media conversation analysis, legal document review, digital humanities, and more.

Overview is built to do several types of tasks:

Code license: Open source
Last updated: 9 Mar 2016

import.io is a free web-based platform that puts the power of the machine readable web in user's hands. Using their tools users can create an API or crawl an entire website in a fraction of the time of traditional methods, no coding required. Their highly efficient and scalable platform allows users to process 1,000s of queries at once and get real-time data in any format you choose. They also offer an easy to use client library to make exporting, integrating and using data as simple as extracting it.

Code license: Closed source
Last updated: 15 Jan 2016

Figshare is a repository where users can make all of their research outputs available in a citable, shareable and discoverable manner. All file formats can be published, including videos and datasets that are often demoted to the supplemental materials section in current publishing models. Users of the site maintain full control over the management of their research whilst benefiting from global access, version control and secure backups in the cloud.

Code license: Closed source
Last updated: 29 Dec 2015

TwapperKeeper is now called Hootsuite Archives and can be accessed from within Hootsuite.

Code license: Closed source
Last updated: 13 Dec 2015

CulturalAnalytics is an R package containing functions for statistical analysis and plotting of image properties, including statistics such as the standard deviation and mean in the RGB and HSV color spaces, image entropy and histograms in greyscale (intensity) and color, and for plotting color clouds and image scatter charts.

Code license: Open source, GNU GPL
Last updated: 12 Nov 2015

Zoho provides a drag-and-drop interface for creating database-driven applications, such as forms.

Code license: Closed source
Last updated: 3 Nov 2015

NVivo is commercial software for qualitative analysis of unstructured data, in a range of formats and from diverse sources. Enables users to collect, organize, and analyze content from interviews, focus group discussions, surveys, audio, social media, videos, and webpages.

Code license: Closed source
Last updated: 30 Oct 2015

nodegoat is a web-based data management, analysis & visualisation environment.

Using nodegoat, you can define, create, update, query, and manage any number of datasets by use of a graphic user interface. Your custom data model autoconfigures the backbone of notegoat's core functionalities.

Code license: Closed source
Last updated: 17 Aug 2015

Editors' Notes is an open-source, web-based tool for recording, organizing, preserving, and opening access to research notes, built with the needs of documentary editing projects, archives, and library special collections in mind.

Code license: Open source
Last updated: 8 Jul 2015

Paperpile is a web-based commercial reference management software with special emphasis on integration with Google Docs and Google Scholar. It imports data from academic publisher websites and from databases such as PubMed, Google Scholar, Google Books, and arXiv. Paperpile can retrieve and store publication PDF files to the user's Google Drive account.

Code license: Closed source
Last updated: 8 Jul 2015

Snapzen is a browser tool that is used to collaborate with others about the information on any web page - right from your browser.

Discuss information on web pages with your colleagues, friends or family. It is easy to collaborate with others because they see exactly what you see on the web pages.

If you still use copy and paste, screenshot tools, email or chat to discuss web pages, Snapzen will show you a better way.

Code license: Closed source
Last updated: 15 Jun 2015

WebClust is a meta search engine that clusters documents into meaningful groups. WebClust presents search results in a horizontal topical arrangement, in addition to a single vertical list. WebClust's data mining technique is meant to make sense of large amounts of textual information extracted from the web, including digital libraries.

Last updated: 14 Jun 2015

The Open Science Framework (OSF) is a free, open source tool designed to help researchers manage the entire research workflow: planning, execution, reporting, archiving and discovery. It is part collaboration software and part version control system. The OSF can be used to manage individual projects or large collaborative ones. Privacy and sharing settings allow for fine-grained control over access to files and materials stored on the platform - share privately with collaborators or publicly with the community at large.

Code license: Apache License
Last updated: 14 Jun 2015

SylvaDB is a graph database management system. It allows users with no knowledge in graph theory to model, collect, query, and analyze data in a network structure. SylvaDB provides tools for easy creation of schemas and modelling, automatic forms creation to input the data, collaborative features, a visual query editor, global and local search, reports charts generation, networks metrics, and visualizations tools.

Code license: GNU Affero GPL v.3
Last updated: 9 Jun 2015

A text-mining system for scientific literature. Textpresso's two major elements are (1) access to full text, so that entire articles can be searched, and (2) introduction of categories of biological concepts and classes that relate to objects (e.g., association, regulation, etc.) or describe one (e.g., methods, etc).

Code license: Open source
Last updated: 28 May 2015

140kit provides a management layer for tweet collection and analysis.

Raw data cannot be passed through to the users, but any analytical process can be run across your dataset, and the data is held for as long as the user wants. When new analytical processes are created, they can be run on existing sets of data. 140kit does not claim any control of the analysis, however it retains ownership of the data collected.

Last updated: 24 May 2015

Scrapy is an open source programming library for web crawling and web page text extraction, written in Python. You can make calls to Scrapy code from within your own scripts and applications to automate the task of extracting data from websites.

You would typically use Scrapy to automate the task of visiting one or more web pages, on a website to which you have access. You could alternately use it to invoke web-based Application Programming Interfaces (APIs).

Code license: Open source
Last updated: 22 May 2015

AntWordProfiler is free software for analyzing word frequency.

Last updated: 9 May 2015

MDID is software for teaching and learning with digital images, with tools for discovering, aggregating, and presenting digital media in a variety of learning spaces.

Code license: Open source, GNU GPL
Last updated: 8 May 2015

HTTrack provides an easy-to-use interface for downloading websites-- including HTML, images, and other files-- or update a copy of a previously-downloaded site.

Code license: Open source, GNU GPL
Last updated: 6 May 2015

Evernote is note-taking software in the cloud, with options for private and shared notebooks. Users can take text notes, and upload files to attach them to notes. Evernote has built-in OCR for images with printed or handwritten text. A premium account allows access to notebooks offline, as well as more storage and embedded PDF search.

Code license: Closed source
Last updated: 2 May 2015

SearchTeam is a collaborative search engine that allows individuals and groups to curate search results in a public or shared SearchSpace.

Code license: Closed source
Last updated: 1 May 2015

ScraperWiki is an online tool to make that makes the process of data scraping simpler and more collaborative. Anyone can write a screen scraper using the online editor. In the free version, the code and data are shared with the world. Because it's a wiki, other programmers can contribute to and improve the code.

Code license: GPL
Last updated: 1 May 2015

After creating a free account, users can submit requests for mining and analyzing JSTOR content. By submitting a query, a user will receive a random sample of 1,000 of JSTOR's 4.6 million documents; more documents can be received by contacting JSTOR directly. Users can choose to receive the following results:

  • Citations Only (all requests come with citations by default)
  • Word Counts
  • Bigrams
  • Trigrams
  • Quadgrams
  • Key Terms
  • References
Last updated: 29 Apr 2015

STACK is an extensible social media research toolkit designed to collect, process, and store data from online social networks. The toolkit is an ongoing project via the Syracuse University iSchool, and currently supports the Twitter Streaming API. Collecting from Facebook public pages and Twitter search API are under development. The toolkit architecture is modular and supports extending. Basic Linux / Mac command line skills needed.

To learn more: https://github.com/bitslabsyr/stack

Code license: Open source
Last updated: 21 Apr 2015

Academia.edu is a social platform that allows academics to share research papers, gray literature, reviews and other scholarly materials. The site provides user statistics on the number and geographic origin of profile and document views. Academic affiliation is displayed in a tree-like format, grouped by universities and departments.

Code license: Closed source
Last updated: 21 Apr 2015

Bitext provides multilingual semantic technologies in the field of Text Analyics via API with services like Entity Extraction, Concept Extraction, Sentiment Analysis, and Text Categorisation.

Last updated: 25 Mar 2015

Content curation and topic discovery website based primarily on publishers the user follows through social media.

Code license: Open source
Last updated: 30 Jan 2015

Weka provides machine learning algorithms in Java for data mining and predictive modeling tasks. These algorithms can either be incorporated into other Java code or called from the Weka Workbench, a GUI environment.

Code license: Open source, GNU GPL
Last updated: 29 Dec 2014

The Open Harvester Systems is a free metadata indexing system that allowers users to create a searchable index of the metadata from Open Archives Initiative (OAI)-compliant archives, such as sites using Open Journal Systems (OJS) or Open Conference Systems (OCS). It can harvest OAI metadata in a variety of schemas (including unqualified DC, the PKP (Open Journal Systems/Open Conference Systems) Dublin Core extension, MODS, and MARCXML).

Code license: GNU GPL
Last updated: 29 Dec 2014

Calibre is a free and open source ebook library management application, including options for syncing to devices and converting between a large number of formats. Calibre also has a built-in e-book editor for EPUB and AZW3 formats.

Code license: Open source, GNU GPL, GNU GPL v3
Last updated: 29 Dec 2014

eXist-db is an open source database management system that stores XML data according to the XML data model and features efficient, index-based XQuery processing.

Code license: Open source, GNU GPL, GNU LGPL
Last updated: 29 Dec 2014

"The Virtual Lightbox for Museums and Archives (VLMA) is an educational tool for collecting and reusing in a structured fashion the online contents of museums and archives with visual components. With VLMA, you can browse and search collections, construct personal collections, export these collections to xml or Impress presentation format, annotate them, and share your collections with other VLMA users."

Code license: Open source
Last updated: 29 Dec 2014

GNU Wget is a free software package for retrieving files using HTTP, HTTPS and FTP.

Code license: Open source, GNU GPL
Last updated: 29 Dec 2014

LibLime Koha is a web-based, open source integrated library system (ILS) that has also been used for virtual library systems (e.g. recreating historic libraries). LibLime Koha offers libraries circulation policies, patron management modules, parent-child relationship for patron records, club and service management features, in-depth "holds" support, single click batch import "undo" option, EzProxy compatibility, self-checkout interface and more.

Code license: Open source, GNU GPL
Last updated: 29 Dec 2014

DownThemAll is a Firefox plugin that allows users to download all the links or images contained in a webpage.

Last updated: 29 Dec 2014

Qiqqa is a research management software that allows you to organize large numbers of papers; find new papers to read and new information about papers you already have; review materials and create annotation reports. Qiqqa has several PDF tools that also allow you to convert from PDFs to text, and use a clipboard function to cut and paste text into your document.

Code license: Closed source
Last updated: 29 Dec 2014

MediaWiki is a free software open source wiki package written in PHP, originally for use on Wikipedia and other Wikimedia Foundation projects. It is designed to be run on a large server farm for a website that gets millions of hits per day.

Code license: Open source, GNU GPL, GNU GPL v2
Last updated: 29 Dec 2014

Archive-It is a subscription web archiving service from the Internet Archive that helps organizations to harvest, build, and preserve collections of digital content. Through our user friendly web application Archive-It partners can collect, catalog, and manage their collections of archived content with 24/7 access and full text search available for their use as well as their patrons. Content is hosted and stored at the Internet Archive data centers.

Last updated: 29 Dec 2014

OpenETD is an open source, web-based software application for managing the submission, approval, and distribution of electronic theses and dissertations (ETDs).

Code license: Open source, GNU GPL v3
Last updated: 29 Dec 2014

SiteCrawler is a website downloading application that allows users to capture entire sites or selected portions of sites like image galleries.

Code license: Closed source
Last updated: 29 Dec 2014

WikiPack is a web based personal information organizer and Markdown editor that uses Dropbox for synced storage. Using plain text Markdown files and WikiWords, WikiPack gives information context and links entries together by turning your Markdown pages into a private, password protected wiki. The easy to use Markdown language lets you create and edit your wiki pages without having to learn complex wiki syntax.

Code license: Closed source
Last updated: 29 Dec 2014

Pocket was founded in 2007 by Nate Weiner to help people save interesting articles, videos and more from the web for later enjoyment. Once saved to Pocket, the list of content is visible on any device — phone, tablet or computer. It can be viewed while waiting in line, on the couch, during commutes or travel — even offline.

Code license: Closed source
Last updated: 29 Dec 2014

News and RSS reader designed for iOS and Android mobile devices. Has been replaced by Google Play Newsstand (https://play.google.com/store/newsstand?hl=en)

Last updated: 29 Dec 2014

Manage and publish your existing journal, or lead the Open Access movement in your field by starting a new journal. Scholastica makes it easy to collaborate on a journal and publish scholarship at the click of a button.

Code license: Closed source
Last updated: 29 Dec 2014

Artifex Press is a publishing and technology company that digitally publishes catalogues raisonnés, a comprehensive, annotated documentation of all of the known artworks by an artists. They have developed a proprietary, patented software platform and a dedicated publishing program in order to create digital catalogues raisonnés. They offer both their own digital catalogues raisonnés and the ability to licence the software to produce your own projects.

Code license: Closed source
Last updated: 29 Dec 2014

ProProfs Poll Software, offers instructors, educators and organizations advanced options for creating effective online polls, in a matter of minutes. With ProProfs, anyone can create different kinds of polls using multiple choices, checkboxes and essay question types. Users can create text-based polls, image-based polls and even polls with a combination of text, images and videos. A set of advanced customization features allows users to create polls, using different themes, adding comment sections, shuffling answers and even adding an expiry date to the polls.

Last updated: 29 Dec 2014
CSV
Subscribe to Gathering