Data collection

What kind of data should the tool work with?

There is an unlimited number of videos, PDFs, etc. that can be used for education, training, instruction, or professional development.

Finding and curating them into playlists, integrating with existing workflow, and sharing with others is time consuming, inefficient, and often limited by ‘vendor lock-in.’

Media Share is a productivity tool that saves time, requires no training to use, and does not limit how or where content can be used.

Code license: Closed source
Last updated: 25 Oct 2017

HEURIST (http://HeuristNetwork.org) is an extremely flexible, end-user oriented, web-based data management system designed specifically for Humanities data. Developed since 2005, it has been in active use across many projects since 2009. It is available both as a free web service for researchers (hosted at the University of Sydney Data Centre) or for installation on a physical or virtual server (Open Source on gitHub).

Researchers can design, create, manage, analyse, visualise and publish their own richly-structured database(s) through a simple web interface, without the need for a programmer(s). Quite complex databases can be built in a few hours by borrowing structures and vocabularies published by other users. Databases can be designed and built incrementally, as existing data are not affected by changes in structure. Databases created by Heurist are stored in MySQL with a repeatable structure facilitating independant access by other software.

Advanced features include record linking, graph structure, drill-down facet searches, rule-based queries, custom reports, linked map-timelines, network visualisation, normalised spreadsheet import, crosstabulation, XML feeds, XSLT transforms. The team provides initial email and skype assistance for project setup at no cost, and special customisations at modest cost.

Code license: Open source, GNU GPL, GNU GPL v3
Last updated: 13 Oct 2017

Yahoo Pipes allows users to combine, filter, translate, and geocode data from RSS feeds, JSON, KML, or other similar formats, and power widgets/badges using that data.

Last updated: 18 Jan 2017

MyIndicators (http://myindicators.net/) is a digital, easy-to-use tool allows researchers, educators, students or anyone, to build their own tailored indicators (e.g. goals, strategies, parameters, survey, questions, calories intakes, alcohol consumptions or quantified self in terms of training, mood tracking or sleeping quality etc.)

Code license: Closed source
Last updated: 2 Sep 2016

Jotform allows users to create web forms (for surveys, etc.) using a drag-and-drop interface.

Code license: Closed source
Last updated: 10 Aug 2016

Geospatial Data Abstraction Library (GDAL) is a translator library for vector and raster geospatial data formats that is released under an X/MIT style Open Source license by the Open Source Geospatial Foundation.

Code license: Open source, MIT License
Last updated: 7 Jun 2016

ERDAS Imagine is a suite of geospatial data authoring software. The suite contains a raster graphics editor and remote sensing application that performs advanced remote sensing analysis and spatial modelling to create new information. ERDAS IMAGINE can also visualize results in 2D, 3D, video, and on cartographic quality map compositions. It is primarily designed for geospatial raster data processing and the creation of digital images for mapping use in GIS or CAD software.

Features:

  • Image Analysis, Remote Sensing
Code license: Closed source
Last updated: 7 Jun 2016

ArcGIS is a suite of software that comprises of Desktop GIS, Server GIS, Mobile GIS, and ArcGIS Online. ArcGIS is a platform for building a complete geographic information system (GIS) that lets you easily create, edit, and analyse geographic knowledge on the desktop; publish data, maps, globes and models to a GIS server and/or share them online; and use them on the desktop, on the Web, or in the field.

Features:

  • View and query maps
  • Manipulate shapefiles and geodatabases
Code license: Closed source
Last updated: 7 Jun 2016

A software application that is used for analysing and visualising multi-volume seismic data.

Features:

  • Visualization and analysis of 2D and 3D seismic data in a single survey
  • 2D and 3D horizon tracking including auto-tracking, plane-by-plane, line and manual tracking
  • On-the-fly calculation and visualization of various attributes and filters
  • Plug-in architecture
Code license: Open source, GNU GPL
Last updated: 7 Jun 2016

A statistical package that may be used to compare quantified assemblages of broken and incomplete objects, such as ceramics, glass and bones. Pie-Slice uses Estimated Vessel Equivalent (EVE) as a base form of measurement, in which each measurable fraction is scored as a fraction of a complete vessel. It also trials the use of a new statistical transformation - the pseudo-count transformation - which converts EVEs into Pottery Information Equivalents (PIEs). The latter enables assemblages to be compared using techniques such as log-linear and correspondence analyses.

Last updated: 7 Jun 2016

The Altmetric Explorer is a powerful web app that allows you to track the conversations around scientific articles online. Altmetric collects and analyzes hundreds of thousands of postings about tens of thousands of articles and datasets each month. It makes this data available to end users through an intuitive user interface and to developers through an API.

Code license: Closed source
Last updated: 23 Mar 2016

Publish or Perish is a software program that retrieves and analyzes academic citations. It uses Google Scholar to obtain the raw citations, then analyzes these and presents the statistics.

Last updated: 24 Feb 2016

The TAPoR Portal is an online environment where users can keep track of texts they want to study (uploaded or available online), learn about and try different tools, and run tools on texts.

Last updated: 23 Feb 2016

The Entity Authority Tool Set (EATS) is a web application for recording, editing, using and displaying authority information about entities. It is designed to allow multiple authorities to each maintain their own independent data, while operating on a common base so that information about the same entity is all in one place. EATS also comes with client tools for automatically looking up entities in a text by name and adding appropriate TEI markup.
Features:

  • A web API for importing and exporting entity data
Code license: Open source, GNU GPL
Last updated: 26 Jan 2016

Specify is a database platform for museum and herbarium research data. It manages species and specimen information for computerizing biological collections, tracking museum specimen transactions, linking images to specimen records and publishing catalog data to the Internet. Specify is written in Java for Windows, Mac OS X, and Linux computers and uses the relational data manager, MySQL, as its data engine. Specify, Java, and MySQL are free and open-source.

Code license: Open source, GNU GPL, GNU GPL v2
Last updated: 10 Jan 2016

Zoho provides a drag-and-drop interface for creating database-driven applications, such as forms.

Code license: Closed source
Last updated: 3 Nov 2015

The MONK workbench provides 525 works of American literature from the 18th and 19th centuries, and 37 plays and 5 works of poetry by William Shakespeare, along with tools to enable literary research through the discovery, exploration, and visualization of patterns.

Users affiliated with CIC (Big Ten) schools can access a larger data set that includes about a thousand works of British literature from the 16th through the 19th century, provided by The Text Creation Partnership (EEBO and ECCO) and ProQuest (Chadwyck-Healey Nineteenth-Century Fiction).

Last updated: 12 Aug 2015

Online web survey tool written in PHP using MySQL, MSSQL or Postgres database. Multilingual site with demo, feature list, documentation [Open Source, GPL3]

Code license: Open source, GNU GPL, GNU GPL v3
Last updated: 7 Aug 2015

KORA is an digital repository that allows institutions to ingest, manage, and deliver digital objects and metadata.

Code license: Open source
Last updated: 5 Aug 2015

Omeka is a content management system designed for the display of library, museum, archives, and scholarly collections and exhibitions.

Code license: Open source, GNU GPL
Last updated: 2 Aug 2015

Bibliopedia will perform advanced data-mining and cross-referencing of scholarly literature to create a humanities-centered collaboratory. As a prototype, it will search resources including JSTOR and Library of Congress for metadata about scholarly articles and books that mention the famed medieval travel narrative The Travels of Sir John Mandeville, examine the articles and books for citations, then save the results in a publicly accessible database.

Code license: Open source
Last updated: 2 Jul 2015

The Open Science Framework (OSF) is a free, open source tool designed to help researchers manage the entire research workflow: planning, execution, reporting, archiving and discovery. It is part collaboration software and part version control system. The OSF can be used to manage individual projects or large collaborative ones. Privacy and sharing settings allow for fine-grained control over access to files and materials stored on the platform - share privately with collaborators or publicly with the community at large.

Code license: Apache License
Last updated: 14 Jun 2015

Zotero is a free tool that collects, manages and cites research sources. It stays on your web browser where you do your work and it's easy to use. It's being downloaded as a firefox extension, used with the chrome and safari browsers or used as a standalone tool. It allows you to attach pdfs, notes and images to your citations, organise them into easily searchable collections for different projects, and open office using any of over 2800 citation styles.

Code license: GNU Affero GPL
Last updated: 24 May 2015

TextGrid is a virtual research environment (VRE) for the humanities, providing integrated access to specialized tools, services and content, and serving as a long-term archive for research data in the humanities.

Last updated: 22 May 2015

Scrapy is an open source programming library for web crawling and web page text extraction, written in Python. You can make calls to Scrapy code from within your own scripts and applications to automate the task of extracting data from websites.

You would typically use Scrapy to automate the task of visiting one or more web pages, on a website to which you have access. You could alternately use it to invoke web-based Application Programming Interfaces (APIs).

Code license: Open source
Last updated: 22 May 2015

Silk is a platform for sites that contain collections of information. It's like the Tumblr for websites that have structured content–like software reviews, information about designers, a site with UN datasets, and more.

Last updated: 22 May 2015

Scripto is an engine for crowdsourcing the transcription of content that can be integrated with a custom transcription GUI and existing CMS.

Last updated: 21 May 2015

Projects allows researchers to organise and manage all their research outputs in a safe, simple and structured way. It’s designed to help academics, at any stage of their career, keep track and stay on top of all their results. It’s a light, useful and slick application that integrates into a researcher’s existing workflow to help them work more efficiently and ensure they have more time for making discoveries.

Last updated: 19 May 2015

"Collex allows users to collect, annotate, and tag online objects and to repurpose them in illustrated, interlinked essays or exhibits."

Last updated: 9 May 2015

Greenstone is a suite of software for building and distributing digital library collections. It also allows users to publish to the internet or CD-ROM. Software interface and documentation available in English, French, Spanish, Russian, and Kazakh.

Code license: Open source, GNU GPL
Last updated: 8 May 2015

CoCoCo is an application for collecting, cataloging, and assessing the quality of user-submitted text or uploaded-file contributions.

Last updated: 8 May 2015

Heritrix is web crawler used by the Internet Archive, which provides a web-based user interface after initial configuration on a Linux machine. Also used by the Library of Congress, Heritrix captures metadata in the Web ARChive (WARC) format.

Code license: Open source, Apache License
Last updated: 6 May 2015

SiteSucker is OSX and iOS software that can download an entire website, including images and videos.

Last updated: 6 May 2015

HTTrack provides an easy-to-use interface for downloading websites-- including HTML, images, and other files-- or update a copy of a previously-downloaded site.

Code license: Open source, GNU GPL
Last updated: 6 May 2015

A digital repository software package that may be used to accept, manage and publish digital objects. It is widely used in academia as a system to manage academic research papers, electronic theses and other distinct digital resources. EPrints offers an extensible plug-in architecture, enabling data processing activities to be tailored to the requirements of the institution.

Features:

Code license: Open source, GNU GPL
Last updated: 1 May 2015

ScraperWiki is an online tool to make that makes the process of data scraping simpler and more collaborative. Anyone can write a screen scraper using the online editor. In the free version, the code and data are shared with the world. Because it's a wiki, other programmers can contribute to and improve the code.

Code license: GPL
Last updated: 1 May 2015

Wiggio is a free service that allows users to create groups, host virtual meetings and conference calls, manage events, create to-do lists, poll members, send messages, and upload and manage folders. You can connect with your FaceBook account or create a new and free account with Wiggio. They are no longer supporting previously available Wiggio apps.

Code license: Closed source
Last updated: 22 Mar 2015

Freedity can create an RSS feed from any web page, with the number of feeds and update interval varying based on the tier of the subscription.

Last updated: 5 Mar 2015

R

R is a free software environment for statistical computing and graphics. R can be run from the command line, or using any of the many graphical user interfaces available on a variety of platforms; these are listed as separate tools.

Code license: GPL
Last updated: 29 Jan 2015

The Open Harvester Systems is a free metadata indexing system that allowers users to create a searchable index of the metadata from Open Archives Initiative (OAI)-compliant archives, such as sites using Open Journal Systems (OJS) or Open Conference Systems (OCS). It can harvest OAI metadata in a variety of schemas (including unqualified DC, the PKP (Open Journal Systems/Open Conference Systems) Dublin Core extension, MODS, and MARCXML).

Code license: GNU GPL
Last updated: 29 Dec 2014

Calibre is a free and open source ebook library management application, including options for syncing to devices and converting between a large number of formats. Calibre also has a built-in e-book editor for EPUB and AZW3 formats.

Code license: Open source, GNU GPL, GNU GPL v3
Last updated: 29 Dec 2014

text analytic and data extraction framework: data and semantic analytics in a suite of business applications.

Last updated: 29 Dec 2014

Twapper Keeper lets users create an archive of tweets based on hashtag, keyword, or person, for them to review online.

Last updated: 29 Dec 2014

Survey Monkey is a web-based survey creation and distribution site, with free and paid plans that allow users to create surveys and collect responses through a link, email, Facebook, or being embedded in a website or blog. Survey Monkey also allows for the collect and analysis of data.

Code license: Closed source
Last updated: 29 Dec 2014

"The Virtual Lightbox for Museums and Archives (VLMA) is an educational tool for collecting and reusing in a structured fashion the online contents of museums and archives with visual components. With VLMA, you can browse and search collections, construct personal collections, export these collections to xml or Impress presentation format, annotate them, and share your collections with other VLMA users."

Code license: Open source
Last updated: 29 Dec 2014

The Blog Analysis Toolkit (BAT) is a free, Web-based system for capturing, archiving and sharing blog posts. Blog posts are acquired via RSS feeds, and stored in a database where they can be accessed and shared by other researchers. Free registration is required.

Last updated: 29 Dec 2014

online spreadsheets, collaborative editing, connected to data sources

Last updated: 29 Dec 2014

Formspring allows users to create and answer questionnaires either within the web interface or using an iOS app.

Last updated: 29 Dec 2014

ScrapBook is a Firefox extension, which helps you to save Web pages and easily manage collections. Major features are:
* Save Web page
* Save snippet of Web page
* Save Web site
* Organize the collection in the same way as Bookmarks
* Full text search and quick filtering search of the collection
* Editing of the collected Web page
* Text/HTML edit feature resembling Opera's Notes

Last updated: 29 Dec 2014

Zoomerang is online survey software; paid plans include analysis tools. Zommerang is now part of Survey Monkey.

Last updated: 29 Dec 2014

The Bamboo Content Interoperability Hub (CI gub) is an effort to largely automate the time-consuming process of downloading and compiling data from different repositories and archives and standardizing some of the format differences.

Last updated: 29 Dec 2014

DownThemAll is a Firefox plugin that allows users to download all the links or images contained in a webpage.

Last updated: 29 Dec 2014

GNU Wget is a free software package for retrieving files using HTTP, HTTPS and FTP.

Code license: Open source, GNU GPL
Last updated: 29 Dec 2014

Adobe Bridge is a media management application used for organizing, browsing, locating, and viewing creative assets. It was provided as a part of the Adobe Creative Suite, beginning with CS2, and is now in version CS5

Features:

  • Tightly integrated with other Adobe suite software (except for the standalone version of Adobe Acrobat 8)
  • Extensible through use of Javascript
Code license: Closed source
Last updated: 29 Dec 2014

Pattern is a Python web mining module with tools for data retrieval (Google + Twitter + Wikipedia API, web spider, HTML DOM parser), text analysis (rule-based shallow parser, WordNet interface, syntactical + semantical n-gram search algorithm, tf-idf + cosine similarity + LSA metrics) and data visualization (graph networks).

Code license: BSD, Open source
Last updated: 29 Dec 2014

Archive-It is a subscription web archiving service from the Internet Archive that helps organizations to harvest, build, and preserve collections of digital content. Through our user friendly web application Archive-It partners can collect, catalog, and manage their collections of archived content with 24/7 access and full text search available for their use as well as their patrons. Content is hosted and stored at the Internet Archive data centers.

Last updated: 29 Dec 2014

Google Scholar Citations lets you track citations to your publications, check who is citing your publications, graph your citations over time, compute citation metrics, and view publications by colleagues.

Last updated: 29 Dec 2014

All Our Ideas is a research project that seeks to develop a new form of social data collection by combining the best features of quantitative and qualitative methods. Using the power of the web, we are creating a data collection tool that has the scale, speed, and quantification of a survey while still allowing for new information to "bubble up" from respondents as happens in interviews, participant observation, and focus groups.

Code license: Open source, BSD
Last updated: 29 Dec 2014

The Dataverse Network is an application to publish, share, reference, extract and analyze research data. It facilitates making data available to others, and allows to replicate others work. Researchers and data authors get credit, publishers and distributors get credit, affiliated institutions get credit.

Code license: Apache License, Open source
Last updated: 29 Dec 2014

Korbo is a powerful aggregation platform for gathering Linked Data objects relevant to your area of research into single workspaces or “baskets”.

Korbo is targeted primarily at developers who want to build applications on top of its API and make full use of the linked cultural data from sources such as Europeana, FreeBase and DBPedia.

Korbo is currently in the early stages of development, but you can already try out a demo version of the platform.

Code license: Open source, GNU GPL
Last updated: 29 Dec 2014

The Observer XT is the professional and user-friendly event logging software for the collection, analysis, and presentation of observational data.

Last updated: 29 Dec 2014

LitBlitz is free beta Chrome extension that aims to improve how students and researchers manage their notes for literature reviews, assignment research and more by simplifying pdf management, allowing capture and annotation of document snippets


LitBlitz v1.0 is currently available as a Google Chrome extension.

LitBlitz, while still available on the Google Chrome store no longer appears to be under development, and the company url redirects to a Japanese language web page.

Last updated: 29 Dec 2014

LimeService is basically the hosted version of the GNU licensed LimeSurvey. It is a survey service-platform to prepare, run and evaluate on-line surveys. Besides basic free usage you are always getting the full feature set with no monthly fees or subscription plans.

I've used it before and found it to pretty robust.

Last updated: 29 Dec 2014

Artifex Press is a publishing and technology company that digitally publishes catalogues raisonnés, a comprehensive, annotated documentation of all of the known artworks by an artists. They have developed a proprietary, patented software platform and a dedicated publishing program in order to create digital catalogues raisonnés. They offer both their own digital catalogues raisonnés and the ability to licence the software to produce your own projects.

Code license: Closed source
Last updated: 29 Dec 2014

Zapier provides a means to create on-the-fly data connections between applications which may not have open API's. Zapier works with a wide range of popular applications - a list of current ones is available at: https://zapier.com/zapbook/apps/.

Last updated: 29 Dec 2014
Code license: GNU Affero GPL v.3
Last updated: 29 Dec 2014
CSV
Subscribe to Data collection