Apache License

What kind of data should the tool work with?

Recogito is an online platform for collaborative document annotation.

Recogito provides a personal workspace where you can upload, collect and organize your source materials - texts and images - and collaborate in their annotation and interpretation. Recogito enables you to make your work more visible on the Web more easily, and to expose the results of your research as Open Data.

Code license: Open source, Apache License
Last updated: 21 Dec 2016

ANNIS is an open source, cross platform (Linux, Mac, Windows), web browser-based search and visualization architecture for complex multi-layer linguistic corpora with diverse types of annotation. ANNIS, which stands for ANNotation of Information Structure, was originally designed to provide access to the data of the SFB 632 - “Information Structure: The Linguistic Means for Structuring Utterances, Sentences and Texts”. It has since then been extended to a large number of projects annotating a variety of phenomena.

Code license: Open source, Apache License
Last updated: 16 Sep 2016

ColorBrewer is a web tool for selecting color schemes for thematic maps, most usually for choropleth maps. It includes 35 basic schemes with different numbers of classes for over 250 possible versions. Each scheme has CMYK, RGB, Hex, Lab, and AV3 (HSV) specs for the colors. The software is designed simply to list color specs for a scheme you find useful so you are able to create these colors in the mapping software you are using.

Code license: Open source, Apache License
Last updated: 7 Jun 2016

TAToo is an embeddable Flash widget that displays TAPOR analytics for the page on which it resides.

Code license: Apache License
Last updated: 23 Feb 2016

Combined with the Leptonica Image Processing Library Tesseract can read a wide variety of image formats and convert them to text in over 40 languages.

This code is a raw OCR engine. It has no output formatting and no UI. It can detect fixed pitch vs proportional text. Nevertheless in 1995 this engine was in the top 3 in terms of character accuracy, and it compiles and runs on both Linux and Windows. Training code is included in the open source release.

The core developer on the project is Ray Smith (theraysmith).

Code license: Open source, Apache License
Last updated: 27 Jan 2016

The Open Science Framework (OSF) is a free, open source tool designed to help researchers manage the entire research workflow: planning, execution, reporting, archiving and discovery. It is part collaboration software and part version control system. The OSF can be used to manage individual projects or large collaborative ones. Privacy and sharing settings allow for fine-grained control over access to files and materials stored on the platform - share privately with collaborators or publicly with the community at large.

Code license: Apache License
Last updated: 14 Jun 2015

The Natural Language Toolkit (NLTK) is an open source Python library for text analysis and natural language processing. NLTK can tokenize strings (create a list of words from a set of characters), identify parts of speech, and perform operations based on a word's context.

Last updated: 28 May 2015

Heritrix is web crawler used by the Internet Archive, which provides a web-based user interface after initial configuration on a Linux machine. Also used by the Library of Congress, Heritrix captures metadata in the Web ARChive (WARC) format.

Code license: Open source, Apache License
Last updated: 6 May 2015

cue.language is a Java library that has tokenizing (words/sentences/ngram), string counting, language guessing, and stop word detection capabilities.

Code license: Apache License, Open source
Last updated: 29 Dec 2014

Apache Lucene is a Java-based high-performance text search engine library.

Code license: Apache License, Open source
Last updated: 29 Dec 2014

Solr is an open source enterprise search platform from the Apache Lucene project. It operates as a standalone full-text search server within an appropriate servlet container, such as Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language.

Code license: Apache License, Open source
Last updated: 29 Dec 2014

Apache Subversion Version System (SVN) is an open source version control system. Access and revision to objects are carefully controlled, to prevent unauthorized access and alteration. Developers use SVN to maintain current and historical versions of files such as source code, web pages, and documentation.

Their Vision:

Code license: Apache License
Last updated: 29 Dec 2014

Xalan is an XSLT processor for transforming XML documents into HTML, text, or other XML document types. It implements XSL Transformations (XSLT) Version 1.0 and XML Path Language (XPath) Version 1.0.

Features:

  • Conversion between structured markup formats
  • Stylesheet validation
Code license: Apache License, Open source
Last updated: 29 Dec 2014

Fedora (Flexible Extensible Digital Object Repository Architecture) was originally developed by researchers at Cornell University as an architecture to store, manage, and access digital content in the form of digital objects. Fedora defines a set of abstractions for expressing digital objects, asserting relationships among digital objects, and linking behaviors to digital objects.

Code license: Open source, Apache License
Last updated: 29 Dec 2014

OpenOffice is an open-source office software suite for word processing, spreadsheets, presentations, graphics and databases.

Code license: Apache License, Open source
Last updated: 29 Dec 2014

Blacklight is an open source Ruby on Rails gem that provides a discovery interface for any Solr index. Blacklight provides a default user interface which is customizable via the standard Rails (templating) mechanisms. Blacklight accommodates heterogeneous data, allowing different information displays for different types of objects and features faceted browsing, relevance based searching, bookmarkable items, permanent URLs for every item, and user tagging of items.

Last updated: 29 Dec 2014

The Dataverse Network is an application to publish, share, reference, extract and analyze research data. It facilitates making data available to others, and allows to replicate others work. Researchers and data authors get credit, publishers and distributors get credit, affiliated institutions get credit.

Code license: Apache License, Open source
Last updated: 29 Dec 2014

(from web page)

Historical musical pieces make their way to us through multiple documents and it often happens that multiple sources introduce differences and variants in the music. meiView is an experimental web application designed to display 15–16th century music and provide a dynamic mechanism for the user to select which variant they want to see.

meiView is an open source software licensed under Apache 2.0. See the source code on
Github

Code license: Apache License, Open source
Last updated: 29 Dec 2014

BLLIP Parser (or Charniak-Johnson parser) is a statistical natural language parser for analyzing text to determine its grammatical structure. Grammatical structures are provided in Penn Treebank format.

Code license: Apache License
Last updated: 29 Dec 2014

Umigon is a free tool for sentiment analysis on Twitter.

Main features:

  1. Export to Excel and csv
  2. Distinction between sentiments ("I hate war", will be classified as negative sentiment) and negative factuals ("war has been declared", will be declared as neutral)
  3. Connects to twitter or allows free text input



The developer of Umigon can be reached on Twitter.

Code license: Apache License
Last updated: 29 Dec 2014
CSV
Subscribe to Apache License