Apache License

What kind of data should the tool work with?

Recogito is an online platform for collaborative document annotation.

Recogito provides a personal workspace where you can upload, collect and organize your source materials - texts and images - and collaborate in their annotation and interpretation. Recogito enables you to make your work more visible on the Web more easily, and to expose the results of your research as Open Data.

Code license: Open source, Apache License
Last updated: 21 Dec 2016

ANNIS is an open source, cross platform (Linux, Mac, Windows), web browser-based search and visualization architecture for complex multi-layer linguistic corpora with diverse types of annotation. ANNIS, which stands for ANNotation of Information Structure, was originally designed to provide access to the data of the SFB 632 - “Information Structure: The Linguistic Means for Structuring Utterances, Sentences and Texts”. It has since then been extended to a large number of projects annotating a variety of phenomena.

Code license: Open source, Apache License
Last updated: 16 Sep 2016

ColorBrewer is a web tool for selecting color schemes for thematic maps, most usually for choropleth maps. It includes 35 basic schemes with different numbers of classes for over 250 possible versions. Each scheme has CMYK, RGB, Hex, Lab, and AV3 (HSV) specs for the colors. The software is designed simply to list color specs for a scheme you find useful so you are able to create these colors in the mapping software you are using.

Code license: Open source, Apache License
Last updated: 7 Jun 2016

TAToo is an embeddable Flash widget that displays TAPOR analytics for the page on which it resides.

Code license: Apache License
Last updated: 23 Feb 2016

Combinado con Leptonica, la Biblioteca para el Procesamiento de Imágenes, Tesseract puede leer una gran variedad de formatos de imagen y convertirlos a texto en más de 40 idiomas.

Este código es un simple motor de OCR. No tiene formato de salida ni interfaz de usuario. Puede detectar tono fijo y texto proporcional. Sin embargo, en 1995 este motor estaba entre los 3 mejores en términos de precisión de caracteres, y opera tanto en Linux como en Windows. El código de programación está incluido en la versión de código abierto.

Code license: Open source, Apache License
Last updated: 27 Jan 2016

Open Science Framework (OSF) es una herramienta de código abierto gratuita diseñada para que los investigadores administren el flujo de trabajo de investigación en su totalidad: la planificación, la ejecución, la generación de informes, el archivado y el descubrimiento. Es en parte un software colaborativo y en parte, un sistema de control de versión. Se puede usar OSF para administrar proyectos individuales o proyectos colaborativos más extensos.

Code license: Apache License
Last updated: 14 Jun 2015

La Herramienta de Lenguaje Natural NLTK es una biblioteca Python de código abierto para el análisis textual y el procesamiento de lenguajes naturales. NLTK puede construir muestras de cadenas (crear una lista de palabras de una serie de caracteres), identificar categorías gramaticales y realizar operaciones basadas en el contexto de una palabra.

Last updated: 28 May 2015

Heritrix es un rastreador web usado por el Archivo de Internet, que ofrece una interfaz de usuario basada en la web luego de una configuración inicial en una máquina de Linux. También utilizado por la Biblioteca del Congreso de los Estados Unidos, Heritrix captura metadatos en el formato Web ARChive (WARC).

Code license: Open source, Apache License
Last updated: 6 May 2015

cue.language is a Java library that has tokenizing (words/sentences/ngram), string counting, language guessing, and stop word detection capabilities.

Code license: Apache License, Open source
Last updated: 29 Dec 2014

Solr is an open source enterprise search platform from the Apache Lucene project. It operates as a standalone full-text search server within an appropriate servlet container, such as Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language.

Code license: Apache License, Open source
Last updated: 29 Dec 2014

Apache Lucene is a Java-based high-performance text search engine library.

Code license: Apache License, Open source
Last updated: 29 Dec 2014

Apache Subversion Version System (SVN) is an open source version control system. Access and revision to objects are carefully controlled, to prevent unauthorized access and alteration. Developers use SVN to maintain current and historical versions of files such as source code, web pages, and documentation.

Their Vision:

Code license: Apache License
Last updated: 29 Dec 2014

Xalan is an XSLT processor for transforming XML documents into HTML, text, or other XML document types. It implements XSL Transformations (XSLT) Version 1.0 and XML Path Language (XPath) Version 1.0.

Features:

  • Conversion between structured markup formats
  • Stylesheet validation
Code license: Apache License, Open source
Last updated: 29 Dec 2014

Fedora (Flexible Extensible Digital Object Repository Architecture) was originally developed by researchers at Cornell University as an architecture to store, manage, and access digital content in the form of digital objects. Fedora defines a set of abstractions for expressing digital objects, asserting relationships among digital objects, and linking behaviors to digital objects.

Code license: Open source, Apache License
Last updated: 29 Dec 2014

OpenOffice is an open-source office software suite for word processing, spreadsheets, presentations, graphics and databases.

Code license: Apache License, Open source
Last updated: 29 Dec 2014

Blacklight is an open source Ruby on Rails gem that provides a discovery interface for any Solr index. Blacklight provides a default user interface which is customizable via the standard Rails (templating) mechanisms. Blacklight accommodates heterogeneous data, allowing different information displays for different types of objects and features faceted browsing, relevance based searching, bookmarkable items, permanent URLs for every item, and user tagging of items.

Last updated: 29 Dec 2014

The Dataverse Network is an application to publish, share, reference, extract and analyze research data. It facilitates making data available to others, and allows to replicate others work. Researchers and data authors get credit, publishers and distributors get credit, affiliated institutions get credit.

Code license: Apache License, Open source
Last updated: 29 Dec 2014

(from web page)

Historical musical pieces make their way to us through multiple documents and it often happens that multiple sources introduce differences and variants in the music. meiView is an experimental web application designed to display 15–16th century music and provide a dynamic mechanism for the user to select which variant they want to see.

meiView is an open source software licensed under Apache 2.0. See the source code on
Github

Code license: Apache License, Open source
Last updated: 29 Dec 2014

BLLIP Parser (or Charniak-Johnson parser) is a statistical natural language parser for analyzing text to determine its grammatical structure. Grammatical structures are provided in Penn Treebank format.

Code license: Apache License
Last updated: 29 Dec 2014

Umigon is a free tool for sentiment analysis on Twitter.

Main features:

  1. Export to Excel and csv
  2. Distinction between sentiments ("I hate war", will be classified as negative sentiment) and negative factuals ("war has been declared", will be declared as neutral)
  3. Connects to twitter or allows free text input



The developer of Umigon can be reached on Twitter.

Code license: Apache License
Last updated: 29 Dec 2014
CSV
Subscribe to Apache License