Freebase "is an open, Creative Commons Attribution (aka CC-BY) licensed collection of structured data," and a "platform for accessing and manipulating that data" via API. Almost 40 million entities and assertions about those entities are stored within a graph database. The database was built by pulling in open data and relies on community contribution to stay updated. Freebase is part of the semantic web and emits Linked Open Data (via RDF) for all its entities.
HEURIST (http://HeuristNetwork.org) is an extremely flexible, end-user oriented, web-based data management system designed specifically for Humanities data. Developed since 2005, it has been in active use across many projects since 2009. It is available both as a free web service for researchers (hosted at the University of Sydney Data Centre) or for installation on a physical or virtual server (Open Source on gitHub).
Researchers can design, create, manage, analyse, visualise and publish their own richly-structured database(s) through a simple web interface, without the need for a programmer(s). Quite complex databases can be built in a few hours by borrowing structures and vocabularies published by other users. Databases can be designed and built incrementally, as existing data are not affected by changes in structure. Databases created by Heurist are stored in MySQL with a repeatable structure facilitating independant access by other software.
Advanced features include record linking, graph structure, drill-down facet searches, rule-based queries, custom reports, linked map-timelines, network visualisation, normalised spreadsheet import, crosstabulation, XML feeds, XSLT transforms. The team provides initial email and skype assistance for project setup at no cost, and special customisations at modest cost.
Gephi is graphing software that provides a way to explore data through visualization and network analysis.
ANNIS is an open source, cross platform (Linux, Mac, Windows), web browser-based search and visualization architecture for complex multi-layer linguistic corpora with diverse types of annotation. ANNIS, which stands for ANNotation of Information Structure, was originally designed to provide access to the data of the SFB 632 - “Information Structure: The Linguistic Means for Structuring Utterances, Sentences and Texts”. It has since then been extended to a large number of projects annotating a variety of phenomena.
A set of dataset management and statistical plugins for Microsoft Excel 2007, 2010 & 2013 including single and multiple linear regression, polynomial regression, and scatter plot with fit, fit confidence and prediction bands. The program can be used to generate visualizations and data reports. Requires Microsoft Excel.
The base SPSS Statistics package includes support for descriptive statistics, bivariate statistics, correlation, prediction for numerical outcomes, and prediction for identifying groups.
TXM is a free and open-source cross-platform Unicode, XML & TEI based text analysis software, supporting Windows, Mac OS X and Linux. It is also available as a J2EE standard compliant portal software (GWT based) for online access with access control built in (see a demo portal: http://portal.textometrie.org/demo).
ERDAS Imagine is a suite of geospatial data authoring software. The suite contains a raster graphics editor and remote sensing application that performs advanced remote sensing analysis and spatial modelling to create new information. ERDAS IMAGINE can also visualize results in 2D, 3D, video, and on cartographic quality map compositions. It is primarily designed for geospatial raster data processing and the creation of digital images for mapping use in GIS or CAD software.
- Image Analysis, Remote Sensing
GeoParser is a text analysis tool that may be used to identify and tag references to geographic location in a text resource using Natural Language Processing to analyse the composition of a resource and identifying words that match its geographic database. The approach is useful for processing names that may have one of several locations (e.g. Belfast in Ireland, New Zealand and Canada) and distinguishing names that may be confused with other common words (e.g. Reading in Berkshire and reading as an activity).
A software application that is used for analysing and visualising multi-volume seismic data.
- Visualization and analysis of 2D and 3D seismic data in a single survey
- 2D and 3D horizon tracking including auto-tracking, plane-by-plane, line and manual tracking
- On-the-fly calculation and visualization of various attributes and filters
- Plug-in architecture
A statistical package that may be used to compare quantified assemblages of broken and incomplete objects, such as ceramics, glass and bones. Pie-Slice uses Estimated Vessel Equivalent (EVE) as a base form of measurement, in which each measurable fraction is scored as a fraction of a complete vessel. It also trials the use of a new statistical transformation - the pseudo-count transformation - which converts EVEs into Pottery Information Equivalents (PIEs). The latter enables assemblages to be compared using techniques such as log-linear and correspondence analyses.
Now called TerraSurveyor, it is a software application for the transfer, assembly and enhancement of geophysical data obtained from gradiometers, resistivity meters and other monitoring instruments. With support for Geoplot, GSSI Profiler, Surfer (ASCII & binary) and Scintrex input formats
Weave (Web-based Analysis and Visualization Environment) is a visualization platform designed to enable visualization of any available data by anyone for any purpose. Weave is an application development platform supporting multiple levels of user proficiency — novice to advanced — as well as the ability to integrate, disseminate and visualize data at “nested” levels of geography.
Viewshare is a free web application for creating interfaces and visualizations of cultural heritage collections. It can create interactive maps, timelines, facets, tag clouds, histograms, and image galleries. The intended users of Viewshare are individuals managing and creating access to digital collections of cultural heritage materials. Viewshare is offered as a software as a service (SaaS), email email@example.com to request a free account.
The Science of Science (Sci2) Tool is a modular toolset supporting temporal, geospatial, topical, and network analysis and visualization of datasets at the micro (individual), meso (local), and macro (global) levels. Users of the tool can:
- Access science datasets online or load their own
- Perform different types of analysis with the most effective algorithms available
- Use different visualizations to interactively explore and understand specific datasets
- Share datasets and algorithms across scientific boundaries
Easy-to-use web-based software for creating infographics and data visualization, including a platform to share your work and discover works by others.
The DataTank is an open source tool that publishes data, stored in text-based files (e.g., CSV, XML, JSON) or in binary structures (e.g., SHP files, relational databases). The DataTank reads data from these structures and publishes them to the web using a URI as an identifier, providing these data in any format a user wants regardless of the original data structure. The DataTank requires a server with Apache2 or Nginx, mod rewrite enabled, PHP 5.4 or higher, Git, any database supported by Laravel 4.
Quadrigram describes itself as a "visual programming environment" for living data. It is a web-based tool for data visualization that allows the user to customize and publish interactive visualizations with a range of data types. Visualization possibilities range from basic charts and graphs (e.g., pie chart, bar graph), to more sophisticated visualizations for exploring complex datasets (e.g., networks, geo-data, zoomable tree map, quadrification, stacked flow).
TokenX is a web-based environment for visualizing, analyzing and playing with texts. Options include word clouds, highlighting words, keywords in context, replacing words with bocks, highlighting punctuation and non-words, counting words in context and decontextualized, and substituting words. A number of sample files are provided, or users can point TokenX to any XML file online.
Publish or Perish is a software program that retrieves and analyzes academic citations. It uses Google Scholar to obtain the raw citations, then analyzes these and presents the statistics.
The TAPoR Portal is an online environment where users can keep track of texts they want to study (uploaded or available online), learn about and try different tools, and run tools on texts.
A graphical user interface tool for Latent Dirichlet Allocation topic modeling.
Exploratree is a web-based library and editing application for "interactive thinking guides," which are templates useful for mind mapping, brainstorming, planning, and visualization. Originally developed for use in the classroom, to help students refine and focus their ideas, as well as manage plans to further their investigation. Thinking guides can be edited, printed, and downloaded directly from the browser.
TwapperKeeper is now called Hootsuite Archives and can be accessed from within Hootsuite.
CulturalAnalytics is an R package containing functions for statistical analysis and plotting of image properties, including statistics such as the standard deviation and mean in the RGB and HSV color spaces, image entropy and histograms in greyscale (intensity) and color, and for plotting color clouds and image scatter charts.
SwiftRiver is free and open source web-based software for real-time filtering, curation, and qualitative analysis of social media data (Twitter, etc.)
Voyeur is a web-based text analysis environment where users can apply a wide variety of tools to any text they import.
Mapline (previously Topo.ly) is a free and paid online service for capturing and geocoding spatial data from spreadsheets and creating point, territory and heat maps. It is free for limited use (quite generous) and paid when you need to map significant (we are talking substantial) datasets. It's intuitive, easy to use and produces high quality interactive maps. Free service has only minimal map customization options and does not include the visual analysis that is included with the fee options.
Dataplot is free, public-domain software for statistical analysis, and non-linear modeling. It was developed by the National Insistute of Standards and Technology in the United States. It performs "scientific, engineering, statistical, mathematical, and graphical analysis" through the use of "an interactive, command-driven language/system with English-like syntax." It will function on Unix, Linux, Mac OS X, and Windows XP/VISTA/7 systems.
The MONK workbench provides 525 works of American literature from the 18th and 19th centuries, and 37 plays and 5 works of poetry by William Shakespeare, along with tools to enable literary research through the discovery, exploration, and visualization of patterns.
Users affiliated with CIC (Big Ten) schools can access a larger data set that includes about a thousand works of British literature from the 16th through the 19th century, provided by The Text Creation Partnership (EEBO and ECCO) and ProQuest (Chadwyck-Healey Nineteenth-Century Fiction).
Philologic is a full-text search, retrieval and analysis tool with support for TEI-Lite XML/SGML, Unicode encoding, plaintext, Dublin Core/HTML, and DocBook.
VisualEyes is web-based authoring tool developed at the University of Virginia to weave images, maps, charts, video, and data into highly interactive and compelling dynamic visualizations.
RStudio is an integrated development environment (IDE) for R. It is available in both open source and consumer versions, and can run either on your desktop, or through a browser connected to RStudio Server. Features include syntax highlighting, code completion, smart indentation, and an interactive debugger.
Microsoft Excel is spreadsheet software with calculation, graphing tools, and pivot table options for analyzing data. A cloud-hosted version is available as part of Office 365.
TimeRime is a web-based tool allowing people to create, view, and compare interactive timelines.
Bibliopedia will perform advanced data-mining and cross-referencing of scholarly literature to create a humanities-centered collaboratory. As a prototype, it will search resources including JSTOR and Library of Congress for metadata about scholarly articles and books that mention the famed medieval travel narrative The Travels of Sir John Mandeville, examine the articles and books for citations, then save the results in a publicly accessible database.
A statistical natural language parser for analyzing text to determine its grammatical structure.
140kit provides a management layer for tweet collection and analysis.
Raw data cannot be passed through to the users, but any analytical process can be run across your dataset, and the data is held for as long as the user wants. When new analytical processes are created, they can be run on existing sets of data. 140kit does not claim any control of the analysis, however it retains ownership of the data collected.
Developed at Indiana University, Event Structure Analysis is made up of three components: Ethno, prerequisite analysis, and composition analysis. Ethno is an on-line Java program that helps you analyze sequential events; prerequisite analysis produces a diagram showing how events are connected; composition analysis involves coding agent, action, object, and other characteristics of each event.
Dispute Finder is a Firefox plugin that allows individuals to tag text on websites as controversial, and view which passages other people have marked.
AnSWR supports qualitative analysis of word-based data. This entails a set of methods for organizing, displaying, processing, summarizing, and interpreting information.
Last updated 9/23/2005.
Only available for Windows 2000 and Windows XP.
Find searches that correlate with real-world data: Google Correlate finds search patterns which correspond with real-world trends.
GRETL () s a cross-platform software package for econometric analysis, written in C. It features:
Weft QDA is a free and open-source tool for the analysis of textual data. You may import documents from plain text or PDF, apply character-level coding, category and document memos, retrieve coded text, apply simple coding statistics, apply free-text search, and export to HTML and CSV formats.
HyperRESEARCH enables users to code and retrieve, build theories, and conduct analyses of your data. You may work with text, graphics, audio and video sources.
Qualrus is an innovative qualitative data analysis tool that helps you manage unstructured data. Additionally, Qualrus learns your coding trends, provides a visual semantic network display, and gives advice and technical support.
Silk is a platform for sites that contain collections of information. It's like the Tumblr for websites that have structured content–like software reviews, information about designers, a site with UN datasets, and more.
The Text-Image Linking Environment (TILE) is a web-based tool for creating and editing image-based electronic editions and digital archives of humanities texts. It allows the user to import and export transcript lines and images of text, as well as mark up the image, and includes a semi-automated line recognizer.
Minitab provides tools for statistical analysis and visualization. It includes tools for creating graphics, and working with variance, regression, reliability, sample size, time series, forecasting, equivalence tests, tables, simulations, and distributions.
MicrOsiris is a statistical and data management package for Windows. This freeware has been derived from OSIRIS IV, a statistics and data management package developed at the University of Michigan. It can import up to 10,000 variables from SPSS, SAS, STATA, UNESCO IDAMS, and Excel. It is distributed as freeware.
Lexos is an online tool that enables you to "scrub" (clean) your text(s), cut a text(s) into various size chunks, manage chunks and chunk sets, and choose from a suite of analysis tools for investigating those texts. Functionality includes building dendrograms, making graphs of rolling averages of word frequencies or ratios of words or letters, and playing with visualizations of word frequencies including word clouds and bubble visualizations.
Statistical Lab is a graphical user interface designed to make statistical analysis easier to understand. This interactive tool will connect and display data frames, frequency tables, random numbers or matrixes. Statistical Lab uses R to run calculations, conduct analyses and perform multiple simulations and manipulations.
Project Quincy allows users to trace the development of social networks and institutions over time and space using information about people, places and organizations. It is a Django application with a MySQL database that can be installed on a web server.
SAS Analytics is an environment for predictive and descriptive modeling, data mining, text analytics, forecasting, optimization, simulation, experimental design, and other statistical functions.
The Sample Size Calculator is simple online tool for calculating sample size according to different variables.
RSiena is a package for the R language that enables the statistical analysis of network data, including longitudinal network data, longitudinal data of networks and behavior, and cross-sectional network data. It provides the same functionality available in SIENA (Simulation Investigation for Empirical Network Analysis), Windows software which is no longer maintained.
StatCrunch is web-based statistical analysis and data-sharing software.
HUBzero is a web publication platform and content management system designed to facilitate collaboration on research and learning. In addition to standard blog and discussion features, HUBzero's most distinctive traits are a built-in environment that can run interactive software that scholars have developed within the browser, a tool development area, and the ability to share data and documents privately between members of the hub.
bubbl.us is a web-based mind mapping tool, useful for organizing ideas, brainstorming, analyzing relationships, and visualizing data. Simple interface, with basic and easily understood functionality. Free to try without creating an account. Also available as an iOS application for iPad.
Cytoscape is a platform for complex network analysis, visualization, and annotation.
Graphviz is open source software for graph visualization, representing structural information as diagrams of abstract graphs and networks. The package includes web and interactive graphical interfaces, and auxiliary tools, libraries, and language bindings.
Cross-platform app for analyzing text, video, and spreadsheet data (analyzing qualitative, quantitative, and mixed methods research)
Linguistic Inquiry and Word Count is a text analysis software program that calculates the degree to which people use different categories of words across a wide array of texts.
ANTHROPAC is a menu-driven DOS program for collecting and analyzing data on cultural domains. The program assists with the collection and analysis of structured qualitative and quantitative data, and provides analytical and multivariate tools.
Leximancer is text analysis software that can create topic and concept based network visualizations and includes a sentiment analyzer.
Netvibes offers a free personal web dashboard for following feeds, friends and using the provided apps. A premium account includes functionality for analytics, tagging, curation, alerts, sentiment analysis, and search.
PDFMiner is a Python tool for extracting information from PDFs (not only text, but also information about fonts, encoding, and layout.)
OmniGraffle is a comprehensive diagramming and drawing application. Drag and drop to create wireframes, flow charts, network diagrams, UI mockups, family trees, office layouts, etc.. Upgrading to OmniGraffle Pro adds Visio support, shared layers, presentation mode, object-geometry controls, AppleScript and Actions support and more.
Rwui allows you to convert an R script to a web page with an interface where users can run the script(s) even if they don't know R.
Data Desk implements traditional statistical techniques using a simple graphic display interface for data exploration. The program focuses specifically on the visual exploration of data.
MATLAB allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages, including C, C++, Java, and Fortran.
CATMA (Computer Aided Textual Markup & Analysis) is a free, open source markup and analysis tool from the University of Hamburg's Department of Languages, Literature and Media. It incorporates three interactive modules: (1) The tagger enables flexible and individual textual markup and markup editing. (2) The analyzer incorporates a query language and predefined functions. It also includes a query builder that allows users to construct queries from combinations of pre-defined questions while allowing for manual modification for more specific questions.
SEASR provides an environment for developing data flows that ingest data, process it through a series of transformations and analytics, and send the data to a results viewer.
A simple word cloud generator with customizable font and color options. Word clouds are generated by pasting text into a box, or by entering the URL of any blog, blog feed, or any other web page that has an Atom or RSS feed.
cue.language is a Java library that has tokenizing (words/sentences/ngram), string counting, language guessing, and stop word detection capabilities.
The Visual Understanding Environment (VUE) is concept mapping software that can integrate with multiple repositories to pull in, organize, and analyze data. Multiple features for advanced management of digital resources for teaching, learning, and research.
HyperPo is a user-friendly text exploration and analysis program that allows users to import texts or use texts available online (in English or French), and provides frequency lists of characters, words and series of words, color-coding to indicate repetition, KWIC, co-occurrence and distribution lists, and the ability to simultaneously compare data from multiple texts.
MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
text analytic and data extraction framework: data and semantic analytics in a suite of business applications.
MAXQDA is a tool for qualitative data analysis, evaluation, and text analysis. You can export parts or all data into reports in Word, Excel, XML, or Images. The MAXQDA Multimedia Browser enables to code audio and video files directly without having to create a transcript. You can code your information however you like for easy retrieval and organization.
"In the WordHoard environment, texts are annotated or tagged by morphological, lexical, prosodic, and narratological criteria. They are mediated through a 'digital page' or user interface that lets scholarly but non-technical users explore the greatly increased query potential of textual data kept in such a form."
Software for creating data dashboards. Many of the sample galleries portray corporate financial data.
A software application that enables relational databases to be created, managed and queried. The database management system enables multiple users to access a database through an appropriate interface. As an open source tool, MySQL underpins a number of free software projects, such as WordPress, phpBB and other software built on a LAMP infrastructure. Although widely used, there are a number of performance issues that limit its use in some environments. For example, it is unable to use multiple CPU cores to process a single query, potentially limiting its use as a data warehouse.
Pattern is a Python web mining module with tools for data retrieval (Google + Twitter + Wikipedia API, web spider, HTML DOM parser), text analysis (rule-based shallow parser, WordNet interface, syntactical + semantical n-gram search algorithm, tf-idf + cosine similarity + LSA metrics) and data visualization (graph networks).
Processing is an open source programming language and environment for people who want to create images, animations, and interactions. Initially developed to serve as a software sketchbook and to teach fundamentals of computer programming within a visual context, Processing also has evolved into a tool for generating finished professional work. Today, there are tens of thousands of students, artists, designers, researchers, and hobbyists who use Processing for learning, prototyping, and production.
Open source data visualization and analysis for novice and experts. Data mining through visual programming or Python scripting. Components for machine learning. Add-ons for bioinformatics and text mining. Packed with features for data analytics.
Prism is a tool for crowdsourcing interpretation. Welcome to our experiment in crowd-sourcing and visualizing many readings of a common set of texts.
ReDBox is a metadata registry application for describing research data.
An online text analysis tool that provides detailed statistics of your text, including features like the anlysis of words groups, finding out keyword density, analysing the prominence of word or expressions.
Bookworm enables you to graphically explore lexical trends in repositories of digitized texts.
The Durationator is a web-based tool which seeks to make the past usable one query at a time by providing legal information regarding the copyright term of any given cultural work.
The Dataverse Network is an application to publish, share, reference, extract and analyze research data. It facilitates making data available to others, and allows to replicate others work. Researchers and data authors get credit, publishers and distributors get credit, affiliated institutions get credit.
GPS Visualizer is a free, easy-to-use online utility that creates maps and profiles from GPS data.
This online tool can be used for a wide variety of annotation tasks, including visualization and collaboration.
brat is designed in particular for structured annotation, where the notes are not freeform text but have a fixed form that can be automatically processed and "interpreted" by a computer. brat also supports the annotation of n-ary associations that can link together any number of other annotations participating in specific roles. brat also implements a number of features relying on natural language processing techniques to support human annotation efforts.
QDA Miner is an easy-to-use mixed-methods qualitative data analysis software package for coding, annotating, retrieving and analyzing small and large collections of documents and images. QDA Miner may be used to analyze interview or focus-group transcripts, legal documents, journal articles, even entire books, as well as drawing, photographs, paintings, and other types of visual documents.
WordStat is a text analysis module for QDA Miner or SimStat. WordStat combines content analysis method by using dictionary approach and many algorithms exploration or various text mining methods. WordStat can apply existing categorization dictionaries to a new text corpus. It also may be used in the development and validation of new categorization dictionaries.
The Observer XT is the professional and user-friendly event logging software for the collection, analysis, and presentation of observational data.
The term "lexomics" was originally coined to describe the computer-assisted detection of "words" (short sequences of bases) in genomes. When applied to literature as we do here, lexomics is the analysis of the frequency, distribution, and arrangement of words in large-scale patterns. The current suite of lexomics tools are:
- scrubber -- strips tags, removes stop words, applies lemma lists, and prepares texts for diviText
- diviText -- cuts texts into chunks in one of three ways, count words, exports the results
The purpose of ATLAS.ti is to help researchers uncover and systematically analyze complex phenomena hidden in text and multimedia data. The program provides tools that let the user locate, code, and annotate findings in primary data material, to weigh and evaluate their importance, and to visualize complex relations between them.
Meld is a visual diff and merge tool targeted at developers. Meld helps you compare files, directories, and version controlled projects. It provides two- and three-way comparison of both files and directories, and has support for many popular version control systems.
Kaleidoscope is one of the world's best tools for spotting differences in images and text, and now it supports merging of files and folders, too. Kaleidoscope integrates directly with Git, Subversion, Mercurial, and Bazaar to fit perfectly in your workflow.
The Tesserae project aims to provide a flexible and robust web interface for exploring intertextual parallels.
Textexture is a tool for visualizing any text as a network. The resulting graph can be used to get a quick visual summary of the text, read the most relevant excerpts (by clicking on the nodes), and find similar texts.
TVE is an interactive Java tool for exploring the effect of window size on three common linguistic measures: type-token ratio, proportion of hapax legomena, and average word length. In addition, TVE can cluster the text fragments according to a user-given set of words by applying principal component analysis (PCA).
Circos is a software package for visualizing data and information. It visualizes data in a circular layout — this makes Circos ideal for exploring relationships between objects or positions. There are other reasons why a circular layout is advantageous, not the least being the fact that it is attractive.
Circos is ideal for creating publication-quality infographics and illustrations with a high data-to-ink ratio, richly layered data and pleasant symmetries. You have fine control each element in the figure to tailor its focus points and detail to your audience.
LimeService is basically the hosted version of the GNU licensed LimeSurvey. It is a survey service-platform to prepare, run and evaluate on-line surveys. Besides basic free usage you are always getting the full feature set with no monthly fees or subscription plans.
I've used it before and found it to pretty robust.
From the website: NodeXL is a free, open-source template for Microsoft® Excel® 2007 and 2010 that makes it easy to explore network graphs. With NodeXL, you can enter a network edge list in a worksheet, click a button and see your graph, all in the familiar environment of the Excel window. (http://nodexl.codeplex.com/)
Ptolemaic is a computer application for music visualization and analysis written in the Java programming language. The software is designed to aid in the analysis of all types of Western music using established analytical techniques, including tonal functional analysis (Harrison 1994), pitch-class set analysis (Forte 1973), hierarchical linear analysis (Schenker 1935, Jones 2002), tonal pitch-space analysis on the Tonnetz (Riemann 1915), pitch-class set analysis (Forte 1973), and transformation analysis (Lewin 1987).
The Juxta family of software (Juxta, Juxta WS, and Juxta Commons) allows you to compare and collate versions of the same textual work. Juxta Commons is an online space powered by the open-source Juxta Web Service that lets you collate sets of two or more texts and share online visualizations of the differences between them.
Statwing is an easy-to-use, web-based tool for data analysis and visualization. Upload data, select variables of interest, and Statwing automatically selects statistical tests and visualizations, then distills the results into plain English sentences (as well as traditional statistical output for those so inclined).
Free trial available, as well as multiple pricing plans:
NodeBox is an application for creating 2D graphics and visualizations. It provides a visual and process-based editor for an underlying Python-based analysis and visualisation package. It is developer-described as a generative design app and this really taps into the serendipitous nature of the environment. The user constructs models and can tweak them in real time via the interface and see the resulting changes too the output.
It has been described as being "similar to Processing, but without all the interactivity".