, text analysis
, regular expressions
, complexity measures
Textable is an open source program for text analysis. It offers a set of basic text-analytic components (e.g. import text from files, segment into words, measure segment diversity, etc.), which the user combines using a visual interface to build custom analytic workflows.
Kartograph is a pair of free and open source libraries for representing data with a spatial component on webpages or in print. The first library, Kartograph.py, is a Python library that builds lightweight vector graphic maps from either shapefiles or PostGIS tables. These graphics files can be styled either using a cascading style sheet during creation or later on the second library, Kartograph.js. The vector files can also be edited and enhanced in a vector graphics program like Adobe Illustrator.
Bokeh is a Python interactive visualization library for large datasets that natively uses the latest web technologies. Its goal is to provide elegant, concise construction of novel graphics in the style of Protovis/D3, while delivering high-performance interactivity over large data to thin clients.
, Data collection
, graph networks
, web crawler
Pattern is a Python web mining module with tools for data retrieval (Google + Twitter + Wikipedia API, web spider, HTML DOM parser), text analysis (rule-based shallow parser, WordNet interface, syntactical + semantical n-gram search algorithm, tf-idf + cosine similarity + LSA metrics) and data visualization (graph networks).