TXM is a free and open-source cross-platform Unicode, XML & TEI based text analysis software, supporting Windows, Mac OS X and Linux. It is also available as a J2EE standard compliant portal software (GWT based) for online access with access control built in (see a demo portal: http://portal.textometrie.org/demo).
It offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP full text search engine (http://cwb.sourceforge.net) and a range of statistical tools (factorial analysis, clustering, cooccurrence analysis, etc.) based on R packages (http://www.r-project.org).
It can analyze three types of textual corpora with various source formats:
- Written texts (possibly aligned to facsimile images): system clipboard content, TXT (raw text), XML, XML-TEI formats
- Speech transcriptions (synchronized to audio or video): Word/Writer/TXT based, XML-TRS (from Transcriber software) formats
- Parallel corpora (several languages per corpus): XML-TMX format
It lemmatizes and POS tags all texts on the fly during the import process by using the TreeTagger software.