TXM

DiRT Tool

TXM

TXM is a free and open-source cross-platform Unicode, XML & TEI based text analysis software, supporting Windows, Mac OS X and Linux. It is also available as a J2EE standard compliant portal software (GWT based) for online access with access control built in (see a demo portal: http://portal.textometrie.org/demo). It offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP full text search engine (http://cwb.sourceforge.net) and a range of statistical tools (factorial analysis, clustering, cooccurrence analysis, etc.) based on R packages (http://www.r-project.org). It can analyze three types of textual corpora with various source formats:

  • Written texts (possibly aligned to facsimile images): system clipboard content, TXT (raw text), XML, XML-TEI formats
  • Speech transcriptions (synchronized to audio or video): Word/Writer/TXT based, XML-TRS (from Transcriber software) formats
  • Parallel corpora (several languages per corpus): XML-TMX format
It lemmatizes and POS tags all texts on the fly during the import process by using the TreeTagger software.

TXM on DiRT

Subscribe to TXM