Skip to Main Content

Introduction to Text Analysis: Analysis Methods and Tools

Types of Text Analysis

Basic Text Summaries and Analyses

  • Word frequency (lists of words and their frequencies)
    (See also: Word counts are amazing, Ted Underwood)
  • Collocation (words commonly appearing near each other)
  • Concordance (the contexts of a given word or set of words)
  • N-grams (common two-, three-, etc.- word phrases)
  • Entity recognition (identifying names, places, time periods, etc.)
  • Dictionary tagging (locating a specific set of words in the texts)

High-level Goals for Text Analysis

(From Underwood, T. (2012). Where to start with text mining.)

  • Document categorization
    • Information retrieval (e.g., search engines)
    • Supervised classification (e.g., guessing genres)
    • Unsupervised clustering (e.g., alternative “genres”)
  • Corpora comparison (e.g., political speeches)
  • Language use over time (e.g., Google ngram viewer)
  • Detecting clusters of document features (i.e., topic modeling)
  • Entity recognition/extraction (e.g., geoparsing)
  • Visualization

Tools with Their Analysis Methods

Web Tools

  • Voyant Tools – word frequencies, concordance, word clouds, visualizations
  • TAPorWare – various data cleaning, annotating, and summarizing tools in a web interface
  • Netlytic – word frequencies, concordance, dictionary tagging, network analysis
  • Wmatrix – frequency profiles, concordances, compare frequency lists, n-grams and c-grams, collocations
  • Natural Language Processor & Analyzer - word frequencies, collocations, concordance, tokenizer, etc.
  • ManyEyes – interactive text visualizations (network diagram, word tree, phrase net, tag cloud, word cloud)
  • Overview – Automatic topic tagging and visualization
  • Monk Workbench – Corpus selection from library holdings, frequencies and corpora comparisons, supervised classification
  • LIWC - Web version will output a few linguistic dimensions; full version can be licensed for ~$100

Downloadable Applications
(no programming required)

Other Lists of Tools