Web Scraping
Introduction to Text Analysis
"Text analysis" is a broad term covering various processes by which text and natural language documents can be modified so that they can be organized and described.
This guide collects resources for several phases of the text analysis process, including text collection, text parsing and cleaning, text summary and analysis methods, and text visualization.
Overviews/summaries
Possible Sources of Text
- Native digital text
- HTML
- RSS feeds
- Sample specific services:
- Tutorials for data collection from various services
- Digitized
- Internet Archive
- Project Gutenberg
- Google Books
- Hathi Trust
- JSTOR Data for Research* (with Early Journal Content bundle)
- PubMed Open Access Subset
- Monk Workbench*
- Document Cloud*
- Open American National Corpus (collection of American English from various sources)
- WordHoard* (tagged literary texts)
* - also has some processing/analysis capabilities
Guide Creator |








Loading...
