Unsupervised Learning
Clustering of documents
Topic modeling is a method for unsupervised classification of such documents, similar to clustering on numeric data, which finds natural groups of items even when we’re not sure what we’re looking for.
This allows documents to “overlap” each other in terms of content, rather than being separated into discrete groups, in a way that mirrors typical use of natural language.
The example in the book runs topic analysis on the Associated Press articles from around 1988.
The two topics found are Financial News and Politics.