The Document Term Matrix is one of the most common data storage formats for text based data.
The DTM is based on the bag-of-words model. Each word is a feature in the data set. This leads to sparse matricies.
Some of the most popular R libraries and Python packages use DTM.
tidytext to DTM
> tidy() # to tidy format
> cast() # to DTM
5.3.1 Example: mining financial articles does not run!