Deep Learning for text and sequences

Prof. Eric A. Suess

Introduction

In Chapter 10 and 11 the topics of applying Deep Learning to Text Processing and Time Series are discussed.

  • Preprocessing text data into useful representations
  • Working with recurrent neural networks (RNNs)
  • Using 1D convnets for sequence processing

Applications of these algorithms include the following:

  • Document classification and timeseries classification, such as identifying the topic of an article or the author of a book
  • Timeseries comparisons, such as estimating how closely related two documents or two stock tickers are
  • Sequence-to-sequence learning, such as decoding an English sentence into French
  • Sentiment analysis, such as classifying the sentiment of tweets or movie reviews as positive or negative
  • Timeseries forecasting, such as predicting the future weather at a certain location, given recent weather data

Text Processing

  • Bag-of-words model

  • Vectorizing text

  • One-hot encoding - text_tokenizer, fit_text_tokenizer

  • Token embedding - layer_embedding

  • n-grams

Text Processing

“When you instantiate an embedding layer, its weights (its internal dictionary of token vectors) are initially random, just as with any other layer. During training, these word vectors are gradually adjusted via backpropagation, structuring the space into something the downstream model can exploit. Once fully trained, the embedding space will show a lot of structure—a kind of structure specialized for the specific problem for which you’re training your model.”

Pretrained

  • word2vec
  • GloVe
  • BERT

Start with words and embed them in the pretrained vector space

Example

The IMDB movie reviews is examined again using the GloVe

Recurrent NNs

So far, NNs have no memory.

“Easy enough: in summary, an RNN is a for loop that reuses quantities computed during the previous iteration of the loop, nothing more.”

layer_simple_rnn

LSTM and GRU

  • fixes the vanishing gradient problem

  • carry track

LSTM and GRU

“Just keep in mind what the LSTM cell is meant to do: allow past information to be reinjected at a later time, thus fighting the vanishing-gradient problem.”

“LSTM is not good at sentiment analysis, it is much better at more complex problems like question and answer, and machine translation.”

Time Series Forecasting

normalize

generator functions

GRU Gated recurrent unit

Recall

“Recall the description of the universal machine-learning workflow: it’s generally a good idea to increase the capacity of your network until overfitting becomes the primary obstacle (assuming you’ve already taken basic steps to mitigate overfitting, such as using dropout). As long as you aren’t overfitting too badly, you’re likely under capacity.”

Time Series Forecasting

Bi-directional RNN

“There are two important concepts we won’t cover in detail here: recurrent attention and sequence masking.”