In Chapter 10 and 11 the topics of applying Deep Learning to Text Processing and Time Series are discussed.
Bag-of-words model
Vectorizing text
One-hot encoding - text_tokenizer, fit_text_tokenizer
Token embedding - layer_embedding
n-grams
“When you instantiate an embedding layer, its weights (its internal dictionary of token vectors) are initially random, just as with any other layer. During training, these word vectors are gradually adjusted via backpropagation, structuring the space into something the downstream model can exploit. Once fully trained, the embedding space will show a lot of structure—a kind of structure specialized for the specific problem for which you’re training your model.”
Start with words and embed them in the pretrained vector space
The IMDB movie reviews is examined again using the GloVe
So far, NNs have no memory.
“Easy enough: in summary, an RNN is a for loop that reuses quantities computed during the previous iteration of the loop, nothing more.”
layer_simple_rnn
fixes the vanishing gradient problem
carry track
“Just keep in mind what the LSTM cell is meant to do: allow past information to be reinjected at a later time, thus fighting the vanishing-gradient problem.”
“LSTM is not good at sentiment analysis, it is much better at more complex problems like question and answer, and machine translation.”
normalize
generator functions
GRU Gated recurrent unit
“Recall the description of the universal machine-learning workflow: it’s generally a good idea to increase the capacity of your network until overfitting becomes the primary obstacle (assuming you’ve already taken basic steps to mitigate overfitting, such as using dropout). As long as you aren’t overfitting too badly, you’re likely under capacity.”
Bi-directional RNN
“There are two important concepts we won’t cover in detail here: recurrent attention and sequence masking.”