Deep Learning for text and sequences

Prof. Eric A. Suess

Introduction

In Chapter 10 and 11 the topics of applying Deep Learning to Text Processing and Time Series are discussed.

Preprocessing text data into useful representations
Working with recurrent neural networks (RNNs)
Using 1D convnets for sequence processing

Applications of these algorithms include the following:

Document classification and timeseries classification, such as identifying the topic of an article or the author of a book
Timeseries comparisons, such as estimating how closely related two documents or two stock tickers are
Sequence-to-sequence learning, such as decoding an English sentence into French
Sentiment analysis, such as classifying the sentiment of tweets or movie reviews as positive or negative
Timeseries forecasting, such as predicting the future weather at a certain location, given recent weather data

Text Processing

Bag-of-words model
Vectorizing text
One-hot encoding - text_tokenizer, fit_text_tokenizer
Token embedding - layer_embedding
n-grams

Text Processing

“When you instantiate an embedding layer, its weights (its internal dictionary of token vectors) are initially random, just as with any other layer. During training, these word vectors are gradually adjusted via backpropagation, structuring the space into something the downstream model can exploit. Once fully trained, the embedding space will show a lot of structure—a kind of structure specialized for the specific problem for which you’re training your model.”

Pretrained

word2vec
GloVe
BERT

Start with words and embed them in the pretrained vector space

Example

The IMDB movie reviews is examined again using the GloVe

Recurrent NNs

So far, NNs have no memory.

“Easy enough: in summary, an RNN is a for loop that reuses quantities computed during the previous iteration of the loop, nothing more.”

layer_simple_rnn

LSTM and GRU

fixes the vanishing gradient problem
carry track

LSTM and GRU

“Just keep in mind what the LSTM cell is meant to do: allow past information to be reinjected at a later time, thus fighting the vanishing-gradient problem.”

“LSTM is not good at sentiment analysis, it is much better at more complex problems like question and answer, and machine translation.”

Transformers

Transformers were introduced in the seminal paper “Attention Is All You Need”
As it turned out, a simple mechanism called “neural attention” could be used to build powerful sequence models that didnot feature any recurrent layers or convolution layers.

Sequence-to-sequence learning

Machine translation - Convert a paragraph in a source language to its equivalent in a target language.
Text summarization - Convert a long document to a shorter version that retains the most important information.
Question answering - Convert an input question into its answer.
Chatbots - Convert a dialogue prompt into a reply to this prompt, or convert the history of a conversation into the next reply in the conversation.
Text generation - Convert a text prompt into a paragraph that completes the prompt.

LLMs and AI

Encoder/Decoder
GPT
LLMs

Time Series Forecasting

normalize
generator functions
GRU Gated recurrent unit

Recall

“Recall the description of the universal machine-learning workflow: it’s generally a good idea to increase the capacity of your network until overfitting becomes the primary obstacle (assuming you’ve already taken basic steps to mitigate overfitting, such as using dropout). As long as you aren’t overfitting too badly, you’re likely under capacity.”

Time Series Forecasting

Bi-directional RNN

“There are two important concepts we won’t cover in detail here: recurrent attention and sequence masking.”