--- title: "movie_sentiment" author: "Prof. Eric A. Suess" format: html: embed-resources: true --- ## Classifying movie reviews: A binary classification example Load the data. Note: The multi-assignment operator (%<-%) from the zeallot package to unpack the list into a set of distinct variables. ```{r} library(keras) imdb <- dataset_imdb(num_words = 10000) c(c(train_data, train_labels), c(test_data, test_labels)) %<-% imdb ``` Examine the data. ```{r} str(train_data) str(train_labels) ``` decode one of these reviews ```{r} word_index <- dataset_imdb_word_index() reverse_word_index <- names(word_index) names(reverse_word_index) <- as.character(word_index) decoded_words <- train_data[[1]] %>% sapply(function(i) { if (i > 3) reverse_word_index[[as.character(i - 3)]] else "?" }) decoded_review <- paste0(decoded_words, collapse = " ") cat(decoded_review, "\n") ``` Encoding the integer sequences via multi-hot encoding ```{r} vectorize_sequences <- function(sequences, dimension = 10000){ results <- array(0, dim = c(length(sequences), dimension)) for (i in seq_along(sequences)) { sequence <- sequences[[i]] for (j in sequence) results[i, j] <- 1 } results } x_train <- vectorize_sequences(train_data) x_test <- vectorize_sequences(test_data) ``` ```{r} str(x_train) y_train <- as.numeric(train_labels) y_test <- as.numeric(test_labels) ``` Define the model. Feedforward neural network with **two** hidden layers ```{r} model <- keras_model_sequential() %>% layer_dense(16, activation = "relu") %>% layer_dense(16, activation = "relu") %>% layer_dense(1, activation = "sigmoid") ``` Compile the model ```{r} model %>% compile(optimizer = "rmsprop", loss = "binary_crossentropy", metrics = "accuracy") ``` Validation set ```{r} x_val <- x_train[seq(10000), ] partial_x_train <- x_train[-seq(10000), ] y_val <- y_train[seq(10000)] partial_y_train <- y_train[-seq(10000)] ``` Train the model ```{r} history <- model %>% fit( partial_x_train, partial_y_train, epochs = 20, batch_size = 512, validation_data = list(x_val, y_val) ) ``` ```{r} str(history$metrics) plot(history) ``` Refit the model using 4 epochs ```{r} model <- keras_model_sequential() %>% layer_dense(16, activation = "relu") %>% layer_dense(16, activation = "relu") %>% layer_dense(1, activation = "sigmoid") model %>% compile(optimizer = "rmsprop", loss = "binary_crossentropy", metrics = "accuracy") model %>% fit(x_train, y_train, epochs = 4, batch_size = 512) ``` ```{r} results <- model %>% evaluate(x_test, y_test) results ``` ```{r} sentiment <- model %>% predict(x_test) head(sentiment) ```