This notebook contains the code samples found in Chapter 8, Section 5 of Deep Learning with R. Note that the original text features far more content, in particular further explanations and figures: in this notebook, you will only find source code and related comments.
In this section, we’ll explain how to implement a GAN in Keras, in its barest form – because GANs are advanced, diving deeply into the technical details would be out of scope for this book. The specific implementation is a deep convolutional GAN (DCGAN): a GAN where the generator and discriminator are deep convnets. In particular, it uses a layer_conv_2d_transpose()
for image upsampling in the generator.
We will train our GAN on images from CIFAR10, a dataset of 50,000 32x32 RGB images belong to 10 classes (5,000 images per class). To make things even easier, we will only use images belonging to the class “frog”.
Schematically, our GAN looks like this:
generator
network maps vectors of shape (latent_dim)
to images of shape (32, 32, 3)
.discriminator
network maps images of shape (32, 32, 3) to a binary score estimating the probability that the image is real.gan
network chains the generator and the discriminator together: gan(x) <- discriminator(generator(x))
. Thus this gan
network maps latent space vectors to the discriminator’s assessment of the realism of these latent vectors as decoded by the generator.gan
model. This means that, at every step, we move the weights of the generator in a direction that will make the discriminator more likely to classify as “real” the images decoded by the generator. I.e. we train the generator to fool the discriminator.Training GANs and tuning GAN implementations is notoriously difficult. There are a number of known “tricks” that one should keep in mind. Like most things in deep learning, it is more alchemy than science: these tricks are really just heuristics, not theory-backed guidelines. They are backed by some level of intuitive understanding of the phenomenon at hand, and they are known to work well empirically, albeit not necessarily in every context.
Here are a few of the tricks that we leverage in our own implementation of a GAN generator and discriminator below. It is not an exhaustive list of GAN-related tricks; you will find many more across the GAN literature.
tanh
as the last activation in the generator, instead of sigmoid
, which is more commonly found in other types of models.layer_activation_leaky_relu()
instead of a ReLU activation. It’s similar to ReLU, but it relaxes sparsity constraints by allowing small negative activation values.layer_conv_2d_transpose()
or layer_conv_2d()
in both the generator and the discriminator.First, we develop a generator
model, which turns a vector (from the latent space – during training it will sampled at random) into a candidate image. One of the many issues that commonly arise with GANs is that the generator gets stuck with generated images that look like noise. A possible solution is to use dropout on both the discriminator and generator.
library(keras)
latent_dim <- 32
height <- 32
width <- 32
channels <- 3
generator_input <- layer_input(shape = c(latent_dim))
generator_output <- generator_input %>%
# First, transform the input into a 16x16 128-channels feature map
layer_dense(units = 128 * 16 * 16) %>%
layer_activation_leaky_relu() %>%
layer_reshape(target_shape = c(16, 16, 128)) %>%
# Then, add a convolution layer
layer_conv_2d(filters = 256, kernel_size = 5,
padding = "same") %>%
layer_activation_leaky_relu() %>%
# Upsample to 32x32
layer_conv_2d_transpose(filters = 256, kernel_size = 4,
strides = 2, padding = "same") %>%
layer_activation_leaky_relu() %>%
# Few more conv layers
layer_conv_2d(filters = 256, kernel_size = 5,
padding = "same") %>%
layer_activation_leaky_relu() %>%
layer_conv_2d(filters = 256, kernel_size = 5,
padding = "same") %>%
layer_activation_leaky_relu() %>%
# Produce a 32x32 1-channel feature map
layer_conv_2d(filters = channels, kernel_size = 7,
activation = "tanh", padding = "same")
generator <- keras_model(generator_input, generator_output)
summary(generator)
____________________________________________________________________________________________________________________________________________________________________________
Layer (type) Output Shape Param #
============================================================================================================================================================================
input_9 (InputLayer) (None, 32) 0
____________________________________________________________________________________________________________________________________________________________________________
dense_64 (Dense) (None, 32768) 1081344
____________________________________________________________________________________________________________________________________________________________________________
leaky_re_lu_6 (LeakyReLU) (None, 32768) 0
____________________________________________________________________________________________________________________________________________________________________________
reshape_3 (Reshape) (None, 16, 16, 128) 0
____________________________________________________________________________________________________________________________________________________________________________
conv2d_136 (Conv2D) (None, 16, 16, 256) 819456
____________________________________________________________________________________________________________________________________________________________________________
leaky_re_lu_7 (LeakyReLU) (None, 16, 16, 256) 0
____________________________________________________________________________________________________________________________________________________________________________
conv2d_transpose_3 (Conv2DTranspose) (None, 32, 32, 256) 1048832
____________________________________________________________________________________________________________________________________________________________________________
leaky_re_lu_8 (LeakyReLU) (None, 32, 32, 256) 0
____________________________________________________________________________________________________________________________________________________________________________
conv2d_137 (Conv2D) (None, 32, 32, 256) 1638656
____________________________________________________________________________________________________________________________________________________________________________
leaky_re_lu_9 (LeakyReLU) (None, 32, 32, 256) 0
____________________________________________________________________________________________________________________________________________________________________________
conv2d_138 (Conv2D) (None, 32, 32, 256) 1638656
____________________________________________________________________________________________________________________________________________________________________________
leaky_re_lu_10 (LeakyReLU) (None, 32, 32, 256) 0
____________________________________________________________________________________________________________________________________________________________________________
conv2d_139 (Conv2D) (None, 32, 32, 3) 37635
============================================================================================================================================================================
Total params: 6,264,579
Trainable params: 6,264,579
Non-trainable params: 0
____________________________________________________________________________________________________________________________________________________________________________
Then, we develop a discriminator
model, that takes as input a candidate image (real or synthetic) and classifies it into one of two classes, either “generated image” or “real image that comes from the training set”.
discriminator_input <- layer_input(shape = c(height, width, channels))
discriminator_output <- discriminator_input %>%
layer_conv_2d(filters = 128, kernel_size = 3) %>%
layer_activation_leaky_relu() %>%
layer_conv_2d(filters = 128, kernel_size = 4, strides = 2) %>%
layer_activation_leaky_relu() %>%
layer_conv_2d(filters = 128, kernel_size = 4, strides = 2) %>%
layer_activation_leaky_relu() %>%
layer_conv_2d(filters = 128, kernel_size = 4, strides = 2) %>%
layer_activation_leaky_relu() %>%
layer_flatten() %>%
# One dropout layer - important trick!
layer_dropout(rate = 0.4) %>%
# Classification layer
layer_dense(units = 1, activation = "sigmoid")
discriminator <- keras_model(discriminator_input, discriminator_output)
summary(discriminator)
____________________________________________________________________________________________________________________________________________________________________________
Layer (type) Output Shape Param #
============================================================================================================================================================================
input_10 (InputLayer) (None, 32, 32, 3) 0
____________________________________________________________________________________________________________________________________________________________________________
conv2d_140 (Conv2D) (None, 30, 30, 128) 3584
____________________________________________________________________________________________________________________________________________________________________________
leaky_re_lu_11 (LeakyReLU) (None, 30, 30, 128) 0
____________________________________________________________________________________________________________________________________________________________________________
conv2d_141 (Conv2D) (None, 14, 14, 128) 262272
____________________________________________________________________________________________________________________________________________________________________________
leaky_re_lu_12 (LeakyReLU) (None, 14, 14, 128) 0
____________________________________________________________________________________________________________________________________________________________________________
conv2d_142 (Conv2D) (None, 6, 6, 128) 262272
____________________________________________________________________________________________________________________________________________________________________________
leaky_re_lu_13 (LeakyReLU) (None, 6, 6, 128) 0
____________________________________________________________________________________________________________________________________________________________________________
conv2d_143 (Conv2D) (None, 2, 2, 128) 262272
____________________________________________________________________________________________________________________________________________________________________________
leaky_re_lu_14 (LeakyReLU) (None, 2, 2, 128) 0
____________________________________________________________________________________________________________________________________________________________________________
flatten_11 (Flatten) (None, 512) 0
____________________________________________________________________________________________________________________________________________________________________________
dropout_5 (Dropout) (None, 512) 0
____________________________________________________________________________________________________________________________________________________________________________
dense_65 (Dense) (None, 1) 513
============================================================================================================================================================================
Total params: 790,913
Trainable params: 790,913
Non-trainable params: 0
____________________________________________________________________________________________________________________________________________________________________________
# To stabilize training, we use learning rate decay
# and gradient clipping (by value) in the optimizer.
discriminator_optimizer <- optimizer_rmsprop(
lr = 0.0008,
clipvalue = 1.0,
decay = 1e-8
)
discriminator %>% compile(
optimizer = discriminator_optimizer,
loss = "binary_crossentropy"
)
Finally, we setup the GAN, which chains the generator and the discriminator. This is the model that, when trained, will move the generator in a direction that improves its ability to fool the discriminator. This model turns latent space points into a classification decision, “fake” or “real”, and it is meant to be trained with labels that are always “these are real images”. So training gan
will updates the weights of generator
in a way that makes discriminator
more likely to predict “real” when looking at fake images. Very importantly, we set the discriminator to be frozen during training (non-trainable): its weights will not be updated when training gan
. If the discriminator weights could be updated during this process, then we would be training the discriminator to always predict “real”, which is not what we want!
# Set discriminator weights to non-trainable
# (will only apply to the `gan` model)
freeze_weights(discriminator)
gan_input <- layer_input(shape = c(latent_dim))
gan_output <- discriminator(generator(gan_input))
gan <- keras_model(gan_input, gan_output)
gan_optimizer <- optimizer_rmsprop(
lr = 0.0004,
clipvalue = 1.0,
decay = 1e-8
)
gan %>% compile(
optimizer = gan_optimizer,
loss = "binary_crossentropy"
)
Now we can begin training. To recapitulate, this is what the training loop looks like schematically. For each epoch, we do the following:
generator
using this random noise.discriminator
using these mixed images, with corresponding targets: either “real” (for the real images) or “fake” (for the generated images).gan
using these random vectors, with targets that all say “these are real images.” This updates the weights of the generator (only, because the discriminator is frozen inside gan
) to move them toward getting the discriminator to predict “these are real images” for generated images: that is, this trains the generator to fool the discriminator.Let’s implement it.
# Loads CIFAR10 data
cifar10 <- dataset_cifar10()