Transfer Learning

Prof. Eric A. Suess

Transfer Learning

Today we will be discussing some of the ideas of Transfer Learning. This is the use of pretrained neural networks to apply them to one’s specific data that is usually smaller than was was available for the pretrained neural network.

The idea

“A pretrained network is a saved network that was previously trained on a large dataset, typically on a large-scale image-classification task. If this original dataset is large enough and general enough, then the spatial hierarchy of features learned by the pretrained network can effectively act as a generic model of the visual world, and hence its features can prove useful for many different computer-vision problems, even though these new problems may involve completely different classes than those of the original task.

ImageNet

“Let’s consider a large convnet trained on the ImageNet dataset (1.4 million labeled images and 1,000 different classes). ImageNet contains many animal classes, including different species of cats and dogs, and you can thus expect to perform well on the dogs-versus-cats classification problem.”

Feature extraction

“Feature extraction consists of taking the convolutional base of a previously trained network, running the new data through it, and training a new classifier on top of the output.”

“Why only reuse the convolutional base? Could you reuse the densely connected classifier as well? In general, doing so should be avoided.”

Pretrained models in keras

The VGG16 model, among others, comes prepackaged with Keras. Here’s the list of image-classification models (all pretrained on the ImageNet dataset) that are available as part of Keras:

Xception
Inception V3
ResNet50
VGG16
VGG19
MobileNet

Two ways to proceed

Fast feature extraction without data augmentation - faster - CPU
Feature extraction with data augmentation - slower - GPU

Fine-tuning

unfreeze certain layers in the convolutional part of the neural network

Wrapping up

Here’s what you should take away from the exercises in the past two sections:

Convnets are the best type of machine-learning models for computer-vision tasks. It’s possible to train one from scratch even on a very small dataset, with decent results.
On a small dataset, overfitting will be the main issue. Data augmentation is a powerful way to fight overfitting when you’re working with image data.
It’s easy to reuse an existing convnet on a new dataset via feature extraction. This is a valuable technique for working with small image datasets.
As a complement to feature extraction, you can use fine-tuning, which adapts to a new problem some of the representations previously learned by an existing model. This pushes performance a bit further.

Visualizing what convnets learn

“It’s often said that deep-learning models are “black boxes”: learning representations that are difficult to extract and present in a human-readable form. Although this is partially true for certain types of deep-learning models, it’s definitely not true for convnets. The representations learned by convnets are highly amenable to visualization, in large part because they’re representations of visual concepts.”

Visualizing

Visualizing intermediate convnet outputs (intermediate activations)—Useful for understanding how successive convnet layers transform their input, and for getting a first idea of the meaning of individual convnet filters.
Visualizing convnets filters—Useful for understanding precisely what visual pattern or concept each filter in a convnet is receptive to.
Visualizing heatmaps of class activation in an image—Useful for understanding which parts of an image were identified as belonging to a given class, thus allowing you to localize objects in images.

Few things:

The first layer acts as a collection of various edge detectors. At that stage, the activations retain almost all of the information present in the initial picture.
As you go higher, the activations become increasingly abstract and less visually interpretable. They begin to encode higher-level concepts such as “cat ear” and “cat eye.” Higher presentations carry increasingly less information about the visual contents of the image, and increasingly more information related to the class of the image.
The sparsity of the activations is increasing with the depth of the layer: in the first layer, all filters are activated by the input image, but in the following layers some filters are blank. This means that the pattern encoded by the filter isn’t found in the input image.

Few things:

“We have just evidenced an important universal characteristic of the representations learned by deep neural networks: the features extracted by a layer become increasingly abstract with the depth of the layer. The activations of higher layers carry less and less information about the specific input being seen, and more and more information about the target (in this case, the class of the image: cat or dog). A deep neural network effectively acts as an information distillation pipeline, with raw data going in (in this case, RGB pictures) and being repeatedly transformed so that irrelevant information is filtered out (for example, the specific visual appearance of the image), and useful information is magnified and refined (for example, the class of the image).”

Summary

Convnets are the best tool for attacking visual-classification problems.
Convnets work by learning a hierarchy of modular patterns and concepts to represent the visual world.
The representations they learn are easy to inspect-convnets are the opposite of black boxes!
You’re now capable of training your own convnet from scratch to solve an image-classification problem.
You understand how to use visual data augmentation to fight overfitting.
You know how to use a pretrained convnet to do feature extraction and fine-tuning.
You can generate visualizations of the filters learned by your convnets, as well as heatmaps of class activity.