Convolutional neural networks revolutionized the field of computer vision and are now the main type of deep learning neural networks that are used in this feild.
Spotlight Blog: Medium How to easily Detect Objects with Deep Learning on Raspberry Pi
What is the difference between a densely connected layer and a convolution layer?
“In the MNIST example, the first convolution layer takes a feature map of size (28, 28, 1) and outputs a feature map of size (26, 26, 32): it computes 32 filters over its input. Each of these 32 output channels contains a 26 × 26 grid of values, which is a response map of the filter over the input, indicating the response of that filter pattern at different locations in the input (see figure 5.3). That is what the term feature map means: every dimension in the depth axis is a feature (or filter), and the 2D tensor output[:, :, n] is the 2D spatial map of the response of this filter over the input.”
Note that the output width and height may differ from the input width and height. They may differ for two reasons:
The role of max pooling: to aggressively downsample feature maps, much like strided convolutions.
“The reason to use downsampling is to reduce the number of feature-map coefficients to process, as well as to induce spatial-filter hierarchies by making successive convolution layers look at increasingly large windows (in terms of the fraction of the original input they cover).”
“Having to train an image-classification model using very little data is a common situation, which you’ll likely encounter in practice if you ever do computer vision in a professional context.”
Classifying images as dogs or cats
Strategy to tackle this problem:
Download the data from kaggle dogs-vs-cats.
Note that you will need to modify the original download directory to the location where you have download the data and unzipped it.
> original_dataset_dir <- "~/Downloads/kaggle_original_data"
Remember in Windows the path needs forward slashes or double back slashes.
“The depth of the feature maps progressively increases in the network (from 32 to 128), whereas the size of the feature maps decreases (from 148 × 148 to 7 × 7). This is a pattern you’ll see in almost all convnets.”
Convert .jpg into floating-point tensors. Keras has the following function that does this.
> image_data_generator()
> flow_images_from_directory()
> model %>% fit_generator()
Because we are working with a relatively small dataset, overfitting may be a problem.
We have discussed Dropout and Regularization.
In computer vision problems Data Augmentation is commonly used to fight overfitting.
“Data augmentation takes the approach of generating more training data from existing training samples, by augmenting the samples via a number of random transformations that yield believable-looking images. The goal is that at training time, your model will never see the exact same picture twice. This helps expose the model to more aspects of the data and generalize better.”
“In Keras, this can be done by configuring a number of random transformations to be performed on the images.”
> image_data_generator()