Finally found a way to identify objects in a picture. The neural network to do this is called Yolo. Here is a blog post about how to use Yolo in R.
Object detection in just 3 lines of R code using Tiny YOLO
I used the devtools install that is given in this blog post and it worked.
devtools::install_github("bnosac/image", subdir = "image.darknet", build_vignettes = TRUE)
library(image.darknet)
yolo_tiny_voc <- image_darknet_model(type = 'detect', model = "tiny-yolo-voc.cfg",
weights = system.file(package="image.darknet", "models", "tiny-yolo-voc.weights"),
labels = system.file(package="image.darknet", "include", "darknet", "data", "voc.names"))
x <- image_darknet_detect(file = "/home/esuess/classes/2018-2019/02 - Spring 2019/Stat654/Final/google_car.png",
object = yolo_tiny_voc,
threshold = 0.19)
layer filters size input output
0 conv 16 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 16
1 max 2 x 2 / 2 416 x 416 x 16 -> 208 x 208 x 16
2 conv 32 3 x 3 / 1 208 x 208 x 16 -> 208 x 208 x 32
3 max 2 x 2 / 2 208 x 208 x 32 -> 104 x 104 x 32
4 conv 64 3 x 3 / 1 104 x 104 x 32 -> 104 x 104 x 64
5 max 2 x 2 / 2 104 x 104 x 64 -> 52 x 52 x 64
6 conv 128 3 x 3 / 1 52 x 52 x 64 -> 52 x 52 x 128
7 max 2 x 2 / 2 52 x 52 x 128 -> 26 x 26 x 128
8 conv 256 3 x 3 / 1 26 x 26 x 128 -> 26 x 26 x 256
9 max 2 x 2 / 2 26 x 26 x 256 -> 13 x 13 x 256
10 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512
11 max 2 x 2 / 1 13 x 13 x 512 -> 13 x 13 x 512
12 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
13 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024
14 conv 125 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 125
15 detection
Loading weights from /home/esuess/R/x86_64-pc-linux-gnu-library/3.4/image.darknet/models/tiny-yolo-voc.weights...Done!
/home/esuess/classes/2018-2019/02 - Spring 2019/Stat654/Final/google_car.png: Predicted in 2.440059 seconds.
Boxes: 845 of which 4 above the threshold.
person: 24%
car: 96%
person: 40%
bicycle: 50%
x <- image_darknet_detect(file = "/home/esuess/classes/2018-2019/02 - Spring 2019/Stat654/Final/busax.jpg",
object = yolo_tiny_voc,
threshold = 0.25)
layer filters size input output
0 conv 16 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 16
1 max 2 x 2 / 2 416 x 416 x 16 -> 208 x 208 x 16
2 conv 32 3 x 3 / 1 208 x 208 x 16 -> 208 x 208 x 32
3 max 2 x 2 / 2 208 x 208 x 32 -> 104 x 104 x 32
4 conv 64 3 x 3 / 1 104 x 104 x 32 -> 104 x 104 x 64
5 max 2 x 2 / 2 104 x 104 x 64 -> 52 x 52 x 64
6 conv 128 3 x 3 / 1 52 x 52 x 64 -> 52 x 52 x 128
7 max 2 x 2 / 2 52 x 52 x 128 -> 26 x 26 x 128
8 conv 256 3 x 3 / 1 26 x 26 x 128 -> 26 x 26 x 256
9 max 2 x 2 / 2 26 x 26 x 256 -> 13 x 13 x 256
10 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512
11 max 2 x 2 / 1 13 x 13 x 512 -> 13 x 13 x 512
12 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
13 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024
14 conv 125 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 125
15 detection
Loading weights from /home/esuess/R/x86_64-pc-linux-gnu-library/3.4/image.darknet/models/tiny-yolo-voc.weights...Done!
/home/esuess/classes/2018-2019/02 - Spring 2019/Stat654/Final/busax.jpg: Predicted in 2.479083 seconds.
Boxes: 845 of which 4 above the threshold.
train: 73%
bus: 28%
person: 86%
person: 55%