---
title: "Getting Started"
author: "Prof. Eric A. Suess"
format: 
  revealjs:
    embed-resources: true
---

## Getting Started

In Chapters 2, 3, 4 the *network architecture*, *compilation* and *fitting* steps are discussed.  The first examples are of feedforward neural networks.

In the network architecture the *layer_dense()* and the *activation function* is used.

In the compilation steps the *loss*, *the optimizer*, and the *metric* is specified.

## Tensors

multidimensional arrays

- 1D
- 2D
- 3D
- 4D

## Examples of Tensors

- Vector data - **2D tensors** of shape (samples, features)
- Timeseries data or sequence data - **3D tensors** of shape (samples, timesteps, features)
- Images - **4D tensors** of shape (samples, height, width, channels) or (samples, channels, height, width)
- Video - **5D tensors** of shape (samples, frames, height, width, channels) or (samples, frames, channels, height, width)

## Array reshaping

In R **columns** are the primary way to look at arrays.

In Tensorflow **rows** are primary and the *array_reshape()* is used to prepare data for use with keras/tensorflow.

## A geometric interpretation of deep learning

Be sure to read this section.  The description of what neural networks try to do is very interesting.

 See Figure 5.9

## Gradients

Derivatives in Calculus measure the rate of change of a function.

In multidimensional space derivatives are called gradients.

Gradients are used to minimize/maximize functions.

## Stochastic Gradient Decent

Four step process.

1. Draw a random batch of training samples x and corresponding targets y.
2. Run the network on x to obtain predictions y_pred.
3. Compute the loss of the network on the batch, a measure of the mismatch between y_pred and y.
4. Compute the gradient of the loss with regard to the network’s parameters (a backward pass).
5. Move the parameters a little in the opposite direction from the gradient—for example, W = W - (step * gradient)—thus reducing the loss on the batch a bit.

## Backpropagation Algorithm

A neural network is composed of multiple tensor operations.

The backpropagation algorithm used the **Chain Rule** to implement the algorithm.

## Examples in Chapter 4

Try to get the examples from Chapter 4 to run.

1. Binary classification, movie reviews:  The data preparation uses the Bag-of-Words model.  All words in the reviews for the Dictionary or Bag-of-Words.  Each word is given a number value.

2. Multi-class classification, newswires

3. Regression, predicting house prices.