---
title: "Gradient Descent and Back Propogation"
author: "Prof. Eric A. Suess"
format: revealjs
---

## What is Gradient Descent?

Each layer in the neural network computes

     > output = relu(dot(W, input) + b)
     
The updating of *W* and *b* is the learning of the network.

## Training loop

1. Draw a batch of training samples x and corresponding targets y.
2. Run the network on x (a step called the *forward pass*) to obtain predictions y_pred.
3. Compute the loss of the network on the batch, a measure of the mismatch between y_pred and y.
4. Update all weights of the network in a way that slightly reduces the loss on this batch (a step called the *backward pass*)

## Training loop

     > y_pred = dot(W, x)
     > loss_value = loss(y_pred, y)
    
Define 
 
     > loss_value = f(W)
     
take derivatives or what is called *computing the gradient*.

Update in the opposite direction of the gradient.

## What is Stochastic Gradient Descent (SGD)?

Training loop based on random samples.

- Steps

## What is Stochastic Gradient Descent (SGD)?

1. Draw a batch of training samples x and corresponding targets y.  (mini-batches)
2. Run the network on x to obtain predictions y_pred.
3. Compute the loss of the network on the batch, a measure of the mismatch between y_pred and y.
4. Compute the gradient of the loss with regard to the network’s parameters (a backward pass).
5. Move the parameters a little in the opposite direction from the gradient, for example, $W = W - (step * gradient)$.

## What is Stochastic Gradient Descent (SGD)?

- Step 5: Thus reducing the loss on the batch a bit.

## What is Back Propogation?

If the neural network had three layers

$$f(W1, W2, W3) = a(W1, b(W2, c(W3)))$$
     
then we can use the Chain Rule from Calculus to take derivatives.

## 3Blue1Brown

The [3Blue1Brown](https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw) [Neural Networks](https://www.youtube.com/watch?v=aircAruvnKk&list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi) series is very nice for visualizing SGD and Back Propagation.

## Optimizer

The optimizer specifies the exact way in which the gradient of the loss will be used to update parameters.