--- title: "Artificial Neural Networks" author: "Prof. Eric A. Suess" date: "March 3, 2021" output: beamer_presentation: default ioslides_presentation: default --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = FALSE) ``` ## Introduction Today we will introduce **Artificial Neural Networks** (ANN). Get to know the **terms** involved in thinking about ANNs. ## Introduction The author begins the introduction with **magic**, discussion of the idea of a **black box**, and ends with "there is no need to be intimidated!" ## black box Neural Networks are considered a **black box** process. ANNs are based on complex mathematical systems. But not a **zero node NN** is an alternative representation of the simple linear regression model. $y = mx + b$ $y(x) = w_1 x + w_2 1$ $y(x) = f(w_1 x + w_2 1)$ ## artificial neurons - ANNs are **versatile learners** that can be applied to nearly any learning task: **classification**, **numeric prediction**, and even **unsupervised pattern recognition**. - ANNs are best applied to problems where the **input data** and the **output data** are **well-understood** or at least fairly simple, yet the process that relates the input to the output is **extremely complex**. ## artificial neurons ANNs are designed as **conceptual models** of **human brain activity**. - incoming signals received by a cell's **dendrites** - signal transmitted through the **axon** - **synapse** - see page 207/221 and 208/222. - **activation function** ## artificial neurons An artificial neuron with $n$ input **dendrites**, with **weights** $w$ on the inputs $x$, the **activation function** $f$, and the resulting signal $y$ is the output **axon**. $y(x) = f\left(\sum_{i=1}^n w_i x_i \right)$ ## Activation functions In a biological sense, the **activation function** could be imagined as a process that involves summing the total input signal and determining whether it meets the firing threshold. If so, the neuron passes the signal on. Otherwise, it does nothing. ## Activation functions - **threshold** activation function - **unit step** activation function - **sigmoid** activation function - **differentiable** - **linear** activation function - **Gaussian** activation function - **Radial Basis Function** (RBF) network - **relu** activation function ## Activation functions For many of the activation functions, the range of input values that affect the output signal is relatively **narrow**. The **compression** of the signal results in a saturated signal at the high and low ends of very dynamic inputs. When this occurs, the activation function is called a **squashing function**. The solution to this is to use **standardization/normalization** of the features. ## Network topology The capacity of a neural network to learn is rooted in its **topology**, or the patterns and structures of interconnected neurons. - number of **layers** (see page 212/224) - can the **network travel backward**? (see page 213/227) - **number of nodes** (see page 214/228) ## Number of layers A set of neurons called **input nodes** receive unprocessed signals directly from the input data. Each input node is responsible for processing a single feature in the dataset. The feature's value is transformed by the node's activation function. The signals resulting from the input nodes are received by the **output node**, which uses its own activation function to generate a final prediction. - **single-layer** network - **multilayer** network - **hidden layers** / **deep learning** ## Direction of infomation travel - **feedforeward** networks - commonly used - feedback networks - theoretical - not used When people talk about applying ANNs they are most likely talking about using the **multilayer preceptron** (MLP) topology. ## Number of nodes in each layer The number of **input nodes** is *predetermined* by the number of features in the input data. The number of **output nodes** is *predetermined* by the number of outcomes to be modeled or the number of class level in the outcome. The number of **hidden nodes** is *left to the user to decide* prior to training the model. **More complex** network topologies with a **greater number** of network connections allow the learning of more complex problems. But run the risk of **overfitting**. ## Number of nodes in each layer A **best practice** is to use the **fewest nodes** that result in **adequate performance** on a validation dataset. It has been proven that a neural network with **at least one hidden layer** of sufficiently many neurons is a **universal function approximator**. ## Training ANNs **Learning by experience**. The network's connection weights reflect the patterns observed over time. Training ANNs by adjusting connection weights is very computationally intensive. An efficient method of training an ANN was discovered, called **backpropagation**. ## weights How does the algorithm determine how much (or whether) a weight should be changed? **gradient descent** derivative of each activation function. ## Modeling the strength of concrete The author gives as an example of the use of ANNs. The analysis of the **concrete** dataset. ## Software: R neuralNet package - A nice way to get started learning about ANNs in R is to read the paper in the [The R Journal](https://journal.r-project.org/) [neuralnet: Training of Neural Networks](https://journal.r-project.org/archive/2010/RJ-2010-006/index.html) - I have made an Rnotebooks of the code presented in the paper. - [neuralnet.Rmd](http://rpubs.com/esuess/neuralnet01) ## Software: R h2o package - A modern machine learning software company is [h2o.ai](https://www.h2o.ai/) - There is an R package to install and use the software. ## Software: R tensorflow and keras packages - Google's Deep Learning software is now available from within R by installing the tensorflow package. - A very commonly used frontend for tensorflow is keras. There is an R package for keras also. - [R Tensorflow](https://tensorflow.rstudio.com/) - [R Keras](https://tensorflow.rstudio.com/keras/)