2024-02-26
Today we will introduce Artificial Neural Networks (ANN).
Get to know the terms involved in thinking about ANNs.
The author begins the introduction with magic, discussion of the idea of a black box, and ends with “there is no need to be intimidated!”
Neural Networks are considered a black box process.
ANNs are based on complex mathematical systems.
But not a zero node NN is an alternative representation of the simple linear regression model.
\(y = mx + b\)
\(y(x) = w_1 x + w_2 1\)
\(y(x) = f(w_1 x + w_2 1)\)
ANNs are versatile learners that can be applied to nearly any learning task: classification, numeric prediction, and even unsupervised pattern recognition.
ANNs are best applied to problems where the input data and the output data are well-understood or at least fairly simple, yet the process that relates the input to the output is extremely complex.
ANNs are designed as conceptual models of human brain activity.
An artificial neuron with \(n\) input dendrites, with weights \(w\) on the inputs \(x\), the activation function \(f\), and the resulting signal \(y\) is the output axon.
\(y(x) = f\left(\sum_{i=1}^n w_i x_i \right)\)
In a biological sense, the activation function could be imagined as a process that involves summing the total input signal and determining whether it meets the firing threshold.
If so, the neuron passes the signal on. Otherwise, it does nothing.
For many of the activation functions, the range of input values that affect the output signal is relatively narrow.
The compression of the signal results in a saturated signal at the high and low ends of very dynamic inputs.
When this occurs, the activation function is called a squashing function.
The solution to this is to use standardization/normalization of the features.
The capacity of a neural network to learn is rooted in its topology, or the patterns and structures of interconnected neurons.
A set of neurons called input nodes receive unprocessed signals directly from the input data. Each input node is responsible for processing a single feature in the dataset.
The feature’s value is transformed by the node’s activation function. The signals resulting from the input nodes are received by the output node, which uses its own activation function to generate a final prediction.
When people talk about applying ANNs they are most likely talking about using the multilayer preceptron (MLP) topology.
The number of input nodes is predetermined by the number of features in the input data.
The number of output nodes is predetermined by the number of outcomes to be modeled or the number of class level in the outcome.
The number of hidden nodes is left to the user to decide prior to training the model.
More complex network topologies with a greater number of network connections allow the learning of more complex problems.
But run the risk of overfitting.
A best practice is to use the fewest nodes that result in adequate performance on a validation dataset.
It has been proven that a neural network with at least one hidden layer of sufficiently many neurons is a universal function approximator.
Learning by experience.
The network’s connection weights reflect the patterns observed over time.
Training ANNs by adjusting connection weights is very computationally intensive.
An efficient method of training an ANN was discovered, called backpropagation.
How does the algorithm determine how much (or whether) a weight should be changed?
gradient descent
derivative of each activation function.
The author gives as an example of the use of ANNs.
The analysis of the concrete dataset.