Perceptrons and Neurons: A Simple NN Model
Learning objectives
- Explain the biological analogy for artificial neurons
- Compute the output of a perceptron given weights and inputs
- Apply the perceptron learning rule
- Identify the XOR limitation and the need for multi-layer networks
- Describe common activation functions: sigmoid, ReLU, tanh
- Understand forward propagation through a multi-layer perceptron
From Biology to Artificial Neurons
The human brain contains roughly 86 billion neurons, each receiving signals through dendrites, processing them in the cell body, and transmitting output through the axon. An artificial neuron mimics this: it receives numerical inputs, computes a weighted sum, applies an activation function, and produces an output.
1. The Perceptron
Model
A perceptron takes inputs , multiplies each by a weight , adds a bias , and applies a step activation function:
The perceptron is a binary classifier: it divides the input space with a hyperplane (a line in 2D) and assigns one class to each side.
Perceptron Learning Rule
Given a training example where is the true label, the weights are updated as:
where is the learning rate (typically 0.01 to 1). If the prediction is correct, and no update occurs. If wrong, the weights shift in the direction that would make the prediction closer to correct.
Geometric Interpretation
The perceptron defines a decision boundary: the set of points where . In 2D with inputs , this is the line . Data on one side is classified as 1, and on the other as 0.
2. Limitations: The XOR Problem
Linear Separability
A perceptron can only classify data that is linearly separable—data where a single straight line (or hyperplane) can separate the two classes. The logical AND and OR functions are linearly separable, but the XOR function is not:
| AND | OR | XOR | ||
|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 |
| 0 | 1 | 0 | 1 | 1 |
| 1 | 0 | 0 | 1 | 1 |
| 1 | 1 | 1 | 1 | 0 |
No single line can separate the 1s from the 0s in XOR. This limitation, highlighted by Minsky and Papert (1969), led to reduced interest in neural networks for over a decade.
3. Multi-Layer Perceptron (MLP)
Architecture
The solution to the XOR problem (and other nonlinear problems) is to stack multiple layers of neurons:
- Input layer: receives the raw features (e.g., well-log values)
- Hidden layer(s): intermediate layers that learn nonlinear representations
- Output layer: produces the final prediction
Each neuron in a hidden layer computes followed by a nonlinear activation function .
4. Activation Functions
Sigmoid
Output range: . Smooth and differentiable. Drawback: gradients become very small for large (vanishing gradient problem).
ReLU (Rectified Linear Unit)
Output range: . Computationally efficient and avoids vanishing gradients for positive values. Most widely used in modern deep learning.
Tanh (Hyperbolic Tangent)
Output range: . Zero-centred, which often helps training converge faster than sigmoid. Still suffers from vanishing gradients at extremes.
5. Forward Propagation
The Computation Flow
For a network with one hidden layer:
- Hidden layer: , then
- Output layer: , then
The superscript denotes the layer number.
6. Backpropagation (Intuition)
How Does the Network Learn?
After a forward pass, we compute the loss (e.g., cross-entropy or MSE). Backpropagation uses the chain rule of calculus to compute how much each weight contributed to the error, then updates all weights simultaneously using gradient descent. The key insight: errors at the output layer propagate backward through the network, allowing each layer to adjust its weights to reduce the overall error.
Geoscience Applications
Neural networks are used for well-log facies classification: given a set of well-log measurements (gamma ray, resistivity, density, neutron porosity), classify each depth interval into a rock type (sandstone, shale, limestone, etc.). They are also used for seismic inversion, earthquake detection, and mineral prospectivity mapping.
[Refs: Goodfellow et al., Deep Learning; Haykin, Neural Networks and Learning Machines]
References
- Goodfellow, I., Bengio, Y., Courville, A. (2016). Deep Learning, ch. 6 (feedforward networks, activations). MIT Press.
- LeCun, Y., Bengio, Y., Hinton, G. (2015). Deep learning. Nature 521, 436–444.
- Bishop, C.M. (2006). Pattern Recognition and Machine Learning, ch. 5 (neural networks). Springer.
- Murphy, K.P. (2022). Probabilistic Machine Learning: An Introduction, ch. 13 (neural networks). MIT Press.