Building a Simple Neural Network
Learning objectives
- Use scikit-learn MLPClassifier to build a neural network
- Prepare well-log data with proper feature scaling for NN training
- Train, evaluate, and tune a simple MLP for facies classification
- Interpret model performance with confusion matrices and classification reports
- Visualise training loss curves and diagnose convergence issues
- Perform hyperparameter tuning across architectures, activations, and regularisation
Neural Networks in Python with scikit-learn
scikit-learn's MLPClassifier provides a quick way to build multi-layer perceptrons without needing deep-learning frameworks like TensorFlow or PyTorch. In this lab, we walk through the complete process of preparing well-log data, building an MLP, evaluating its performance, and tuning hyperparameters for a lithofacies classification task.
Step 1: Preparing Well-Log Data
Real well-log data typically comes as a table with columns for each logging tool measurement. For this lab, we simulate data with three common logs: gamma ray (GR), bulk density (RHOB), and neutron porosity (NPHI). Each log measures different physical properties and responds differently to rock types.
- Gamma Ray (GR): measures natural radioactivity. Shales have high GR (~90 API), sandstones moderate (~40 API), limestones low (~20 API).
- Bulk Density (RHOB): measures formation density. Sandstones (~2.30 g/cc), shales (~2.55 g/cc), limestones (~2.70 g/cc).
- Neutron Porosity (NPHI): responds to hydrogen content. Shales show high apparent porosity (~35%), sandstones moderate (~20%), limestones low (~3%).
%%%python import numpy as np import pandas as pd from sklearn.preprocessing import StandardScaler from sklearn.model_selection import train_test_split
Simulated well-log data for facies classification
Features: gamma_ray (API), density (g/cc), neutron_porosity (%)
Labels: 0=shale, 1=sandstone, 2=limestone
np.random.seed(42)
gamma_ray = np.concatenate([ np.random.normal(90, 15, 80), # shale: high GR np.random.normal(40, 10, 70), # sandstone: moderate GR np.random.normal(20, 8, 50) # limestone: low GR ]) density = np.concatenate([ np.random.normal(2.55, 0.05, 80), # shale np.random.normal(2.30, 0.08, 70), # sandstone np.random.normal(2.70, 0.04, 50) # limestone ]) neutron = np.concatenate([ np.random.normal(35, 5, 80), # shale: high NPHI np.random.normal(20, 6, 70), # sandstone: moderate np.random.normal(3, 2, 50) # limestone: low ]) labels = np.array([0]*80 + [1]*70 + [2]*50)
Build the feature matrix and label vector
X = np.column_stack([gamma_ray, density, neutron]) y = labels
print(f"Dataset shape: {X.shape}") print(f"Class distribution: shale={np.sum(y==0)}, " f"sandstone={np.sum(y==1)}, limestone={np.sum(y==2)}") %%%
Step 2: Train/Test Split and Feature Scaling
Neural networks require feature scaling because they use gradient-based optimisation. Features with different scales (e.g., GR in 0-150 API vs density in 2.0-2.8 g/cc) cause some weights to receive disproportionately large gradients, leading to slow or unstable convergence. Always standardise or normalise your features before training an NN.
%%%python
Train/test split with stratification to preserve class ratios
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42, stratify=y )
print(f"Training set: {X_train.shape[0]} samples") print(f"Test set: {X_test.shape[0]} samples")
Standardise features (zero mean, unit variance)
CRITICAL: fit on training data ONLY, then transform both sets
scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test) # use the SAME scaler
Verify scaling
print(f"\nAfter scaling:") print(f"Train means: {X_train_scaled.mean(axis=0).round(2)}") print(f"Train stds: {X_train_scaled.std(axis=0).round(2)}") %%%
After scaling, each feature has approximately zero mean and unit standard deviation. The test set will not have exactly zero mean because the scaler was fitted on the training data only — this is correct and avoids data leakage.
Step 3: Building and Training the MLP
The MLPClassifier key parameters are:
hidden_layer_sizes: a tuple defining the number of neurons in each hidden layer. E.g., (16, 8) means two hidden layers with 16 and 8 neurons.activation: the activation function — "relu" (default, most common), "tanh", or "logistic" (sigmoid).solver: the optimisation algorithm — "adam" (default, works well for most problems), "sgd" (stochastic gradient descent, more tuneable), or "lbfgs" (good for small datasets).max_iter: maximum number of training epochs. Increase if you see a convergence warning.alpha: L2 regularisation strength (default 0.0001). Increase to reduce overfitting.learning_rate_init: initial learning rate for sgd/adam (default 0.001).
%%%python from sklearn.neural_network import MLPClassifier
Create the MLP: 2 hidden layers with 16 and 8 neurons
mlp = MLPClassifier( hidden_layer_sizes=(16, 8), # two hidden layers activation="relu", # ReLU activation solver="adam", # Adam optimiser max_iter=500, # training epochs random_state=42, verbose=False # set True to see progress )
Train the network
mlp.fit(X_train_scaled, y_train)
Evaluate
train_acc = mlp.score(X_train_scaled, y_train) test_acc = mlp.score(X_test_scaled, y_test) print(f"Train accuracy: {train_acc:.3f}") print(f"Test accuracy: {test_acc:.3f}") print(f"Number of iterations: {mlp.n_iter_}") print(f"Final loss: {mlp.loss_:.4f}") %%%
Step 4: Evaluating with Classification Report and Confusion Matrix
Accuracy alone can be misleading, especially with imbalanced classes. The classification report provides precision, recall, and F1-score for each class. The confusion matrix shows exactly which classes are being confused with each other.
%%%python from sklearn.metrics import confusion_matrix, classification_report import matplotlib.pyplot as plt
y_pred = mlp.predict(X_test_scaled)
Classification report — precision, recall, F1 per class
print(classification_report(y_test, y_pred, target_names=["Shale", "Sandstone", "Limestone"])) %%%
Precision = of all samples predicted as class X, what fraction actually is class X? Recall = of all actual class X samples, what fraction did we correctly predict? F1 = harmonic mean of precision and recall.
%%%python
Confusion matrix visualisation
cm = confusion_matrix(y_test, y_pred) fig, ax = plt.subplots(figsize=(6, 5)) im = ax.imshow(cm, cmap="Blues") plt.colorbar(im) class_names = ["Shale", "Sandstone", "Limestone"] ax.set_xticks([0, 1, 2]) ax.set_xticklabels(class_names) ax.set_yticks([0, 1, 2]) ax.set_yticklabels(class_names) ax.set_xlabel("Predicted") ax.set_ylabel("Actual") ax.set_title("Confusion Matrix — Facies Classification") for i in range(3): for j in range(3): color = "white" if cm[i, j] > cm.max() / 2 else "black" ax.text(j, i, str(cm[i, j]), ha="center", va="center", color=color, fontsize=14) plt.tight_layout() plt.show() %%%
Step 5: Visualising the Training Loss Curve
The loss curve shows how the training loss decreases over epochs. It is crucial for diagnosing convergence problems.
%%%python plt.figure(figsize=(8, 4)) plt.plot(mlp.loss_curve_, linewidth=2) plt.xlabel("Epoch") plt.ylabel("Loss") plt.title("Training Loss Curve") plt.grid(alpha=0.3) plt.tight_layout() plt.show() %%%
How to read the loss curve:
- Smooth decrease to a plateau: good convergence. The model has learned the patterns.
- Loss still decreasing at the end: increase
max_iter— the model needs more training time. - Oscillating wildly: the learning rate is too high. Reduce with
learning_rate_init=0.0005. - Flat from the beginning: the learning rate may be too low, or the model architecture is too simple.
- Sharp drop then flat: the model quickly learned major patterns and is done. Normal for simple problems.
Step 6: Hyperparameter Tuning — Architectures
Systematically compare different network architectures to find the best configuration:
%%%python
Compare different architectures
results = [] for layers in [(8,), (16,), (16, 8), (32, 16), (32, 16, 8), (64, 32)]: mlp = MLPClassifier( hidden_layer_sizes=layers, max_iter=500, random_state=42 ) mlp.fit(X_train_scaled, y_train) train_a = mlp.score(X_train_scaled, y_train) test_a = mlp.score(X_test_scaled, y_test) results.append((str(layers), train_a, test_a)) print(f"Layers {str(layers):>15}: " f"train={train_a:.3f}, test={test_a:.3f}")
Find best architecture by test accuracy
best = max(results, key=lambda r: r[2]) print(f"\nBest architecture: {best[0]} " f"(test acc = {best[2]:.3f})") %%%
General guidelines: start simple and increase complexity. More layers and neurons allow the model to learn more complex patterns, but also increase the risk of overfitting. For well-log data with 3-5 features and a few hundred samples, architectures like (16, 8) or (32, 16) usually work well.
Step 7: Hyperparameter Tuning — Activation and Regularisation
%%%python
Compare activation functions
print("=== Activation Functions ===") for act in ["relu", "tanh", "logistic"]: mlp = MLPClassifier( hidden_layer_sizes=(16, 8), activation=act, max_iter=500, random_state=42 ) mlp.fit(X_train_scaled, y_train) print(f"{act:>10}: test acc = " f"{mlp.score(X_test_scaled, y_test):.3f}")
Compare regularisation strengths
print("\n=== Regularisation (alpha) ===") for alpha in [0.0001, 0.001, 0.01, 0.1]: mlp = MLPClassifier( hidden_layer_sizes=(16, 8), alpha=alpha, max_iter=500, random_state=42 ) mlp.fit(X_train_scaled, y_train) tr = mlp.score(X_train_scaled, y_train) te = mlp.score(X_test_scaled, y_test) print(f"alpha={alpha:.4f}: train={tr:.3f}, test={te:.3f}") %%%
If train accuracy is much higher than test accuracy, increase alpha (more regularisation). If both are low, decrease alpha or use a larger network.
Step 8: Comparing Loss Curves Across Architectures
%%%python fig, ax = plt.subplots(figsize=(10, 5)) for layers in [(8,), (16, 8), (32, 16, 8)]: mlp = MLPClassifier( hidden_layer_sizes=layers, max_iter=500, random_state=42 ) mlp.fit(X_train_scaled, y_train) ax.plot(mlp.loss_curve_, label=str(layers), linewidth=2)
ax.set_xlabel("Epoch") ax.set_ylabel("Loss") ax.set_title("Loss Curves for Different Architectures") ax.legend() ax.grid(alpha=0.3) plt.tight_layout() plt.show() %%%
Larger networks typically converge faster (steeper initial drop) but may overfit if the dataset is small.
Step 9: Making Predictions on New Data
%%%python
Predict the lithology for a new well-log reading
GR=75 API, RHOB=2.50 g/cc, NPHI=30%
new_sample = np.array([[75, 2.50, 30]])
IMPORTANT: apply the same scaler
new_scaled = scaler.transform(new_sample) prediction = mlp.predict(new_scaled) proba = mlp.predict_proba(new_scaled)
class_names = ["Shale", "Sandstone", "Limestone"] print(f"Predicted class: {class_names[prediction[0]]}") print(f"Class probabilities:") for name, p in zip(class_names, proba[0]): print(f" {name}: {p:.3f}") %%%
The predict_proba method returns the probability of each class. This is useful for identifying uncertain predictions (e.g., when no class has probability above 0.5, the model is unsure).
[Library: scikit-learn MLPClassifier]
References
- Pedregosa, F., et al. (2011). Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830.
- Géron, A. (2022). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (3rd ed.), ch. 10 (introduction to artificial neural networks with Keras). O’Reilly.
- Goodfellow, I., Bengio, Y., Courville, A. (2016). Deep Learning, ch. 6 (feedforward networks). MIT Press.