CNN Practice

Chapter 14: Convolutional Networks for Spatial Data

Learning objectives

Read and understand Keras/TensorFlow code for building CNNs
Trace through model.summary() output to verify dimensions
Interpret training output (loss curves, accuracy)
Apply CNN code patterns to geoscience image classification
Implement data augmentation for small datasets
Understand transfer learning concepts

Preparing Image Data for a CNN

Before building a CNN, the image data must be properly prepared. The most critical steps are: (1) reshape images into the expected 4D tensor (batch, height, width, channels), (2) normalize pixel values to the range $[0, 1]$ by dividing by 255, and (3) encode labels appropriately.

%%%python import numpy as np from tensorflow.keras.utils import to_categorical

Suppose we have rock thin section images

X_raw shape: (1000, 64, 64, 3) - 1000 RGB images, 64x64 pixels

y_raw: integer labels 0-4 (sandstone, limestone, shale, granite, basalt)

Step 1: Normalize pixel values to [0, 1]

X_normalized = X_raw.astype("float32") / 255.0

print("Before normalization - min:", X_raw.min(), "max:", X_raw.max()) print("After normalization - min:", X_normalized.min(), "max:", X_normalized.max()) print("Shape:", X_normalized.shape, "dtype:", X_normalized.dtype)

Step 2: One-hot encode labels for categorical_crossentropy

y_onehot = to_categorical(y_raw, num_classes=5) print("\nLabel shape:", y_onehot.shape) print("Example label (class 2):", y_onehot[y_raw == 2][0])

Step 3: Train/test split

from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split( X_normalized, y_onehot, test_size=0.2, random_state=42, stratify=y_raw ) print(f"\nTrain: {X_train.shape[0]}, Test: {X_test.shape[0]}") %%%

Building a CNN Step by Step with Keras Sequential

The Sequential model lets us stack layers in order. A typical CNN pattern is: Conv2D $\to$ MaxPooling2D (repeated 2-3 times for feature extraction) followed by Flatten $\to$ Dense $\to$ Dropout $\to$ Dense(softmax) for classification.

%%%python from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense from tensorflow.keras.layers import Dropout, BatchNormalization

Build the model layer by layer

model = Sequential()

=== BLOCK 1: First convolutional block ===

Conv2D(filters, kernel_size, activation, input_shape)

32 filters, each 3x3, ReLU activation

input_shape only needed for the FIRST layer

model.add(Conv2D(32, (3, 3), activation="relu", input_shape=(64, 64, 3)))

Output: (62, 62, 32) - spatial dims shrink by (kernel-1)

model.add(MaxPooling2D(pool_size=(2, 2)))

Output: (31, 31, 32) - spatial dims halved

=== BLOCK 2: Second convolutional block ===

model.add(Conv2D(64, (3, 3), activation="relu"))

Output: (29, 29, 64)

model.add(MaxPooling2D(pool_size=(2, 2)))

Output: (14, 14, 64)

=== BLOCK 3: Third convolutional block ===

model.add(Conv2D(128, (3, 3), activation="relu"))

Output: (12, 12, 128)

model.add(MaxPooling2D(pool_size=(2, 2)))

Output: (6, 6, 128)

=== CLASSIFICATION HEAD ===

model.add(Flatten()) # (6, 6, 128) -> (4608,) model.add(Dense(128, activation="relu")) # Fully connected model.add(Dropout(0.5)) # Prevent overfitting model.add(Dense(5, activation="softmax")) # 5 rock type classes

model.summary() %%%

Understanding Each Layer Type

Conv2D(32, (3, 3), activation='relu', input_shape=(64, 64, 3)): Creates 32 learnable filters, each $3 \times 3 \times 3$ (3 for RGB channels). Each filter slides across the image, computing a dot product at each position. ReLU activation introduces non-linearity. Parameters: $32 \times (3 \times 3 \times 3 + 1) = 896$ .

MaxPooling2D(pool_size=(2, 2)): Takes the maximum value in each $2 \times 2$ window, reducing spatial dimensions by half. This provides translation invariance and reduces computation. No learnable parameters.

Flatten(): Reshapes the 3D feature maps (height $\times$ width $\times$ channels) into a 1D vector for the dense layers. No learnable parameters.

Dense(128, activation='relu'): A fully connected layer with 128 neurons. Every neuron connects to every input. This is where the model combines spatial features for classification.

Dropout(0.5): During training, randomly sets 50% of inputs to zero. This prevents co-adaptation of neurons and reduces overfitting. During inference, all neurons are active (outputs are scaled).

Dense(5, activation='softmax'): Output layer for 5-class classification. Softmax converts raw scores (logits) to probabilities that sum to 1. The predicted class is the one with the highest probability.

Compiling the Model

Before training, we must compile the model by specifying the optimizer, loss function, and metrics. These choices depend on the task type.

%%%python

Compile the model

model.compile( optimizer="adam", # Adaptive learning rate optimizer loss="categorical_crossentropy", # For one-hot encoded labels metrics=["accuracy"] )

Alternative: if labels are integers (not one-hot), use:

model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])

print("Loss function: categorical_crossentropy") print(" - Use when labels are one-hot encoded: [0, 1, 0, 0, 0]") print(" - Alternative: sparse_categorical_crossentropy for integer labels: 1") print("\nOptimizer: Adam") print(" - Combines momentum + RMSProp") print(" - Default learning rate: 0.001") print(" - Usually works well without tuning") %%%

Training the Model

The model.fit call trains the model. Key parameters control training duration, batch processing, and validation monitoring.

%%%python

Train the model

history = model.fit( X_train, y_train, epochs=20, # Number of full passes through training data batch_size=32, # Process 32 images at a time validation_split=0.2 # Reserve 20% of training data for validation )

history.history contains training metrics per epoch

print("\nKeys in history:", list(history.history.keys())) print("Final train accuracy:", history.history["accuracy"][-1]) print("Final val accuracy: ", history.history["val_accuracy"][-1]) %%%

Interpreting Training Output

During training, Keras prints metrics for each epoch. Here is how to read them:

%%%python

Typical training output:

Epoch 1/20 - loss: 1.5423 - accuracy: 0.3210 - val_loss: 1.4012 - val_accuracy: 0.3850

Epoch 5/20 - loss: 0.8124 - accuracy: 0.6540 - val_loss: 0.9235 - val_accuracy: 0.5920

Epoch 10/20 - loss: 0.4215 - accuracy: 0.8320 - val_loss: 0.6128 - val_accuracy: 0.7450

Epoch 20/20 - loss: 0.1523 - accuracy: 0.9480 - val_loss: 0.5834 - val_accuracy: 0.7820

What to look for:

GOOD: both loss and val_loss decrease; accuracy and val_accuracy increase

OVERFITTING: train loss decreases but val_loss increases after some epoch

UNDERFITTING: both losses are high and not decreasing

print("Signs of overfitting:") print(" - Training accuracy >> validation accuracy (e.g., 0.95 vs 0.78)") print(" - Training loss << validation loss") print(" - val_loss starts increasing while train loss continues decreasing") %%%

Plotting Training vs Validation Curves

Visualizing the training history is the single most important diagnostic for deep learning. The gap between training and validation curves reveals overfitting, underfitting, and the optimal number of epochs.

%%%python fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

Loss curves

ax1.plot(history.history["loss"], label="Training Loss") ax1.plot(history.history["val_loss"], label="Validation Loss") ax1.set_xlabel("Epoch") ax1.set_ylabel("Loss") ax1.set_title("Loss Curves") ax1.legend() ax1.grid(True, alpha=0.3)

Accuracy curves

ax2.plot(history.history["accuracy"], label="Training Accuracy") ax2.plot(history.history["val_accuracy"], label="Validation Accuracy") ax2.set_xlabel("Epoch") ax2.set_ylabel("Accuracy") ax2.set_title("Accuracy Curves") ax2.legend() ax2.grid(True, alpha=0.3)

plt.suptitle("CNN Training History", fontsize=14) plt.tight_layout() plt.show()

Find best epoch (lowest val_loss)

best_epoch = np.argmin(history.history["val_loss"]) + 1 print(f"Best epoch: {best_epoch}") print(f"Best val_loss: {min(history.history["val_loss"]):.4f}") print(f"Best val_accuracy: {history.history["val_accuracy"][best_epoch-1]:.4f}") %%%

Data Augmentation with ImageDataGenerator

In geoscience, labeled image datasets are often small (hundreds, not thousands). Data augmentation artificially increases the effective training set by applying random transformations: rotation, flip, shift, zoom. This is one of the most effective strategies against overfitting with limited data.

%%%python from tensorflow.keras.preprocessing.image import ImageDataGenerator

Create an augmentation generator

datagen = ImageDataGenerator( rotation_range=20, # Random rotation up to 20 degrees width_shift_range=0.1, # Horizontal shift up to 10% height_shift_range=0.1, # Vertical shift up to 10% horizontal_flip=True, # Random horizontal flip vertical_flip=True, # For microscope images, vertical flip is valid zoom_range=0.1, # Random zoom up to 10% fill_mode="nearest" # Fill new pixels with nearest value )

Train with augmented data

datagen.flow generates augmented batches on the fly

history_aug = model.fit( datagen.flow(X_train, y_train, batch_size=32), epochs=30, validation_data=(X_test, y_test), # Validation data is NOT augmented steps_per_epoch=len(X_train) // 32 )

print("\nNote: validation data should NEVER be augmented.") print("Augmentation only applies to training to increase diversity.") %%%

Early Stopping to Prevent Overfitting

%%%python from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

Stop training when val_loss stops improving

early_stop = EarlyStopping( monitor="val_loss", # Watch validation loss patience=5, # Wait 5 epochs for improvement restore_best_weights=True # Revert to best weights )

Save the best model during training

checkpoint = ModelCheckpoint( "best_model.keras", monitor="val_loss", save_best_only=True )

history = model.fit( X_train, y_train, epochs=100, # Set high; early stopping will cut it short batch_size=32, validation_split=0.2, callbacks=[early_stop, checkpoint] )

n_epochs = len(history.history["loss"]) print(f"Training stopped at epoch {n_epochs}") best_vl = min(history.history["val_loss"]) print(f"Best val_loss: {best_vl:.4f}") %%%

Transfer Learning Concept

Transfer learning uses a model pre-trained on a large dataset (like ImageNet with millions of images) as a starting point for your task. The early layers of a CNN learn generic features (edges, textures, shapes) that transfer well to new domains, including geoscience images.

%%%python from tensorflow.keras.applications import VGG16 from tensorflow.keras.layers import GlobalAveragePooling2D

Load VGG16 pre-trained on ImageNet, without the top classification layers

base_model = VGG16(weights="imagenet", include_top=False, input_shape=(64, 64, 3))

Freeze the pre-trained layers (do not update their weights)

base_model.trainable = False

Build new classification head

model_transfer = Sequential([ base_model, GlobalAveragePooling2D(), # Reduces spatial dims to 1D Dense(128, activation="relu"), Dropout(0.5), Dense(5, activation="softmax") # 5 rock types ])

model_transfer.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"]) model_transfer.summary()

print("\nTrainable parameters:", model_transfer.count_params() - base_model.count_params()) print("Frozen parameters: ", base_model.count_params()) print("\nTransfer learning is especially valuable when:") print(" - Your dataset is small (< 1000 images)") print(" - Your images share visual features with ImageNet (textures, shapes)") %%%

Geoscience Applications

Rock Thin Section Classification

Microscope images of rock thin sections (plane-polarized and cross-polarized light) contain rich textural information: grain size, shape, sorting, cementation, and mineral composition. A CNN can learn to classify rock types (sandstone, limestone, shale, granite, basalt) from these images with accuracy rivaling trained geologists.

Seismic Facies Classification

2D patches extracted from seismic sections can be classified into facies (e.g., channel fill, levee, mass transport deposit). The CNN learns to recognize amplitude patterns, reflection geometry, and lateral continuity.

Core Photo Analysis

Photographs of drill core can be classified by lithology, fracture density, or alteration intensity. Data augmentation is especially important here since core photo datasets are typically small.

References

Abadi, M., Barham, P., Chen, J., et al. (2016). TensorFlow: A system for large-scale machine learning. OSDI, 265–283.
Géron, A. (2022). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (3rd ed.), ch. 14 (deep computer vision using CNNs). O’Reilly.
Goodfellow, I., Bengio, Y., Courville, A. (2016). Deep Learning, ch. 9 (convolutional networks). MIT Press.