Auto-encoder Practice

Chapter 16: Representation Learning with Auto-encoders

Learning objectives

Build a simple auto-encoder in Keras using the functional API
Trace through auto-encoder architecture and parameter counts
Implement a denoising auto-encoder
Interpret reconstruction results and latent space representations
Apply auto-encoder embeddings for anomaly detection
Compare auto-encoder features with PCA

Building an Auto-encoder with the Keras Functional API

The Keras functional API is preferred for auto-encoders because it allows us to define separate encoder and decoder models that share weights. This is important because after training, we often want to use the encoder alone (for dimensionality reduction) or the decoder alone (for generation).

%%%python import numpy as np import matplotlib.pyplot as plt from tensorflow.keras.layers import Input, Dense from tensorflow.keras.models import Model

Define dimensions

input_dim = 100 # e.g., 100 seismic attributes encoding_dim = 10 # compress to 10 dimensions (10:1 compression)

=== ENCODER ===

Input -> Dense(64, relu) -> Dense(32, relu) -> Dense(10, relu)

input_layer = Input(shape=(input_dim,)) encoded = Dense(64, activation="relu")(input_layer) encoded = Dense(32, activation="relu")(encoded) bottleneck = Dense(encoding_dim, activation="relu")(encoded)

=== DECODER ===

Dense(32, relu) -> Dense(64, relu) -> Dense(100, sigmoid)

decoded = Dense(32, activation="relu")(bottleneck) decoded = Dense(64, activation="relu")(decoded) output_layer = Dense(input_dim, activation="sigmoid")(decoded)

Full auto-encoder: input -> bottleneck -> output

autoencoder = Model(input_layer, output_layer)

Separate encoder model (for extracting compressed features)

encoder = Model(input_layer, bottleneck)

autoencoder.compile(optimizer="adam", loss="mse") autoencoder.summary() %%%

Understanding the Functional API

Input(shape=(input_dim,)): Creates a placeholder tensor for the input. Unlike Sequential, this does not add to a stack — it creates a named tensor node in the computation graph.

Dense(64, activation='relu')(input_layer): Creates a Dense layer AND applies it to input_layer. The double parentheses: first creates the layer, second calls it on the input.

Model(input_layer, output_layer): Defines a model by specifying its input and output tensors. Keras traces the computation graph between them automatically.

encoder = Model(input_layer, bottleneck): Defines a sub-model from input to the bottleneck. This encoder shares weights with the full auto-encoder — training one updates both.

Architecture Walkthrough

The auto-encoder has a symmetric structure: Input(100) $\to$ Dense(64, relu) $\to$ Dense(32, relu) $\to$ Dense(10, relu) $\to$ Dense(32, relu) $\to$ Dense(64, relu) $\to$ Dense(100, sigmoid). The bottleneck forces a 10:1 compression, meaning the network must learn to represent 100 features using only 10 latent variables.

%%%python

Parameter count walkthrough

print("Layer-by-layer parameter count:") print(f" Input(100) -> Dense(64): {10064+64} = {10064+64}") print(f" Dense(64) -> Dense(32): {6432+32} = {6432+32}") print(f" Dense(32) -> Dense(10): {3210+10} = {3210+10}") print(f" Dense(10) -> Dense(32): {1032+32} = {1032+32}") print(f" Dense(32) -> Dense(64): {3264+64} = {3264+64}") print(f" Dense(64) -> Dense(100): {64100+100} = {64100+100}") total = (10064+64) + (6432+32) + (3210+10) + (1032+32) + (3264+64) + (64100+100) print(f" Total: {total}")

print(f"\nCompression ratio: {input_dim}:{encoding_dim} = {input_dim//encoding_dim}:1") print(f"Bottleneck captures {encoding_dim} latent features from {input_dim} inputs") %%%

Training the Auto-encoder

The auto-encoder is self-supervised: the target is the input itself. The model learns to compress and reconstruct by minimizing the reconstruction error (MSE).

%%%python

Simulated data: normalized well log attributes

np.random.seed(42) X_train = np.random.rand(5000, input_dim).astype("float32") X_test = np.random.rand(1000, input_dim).astype("float32")

Training: input = target (self-supervised)

history = autoencoder.fit( X_train, X_train, # Input = Target! epochs=50, batch_size=64, shuffle=True, validation_split=0.2 )

print("\nFinal training loss:", round(history.history["loss"][-1], 5)) print("Final validation loss:", round(history.history["val_loss"][-1], 5)) %%%

Plotting the Reconstruction Loss Curve

The loss curve shows how well the auto-encoder learns to reconstruct its input over training. A steadily decreasing curve that plateaus indicates good convergence. If validation loss diverges from training loss, the model is overfitting.

%%%python plt.figure(figsize=(10, 5)) plt.plot(history.history["loss"], label="Training Loss") plt.plot(history.history["val_loss"], label="Validation Loss") plt.xlabel("Epoch") plt.ylabel("MSE Loss") plt.title("Auto-encoder Reconstruction Loss") plt.legend() plt.grid(True, alpha=0.3) plt.tight_layout() plt.show()

Check for overfitting

train_loss = history.history["loss"][-1] val_loss = history.history["val_loss"][-1] ratio = val_loss / train_loss print(f"Val/Train loss ratio: {ratio:.2f}") if ratio > 1.5: print("WARNING: Possible overfitting (ratio > 1.5)") else: print("OK: No significant overfitting") %%%

Using the Encoder for Dimensionality Reduction

After training, the encoder portion maps high-dimensional data to the compact latent space. These latent features can be used for visualization, clustering, or as input to other classifiers.

%%%python

Extract compressed representations

encoded_train = encoder.predict(X_train) # shape: (5000, 10) encoded_test = encoder.predict(X_test) # shape: (1000, 10)

print("Original shape: ", X_train.shape) print("Compressed shape:", encoded_train.shape) print("Compression ratio:", X_train.shape[1], "->", encoded_train.shape[1])

Visualize latent space distributions

fig, axes = plt.subplots(2, 5, figsize=(18, 6)) for i, ax in enumerate(axes.flat): ax.hist(encoded_train[:, i], bins=30, color="steelblue", alpha=0.7) ax.set_title(f"Latent dim {i+1}") ax.set_ylabel("Count") plt.suptitle("Distribution of Each Latent Dimension", fontsize=14) plt.tight_layout() plt.show()

Use encoder output as features for classification

from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import cross_val_score

Assuming y_train has class labels

scores = cross_val_score(RandomForestClassifier(100), encoded_train, y_train, cv=5)

print(f"RF on auto-encoder features: {scores.mean():.4f}")

%%%

Evaluating Reconstruction Quality

%%%python

Get reconstructions

reconstructed = autoencoder.predict(X_test)

Overall MSE

mse = np.mean((X_test - reconstructed)**2) print(f"Overall reconstruction MSE: {mse:.6f}")

Per-sample reconstruction error

sample_errors = np.mean((X_test - reconstructed)**2, axis=1) print(f"Mean per-sample error: {sample_errors.mean():.6f}") print(f"Std per-sample error: {sample_errors.std():.6f}") print(f"Max per-sample error: {sample_errors.max():.6f}")

Compare original vs reconstructed for a single sample

sample_idx = 0 plt.figure(figsize=(14, 4))

plt.subplot(1, 3, 1) plt.plot(X_test[sample_idx], linewidth=0.8) plt.title("Original") plt.ylabel("Value")

plt.subplot(1, 3, 2) plt.plot(reconstructed[sample_idx], color="red", linewidth=0.8) plt.title("Reconstructed")

plt.subplot(1, 3, 3) plt.plot(X_test[sample_idx] - reconstructed[sample_idx], color="gray", linewidth=0.8) plt.axhline(0, color="black", linewidth=0.5) plt.title("Residual (Original - Reconstructed)")

plt.suptitle("Auto-encoder Reconstruction Quality") plt.tight_layout() plt.show() %%%

Denoising Auto-encoder

A denoising auto-encoder (DAE) deliberately adds noise to the input but trains the model to reconstruct the clean original. This forces the network to learn robust features rather than simply memorizing the identity function. In geoscience, DAEs are valuable for cleaning noisy well logs and seismic traces.

%%%python

Add Gaussian noise to training data

noise_factor = 0.3 X_train_noisy = X_train + noise_factor * np.random.normal( loc=0.0, scale=1.0, size=X_train.shape ) X_train_noisy = np.clip(X_train_noisy, 0.0, 1.0) # keep in [0,1]

Also create noisy test data

X_test_noisy = X_test + noise_factor * np.random.normal( loc=0.0, scale=1.0, size=X_test.shape ) X_test_noisy = np.clip(X_test_noisy, 0.0, 1.0)

Visualize clean vs noisy

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 4)) ax1.plot(X_train[0], linewidth=0.8) ax1.set_title("Clean Signal") ax2.plot(X_train_noisy[0], linewidth=0.8, color="orange") ax2.set_title(f"Noisy Signal (noise_factor={noise_factor})") plt.suptitle("Well Log: Clean vs Noisy") plt.tight_layout() plt.show()

Build a fresh DAE (same architecture)

dae_input = Input(shape=(input_dim,)) dae_enc = Dense(64, activation="relu")(dae_input) dae_enc = Dense(32, activation="relu")(dae_enc) dae_bn = Dense(encoding_dim, activation="relu")(dae_enc) dae_dec = Dense(32, activation="relu")(dae_bn) dae_dec = Dense(64, activation="relu")(dae_dec) dae_out = Dense(input_dim, activation="sigmoid")(dae_dec)

denoiser = Model(dae_input, dae_out) denoiser.compile(optimizer="adam", loss="mse")

Train: noisy input -> clean target

history_dae = denoiser.fit( X_train_noisy, X_train, # KEY: noisy input, clean target epochs=50, batch_size=64, validation_data=(X_test_noisy, X_test) )

print("\nDAE trained to reconstruct clean data from noisy input") %%%

Denoising Results Visualization

%%%python

Denoise test data

X_denoised = denoiser.predict(X_test_noisy)

Compare: noisy vs denoised vs original

sample_idx = 5 fig, axes = plt.subplots(1, 3, figsize=(16, 4))

axes[0].plot(X_test_noisy[sample_idx], color="orange", linewidth=0.8) axes[0].set_title("Noisy Input") axes[0].set_ylabel("Value")

axes[1].plot(X_denoised[sample_idx], color="green", linewidth=0.8) axes[1].set_title("Denoised Output")

axes[2].plot(X_test[sample_idx], color="blue", linewidth=0.8) axes[2].set_title("Original (Ground Truth)")

plt.suptitle("Denoising Auto-encoder: Noisy -> Denoised -> Original") plt.tight_layout() plt.show()

Quantitative comparison

mse_noisy = np.mean((X_test - X_test_noisy)**2) mse_denoised = np.mean((X_test - X_denoised)**2) print(f"MSE (noisy vs original): {mse_noisy:.6f}") print(f"MSE (denoised vs original): {mse_denoised:.6f}") print(f"Noise reduction: {(1 - mse_denoised/mse_noisy)*100:.1f}%") %%%

Anomaly Detection with Auto-encoders

Auto-encoders trained on "normal" data will reconstruct normal patterns well (low error) but fail to reconstruct anomalous patterns (high error). The reconstruction error serves as an anomaly score.

%%%python

Compute reconstruction errors for all test samples

reconstructed = autoencoder.predict(X_test) sample_errors = np.mean((X_test - reconstructed)**2, axis=1)

Set threshold: mean + 3*std (99.7% of normal data)

mean_err = sample_errors.mean() std_err = sample_errors.std() threshold = mean_err + 3 * std_err

print(f"Mean reconstruction error: {mean_err:.6f}") print(f"Std reconstruction error: {std_err:.6f}") print(f"Anomaly threshold (mean + 3*std): {threshold:.6f}")

Flag anomalies

anomalies = sample_errors > threshold print(f"\nAnomalies detected: {sum(anomalies)} / {len(anomalies)}")

Visualize the error distribution

plt.figure(figsize=(10, 5)) plt.hist(sample_errors, bins=50, color="steelblue", alpha=0.7, edgecolor="black") plt.axvline(threshold, color="red", linestyle="--", linewidth=2, label=f"Threshold = {threshold:.4f}") plt.xlabel("Reconstruction Error (MSE)") plt.ylabel("Count") plt.title("Reconstruction Error Distribution - Anomaly Detection") plt.legend() plt.tight_layout() plt.show() %%%

Comparing Auto-encoder Embeddings with PCA

Both PCA and auto-encoders perform dimensionality reduction, but PCA is limited to linear projections while auto-encoders can learn non-linear mappings. Let us compare the quality of their embeddings.

%%%python from sklearn.decomposition import PCA from sklearn.preprocessing import StandardScaler

PCA reduction to same dimensionality

pca = PCA(n_components=encoding_dim) X_pca = pca.fit_transform(X_test)

Auto-encoder reduction

X_ae = encoder.predict(X_test)

print("Embedding comparison:") print(f" PCA shape: {X_pca.shape}") print(f" Auto-encoder shape: {X_ae.shape}")

Reconstruction quality comparison

PCA reconstruction

X_pca_recon = pca.inverse_transform(X_pca) mse_pca = np.mean((X_test - X_pca_recon)**2)

AE reconstruction

X_ae_recon = autoencoder.predict(X_test) mse_ae = np.mean((X_test - X_ae_recon)**2)

print(f"\nReconstruction MSE:") print(f" PCA: {mse_pca:.6f}") print(f" Auto-encoder: {mse_ae:.6f}") if mse_ae < mse_pca: print(" Auto-encoder achieves better reconstruction (non-linear advantage)") else: print(" PCA achieves better reconstruction (may need more AE training)")

PCA explained variance

print(f"\nPCA explained variance: {sum(pca.explained_variance_ratio_):.2%}") %%%

Geoscience Applications

Well Log Denoising

Well log measurements contain noise from borehole conditions, tool calibration, and environmental effects. A denoising auto-encoder trained on clean log intervals can remove noise while preserving the geological signal. This is especially useful for resistivity and acoustic logs that are sensitive to borehole quality.

Seismic Anomaly Detection

Train an auto-encoder on a large volume of "normal" background seismic data. At monitoring time, new seismic traces with high reconstruction error indicate anomalous patterns — possibly injection-induced seismicity, unusual fault slip, or equipment malfunction. The threshold can be calibrated to control the false alarm rate.

Feature Learning for Classification

The encoder bottleneck provides a compact representation of the data that can be fed into traditional classifiers (Random Forest, SVM). This is especially useful when the original feature space is very high-dimensional (e.g., 500+ seismic attributes) and the auto-encoder learns to extract the most informative features.

References

Abadi, M., et al. (2016). TensorFlow: A system for large-scale machine learning. OSDI, 265–283.
Géron, A. (2022). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (3rd ed.), ch. 17 (autoencoders, GANs, and diffusion models). O’Reilly.
Goodfellow, I., Bengio, Y., Courville, A. (2016). Deep Learning, ch. 14 (autoencoders). MIT Press.