Auto-encoder Practice
Learning objectives
- Build a simple auto-encoder in Keras using the functional API
- Trace through auto-encoder architecture and parameter counts
- Implement a denoising auto-encoder
- Interpret reconstruction results and latent space representations
- Apply auto-encoder embeddings for anomaly detection
- Compare auto-encoder features with PCA
Building an Auto-encoder with the Keras Functional API
The Keras functional API is preferred for auto-encoders because it allows us to define separate encoder and decoder models that share weights. This is important because after training, we often want to use the encoder alone (for dimensionality reduction) or the decoder alone (for generation).
%%%python import numpy as np import matplotlib.pyplot as plt from tensorflow.keras.layers import Input, Dense from tensorflow.keras.models import Model
Define dimensions
input_dim = 100 # e.g., 100 seismic attributes encoding_dim = 10 # compress to 10 dimensions (10:1 compression)
=== ENCODER ===
Input -> Dense(64, relu) -> Dense(32, relu) -> Dense(10, relu)
input_layer = Input(shape=(input_dim,)) encoded = Dense(64, activation="relu")(input_layer) encoded = Dense(32, activation="relu")(encoded) bottleneck = Dense(encoding_dim, activation="relu")(encoded)
=== DECODER ===
Dense(32, relu) -> Dense(64, relu) -> Dense(100, sigmoid)
decoded = Dense(32, activation="relu")(bottleneck) decoded = Dense(64, activation="relu")(decoded) output_layer = Dense(input_dim, activation="sigmoid")(decoded)
Full auto-encoder: input -> bottleneck -> output
autoencoder = Model(input_layer, output_layer)
Separate encoder model (for extracting compressed features)
encoder = Model(input_layer, bottleneck)
autoencoder.compile(optimizer="adam", loss="mse") autoencoder.summary() %%%
Understanding the Functional API
Input(shape=(input_dim,)): Creates a placeholder tensor for the input. Unlike Sequential, this does not add to a stack — it creates a named tensor node in the computation graph.
Dense(64, activation='relu')(input_layer): Creates a Dense layer AND applies it to input_layer. The double parentheses: first creates the layer, second calls it on the input.
Model(input_layer, output_layer): Defines a model by specifying its input and output tensors. Keras traces the computation graph between them automatically.
encoder = Model(input_layer, bottleneck): Defines a sub-model from input to the bottleneck. This encoder shares weights with the full auto-encoder — training one updates both.
Architecture Walkthrough
The auto-encoder has a symmetric structure: Input(100) Dense(64, relu) Dense(32, relu) Dense(10, relu) Dense(32, relu) Dense(64, relu) Dense(100, sigmoid). The bottleneck forces a 10:1 compression, meaning the network must learn to represent 100 features using only 10 latent variables.
%%%python
Parameter count walkthrough
print("Layer-by-layer parameter count:") print(f" Input(100) -> Dense(64): {10064+64} = {10064+64}") print(f" Dense(64) -> Dense(32): {6432+32} = {6432+32}") print(f" Dense(32) -> Dense(10): {3210+10} = {3210+10}") print(f" Dense(10) -> Dense(32): {1032+32} = {1032+32}") print(f" Dense(32) -> Dense(64): {3264+64} = {3264+64}") print(f" Dense(64) -> Dense(100): {64100+100} = {64100+100}") total = (10064+64) + (6432+32) + (3210+10) + (1032+32) + (3264+64) + (64100+100) print(f" Total: {total}")
print(f"\nCompression ratio: {input_dim}:{encoding_dim} = {input_dim//encoding_dim}:1") print(f"Bottleneck captures {encoding_dim} latent features from {input_dim} inputs") %%%
Training the Auto-encoder
The auto-encoder is self-supervised: the target is the input itself. The model learns to compress and reconstruct by minimizing the reconstruction error (MSE).
%%%python
Simulated data: normalized well log attributes
np.random.seed(42) X_train = np.random.rand(5000, input_dim).astype("float32") X_test = np.random.rand(1000, input_dim).astype("float32")
Training: input = target (self-supervised)
history = autoencoder.fit( X_train, X_train, # Input = Target! epochs=50, batch_size=64, shuffle=True, validation_split=0.2 )
print("\nFinal training loss:", round(history.history["loss"][-1], 5)) print("Final validation loss:", round(history.history["val_loss"][-1], 5)) %%%
Plotting the Reconstruction Loss Curve
The loss curve shows how well the auto-encoder learns to reconstruct its input over training. A steadily decreasing curve that plateaus indicates good convergence. If validation loss diverges from training loss, the model is overfitting.
%%%python plt.figure(figsize=(10, 5)) plt.plot(history.history["loss"], label="Training Loss") plt.plot(history.history["val_loss"], label="Validation Loss") plt.xlabel("Epoch") plt.ylabel("MSE Loss") plt.title("Auto-encoder Reconstruction Loss") plt.legend() plt.grid(True, alpha=0.3) plt.tight_layout() plt.show()
Check for overfitting
train_loss = history.history["loss"][-1] val_loss = history.history["val_loss"][-1] ratio = val_loss / train_loss print(f"Val/Train loss ratio: {ratio:.2f}") if ratio > 1.5: print("WARNING: Possible overfitting (ratio > 1.5)") else: print("OK: No significant overfitting") %%%
Using the Encoder for Dimensionality Reduction
After training, the encoder portion maps high-dimensional data to the compact latent space. These latent features can be used for visualization, clustering, or as input to other classifiers.
%%%python
Extract compressed representations
encoded_train = encoder.predict(X_train) # shape: (5000, 10) encoded_test = encoder.predict(X_test) # shape: (1000, 10)
print("Original shape: ", X_train.shape) print("Compressed shape:", encoded_train.shape) print("Compression ratio:", X_train.shape[1], "->", encoded_train.shape[1])
Visualize latent space distributions
fig, axes = plt.subplots(2, 5, figsize=(18, 6)) for i, ax in enumerate(axes.flat): ax.hist(encoded_train[:, i], bins=30, color="steelblue", alpha=0.7) ax.set_title(f"Latent dim {i+1}") ax.set_ylabel("Count") plt.suptitle("Distribution of Each Latent Dimension", fontsize=14) plt.tight_layout() plt.show()
Use encoder output as features for classification
from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import cross_val_score
Assuming y_train has class labels
scores = cross_val_score(RandomForestClassifier(100), encoded_train, y_train, cv=5)
print(f"RF on auto-encoder features: {scores.mean():.4f}")
%%%
Evaluating Reconstruction Quality
%%%python
Get reconstructions
reconstructed = autoencoder.predict(X_test)
Overall MSE
mse = np.mean((X_test - reconstructed)**2) print(f"Overall reconstruction MSE: {mse:.6f}")
Per-sample reconstruction error
sample_errors = np.mean((X_test - reconstructed)**2, axis=1) print(f"Mean per-sample error: {sample_errors.mean():.6f}") print(f"Std per-sample error: {sample_errors.std():.6f}") print(f"Max per-sample error: {sample_errors.max():.6f}")
Compare original vs reconstructed for a single sample
sample_idx = 0 plt.figure(figsize=(14, 4))
plt.subplot(1, 3, 1) plt.plot(X_test[sample_idx], linewidth=0.8) plt.title("Original") plt.ylabel("Value")
plt.subplot(1, 3, 2) plt.plot(reconstructed[sample_idx], color="red", linewidth=0.8) plt.title("Reconstructed")
plt.subplot(1, 3, 3) plt.plot(X_test[sample_idx] - reconstructed[sample_idx], color="gray", linewidth=0.8) plt.axhline(0, color="black", linewidth=0.5) plt.title("Residual (Original - Reconstructed)")
plt.suptitle("Auto-encoder Reconstruction Quality") plt.tight_layout() plt.show() %%%
Denoising Auto-encoder
A denoising auto-encoder (DAE) deliberately adds noise to the input but trains the model to reconstruct the clean original. This forces the network to learn robust features rather than simply memorizing the identity function. In geoscience, DAEs are valuable for cleaning noisy well logs and seismic traces.
%%%python
Add Gaussian noise to training data
noise_factor = 0.3 X_train_noisy = X_train + noise_factor * np.random.normal( loc=0.0, scale=1.0, size=X_train.shape ) X_train_noisy = np.clip(X_train_noisy, 0.0, 1.0) # keep in [0,1]
Also create noisy test data
X_test_noisy = X_test + noise_factor * np.random.normal( loc=0.0, scale=1.0, size=X_test.shape ) X_test_noisy = np.clip(X_test_noisy, 0.0, 1.0)
Visualize clean vs noisy
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 4)) ax1.plot(X_train[0], linewidth=0.8) ax1.set_title("Clean Signal") ax2.plot(X_train_noisy[0], linewidth=0.8, color="orange") ax2.set_title(f"Noisy Signal (noise_factor={noise_factor})") plt.suptitle("Well Log: Clean vs Noisy") plt.tight_layout() plt.show()
Build a fresh DAE (same architecture)
dae_input = Input(shape=(input_dim,)) dae_enc = Dense(64, activation="relu")(dae_input) dae_enc = Dense(32, activation="relu")(dae_enc) dae_bn = Dense(encoding_dim, activation="relu")(dae_enc) dae_dec = Dense(32, activation="relu")(dae_bn) dae_dec = Dense(64, activation="relu")(dae_dec) dae_out = Dense(input_dim, activation="sigmoid")(dae_dec)
denoiser = Model(dae_input, dae_out) denoiser.compile(optimizer="adam", loss="mse")
Train: noisy input -> clean target
history_dae = denoiser.fit( X_train_noisy, X_train, # KEY: noisy input, clean target epochs=50, batch_size=64, validation_data=(X_test_noisy, X_test) )
print("\nDAE trained to reconstruct clean data from noisy input") %%%
Denoising Results Visualization
%%%python
Denoise test data
X_denoised = denoiser.predict(X_test_noisy)
Compare: noisy vs denoised vs original
sample_idx = 5 fig, axes = plt.subplots(1, 3, figsize=(16, 4))
axes[0].plot(X_test_noisy[sample_idx], color="orange", linewidth=0.8) axes[0].set_title("Noisy Input") axes[0].set_ylabel("Value")
axes[1].plot(X_denoised[sample_idx], color="green", linewidth=0.8) axes[1].set_title("Denoised Output")
axes[2].plot(X_test[sample_idx], color="blue", linewidth=0.8) axes[2].set_title("Original (Ground Truth)")
plt.suptitle("Denoising Auto-encoder: Noisy -> Denoised -> Original") plt.tight_layout() plt.show()
Quantitative comparison
mse_noisy = np.mean((X_test - X_test_noisy)**2) mse_denoised = np.mean((X_test - X_denoised)**2) print(f"MSE (noisy vs original): {mse_noisy:.6f}") print(f"MSE (denoised vs original): {mse_denoised:.6f}") print(f"Noise reduction: {(1 - mse_denoised/mse_noisy)*100:.1f}%") %%%
Anomaly Detection with Auto-encoders
Auto-encoders trained on "normal" data will reconstruct normal patterns well (low error) but fail to reconstruct anomalous patterns (high error). The reconstruction error serves as an anomaly score.
%%%python
Compute reconstruction errors for all test samples
reconstructed = autoencoder.predict(X_test) sample_errors = np.mean((X_test - reconstructed)**2, axis=1)
Set threshold: mean + 3*std (99.7% of normal data)
mean_err = sample_errors.mean() std_err = sample_errors.std() threshold = mean_err + 3 * std_err
print(f"Mean reconstruction error: {mean_err:.6f}") print(f"Std reconstruction error: {std_err:.6f}") print(f"Anomaly threshold (mean + 3*std): {threshold:.6f}")
Flag anomalies
anomalies = sample_errors > threshold print(f"\nAnomalies detected: {sum(anomalies)} / {len(anomalies)}")
Visualize the error distribution
plt.figure(figsize=(10, 5)) plt.hist(sample_errors, bins=50, color="steelblue", alpha=0.7, edgecolor="black") plt.axvline(threshold, color="red", linestyle="--", linewidth=2, label=f"Threshold = {threshold:.4f}") plt.xlabel("Reconstruction Error (MSE)") plt.ylabel("Count") plt.title("Reconstruction Error Distribution - Anomaly Detection") plt.legend() plt.tight_layout() plt.show() %%%
Comparing Auto-encoder Embeddings with PCA
Both PCA and auto-encoders perform dimensionality reduction, but PCA is limited to linear projections while auto-encoders can learn non-linear mappings. Let us compare the quality of their embeddings.
%%%python from sklearn.decomposition import PCA from sklearn.preprocessing import StandardScaler
PCA reduction to same dimensionality
pca = PCA(n_components=encoding_dim) X_pca = pca.fit_transform(X_test)
Auto-encoder reduction
X_ae = encoder.predict(X_test)
print("Embedding comparison:") print(f" PCA shape: {X_pca.shape}") print(f" Auto-encoder shape: {X_ae.shape}")
Reconstruction quality comparison
PCA reconstruction
X_pca_recon = pca.inverse_transform(X_pca) mse_pca = np.mean((X_test - X_pca_recon)**2)
AE reconstruction
X_ae_recon = autoencoder.predict(X_test) mse_ae = np.mean((X_test - X_ae_recon)**2)
print(f"\nReconstruction MSE:") print(f" PCA: {mse_pca:.6f}") print(f" Auto-encoder: {mse_ae:.6f}") if mse_ae < mse_pca: print(" Auto-encoder achieves better reconstruction (non-linear advantage)") else: print(" PCA achieves better reconstruction (may need more AE training)")
PCA explained variance
print(f"\nPCA explained variance: {sum(pca.explained_variance_ratio_):.2%}") %%%
Geoscience Applications
Well Log Denoising
Well log measurements contain noise from borehole conditions, tool calibration, and environmental effects. A denoising auto-encoder trained on clean log intervals can remove noise while preserving the geological signal. This is especially useful for resistivity and acoustic logs that are sensitive to borehole quality.
Seismic Anomaly Detection
Train an auto-encoder on a large volume of "normal" background seismic data. At monitoring time, new seismic traces with high reconstruction error indicate anomalous patterns — possibly injection-induced seismicity, unusual fault slip, or equipment malfunction. The threshold can be calibrated to control the false alarm rate.
Feature Learning for Classification
The encoder bottleneck provides a compact representation of the data that can be fed into traditional classifiers (Random Forest, SVM). This is especially useful when the original feature space is very high-dimensional (e.g., 500+ seismic attributes) and the auto-encoder learns to extract the most informative features.
References
- Abadi, M., et al. (2016). TensorFlow: A system for large-scale machine learning. OSDI, 265–283.
- Géron, A. (2022). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (3rd ed.), ch. 17 (autoencoders, GANs, and diffusion models). O’Reilly.
- Goodfellow, I., Bengio, Y., Courville, A. (2016). Deep Learning, ch. 14 (autoencoders). MIT Press.