Denoising with CNNs

Part 9, Machine Learning in Processing

Learning objectives

Describe the U-Net architecture commonly used for CNN seismic denoising
Contrast supervised (N2Clean) and self-supervised (N2N) training regimes
Explain the amplitude-preservation trade-off CNN denoisers make
Identify the failure modes that QC must catch

CNN-based denoising is the most widely deployed ML technique in seismic processing. A convolutional network ingests a noisy 2D patch (trace × offset or trace × time) and outputs a "clean" version. Trained on millions of patches with known ground truth, the network learns a signal manifold and projects noisy inputs onto it. For random noise + coherent noise + near-surface multiples, modern CNN denoisers routinely outperform f-x deconvolution by 3-5 dB SNR while running at throughputs 10-100× faster.

1. U-Net, the workhorse architecture

Nearly all production seismic denoisers are based on U-Net: an encoder-decoder CNN with skip connections between corresponding encoder/decoder layers. The architecture:

Encoder: 3-5 blocks, each with (conv → ReLU → conv → ReLU → downsample). Captures progressively coarser features.
Bottleneck: 1-2 conv blocks at the coarsest resolution. Represents global context.
Decoder: mirror of encoder, with upsample and concat from the corresponding encoder layer (skip connection). Reconstructs fine detail.
Final layer: 1×1 conv mapping to the output channels (usually a single channel for grayscale seismic patches).

Typical parameters: 5-10 million. Patch size: 64×64 or 128×128 samples. Loss: L1 or L2 between prediction and clean ground truth.

Four traces stacked: the clean reference (teal), a noisy input contaminated with both random noise and a coherent low-frequency swell component (yellow), the output of a classical 5-tap moving-average filter (orange), and a simulated CNN output (pink). Each trace is labelled, and the info strip reports all three SNRs. The moving-average suppresses high-frequency random noise modestly but does nothing against coherent low-frequency swell. The CNN-like denoiser "sees" that the swell is not valid signal (because it was not part of its training manifold) and removes it while preserving the reflectors.

Real CNN denoisers are not implemented in-widget (a trained network requires gigabytes of data); this widget shows the outcome of training, which is what a practitioner cares about.

3. Training regimes

N2Clean (supervised). Train on pairs (noisy, clean). Requires clean ground truth, which is rare in real seismic. Usually clean comes from high-fold stacks, carefully processed "hero" datasets, or synthetic models. Gold standard when available.
N2N (Noise2Noise, self-supervised). Train on pairs (noisy_A, noisy_B), two independent noisy realisations of the same underlying signal. The network cannot learn to reproduce noise (it differs between A and B) so it converges on the shared signal. Works when two realisations exist (two surveys over the same location, two halves of a single long record). Lehtinen et al. (2018), Mousavi et al. (2018).
N2Self / N2Void (blind-spot). Train on noisy data alone with a blind-spot masking strategy: the network sees a masked pixel's neighbours but not the pixel itself, forcing it to predict from context. Useful when no paired data exists.
GAN-based. An adversarial discriminator pushes the denoiser output toward the distribution of real clean seismic. Higher perceptual quality but sometimes invents plausible-looking but wrong details.

4. Amplitude fidelity, the QI caveat

CNN denoisers model a typical seismic signal distribution. An AVO anomaly (high amplitude at a reservoir) is by definition atypical. Over-aggressive CNN denoising can attenuate AVO anomalies as "outliers" from the signal manifold. Several mitigations:

Residual-domain denoising. Apply the CNN to (noisy − classically-denoised); add the result back. Residual has limited dynamic range so amplitude regression is bounded.
Amplitude constraint in the loss. Penalise absolute amplitude changes of reflectors during training, not just pixel-wise error.
Post-hoc amplitude calibration. Scale CNN output by a per-trace RMS-matching factor to restore absolute amplitudes.
Section-by-section QC. Run the constant-R test (§7.2) on the denoised data. If it is no longer flat, the CNN is distorting amplitudes.

5. Common failure modes

Over-smoothing. The CNN is too aggressive and removes legitimate signal it considers anomalous. Symptoms: output looks beautiful but shallow reservoirs (bright spots) are flattened. Fix: reduce denoising strength or reduce network depth.
Hallucination. The network synthesises plausible-looking but false signal where only noise existed. Symptoms: output shows pristine reflectors in zones where the data had no actual signal. Fix: check uncertainty metrics (Monte Carlo dropout) or switch to less aggressive architecture.
Domain shift. Model trained on North Sea data applied to Gulf of Mexico data. Symptoms: inconsistent performance, poor QC on the new survey. Fix: fine-tune with a subset of target-survey data.
Artifacts at patch boundaries. The network processes 64×64 patches; small inconsistencies at patch boundaries become visible as a grid pattern. Fix: overlap patches and blend in the overlap zone.

6. Typical deployment

Training. Build a synthetic dataset (millions of synthetic traces + noise models); train U-Net for ~24-48 GPU-hours.
Fine-tuning. Take a small labelled subset from the target survey; fine-tune the pre-trained network (~2-4 GPU-hours).
Application. Slide the trained network over the field data in overlapping patches; blend outputs.
QC. Amplitude check, spectral check, visual inspection of difference volume (noisy − denoised). Any remaining coherent signal in the difference is a failure mode.
Calibration. Match output RMS to input at selected reference horizons; this preserves absolute amplitude for downstream QI.

**The one sentence to remember**

CNN denoising (typically U-Net) learns the seismic signal manifold from noisy-clean training pairs and outperforms classical f-x deconvolution by 3-5 dB on production data, at the cost of requiring careful QC to catch amplitude-regression and hallucination failures.

Where this goes next

§9.3 covers ML-based trace reconstruction, filling in missing traces or reconstructing badly-sampled gathers using similar U-Net architectures and training strategies.

References

Yilmaz, Ö. (2001). Seismic Data Analysis (2 vols.). SEG.
Oppenheim, A. V., Schafer, R. W. (2009). Discrete-Time Signal Processing (3rd ed.). Prentice Hall.
Claerbout, J. F. (1976). Fundamentals of Geophysical Data Processing. McGraw-Hill.