Interpolation / reconstruction

Part 9 — Machine Learning in Processing

Learning objectives

  • Describe the trace-interpolation problem and typical sources of missing data
  • Compare linear-fill, sparse-Radon, and CNN-based reconstruction
  • Identify the gap-size-to-wavelength constraint that limits any interpolation
  • Recognise when ML interpolation is worth the training cost

Real seismic data is rarely complete. Dead channels, bad shots, navigation gaps, de-cable failures, infrastructure shadows, and planned under-sampling all leave missing traces in otherwise useful gathers. Downstream processing (migration, multiple elimination, AVO) assumes regular sampling; missing traces introduce aliasing, amplitude drops, or failed algorithm convergence. Trace interpolation fills the gaps.

1. The spectrum of interpolators

  • Linear / nearest-trace fill. Interpolate each sample from the nearest live neighbours. Cheap and robust; blurs curved events because it assumes locally straight kinematics.
  • Sparse Radon interpolation. Fit events to a parabolic or hyperbolic Radon basis, then resample to the missing trace positions. Handles curved events well if the events are sparse.
  • f-x prediction / spitz interpolation. Assumes each frequency slice is locally predictable along x; uses an auto-regressive filter. Good for dense events; struggles with sparse events.
  • Curvelet / seislet / POCS. Threshold in a transform domain designed to make seismic events sparse; inverse-transform with missing positions filled. State-of-the-art classical method.
  • CNN-based (U-Net). Train a network to map (gather_with_gaps, gather_true) pairs. Learns geological prior from data.

2. The widget

Ml Interp DemoInteractive figure — enable JavaScript to interact.

48-trace gather with two hyperbolic events. Slider sets the fraction of traces zeroed (5–70 %). Three panels side-by-side: input with gaps, linear fill, CNN-like reconstruction. The info strip reports RMS reconstruction error against the ground-truth gather for both methods.

At 30 % gaps, linear fill introduces visible "comb" artefacts — zero-amplitude stripes at the gap positions become short low-amplitude segments. The CNN output reconstructs the hyperbolic events almost to ground truth. As gap fraction climbs above 50 %, both methods degrade; at 60–70 % even the CNN struggles because it runs out of neighbourhood context.

3. The gap-size-to-wavelength constraint

Any interpolator can only reconstruct events that are adequately sampled in the surviving traces. If the trace spacing is Δx\Delta x and the fastest apparent event has horizontal wavelength λx\lambda_x, the sampling criterion Δxλx/2\Delta x \leq \lambda_x / 2 (Nyquist) must be preserved by the surviving traces. Losing ~50 % of traces effectively doubles Δx\Delta x; events with λx\lambda_x near the original Nyquist become aliased beyond recovery.

ML interpolation bends but does not break this rule: it can recover aliased events that match its training distribution, but it cannot fabricate signal from nothing. Use interpolation to fill small gaps, not to substitute for proper acquisition.

4. Training-data strategy

  • Synthetic gathers with known "true" ground truth; apply synthetic gap patterns; train end-to-end.
  • Real dense gathers, synthetically gap-ify; this captures real noise and wavelet statistics.
  • Self-supervised (Masked Autoencoder): hide random traces, train the network to predict them from context alone. No ground truth needed beyond the original data.
  • Domain-specific pre-training + fine-tune: generalist trained on global data + small local fine-tune.

5. Production deployment

  • Regularisation step in the processing flow. Reconstruct to a regular grid once, use the filled gather downstream. Same output as classical interpolation but higher fidelity.
  • Adaptive integration with migration. Some flows do ML-reconstructed traces only within the migration aperture where they affect the output; outside the aperture, leave gaps as is to save cost.
  • Uncertainty flagging. Tag reconstructed traces with lower weights in downstream processing; flag the fact that these are interpolated, not observed.

6. Failure modes to QC

  • Event mis-location. CNN shifts an event slightly off its correct position. Symptom: time shift at reconstructed traces relative to neighbours. Fix: overlap patches; cross-validate with held-out traces.
  • Hallucination. CNN adds a plausible event that was never in the data. Symptom: new reflectors appearing in the reconstructed gather that aren't in adjacent gathers. Fix: check spatial consistency.
  • Amplitude distortion. Reconstructed amplitudes don't match trend. Fix: post-hoc amplitude balancing to match neighbours.
  • Domain shift. Model trained on hyperbolic events applied to a field with faults produces wrong geometry across fault planes. Fix: domain-specific fine-tuning.
**The one sentence to remember**

CNN trace interpolation outperforms linear fill and sparse-Radon for realistic gap patterns by learning the geological prior from training data; it fails at gaps larger than half a wavelength and needs uncertainty QC to catch event mis-location and hallucinations.

Where this goes next

§9.4 covers first-break picking — an arrival-time estimation task that has become the canonical success story for ML in seismic processing. Production networks now pick first breaks 100× faster than humans at comparable accuracy, freeing interpreter time for higher-value work.

References

  • Yilmaz, Ö. (2001). Seismic Data Analysis (2 vols.). SEG.
  • Claerbout, J. F. (1976). Fundamentals of Geophysical Data Processing. McGraw-Hill.
  • Oppenheim, A. V., Schafer, R. W. (2009). Discrete-Time Signal Processing (3rd ed.). Prentice Hall.

This page is prerendered for SEO and accessibility. The interactive widgets above hydrate on JavaScript load.