Interpolation / reconstruction
Learning objectives
- Describe the trace-interpolation problem and typical sources of missing data
- Compare linear-fill, sparse-Radon, and CNN-based reconstruction
- Identify the gap-size-to-wavelength constraint that limits any interpolation
- Recognise when ML interpolation is worth the training cost
Real seismic data is rarely complete. Dead channels, bad shots, navigation gaps, de-cable failures, infrastructure shadows, and planned under-sampling all leave missing traces in otherwise useful gathers. Downstream processing (migration, multiple elimination, AVO) assumes regular sampling; missing traces introduce aliasing, amplitude drops, or failed algorithm convergence. Trace interpolation fills the gaps.
1. The spectrum of interpolators
- Linear / nearest-trace fill. Interpolate each sample from the nearest live neighbours. Cheap and robust; blurs curved events because it assumes locally straight kinematics.
- Sparse Radon interpolation. Fit events to a parabolic or hyperbolic Radon basis, then resample to the missing trace positions. Handles curved events well if the events are sparse.
- f-x prediction / spitz interpolation. Assumes each frequency slice is locally predictable along x; uses an auto-regressive filter. Good for dense events; struggles with sparse events.
- Curvelet / seislet / POCS. Threshold in a transform domain designed to make seismic events sparse; inverse-transform with missing positions filled. State-of-the-art classical method.
- CNN-based (U-Net). Train a network to map (gather_with_gaps, gather_true) pairs. Learns geological prior from data.
2. The widget
48-trace gather with two hyperbolic events. Slider sets the fraction of traces zeroed (5–70 %). Three panels side-by-side: input with gaps, linear fill, CNN-like reconstruction. The info strip reports RMS reconstruction error against the ground-truth gather for both methods.
At 30 % gaps, linear fill introduces visible "comb" artefacts — zero-amplitude stripes at the gap positions become short low-amplitude segments. The CNN output reconstructs the hyperbolic events almost to ground truth. As gap fraction climbs above 50 %, both methods degrade; at 60–70 % even the CNN struggles because it runs out of neighbourhood context.
3. The gap-size-to-wavelength constraint
Any interpolator can only reconstruct events that are adequately sampled in the surviving traces. If the trace spacing is and the fastest apparent event has horizontal wavelength , the sampling criterion (Nyquist) must be preserved by the surviving traces. Losing ~50 % of traces effectively doubles ; events with near the original Nyquist become aliased beyond recovery.
ML interpolation bends but does not break this rule: it can recover aliased events that match its training distribution, but it cannot fabricate signal from nothing. Use interpolation to fill small gaps, not to substitute for proper acquisition.
4. Training-data strategy
- Synthetic gathers with known "true" ground truth; apply synthetic gap patterns; train end-to-end.
- Real dense gathers, synthetically gap-ify; this captures real noise and wavelet statistics.
- Self-supervised (Masked Autoencoder): hide random traces, train the network to predict them from context alone. No ground truth needed beyond the original data.
- Domain-specific pre-training + fine-tune: generalist trained on global data + small local fine-tune.
5. Production deployment
- Regularisation step in the processing flow. Reconstruct to a regular grid once, use the filled gather downstream. Same output as classical interpolation but higher fidelity.
- Adaptive integration with migration. Some flows do ML-reconstructed traces only within the migration aperture where they affect the output; outside the aperture, leave gaps as is to save cost.
- Uncertainty flagging. Tag reconstructed traces with lower weights in downstream processing; flag the fact that these are interpolated, not observed.
6. Failure modes to QC
- Event mis-location. CNN shifts an event slightly off its correct position. Symptom: time shift at reconstructed traces relative to neighbours. Fix: overlap patches; cross-validate with held-out traces.
- Hallucination. CNN adds a plausible event that was never in the data. Symptom: new reflectors appearing in the reconstructed gather that aren't in adjacent gathers. Fix: check spatial consistency.
- Amplitude distortion. Reconstructed amplitudes don't match trend. Fix: post-hoc amplitude balancing to match neighbours.
- Domain shift. Model trained on hyperbolic events applied to a field with faults produces wrong geometry across fault planes. Fix: domain-specific fine-tuning.
CNN trace interpolation outperforms linear fill and sparse-Radon for realistic gap patterns by learning the geological prior from training data; it fails at gaps larger than half a wavelength and needs uncertainty QC to catch event mis-location and hallucinations.
Where this goes next
§9.4 covers first-break picking — an arrival-time estimation task that has become the canonical success story for ML in seismic processing. Production networks now pick first breaks 100× faster than humans at comparable accuracy, freeing interpreter time for higher-value work.
References
- Yilmaz, Ö. (2001). Seismic Data Analysis (2 vols.). SEG.
- Claerbout, J. F. (1976). Fundamentals of Geophysical Data Processing. McGraw-Hill.
- Oppenheim, A. V., Schafer, R. W. (2009). Discrete-Time Signal Processing (3rd ed.). Prentice Hall.