OgbonLab

Free, open, interactive textbooks for geoscience and applied mathematics. Browse the full library:

OgbonLab

Chapter 2 Quiz: Big Data & ML Overview

Chapter 12: Big Data Pipelines in Earth Science

Learning objectives

Test your understanding of big data concepts, ML pipelines, and NumPy/Pandas basics

This quiz covers both the lecture material and lab exercises from Chapter 2.

Key Concepts Review

Four V's of Big Data: Volume, Velocity, Variety, Veracity.
ML Pipeline: Data Collection $\to$ Preprocessing $\to$ Feature Engineering $\to$ Model Training $\to$ Evaluation & Deployment.
Train/Val/Test Split: Training fits parameters; validation tunes hyperparameters; test gives unbiased final estimate.
NumPy: Broadcasting rules, shape, slicing with [:,1].
Pandas: pd.read_csv(), boolean filtering with df[df[col] > val].

References

Hastie, T., Tibshirani, R., Friedman, J. (2009). The Elements of Statistical Learning (2nd ed.), ch. 7. Springer.
James, G., Witten, D., Hastie, T., Tibshirani, R. (2021). An Introduction to Statistical Learning (2nd ed.), ch. 2 & 5. Springer.
Harris, C.R., et al. (2020). Array programming with NumPy. Nature 585, 357-362.