Chapter 2 Quiz: Big Data & ML Overview
Learning objectives
- Test your understanding of big data concepts, ML pipelines, and NumPy/Pandas basics
This quiz covers both the lecture material and lab exercises from Chapter 2.
Key Concepts Review
- Four V's of Big Data: Volume, Velocity, Variety, Veracity.
- ML Pipeline: Data Collection Preprocessing Feature Engineering Model Training Evaluation & Deployment.
- Train/Val/Test Split: Training fits parameters; validation tunes hyperparameters; test gives unbiased final estimate.
- NumPy: Broadcasting rules,
shape, slicing with[:,1]. - Pandas:
pd.read_csv(), boolean filtering withdf[df[col] > val].
References
- Hastie, T., Tibshirani, R., Friedman, J. (2009). The Elements of Statistical Learning (2nd ed.), ch. 7. Springer.
- James, G., Witten, D., Hastie, T., Tibshirani, R. (2021). An Introduction to Statistical Learning (2nd ed.), ch. 2 & 5. Springer.
- Harris, C.R., et al. (2020). Array programming with NumPy. Nature 585, 357–362.