Chapter 3 Quiz: Optimization Methods

Chapter 3: Numerical Optimization for Learning

Learning objectives

Test your understanding of gradient descent, optimization concepts, and implementation

This quiz covers both the lecture material and lab exercises from Chapter 3.

Gradient: $\nabla f$ points in the direction of steepest ascent. Update rule: $x \leftarrow x - \alpha \nabla f$ .
Learning Rate: Too large $\to$ divergence; too small $\to$ slow convergence.
Batch vs SGD vs Mini-batch: Batch = all data; SGD = 1 sample; Mini-batch = small subset (e.g., 32).
Loss Functions: MSE for regression, cross-entropy for classification.
Convergence Criteria: $|\nabla f| < \epsilon$ or $|f_k - f_{k-1}| < \epsilon$ .

Goodfellow, I., Bengio, Y., Courville, A. (2016). Deep Learning, ch. 4 & 8. MIT Press.
Bishop, C.M. (2006). Pattern Recognition and Machine Learning, ch. 5.3. Springer.
Murphy, K.P. (2022). Probabilistic Machine Learning: An Introduction, ch. 8. MIT Press.