Chapter 3 Quiz: Optimization Methods

Chapter 3: Numerical Optimization for Learning

Learning objectives

  • Test your understanding of gradient descent, optimization concepts, and implementation

This quiz covers both the lecture material and lab exercises from Chapter 3.

Key Concepts Review

  • Gradient: f\nabla f points in the direction of steepest ascent. Update rule: xxαfx \leftarrow x - \alpha \nabla f.
  • Learning Rate: Too large \to divergence; too small \to slow convergence.
  • Batch vs SGD vs Mini-batch: Batch = all data; SGD = 1 sample; Mini-batch = small subset (e.g., 32).
  • Loss Functions: MSE for regression, cross-entropy for classification.
  • Convergence Criteria: f<ϵ|\nabla f| < \epsilon or fkfk1<ϵ|f_k - f_{k-1}| < \epsilon.

References

  • Goodfellow, I., Bengio, Y., Courville, A. (2016). Deep Learning, ch. 4 & 8. MIT Press.
  • Bishop, C.M. (2006). Pattern Recognition and Machine Learning, ch. 5.3. Springer.
  • Murphy, K.P. (2022). Probabilistic Machine Learning: An Introduction, ch. 8. MIT Press.

This page is prerendered for SEO and accessibility. The interactive widgets above hydrate on JavaScript load.