\( \def\R{{\mathbb{R}}} \def\L{{\mathcal{L}}} \def\x{{\times}} \)
Guest lecture / 2019-11-04
Intro to Data Science, Fall 2019 @ CCNY
Tom Sercu - homepage - twitter - github.
This guest lecture - Preface - Main slides - Figure - lab (github)
Recapping part 1 (pdf)
Object recognition
Speech recognition
Machine Translation
"simple" Input->Output ML problems!
Common sense
Somewhat based on https://campus.datacamp.com/courses/deep-learning-in-python
I've been using PyTorch a few months now and I've never felt better. I have more energy. My skin is clearer. My eye sight has improved.
— Andrej Karpathy (@karpathy) May 26, 2017
“ What I cannot create,
I do not understand ”
Richard Feyman
This is all of ML:
$$\arg\min_\theta \L(\theta)$$Find argmin by taking little steps $\alpha$ along :
$$\nabla_\theta \L(\theta)$$$$\theta \gets \theta - \alpha \nabla_\theta \L(\theta)$$
Oops \(\nabla_\theta \L(\theta)\) is expensive, sums over all data.
Ok instead of \(\L (\theta) = \sum_{x,y \in D} \ell(x,y; \theta) \)
Let us use \(\L^{mb} (\theta) = \sum_{x,y \in mb} \ell(x,y; \theta) \)
\(\L^{mb} (\theta) \) is the loss for one minibatch.
Compute \( \nabla_\theta \L^{mb} (\theta) \) by chain rule:
reverse the computation graph.