# Introduction to Deep Learning¶

**STAT 157, UC Berkeley, Spring, 2019**

## Practical information¶

Instructors | Alex Smola and Mu Li |

TAs | Rachel Hu and Ryan Theisen |

Lectures | Every Tuesday/Thursday 3:30pm - 5:00pm |

Location | LeConte 3 |

Instructor Office Hours | Thursday, 1:00 - 3:00pm |

Location | 337 Evans Hall |

Contact instructors and TAs | berkeley-stat-157@googlegroups.com |

Course discussion | https://discuss.mxnet.io/c/courses |

Grading Policy | Homework 30%, Midterm 20%, Project 50% |

## News¶

- 3/18: Added solutions to homework 5.
- 3/15: Added slides/videos for lecture 3/14, with solutions to homework 3 and 4.
- 3/13: Added slides/videos for lectures on 3/12, including midterm exam logistics
- 3/5: Added midterm presentations
- 3/5: Added homework 6.
- 3/3: Added slides and (re-recorded) videos for lectures before 3/3.
- 2/21: Added slides for 2/19 and 2/21
- 2/19: Added homework 5.
- 2/16: Added slides for 2/12 and 2/14
- 2/12: Added homework 4.
- 2/11: Added solution for homework 2.
- 2/6: Added homework 3, and solution for homework 1. Uploaded slides for 2/5.
- 1/31: Slighly updated homework 2, uploaded slides for lecture 2 & 3.
- 1/29: Homework 2 is uploaded.
- 1/26: Videos for Lecture 1 are up (they’re not the original - we had to re-record them due to technical issues).
- 1/24: Slides, Videos, and Notebooks for Lecture 2 are up.
- 1/22: Updated course information, uploaded slides for today’s lecture and homework 1.

## Overview¶

This class provides a practical introduction to deep learning, including theoretical motivations and how to implement it in practice. As part of the course we will cover multilayer perceptrons, backpropagation, automatic differentiation, and stochastic gradient descent. Moreover, we introduce convolutional networks for image processing, starting from the simple LeNet to more recent architectures such as ResNet for highly accurate models. Secondly, we discuss sequence models and recurrent networks, such as LSTMs, GRU, and the attention mechanism. Throughout the course we emphasize efficient implementation, optimization and scalability, e.g. to multiple GPUs and to multiple machines. The goal of the course is to provide both a good understanding and good ability to build modern nonparametric estimators. The entire course is based on Jupyter notebooks to allow students to gain experience quickly. Supporting material can be found at https://d2l.ai.

## Prerequisites¶

Programming in Python (CS 61a or CS/STAT C8 and CS 88), Linear Algebra (MATH 54,
STAT 89A, or EE 16A), Probability (STAT 134, STAT 140, or EE 126), and
Statistics (STAT 20, STAT 135, or CS/STAT C100) are highly
desirable. Eqivalent knowledge is fine, and we will try to make the
class as self-contained as possible. *This is a class where you need
to get your hands dirty with programming.*

## Course Format¶

The course consists of 2 units of 90 minutes, taught by the
instructors, plus office hours by the instructors and TAs. Evaluation
is based on a **midterm exam** (20%), **homework** (30%), and a **research
project** (50%) which will be presented in lieu of an end-of-course
exam. As part of the course you will be performing work that’s similar
to research leading up to a paper.