STAT 157, Spring 19
Table Of Contents
STAT 157, Spring 19
Table Of Contents

Introduction to Deep Learning

STAT 157, UC Berkeley, Spring, 2019

Practical information

Instructors Alex Smola and Mu Li
TAs Rachel Hu and Ryan Theisen
Lectures Every Tuesday/Thursday 3:30pm - 5:00pm
Location LeConte 3
Instructor Office Hours Thursday, 1:00 - 3:00pm
Location 337 Evans Hall
Contact instructors and TAs
Course discussion
Grading Policy Homework 30%, Midterm 20%, Project 50%


  • 3/18: Added solutions to homework 5.
  • 3/15: Added slides/videos for lecture 3/14, with solutions to homework 3 and 4.
  • 3/13: Added slides/videos for lectures on 3/12, including midterm exam logistics
  • 3/5: Added midterm presentations
  • 3/5: Added homework 6.
  • 3/3: Added slides and (re-recorded) videos for lectures before 3/3.
  • 2/21: Added slides for 2/19 and 2/21
  • 2/19: Added homework 5.
  • 2/16: Added slides for 2/12 and 2/14
  • 2/12: Added homework 4.
  • 2/11: Added solution for homework 2.
  • 2/6: Added homework 3, and solution for homework 1. Uploaded slides for 2/5.
  • 1/31: Slighly updated homework 2, uploaded slides for lecture 2 & 3.
  • 1/29: Homework 2 is uploaded.
  • 1/26: Videos for Lecture 1 are up (they’re not the original - we had to re-record them due to technical issues).
  • 1/24: Slides, Videos, and Notebooks for Lecture 2 are up.
  • 1/22: Updated course information, uploaded slides for today’s lecture and homework 1.


This class provides a practical introduction to deep learning, including theoretical motivations and how to implement it in practice. As part of the course we will cover multilayer perceptrons, backpropagation, automatic differentiation, and stochastic gradient descent. Moreover, we introduce convolutional networks for image processing, starting from the simple LeNet to more recent architectures such as ResNet for highly accurate models. Secondly, we discuss sequence models and recurrent networks, such as LSTMs, GRU, and the attention mechanism. Throughout the course we emphasize efficient implementation, optimization and scalability, e.g. to multiple GPUs and to multiple machines. The goal of the course is to provide both a good understanding and good ability to build modern nonparametric estimators. The entire course is based on Jupyter notebooks to allow students to gain experience quickly. Supporting material can be found at


Programming in Python (CS 61a or CS/STAT C8 and CS 88), Linear Algebra (MATH 54, STAT 89A, or EE 16A), Probability (STAT 134, STAT 140, or EE 126), and Statistics (STAT 20, STAT 135, or CS/STAT C100) are highly desirable. Eqivalent knowledge is fine, and we will try to make the class as self-contained as possible. This is a class where you need to get your hands dirty with programming.

Course Format

The course consists of 2 units of 90 minutes, taught by the instructors, plus office hours by the instructors and TAs. Evaluation is based on a midterm exam (20%), homework (30%), and a research project (50%) which will be presented in lieu of an end-of-course exam. As part of the course you will be performing work that’s similar to research leading up to a paper.