This course covers several foundational topics in online learning and sequential decision making under uncertainty, a subject on the intersection of algorithms, machine learning, and operations research. Such problems have wide applications in online advertising, recommendation systems, crowdsourcing, revenue management, etc. In this course, we will study the problems that usually feature the tension between how to collect data and utilize the data to make optimal sequential decisions (a.k.a. the exploration and exploitation dilemma). We cover both fundamental results and research frontiers. We focus on algorithmic results, and introduce lower bounds as well.

This course assumes background in basic probability theory. linear algebra, and algorithm design and analysis at an undergraduate level. Key mathematical concepts will be reviewed before they are used, but a certain level of mathematical maturity is expected.

Disclaimer: The lecture notes are scribed by the students from the class, and have not been proofread by the lecturer.

- Lecture 01 and 02: the Central Limit Theorem and Tail Bounds
- Lecture 03: the Multi-Armed Bandit Problem
- Lecture 04: Pure Exploration Algorithms for MAB Problem
- Lecture 05: Expectation of Sample Complexity and Improved SR
- Lecture 06: the lil'UCB Algorithm
- Lecture 07: MAB Lower Bounds (Part I)
- Lecture 08: MAB Lower Bounds (Part II)
- Lecture 09: MAB Lower Bounds (Part III) and Adversarial Bandit
- Lecture 10: MAB Lower Bounds (Part III) and Adversarial Bandit
- Lecture 11: Stochastic Contextual Bandit
- Lecture 12: Linear Contextual Bandit (Part I)
- Lecture 13: Linear Contextual Bandit (Part II)
- Lecture 14: Linear Contextual Bandit Lower Bounds
- Lecture 15: Introduction to Reinforcement Learning
- Lecture 16: Value Iteration, Policy Iteration and Policy Gradient
- Lecture 17: Model Based Reinforcement Learning
- Lecture 18: Tighter Regret for Model Based Reinforcement Learning
- Lecture 19: The OLIVE Algorithm for Learning MDPs with Low Bellman Rank