James Arambam


AIL7022/AIL722: Reinforcement Learning

Course Overview: Reinforcement Learning (RL) is a core area of machine learning focused on how intelligent agents learn to make decisions through interaction with an environment. This course will provide a comprehensive introduction to the principles, algorithms, and applications of reinforcement learning, with an emphasis on both theoretical foundations and practical implementation. Students will learn how agents can optimize long-term rewards by balancing exploration and exploitation, modeling problems using Markov Decision Processes (MDPs), and applying value-based and policy-based learning techniques.

Note: The course is currently offered under two different course IDs.
AIL722: Reinforcement Learning (3 Credit) - For old students only.
AIL7022: Reinforcement Learning (4 Credit) - For new students only.

Grading Scheme (AIL722 - 3 Credit): Minor - 30%, Major - 30%, Assignments - 30%, Quizes - 10%.

Grading Scheme (AIL7022 - 4 Credit): Minor - 30%, Major - 30%, Assignments - 40%, Quizes - 10%.

Prerequisites:: Basic knowledge of Probability and Statistics.

Attendance Policy: Institute default (<75% attendance leads to grade being lowered by one).

Audit Pass Criteria: Marks equivalent to B- or higher, plus >=75% attendance.

Lecture Hall & Time: TBA

Office Hours: By appointment only.

Class Communication: Moodle.

Tentative List of Topics:

Week No. Lecture Dates Module Topics
1-a - Introduction Course Logistics; Motivation; Connection to Psychology and Neuroscience; Sequential Decision-Making Problem; The RL Problem; Key Challenges
1-b - Planning Problem Deterministic Decision Processes; Markov Decision Process; Partially Observable MDP; Value Functions
2 - Planning Problem Planning by Dynamic Programming - Value Iteration; Policy Iteration; Monte-Carlo Tree Search (MCTS)
3 - Monte-Carlo (MC) Methods MC Prediction; MC Control; Off-policy Prediction via Importance Sampling; Off-policy MC Control
4 - Temporal Difference (TD) Methods TD Prediction; SARSA; Expected SARSA; Off-policy Q-Learning; Q-Learning Convergence - Contraction Mapping, Banach”s Fixed-Point Theorem
5-a - Temporal Difference (TD) Methods Fitted Q-Learning; Double Q-Learning; n-step TD prediction; n-step SARSA
5-b, 6-a - Approximate Prediction and Control Value Function Approximation; Linear Methods; Tile Coding; Non-Linear methods; Off-policy Divergence; The Deadly Triad;
6-b - Eligibility Traces Forward and Backward View; Lambda-Return; TD-Lambda; SARSA-Lambda
7 - Policy Gradient Methods Stochastic and Deterministic Policy Gradient; Natural Policy Gradient; REINFORCE
8 - Policy Gradient Methods Actor-Critic (AC); A2C, A3C, DDPG; TRPO; PPO; SAC
9 - RL as Probabilistic Inference Graphical Model for Decision-Making; Policy Search as Probabilistic Inference; Maximum Entropy RL
10 - Offline RL Motivation; Distributional Shift; Policy Constraints; Implicit Q-Learning; Conservative Q-Learning
11 - Model-Based RL (MBRL) Model Learning: Planning with Models; MBRL via Policy Gradient; Latent Space Models; Dyna; Dreamer
12 - Bandits Multi-Arm Bandits; Contextual Bandits; Applications
13 - RL for Training LLMs RL with Human Feedback (RLHF); Preference based learning - Direct preference optimization (DPO), Reward-aware preference optimization (RPO), Group Relative Policy Optimization (GRPO).
14 - RL Applications RL for Real-World Applications and Case Studies


Reference Books :

  1. Sutton and Barto, Reinforcement Learning. Second Edition, MIT Press 2018 [PDF]
  2. Dimitri Bertsekas, Neuro-dynamic Programming, Athena Scientific 1996 [PDF]
  3. Shie Mannor, Yishay Mansour and Aviv Tamar, Reinforcement Learning: Foundations [PDF]
  4. Csaba Szepesvari, Algorithms for Reinforcement Learning [PDF]
  5. Sergey Levine, Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review [PDF]