James Arambam


AIL8027: Advanced Reinforcement Learning (4 Credit)

AIL821: Special Topics in Machine Learning (3 Credit)

Course Overview: In this course, we will delve into the intricacies of reinforcement learning (RL) by exploring the advanced topics in the field. RL, a sprawling research area, holds promise for applications in diverse real-world domains such as robotics, autonomous driving, smart transportation, finance, supply-chain logistics, training LLMs, games etc. However, the challenges we confront in these domains often do not align with ideal conditions, necessitating a departure from simply applying our preferred off-the-shelf RL algorithms. For instance, we may encounter scenarios with multiple learning agents in the environment, sparse reward structures, multiple dynamic goals, incorporating constraints in the policy optimization etc. We will also cover recent crucial applications of RL in training LLMs.

Learning Outcome: Develop a comprehensive understanding of a wide range of sophisticated tools and techniques in the field of reinforcement learning. This will empower students to effectively tackle complex problem scenarios that may arise in real-world applications. Gain insight into the latest innovative concepts and pioneering research directions in the field. Students will also explore open research challenges that exist on the cutting edge of this rapidly evolving field, equipping them with the knowledge to contribute to future advancements.

Note: The course is currently offered under two different course IDs.
AIL821: Special Topics in Machine Learning (3 Credit) - For old students only.
AIL8027: Advanced Reinforcement Learning (4 Credit) - For new students only.

Grading Scheme (AIL821 - 3 Credit): Minor - 30%, Major - 35%, Assignments - 10%, Quizes - 10%, Paper Reading - 15%.

Grading Scheme (AIL8027 - 4 Credit): Minor - 30%, Major - 35%, Assignments - 20%, Quizes - 10%, Paper Reading - 15%.

Attendance Policy: Institute default (<75% attendance leads to grade being lowered by one).

Audit Pass Criteria: Marks equivalent to B- or higher, plus >=75% attendance.

Prerequisites: A foundational course in AI or ML; Proficiency in Python; Good knowledge of Probability and Statistics.

Lecture Location: LH521. Students registered with course code AIL821 will also have lectures at LH521.

Lecture Timing: Monday & Thursday, 03:30 PM - 5:00 PM

Office Hours: By appointment.

Tentative List of Modules:

  • Multi-Agent Reinforcement Learning (MARL)
  • Constrained Reinforcement Learning (CRL)
  • Hierarchical Reinforcement Learning (HRL)
  • Goal Conditioned RL (GCRL)
  • Unsupervised Reinforcement Learning (URL)
  • Meta Reinforcement Learning (MRL)
  • Imitation Learning (IL)
  • Reinforcement Learning for LLMs (RL4LLM)
  • Distributional Reinforcement Learning (DRL)

Deadlines :

  • Assignment - 1: 01st Sep.
  • Quiz - 1: 04th Sep.
  • Assignment - 2: 16th Oct.
  • Quiz - 2: 23rd Oct.

Lecture Schedule:

Week No. Lecture Dates Module Topics Reference Materials
1 July 24, 28 RL Course motivation and logistics; Review of RL Basics - 1: Markov decision process, value functions, Bellman equations, monte-carlo RL, TD learning, SARSA. [1]-Ch1, Ch3, Ch5, Ch6
2 July 31, Aug. 04 RL Review of RL Basics - 2: Off-policy learning - Q-Learning, DQN, Policy Gradient Methods - REINFORCE, Actor-Critic (AC), Advantage Actor Critic (A2C). [1]-Ch6, Ch13
3 Aug. 07, 11 MARL Introduction: multi-agent RL, motivation; challenges in MARL; Dec-POMDP;Solution Method: Single-RL - centralized learning, independent learning, parameter sharing, experience sharing; [2]-Ch1, Ch3;
4 Aug. 14, 18 MARL Game theoretic solutions: Nash Q-learning, No-regret learning; Training & execution paradigms; Multi-Agent policy gradient (MAPG, MADDPG), counterfactual action-value function (COMA) [2]-Ch6
5 Aug. 21, 25 MARL Value Decomposition Method: Linear Value Decomposition (VDN), Monotonic Value Decomposition (QMIX), Multi-agent Attention Actor Critic (MAAC); Many Agent Training: Mean-Field MARL; Collective Dec-POMDP -
6 Aug. 28, Sep. 1st CRL Constrained MDP; Lagrange relaxation technique - Reward Constrained Policy Optimization (RCPO); Trust Region Method - Constrained Policy Optimization (CPO). [3]-Ch5
7 Sep. 4, 8 HRL State and Temporal Abstractions in Markov Decision Process; Semi-Markov Decision Process; Option Framework - Value Iteration with Options -
8 Sep. 11 HRL Option Value and Policy Learning; Option-Critic Arch -
- Sep. 12 - 17 - Minor Exam -
8 Sep. 18 GCRL Goal Augmented MDP, Notion of Goals & Subgoals; -
9 Sep. 22, 25 URL Reward-Free Pre-Training and Exploration; Hindsight Experience Replay (HER) -
- Sep. 28 - Oct. 5 - Mid-Semester Break -
10 Oct. 6, 9 URL Intrinsic Reward Based RL; Empowerment -
11 Oct. 13, 16 MRL Fast RL via slow RL; Learning to RL; Model agnostic meta learning (MAML) -
12 Oct. 23, 27 IL Imitation Learning: Behavior cloning, Dataset Aggregation (DAgger). Generative Adversarial Imitation Learning (GAIL). -
13 Oct. 30, Nov. 3 RL4LLM RL with Human Feedback (RLHF); Preference based learning - Direct preference optimization (DPO); Preference based learning - Reward-aware preference optimization (RPO), Group Relative Policy Optimization (GRPO). -
14 Nov. 6, 10 DRL Learning return distribution; categorical TD-learning; Distributional Bellman operator; distributional value iteration; Distributional RL algorithm with deep neural networks. -


Reference Books :

  1. Sutton and Barto, Reinforcement Learning. Second Edition, MIT Press 2018 [PDF]
  2. Christianos et al. Multi-Agent Reinforcement Learning: Foundations and Modern Approaches. MIT Press 2024. [PDF]
  3. Boyd et al. Convex Optimization [PDF]
  4. Bellemare et al. Distributional Reinforcement Learning MIT Press 2023. [PDF]
  5. Sutton et al. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning.