James Arambam


AIL8027: Advanced Reinforcement Learning (4 Credit)

AIL821: Special Topics in Machine Learning (3 Credit)

Course Overview: In this course, we will delve into the intricacies of reinforcement learning (RL) by exploring the advanced topics in the field. RL, a sprawling research area, holds promise for applications in diverse real-world domains such as robotics, autonomous driving, smart transportation, finance, supply-chain logistics, training LLMs, games etc. However, the challenges we confront in these domains often do not align with ideal conditions, necessitating a departure from simply applying our preferred off-the-shelf RL algorithms. For instance, we may encounter scenarios with multiple learning agents in the environment, sparse reward structures, multiple dynamic goals, incorporating constraints in the policy optimization etc. We will also cover recent crucial applications of RL in training LLMs.

Learning Outcome: Develop a comprehensive understanding of a wide range of sophisticated tools and techniques in the field of reinforcement learning. This will empower students to effectively tackle complex problem scenarios that may arise in real-world applications. Gain insight into the latest innovative concepts and pioneering research directions in the field. Students will also explore open research challenges that exist on the cutting edge of this rapidly evolving field, equipping them with the knowledge to contribute to future advancements.

Note: The course is currently offered under two different course IDs.
AIL821: Special Topics in Machine Learning (3 Credit) - For old students only.
AIL8027: Advanced Reinforcement Learning (4 Credit) - For new students only.

Grading Scheme (AIL821 - 3 Credit): Minor - 30%, Major - 35%, Assignments - 10%, Quizes - 10%, Paper Reading - 15%.

Grading Scheme (AIL8027 - 4 Credit): Minor - 30%, Major - 35%, Assignments - 20%, Quizes - 10%, Paper Reading - 15%.

Attendance Policy: Institute default (<75% attendance leads to grade being lowered by one).

Audit Pass Criteria: Marks equivalent to B- or higher, plus >=75% attendance.

Prerequisites: A foundational course in AI or ML; Proficiency in Python; Good knowledge of Probability and Statistics.

Lecture Location: LH521. Students registered with course code AIL821 will also have lectures at LH521.

Lecture Timing: Monday & Thursday, 03:30 PM - 5:00 PM

Office Hours: By appointment.

Tentative List of Modules:

  • Multi-Agent Reinforcement Learning (MARL)
  • Constrained Reinforcement Learning (CRL)
  • Hierarchical Reinforcement Learning (HRL)
  • Reinforcement Learning for LLMs (RL-LLM)
  • Unsupervised Reinforcement Learning (URL)
  • Distributional Reinforcement Learning (DRL)
  • Meta Reinforcement Learning (MRL)
  • Imitation Learning (IL)
  • Goal Conditioned RL (GCRL)
  • Human-in-the-loop Reinforcement Learning

Deadlines :

  • Assignment - 1: TBA.
  • Quiz - 1: TBA

Lecture Schedule:

Week No. Lecture Dates Module Topics Reference Materials
1 July 24, 28 RL Course motivation and logistics; Review of RL Basics - 1: Markov decision process, value functions, Bellman equations, monte-carlo RL, TD learning, SARSA. [1]-Ch1, Ch3, Ch5, Ch6
2 July 31, Aug. 04 RL Review of RL Basics - 2: Off-policy learning - Q-Learning, DQN, Policy Gradient Methods - REINFORCE, Actor-Critic (AC), Advantage Actor Critic (A2C). [1]-Ch6, Ch13
3 Aug. 07, 11 MARL Introduction: multi-agent RL, motivation; challenges in MARL; Dec-POMDP;Solution Method: Single-RL - centralized learning, independent learning, parameter sharing, experience sharing; [2]-Ch1, Ch3
4 Aug. 14, 18 MARL Game theoretic solutions: Nash Q-learning, No-regret learning; Training & execution paradigms; Multi-Agent policy gradient (MAPG, MADDPG), counterfactual action-value function (COMA) -
5 Aug. 21, 25 MARL Value Decomposition Method: Linear Value Decomposition (VDN), Monotonic Value Decomposition (QMIX), Multi-agent Attention Actor Critic (MAAC); Many Agent Training: Mean-Field MARL; Collective Dec-POMDP -
6 Aug. 28, Sep. 1st CRL Constrained MDP; Lagrange relaxation technique - Reward Constrained Policy Optimization (RCPO); Trust Region Method - Constrained Policy Optimization (CPO). -
7 Sep. 4, 8 HRL State and Temporal Abstractions in Markov Decision Process; Semi-Markov Decision Process; Option Framework - Value Iteration with Options, Option Value and Policy Learning, Option-Critic Arch., Natural Option-Critic. -
8 Sep. 11 RL-LLM RL with Human Feedback (RLHF); Preference based learning - Direct preference optimization (DPO) -
- Sep. 12 - 17 - Minor Exam -
8 Sep. 18 RL-LLM Preference based learning - Reward-aware preference optimization (RPO), Group Relative Policy Optimization (GRPO). -
9 Sep. 22, 25 URL Reward-Free Pre-Training and Exploration; Intrinsic Motivation; Empowerment; Curiosity Driven Exploration; Unsupervised Skill Discovery; Unsupervised Control. -
- Sep. 28 - Oct. 5 - Mid-Semester Break -
10 Oct. 6, 9 DRL Learning return distribution; categorical TD-learning; Distributional Bellman operator; distributional value iteration; Distributional RL algorithm with deep neural networks. -
11 Oct. 13, 16 MRL Fast RL via slow RL; Learning to RL; Model agnostic meta learning (MAML); Meta Gradient RL -
12 - IL Imitation Learning: Behavior cloning, Dataset Aggregation (DAgger). Generative Adversarial Imitation Learning (GAIL). -
13 - GCRL Goal Augmented MDP, Notion of Goals & Subgoals; Hindsight Experience Replay (HER) -
14 - HLRL Human-in-the-loop Reinforcement Learning -


Reference Materials :

  1. Sutton and Barto, Reinforcement Learning. Second Edition, MIT Press 2018 [PDF]
  2. Christianos et al. Multi-Agent Reinforcement Learning: Foundations and Modern Approaches. MIT Press 2024. [PDF]
  3. Bellemare et al. Distributional Reinforcement Learning MIT Press 2023. [PDF]
  4. Gupta et al. Cooperative Multi-Agent Control Using Deep Reinforcement Learning. AAMAS-2017
  5. Christianos et al. Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning. NeurIPS-2020
  6. Sutton et al. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning.
  7. Duan et al. RL2: Fast Reinforcement Learning via Slow Reinforcement Learning, ICLR-2017.
  8. Finn et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (MAML), ICML-2017
  9. Warde-Farley et al. Unsupervised Control Through Non-Parameteric Discriminative Rewards. ICLR-2018.
  10. J Achiam et al. Constraint Policy Optimization. ICML-2017.
  11. Duan et al. RL2: Fast Reinforcement Learning via Slow Reinforcement Learning, ICLR-2017.
  12. C Finn et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (MAML), ICML-2017
  13. Christiano et al. Deep reinforcement learning from human preferences. NeurIPS, 2017.
  14. Liu et al. Goal-Conditioned Reinforcement Learning: Problems and Solutions. IJCAI-2022.


More updates coming soon!