Reinforcement Learning

Training AI models through reward-based optimization, including techniques like RLHF, GRPO, and policy gradient methods for improving reasoning, alignment, and task performance.

Reading List