[UCLA RL-LLM] Chapter 3.2: Reinforcement learning with verifiable rewards (RLVR) -

محتوى الدورة

[UCLA RL-LLM] Chapter 0: Course outline and prologue [UCLA RL-LLM] Chapter 1.1: MDP foundations, imitation learning, and value iteration [UCLA RL-LLM] Chapter 1.2: Deep policy evaluation [UCLA RL-LLM] Chapter 1.3: Deep policy gradient methods (A3C) [UCLA RL-LLM] Chapter 1.4: Deep policy gradient methods (PPO, GRPO) [UCLA RL-LLM] Chapter 1.5: AlphaGo, test-time compute, and expert iteration [UCLA RL-LLM] Chapter 2.1: NLP foundations, language modeling, RNNs [UCLA RL-LLM] Chapter 2.2: Transformers I (BERT, GPT-1) [UCLA RL-LLM] Chapter 2.3: Transformers II (modern transformers updates and sampling methods) [UCLA RL-LLM] Chapter 2.4: In-context learning and instruction fine-tuning [UCLA RL-LLM] Chapter 3.1: Reinforcement learning from human feedback (PPO, DPO)

الدرس السابق

للحصول على شهادة

1- التسجيل
2- مشاهدة الكورس كاملا
3- متابعة نسبة اكتمال الكورس تدريجيا
4- بعد الانتهاء تظهر الشهادة في الملف الشخصي الخاص بك