LLMs from Scratch – From Base Model to PPO & RLHF -

LLMs from Scratch – From Base Model to PPO & RLHF

عدد الدروس : 1 عدد ساعات الدورة : 06:06:21 شهادة معتمدة : نعم التسجيل في الدورة للحصول على شهادة

للحصول على شهادة

1- التسجيل
2- مشاهدة الكورس كاملا
3- متابعة نسبة اكتمال الكورس تدريجيا
4- بعد الانتهاء تظهر الشهادة في الملف الشخصي الخاص بك

A practical deep dive into building large language models from the ground up, including pretraining, fine-tuning, and reinforcement learning with human feedback (RLHF).

قائمة الدروس

1 - LLMs from Scratch – Practical Engineering from Base Model to PPO RLHF

عن الدورة

This course focuses on the full engineering pipeline of Large Language Models (LLMs), starting from fundamental model building and progressing all the way to advanced alignment techniques such as PPO-based Reinforcement Learning from Human Feedback (RLHF).

The course begins with the basics of constructing an LLM from scratch, including environment setup, data preparation, and implementing core components such as tokenization and embedding layers. Learners gain hands-on experience with how raw text is transformed into structured inputs that neural networks can process.

Next, the course moves into model architecture design, covering transformer-based structures, attention mechanisms, and training objectives like next-token prediction. This section helps learners understand how modern LLMs such as GPT-style models are built and optimized.

A major focus is placed on pretraining workflows, where models learn language patterns from large-scale unlabeled datasets. Learners explore training loops, optimization strategies, and performance considerations when scaling models.

The course then introduces fine-tuning techniques, showing how pretrained models can be adapted to specific tasks or domains using supervised datasets.

The final and most advanced section covers Reinforcement Learning from Human Feedback (RLHF), including the PPO (Proximal Policy Optimization) algorithm. This part explains how human preferences are used to align model behavior, improve response quality, and reduce harmful or irrelevant outputs.

By the end of the course, learners will understand the full lifecycle of LLM development—from raw data processing and transformer implement