PyTorch DDP Masterclass – Multi-GPU & Multi-Node Training -

PyTorch Distributed Data Parallel (DDP) Masterclass | Multi-GPU & Multi-Node Training

عدد الدروس : 6 عدد ساعات الدورة : 00:50:40 شهادة معتمدة : نعم التسجيل في الدورة للحصول على شهادة

للحصول على شهادة

1- التسجيل
2- مشاهدة الكورس كاملا
3- متابعة نسبة اكتمال الكورس تدريجيا
4- بعد الانتهاء تظهر الشهادة في الملف الشخصي الخاص بك

Master distributed deep learning with PyTorch using Distributed Data Parallel (DDP). Learn multi-GPU and multi-node training, Torchrun workflows, and training GPT-like large models efficiently.

قائمة الدروس

1 - Part 1: Welcome to the Distributed Data Parallel (DDP) Tutorial Series

2 - Part 2: What is Distributed Data Parallel (DDP)

3 - Part 3: Multi-GPU training with DDP (code walkthrough)

4 - Part 4: Multi-GPU DDP Training with Torchrun (code walkthrough)

5 - Part 5: Multinode DDP Training with Torchrun (code walkthrough)

6 - Part 6: Training a GPT-like model with DDP (code walkthrough)

عن الدورة

This advanced PyTorch course focuses on Distributed Data Parallel (DDP) for high-performance deep learning. Designed for experienced machine learning engineers and AI researchers, it provides a practical guide to scaling models across multiple GPUs and nodes.

The course starts with an introduction to DDP, explaining its architecture, benefits, and how it improves training efficiency. You will then explore multi-GPU training through hands-on code walkthroughs, including Torchrun-based workflows to simplify distributed training.

Advanced modules cover multi-node DDP setups, allowing you to scale large models across multiple machines. The course also includes a complete example of training a GPT-like model with DDP, demonstrating best practices for performance, memory optimization, and synchronization across GPUs and nodes.

By the end of this course, learners will be able to implement and optimize distributed training for large deep learning models, reduce training time dramatically, and confidently handle multi-GPU and multi-node setups for research or production environments.

PyTorch Distributed Data Parallel (DDP) Masterclass | Multi-GPU & Multi-Node Training

قائمة الدروس

عن الدورة

دورات ذات صلة

English Speaking Practice | Food & Restaurant Conversations

Probability and Statistics Tutorials – 365 Data Science

Stanford CME295 Transformers & LLMs – Autumn 2025

Practical Introduction to Large Language Models (LLMs) – Full Series

Intro to Large Language Models – Andrej Karpathy

Reinforcement Learning for LLMs – UCLA Course

Stanford CS336 – Language Modeling from Scratch | Spring 2025

LLMs Level 1 – Master Large Language Models | H2O.ai