Model Parallelism
Splitting a model across multiple GPUs when it's too large for one
Model parallelism splits a model's layers across multiple GPUs. This is necessary when a model is too large to fit in a single GPU's memory.
Example
import torch.nn as nn
# Manually place different layers on different GPUs
class SplitModel(nn.Module):
def __init__(self):
super().__init__()
self.layer1 = nn.Linear(4096, 4096).to("cuda:0")
self.layer2 = nn.Linear(4096, 4096).to("cuda:1")
def forward(self, x):
x = self.layer1(x.to("cuda:0"))
x = self.layer2(x.to("cuda:1"))
return x
Types
| Type | What's Split | When to Use |
|---|---|---|
| Pipeline parallelism | Sequential layers across GPUs | Very deep models |
| Tensor parallelism | Individual layers across GPUs | Very wide layers |