Thunder Compute logo

Model Parallelism

Splitting a model across multiple GPUs when it's too large for one

Model parallelism splits a model's layers across multiple GPUs. This is necessary when a model is too large to fit in a single GPU's memory.

Example

import torch.nn as nn

# Manually place different layers on different GPUs
class SplitModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = nn.Linear(4096, 4096).to("cuda:0")
        self.layer2 = nn.Linear(4096, 4096).to("cuda:1")

    def forward(self, x):
        x = self.layer1(x.to("cuda:0"))
        x = self.layer2(x.to("cuda:1"))
        return x

Types

TypeWhat's SplitWhen to Use
Pipeline parallelismSequential layers across GPUsVery deep models
Tensor parallelismIndividual layers across GPUsVery wide layers

See Also