---
title: "Model Parallelism"
canonical: "https://www.thundercompute.com/glossary/parallelism/model-parallelism"
description: "Splitting a model across multiple GPUs when it's too large for one"
sidebarTitle: "Model Parallelism"
icon: "scissors"
iconType: "solid"
---

**Model parallelism** splits a model's layers across multiple GPUs. This is necessary when a model is too large to fit in a single GPU's memory.

## Example

```python
import torch.nn as nn

# Manually place different layers on different GPUs
class SplitModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = nn.Linear(4096, 4096).to("cuda:0")
        self.layer2 = nn.Linear(4096, 4096).to("cuda:1")

    def forward(self, x):
        x = self.layer1(x.to("cuda:0"))
        x = self.layer2(x.to("cuda:1"))
        return x
```

## Types

| Type | What's Split | When to Use |
|------|-------------|-------------|
| **Pipeline parallelism** | Sequential layers across GPUs | Very deep models |
| **Tensor parallelism** | Individual layers across GPUs | Very wide layers |

## See Also

- [Data Parallelism](/parallelism/data-parallelism)
- [Tensor Parallelism](/parallelism/tensor-parallelism)
- [VRAM](/gpu-hardware/vram)
