---
title: "Tensor Parallelism"
canonical: "https://www.thundercompute.com/glossary/parallelism/tensor-parallelism"
description: "Splitting individual layers across multiple GPUs"
sidebarTitle: "Tensor Parallelism"
icon: "table-cells"
iconType: "solid"
---

**Tensor parallelism** splits individual weight matrices (tensors) across multiple GPUs. Each GPU computes a portion of a layer's output, and results are combined. This reduces memory per GPU without the pipeline bubbles of model parallelism.

## Example

```
A single linear layer: y = xW + b
Where W is [4096 × 4096]

With 2-way tensor parallelism:
  GPU 0 holds W[:, :2048]  →  computes y₀
  GPU 1 holds W[:, 2048:]  →  computes y₁
  y = concat(y₀, y₁)
```

## Key Points

- Requires **fast inter-GPU communication** (NVLink)
- Each GPU stores only a fraction of each layer's weights
- Common in large language model serving (e.g., Megatron-LM)

## See Also

- [Model Parallelism](/parallelism/model-parallelism)
- [Data Parallelism](/parallelism/data-parallelism)