---
title: "Memory Bandwidth"
canonical: "https://www.thundercompute.com/glossary/memory/memory-bandwidth"
description: "The rate at which data can be read from or written to GPU memory"
sidebarTitle: "Memory Bandwidth"
icon: "gauge-high"
iconType: "solid"
---

**Memory bandwidth** is the rate at which data can be transferred between GPU memory (VRAM) and the GPU's compute cores. It is often the bottleneck in GPU workloads.

## Example

```python
import torch, time

x = torch.randn(100_000_000, device="cuda")  # ~400 MB
torch.cuda.synchronize()

start = time.time()
y = x * 2  # memory-bandwidth-bound operation
torch.cuda.synchronize()

elapsed = time.time() - start
gb_moved = x.nelement() * x.element_size() * 3 / 1e9  # read x, read 2, write y
print(f"Effective bandwidth: {gb_moved / elapsed:.0f} GB/s")
```

## Bandwidth by GPU

| GPU | Memory Bandwidth |
|-----|-----------------|
| RTX 4090 | 1,008 GB/s |
| A100 (HBM2e) | 2,039 GB/s |
| H100 (HBM3) | 3,350 GB/s |

## See Also

- [VRAM](/gpu-hardware/vram)
- [Global Memory](/memory/global-memory)
- [Throughput vs. Latency](/inference/throughput-vs-latency)