Memory Bandwidth
The rate at which data can be read from or written to GPU memory
Memory bandwidth is the rate at which data can be transferred between GPU memory (VRAM) and the GPU's compute cores. It is often the bottleneck in GPU workloads.
Example
import torch, time
x = torch.randn(100_000_000, device="cuda") # ~400 MB
torch.cuda.synchronize()
start = time.time()
y = x * 2 # memory-bandwidth-bound operation
torch.cuda.synchronize()
elapsed = time.time() - start
gb_moved = x.nelement() * x.element_size() * 3 / 1e9 # read x, read 2, write y
print(f"Effective bandwidth: {gb_moved / elapsed:.0f} GB/s")
Bandwidth by GPU
| GPU | Memory Bandwidth |
|---|---|
| RTX 4090 | 1,008 GB/s |
| A100 (HBM2e) | 2,039 GB/s |
| H100 (HBM3) | 3,350 GB/s |