Memory Bandwidth

Memory bandwidth is the rate at which data can be transferred between GPU memory (VRAM) and the GPU's compute cores. It is often the bottleneck in GPU workloads.

Example

import torch, time

x = torch.randn(100_000_000, device="cuda")  # ~400 MB
torch.cuda.synchronize()

start = time.time()
y = x * 2  # memory-bandwidth-bound operation
torch.cuda.synchronize()

elapsed = time.time() - start
gb_moved = x.nelement() * x.element_size() * 3 / 1e9  # read x, read 2, write y
print(f"Effective bandwidth: {gb_moved / elapsed:.0f} GB/s")

Bandwidth by GPU

GPU	Memory Bandwidth
RTX 4090	1,008 GB/s
A100 (HBM2e)	2,039 GB/s
H100 (HBM3)	3,350 GB/s

Example

Bandwidth by GPU

See Also