Thunder Compute logo

Memory Bandwidth

The rate at which data can be read from or written to GPU memory

Memory bandwidth is the rate at which data can be transferred between GPU memory (VRAM) and the GPU's compute cores. It is often the bottleneck in GPU workloads.

Example

import torch, time

x = torch.randn(100_000_000, device="cuda")  # ~400 MB
torch.cuda.synchronize()

start = time.time()
y = x * 2  # memory-bandwidth-bound operation
torch.cuda.synchronize()

elapsed = time.time() - start
gb_moved = x.nelement() * x.element_size() * 3 / 1e9  # read x, read 2, write y
print(f"Effective bandwidth: {gb_moved / elapsed:.0f} GB/s")

Bandwidth by GPU

GPUMemory Bandwidth
RTX 40901,008 GB/s
A100 (HBM2e)2,039 GB/s
H100 (HBM3)3,350 GB/s

See Also