Thunder Compute logo

Warp

A group of 32 threads that execute in lockstep on a GPU

A warp is a group of 32 threads that execute the same instruction simultaneously on a GPU's streaming multiprocessor. Warps are the fundamental unit of scheduling on NVIDIA GPUs.

Example

// If you launch 256 threads per block, that's 256 / 32 = 8 warps per block.
// All 32 threads in a warp execute the same instruction at the same time.
myKernel<<<grid, 256>>>(data);

Key Points

  • Warp size is always 32 on NVIDIA GPUs
  • If threads in a warp take different branches (if/else), both paths execute sequentially — this is called warp divergence
  • Keeping threads in a warp doing the same work maximizes performance

See Also