Warp

A warp is a group of 32 threads that execute the same instruction simultaneously on a GPU's streaming multiprocessor. Warps are the fundamental unit of scheduling on NVIDIA GPUs.

Example

// If you launch 256 threads per block, that's 256 / 32 = 8 warps per block.
// All 32 threads in a warp execute the same instruction at the same time.
myKernel<<<grid, 256>>>(data);

Key Points

Warp size is always 32 on NVIDIA GPUs
If threads in a warp take different branches (if/else), both paths execute sequentially — this is called warp divergence
Keeping threads in a warp doing the same work maximizes performance

Example

Key Points

See Also