Warp
A group of 32 threads that execute in lockstep on a GPU
A warp is a group of 32 threads that execute the same instruction simultaneously on a GPU's streaming multiprocessor. Warps are the fundamental unit of scheduling on NVIDIA GPUs.
Example
// If you launch 256 threads per block, that's 256 / 32 = 8 warps per block.
// All 32 threads in a warp execute the same instruction at the same time.
myKernel<<<grid, 256>>>(data);
Key Points
- Warp size is always 32 on NVIDIA GPUs
- If threads in a warp take different branches (
if/else), both paths execute sequentially — this is called warp divergence - Keeping threads in a warp doing the same work maximizes performance