---
title: "Warp"
canonical: "https://www.thundercompute.com/glossary/cuda-programming/warp"
description: "A group of 32 threads that execute in lockstep on a GPU"
sidebarTitle: "Warp"
icon: "arrows-split-up-and-left"
iconType: "solid"
---

A **warp** is a group of 32 threads that execute the same instruction simultaneously on a GPU's streaming multiprocessor. Warps are the fundamental unit of scheduling on NVIDIA GPUs.

## Example

```cpp
// If you launch 256 threads per block, that's 256 / 32 = 8 warps per block.
// All 32 threads in a warp execute the same instruction at the same time.
myKernel<<<grid, 256>>>(data);
```

## Key Points

- **Warp size is always 32** on NVIDIA GPUs
- If threads in a warp take different branches (`if/else`), both paths execute sequentially — this is called **warp divergence**
- Keeping threads in a warp doing the same work maximizes performance

## See Also

- [Thread, Block, Grid](/cuda-programming/thread-block-grid)
- [CUDA Cores](/gpu-hardware/cuda-cores)
