Thread, Block, Grid
The hierarchical execution model of CUDA
CUDA organizes parallel execution into a three-level hierarchy: threads, blocks, and grids.
Hierarchy
| Level | Description |
|---|---|
| Thread | A single unit of execution |
| Block | A group of threads that can share memory and synchronize |
| Grid | A collection of blocks that together execute a kernel |
Example
// Launch a grid of 8 blocks, each with 128 threads = 1024 total threads
myKernel<<<8, 128>>>(data);
// Inside the kernel, compute a global thread ID:
int tid = blockIdx.x * blockDim.x + threadIdx.x;
Key Points
- Threads within a block can use shared memory and call
__syncthreads() - Blocks are independent — they can execute in any order
- Maximum threads per block is typically 1024