Thread, Block, Grid

CUDA organizes parallel execution into a three-level hierarchy: threads, blocks, and grids.

Hierarchy

Level	Description
Thread	A single unit of execution
Block	A group of threads that can share memory and synchronize
Grid	A collection of blocks that together execute a kernel

Example

// Launch a grid of 8 blocks, each with 128 threads = 1024 total threads
myKernel<<<8, 128>>>(data);

// Inside the kernel, compute a global thread ID:
int tid = blockIdx.x * blockDim.x + threadIdx.x;

Key Points

Threads within a block can use shared memory and call __syncthreads()
Blocks are independent — they can execute in any order
Maximum threads per block is typically 1024

Hierarchy

Example

Key Points

See Also