Thunder Compute logo

Thread, Block, Grid

The hierarchical execution model of CUDA

CUDA organizes parallel execution into a three-level hierarchy: threads, blocks, and grids.

Hierarchy

LevelDescription
ThreadA single unit of execution
BlockA group of threads that can share memory and synchronize
GridA collection of blocks that together execute a kernel

Example

// Launch a grid of 8 blocks, each with 128 threads = 1024 total threads
myKernel<<<8, 128>>>(data);

// Inside the kernel, compute a global thread ID:
int tid = blockIdx.x * blockDim.x + threadIdx.x;

Key Points

  • Threads within a block can use shared memory and call __syncthreads()
  • Blocks are independent — they can execute in any order
  • Maximum threads per block is typically 1024

See Also

Thread, Block, Grid