Thunder Compute logo

Kernel

A function that executes in parallel on the GPU

A kernel is a function written to run on the GPU. When launched, it executes across many threads simultaneously — each thread typically processes a different piece of data.

Example

// Define a kernel
__global__ void square(float *data, int n) {
    int i = blockIdx.x * blockDim.x + threadIdx.x;
    if (i < n) data[i] = data[i] * data[i];
}

// Launch the kernel with 1024 threads
square<<<4, 256>>>(data, 1024);

Key Points

  • Declared with __global__ in CUDA C++
  • Launched with the <<<blocks, threads>>> syntax
  • Each thread has a unique ID computed from blockIdx and threadIdx

See Also