Kernel
A function that executes in parallel on the GPU
A kernel is a function written to run on the GPU. When launched, it executes across many threads simultaneously — each thread typically processes a different piece of data.
Example
// Define a kernel
__global__ void square(float *data, int n) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i < n) data[i] = data[i] * data[i];
}
// Launch the kernel with 1024 threads
square<<<4, 256>>>(data, 1024);
Key Points
- Declared with
__global__in CUDA C++ - Launched with the
<<<blocks, threads>>>syntax - Each thread has a unique ID computed from
blockIdxandthreadIdx