---
title: "Kernel"
canonical: "https://www.thundercompute.com/glossary/cuda-programming/kernel"
description: "A function that executes in parallel on the GPU"
sidebarTitle: "Kernel"
icon: "bolt"
iconType: "solid"
---

A **kernel** is a function written to run on the GPU. When launched, it executes across many threads simultaneously — each thread typically processes a different piece of data.

## Example

```cpp
// Define a kernel
__global__ void square(float *data, int n) {
    int i = blockIdx.x * blockDim.x + threadIdx.x;
    if (i < n) data[i] = data[i] * data[i];
}

// Launch the kernel with 1024 threads
square<<<4, 256>>>(data, 1024);
```

## Key Points

- Declared with `__global__` in CUDA C++
- Launched with the `<<<blocks, threads>>>` syntax
- Each thread has a unique ID computed from `blockIdx` and `threadIdx`

## See Also

- [CUDA](/cuda-programming/cuda)
- [Thread, Block, Grid](/cuda-programming/thread-block-grid)
- [Warp](/cuda-programming/warp)