CUDA
NVIDIA's parallel computing platform and API
CUDA (Compute Unified Device Architecture) is NVIDIA's parallel computing platform and programming model that lets developers run general-purpose code on NVIDIA GPUs.
Example
// CUDA C++ kernel — adds two arrays element-wise
__global__ void add(float *a, float *b, float *c, int n) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i < n) c[i] = a[i] + b[i];
}
// Launch with 256 threads per block
add<<<(n + 255) / 256, 256>>>(a, b, c, n);
Key Concepts
- Host — the CPU and its memory
- Device — the GPU and its memory
- Kernel — a function that runs on the GPU
- CUDA Toolkit — compiler (
nvcc), libraries, and profiling tools