# Thunder Compute

> A glossary of GPU computing terms, CUDA programming concepts, and machine learning terminology

## GPU Hardware

- [GPU](https://thunder-compute-glossary.jamdesk.app/gpu-hardware/gpu.md): Graphics Processing Unit — a processor designed for parallel computation
- [What is VRAM?](https://thunder-compute-glossary.jamdesk.app/gpu-hardware/vram.md): Short definition of VRAM and why GPU memory capacity matters for real workloads.
- [What are CUDA Cores?](https://thunder-compute-glossary.jamdesk.app/gpu-hardware/cuda-cores.md): Learn the basics regarding CUDA cores, plus a simple example of how they enable parallel work on NVIDIA GPUs.
- [What are Tensor Cores?](https://thunder-compute-glossary.jamdesk.app/gpu-hardware/tensor-cores.md): Specialized GPU cores for matrix multiply-accumulate operations

## CUDA Programming

- [CUDA](https://thunder-compute-glossary.jamdesk.app/cuda-programming/cuda.md): NVIDIA's parallel computing platform and API
- [Kernel](https://thunder-compute-glossary.jamdesk.app/cuda-programming/kernel.md): A function that executes in parallel on the GPU
- [Thread, Block, Grid](https://thunder-compute-glossary.jamdesk.app/cuda-programming/thread-block-grid.md): The hierarchical execution model of CUDA
- [Warp](https://thunder-compute-glossary.jamdesk.app/cuda-programming/warp.md): A group of 32 threads that execute in lockstep on a GPU

## Memory

- [Global Memory](https://thunder-compute-glossary.jamdesk.app/memory/global-memory.md): The main, large memory space on a GPU accessible by all threads
- [Shared Memory](https://thunder-compute-glossary.jamdesk.app/memory/shared-memory.md): Fast, on-chip memory shared among threads in a block
- [Memory Bandwidth](https://thunder-compute-glossary.jamdesk.app/memory/memory-bandwidth.md): The rate at which data can be read from or written to GPU memory

## Training

- [Forward Pass](https://thunder-compute-glossary.jamdesk.app/training/forward-pass.md): Computing the output of a neural network from input to prediction
- [Backpropagation](https://thunder-compute-glossary.jamdesk.app/training/backpropagation.md): The algorithm for computing gradients through a neural network
- [Batch Size](https://thunder-compute-glossary.jamdesk.app/training/batch-size.md): The number of training samples processed in one forward/backward pass
- [Epoch](https://thunder-compute-glossary.jamdesk.app/training/epoch.md): One complete pass through the entire training dataset

## Inference

- [Inference](https://thunder-compute-glossary.jamdesk.app/inference/inference.md): Using a trained model to make predictions on new data
- [Quantization](https://thunder-compute-glossary.jamdesk.app/inference/quantization.md): Reducing model precision to use less memory and compute faster
- [Throughput vs. Latency](https://thunder-compute-glossary.jamdesk.app/inference/throughput-vs-latency.md): Two key metrics for measuring inference performance

## Parallelism

- [Data Parallelism](https://thunder-compute-glossary.jamdesk.app/parallelism/data-parallelism.md): Distributing training data across multiple GPUs, each with a copy of the model
- [Model Parallelism](https://thunder-compute-glossary.jamdesk.app/parallelism/model-parallelism.md): Splitting a model across multiple GPUs when it's too large for one
- [Tensor Parallelism](https://thunder-compute-glossary.jamdesk.app/parallelism/tensor-parallelism.md): Splitting individual layers across multiple GPUs