Tensor Cores

Tensor Cores are specialized processing units on NVIDIA GPUs (Volta and later) that accelerate matrix multiply-accumulate operations — the core computation in deep learning.

Example

import torch

# Tensor Cores are used automatically with mixed precision
with torch.autocast(device_type="cuda", dtype=torch.float16):
    x = torch.randn(512, 512, device="cuda")
    w = torch.randn(512, 512, device="cuda")
    y = x @ w  # matrix multiply — runs on Tensor Cores

Key Facts

Perform 4x4 matrix multiplications in a single clock cycle
Require specific data types: FP16, BF16, TF32, INT8
Dramatically accelerate training and inference when used with mixed precision

Example

Key Facts

See Also