Thunder Compute logo

Tensor Cores

Specialized GPU cores for matrix multiply-accumulate operations

Tensor Cores are specialized processing units on NVIDIA GPUs (Volta and later) that accelerate matrix multiply-accumulate operations — the core computation in deep learning.

Example

import torch

# Tensor Cores are used automatically with mixed precision
with torch.autocast(device_type="cuda", dtype=torch.float16):
    x = torch.randn(512, 512, device="cuda")
    w = torch.randn(512, 512, device="cuda")
    y = x @ w  # matrix multiply — runs on Tensor Cores

Key Facts

  • Perform 4x4 matrix multiplications in a single clock cycle
  • Require specific data types: FP16, BF16, TF32, INT8
  • Dramatically accelerate training and inference when used with mixed precision

See Also