---
title: "What are Tensor Cores?"
canonical: "https://www.thundercompute.com/glossary/gpu-hardware/tensor-cores"
description: "Specialized GPU cores for matrix multiply-accumulate operations"
sidebarTitle: "Tensor Cores"
icon: "cube"
iconType: "solid"
---


**Tensor Cores** are specialized processing units on NVIDIA GPUs (Volta and later) that accelerate matrix multiply-accumulate operations — the core computation in deep learning.


## Example


```python
import torch


# Tensor Cores are used automatically with mixed precision
with torch.autocast(device_type="cuda", dtype=torch.float16):
    x = torch.randn(512, 512, device="cuda")
    w = torch.randn(512, 512, device="cuda")
    y = x @ w  # matrix multiply — runs on Tensor Cores
```


## An Overview of Tensor Cores


- Perform **4x4 matrix multiplications** in a single clock cycle
- Require specific data types: FP16, BF16, TF32, INT8
- Dramatically accelerate training and inference when used with mixed precision


## Tensor Core Generations


NVIDIA has iterated on Tensor Core technology across several architectural generations to provide exponential leaps in deep learning performance.


- **Blackwell (5th Gen):** Featured in the **RTX PRO 6000**, delivering up to 4,000 AI TOPS and introducing support for FP4 precision to maximize throughput for massive LLMs.
- **Hopper (4th Gen):** Introduced the Transformer Engine in the **H100**, specifically designed to dynamically scale precision for Transformer-based models using FP8.
- **Ada Lovelace (4th Gen):** Found in the **RTX 6000** and **RTX 4090**, these cores include an enhanced 8-bit floating point (FP8) engine to double throughput over the previous generation.
- **Ampere (3rd Gen):** Found in the **A100**, **RTX A6000**, and **RTX 3090**, this generation introduced **TF32** (Tensor Float 32), providing speedups on FP32 workloads without requiring code changes.


## NVIDIA GPU Tensor Core Comparison


| Graphics Card | Architecture | Tensor Cores | AI TOPS | CUDA Cores | FP32 TFLOPS |
| :--- | :--- | :--- | :--- | :--- | :--- |
| **RTX PRO 6000** | NVIDIA Blackwell | 5th Gen | 4,000 | 24,064 | 125.0 |
| **RTX 6000** | NVIDIA Ada Lovelace | 4th Gen | 1,457 | 18,176 | 91.1 |
| **RTX A6000** | NVIDIA Ampere | 3rd Gen | 309.7 | 10,752 | 38.7 |
| **A100 80GB** | NVIDIA Ampere | 3rd Gen | 624 | 6,912 | 19.5 |
| **H100 PCIe** | NVIDIA Hopper | 4th Gen | 1,513 | 14,592 | 51.2 |
| **H200 NVL** | NVIDIA Hopper | 4th Gen | 3,341 | 16,896 | 60.3 |


<Card title="Recommended article" icon="book" href="https://www.thundercompute.com/blog/best-gpu-for-ai-guide" cta="Read more">
Read about how architecture choices translate to real-world training.
</Card>


## See Also


- [CUDA Cores](/gpu-hardware/cuda-cores)
- [Quantization](/inference/quantization)