Thunder Compute logo

Throughput vs. Latency

Two key metrics for measuring inference performance

Throughput is how many requests a system can handle per unit of time. Latency is how long a single request takes to complete. These two metrics often trade off against each other.

Example

Scenario: Serving an LLM

Latency-optimized:
  - Batch size 1 → 50ms per request → 20 req/s

Throughput-optimized:
  - Batch size 32 → 200ms per request → 160 req/s
  - Each request is slower, but total work done is 8x higher

Key Differences

MetricOptimized ForTypical Strategy
LatencyReal-time apps, chatbotsSmall batch, fast hardware
ThroughputBatch processing, serving at scaleLarge batch, parallelism

See Also