Throughput vs. Latency

Throughput is how many requests a system can handle per unit of time. Latency is how long a single request takes to complete. These two metrics often trade off against each other.

Example

Scenario: Serving an LLM

Latency-optimized:
  - Batch size 1 → 50ms per request → 20 req/s

Throughput-optimized:
  - Batch size 32 → 200ms per request → 160 req/s
  - Each request is slower, but total work done is 8x higher

Key Differences

Metric	Optimized For	Typical Strategy
Latency	Real-time apps, chatbots	Small batch, fast hardware
Throughput	Batch processing, serving at scale	Large batch, parallelism

Example

Key Differences

See Also