Thunder Compute logo

Batch Size

The number of training samples processed in one forward/backward pass

Batch size is the number of training samples that are processed together in a single forward and backward pass. It directly affects GPU memory usage, training speed, and model convergence.

Example

from torch.utils.data import DataLoader

# Batch size of 64 — each iteration processes 64 samples
train_loader = DataLoader(dataset, batch_size=64, shuffle=True)

for batch in train_loader:
    inputs, labels = batch
    outputs = model(inputs.cuda())  # forward pass on 64 samples at once
    loss = criterion(outputs, labels.cuda())
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

Trade-offs

Larger BatchSmaller Batch
Better GPU utilizationLower memory usage
Faster wall-clock time per epochCan generalize better
May need learning rate scalingMore noisy gradients

See Also