Batch Size

Batch size is the number of training samples that are processed together in a single forward and backward pass. It directly affects GPU memory usage, training speed, and model convergence.

Example

from torch.utils.data import DataLoader

# Batch size of 64 — each iteration processes 64 samples
train_loader = DataLoader(dataset, batch_size=64, shuffle=True)

for batch in train_loader:
    inputs, labels = batch
    outputs = model(inputs.cuda())  # forward pass on 64 samples at once
    loss = criterion(outputs, labels.cuda())
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

Trade-offs

Larger Batch	Smaller Batch
Better GPU utilization	Lower memory usage
Faster wall-clock time per epoch	Can generalize better
May need learning rate scaling	More noisy gradients

Example

Trade-offs

See Also