Batch Size
The number of training samples processed in one forward/backward pass
Batch size is the number of training samples that are processed together in a single forward and backward pass. It directly affects GPU memory usage, training speed, and model convergence.
Example
from torch.utils.data import DataLoader
# Batch size of 64 — each iteration processes 64 samples
train_loader = DataLoader(dataset, batch_size=64, shuffle=True)
for batch in train_loader:
inputs, labels = batch
outputs = model(inputs.cuda()) # forward pass on 64 samples at once
loss = criterion(outputs, labels.cuda())
loss.backward()
optimizer.step()
optimizer.zero_grad()
Trade-offs
| Larger Batch | Smaller Batch |
|---|---|
| Better GPU utilization | Lower memory usage |
| Faster wall-clock time per epoch | Can generalize better |
| May need learning rate scaling | More noisy gradients |