Backpropagation
The algorithm for computing gradients through a neural network
Backpropagation (backward pass) is the algorithm that computes the gradient of the loss function with respect to each weight in the network, by applying the chain rule layer by layer from output back to input.
Example
import torch
import torch.nn as nn
model = nn.Linear(784, 10).cuda()
x = torch.randn(32, 784, device="cuda")
target = torch.randint(0, 10, (32,), device="cuda")
output = model(x) # forward pass
loss = nn.CrossEntropyLoss()(output, target) # compute loss
loss.backward() # backpropagation
print(model.weight.grad.shape) # [10, 784] — gradient for each weight
Key Points
- Follows the forward pass — requires stored activations
- Computes gradients used by the optimizer to update weights
- Typically uses ~2x the compute of the forward pass