Thunder Compute logo

Backpropagation

The algorithm for computing gradients through a neural network

Backpropagation (backward pass) is the algorithm that computes the gradient of the loss function with respect to each weight in the network, by applying the chain rule layer by layer from output back to input.

Example

import torch
import torch.nn as nn

model = nn.Linear(784, 10).cuda()
x = torch.randn(32, 784, device="cuda")
target = torch.randint(0, 10, (32,), device="cuda")

output = model(x)                            # forward pass
loss = nn.CrossEntropyLoss()(output, target)  # compute loss
loss.backward()                               # backpropagation

print(model.weight.grad.shape)  # [10, 784] — gradient for each weight

Key Points

  • Follows the forward pass — requires stored activations
  • Computes gradients used by the optimizer to update weights
  • Typically uses ~2x the compute of the forward pass

See Also