Stochastic gradient descent with momentum uses an exponentially weighted average of past gradients to update the momentum term and the model's parameters at each iteration. It helps the optimizer maintain a more stable direction and speed up convergence.