In gradient descent, each parameter update is based solely on the current gradient, which can lead to undesirable oscillations during the optimization process. Similar to control tasks, where a filter is often required, a momentum mechanism is introduced. Momentum acts as a filter by incorporating a moving average of past gradients during the learning process.
A widely used control algorithm in industrial applications for smoothing input (sensor) data is the Proportional-Integral-Derivative (PID) control. The PID algorithm, as its name implies, is based on three core components: proportional, integral, and derivative. These coefficients are adjusted to achieve an optimal response, effectively reducing fluctuations and ensuring smoother adjustments.