# What is: Gradient Clipping?

Year | 2000 |

Data Source | CC BY-SA - https://paperswithcode.com |

One difficulty that arises with optimization of deep neural networks is that large parameter gradients can lead an SGD optimizer to update the parameters strongly into a region where the loss function is much greater, effectively undoing much of the work that was needed to get to the current solution.

**Gradient Clipping** clips the size of the gradients to ensure optimization performs more reasonably near sharp areas of the loss surface. It can be performed in a number of ways. One option is to simply clip the parameter gradient element-wise before a parameter update. Another option is to clip the norm ||$\textbf{g}$|| of the gradient $\textbf{g}$ before a parameter update:

$\text{ if } ||\textbf{g}|| > v \text{ then } \textbf{g} \leftarrow \frac{\textbf{g}{v}}{||\textbf{g}||}$

where $v$ is a norm threshold.

Source: Deep Learning, Goodfellow et al

Image Source: Pascanu et al