Viet-Anh on Software Logo

What is: Attention Dropout?

Data SourceCC BY-SA -

Attention Dropout is a type of dropout used in attention-based architectures, where elements are randomly dropped out of the softmax in the attention equation. For example, for scaled-dot product attention, we would drop elements from the first term:

Attention(Q,K,V)=softmax(QKTdk)V{\text{Attention}}(Q, K, V) = \text{softmax}\left(\frac{QK^{T}}{\sqrt{d_k}}\right)V