Viet-Anh on Software Logo

What is: Dueling Network?

SourceDueling Network Architectures for Deep Reinforcement Learning
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

A Dueling Network is a type of Q-Network that has two streams to separately estimate (scalar) state-value and the advantages for each action. Both streams share a common convolutional feature learning module. The two streams are combined via a special aggregating layer to produce an estimate of the state-action value function Q as shown in the figure to the right.

The last module uses the following mapping:

Q(s,a,θ,α,β)=V(s,θ,β)+(A(s,a,θ,α)1A_aA(s,a;θ,α))Q\left(s, a, \theta, \alpha, \beta\right) =V\left(s, \theta, \beta\right) + \left(A\left(s, a, \theta, \alpha\right) - \frac{1}{|\mathcal{A}|}\sum\_{a'}A\left(s, a'; \theta, \alpha\right)\right)

This formulation is chosen for identifiability so that the advantage function has zero advantage for the chosen action, but instead of a maximum we use an average operator to increase the stability of the optimization.