What is: Deformable Convolutional Networks?
Source | Deformable Convolutional Networks |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Deformable ConvNets do not learn an affine transformation. They divide convolution into two steps, firstly sampling features on a regular grid from the input feature map, then aggregating sampled features by weighted summation using a convolution kernel. The process can be written as: \begin{align} Y(p_{0}) &= \sum_{p_i \in \mathcal{R}} w(p_{i}) X(p_{0} + p_{i}) \end{align} \begin{align} \mathcal{R} &= {(-1,-1), (-1, 0), \dots, (1, 1)} \end{align} The deformable convolution augments the sampling process by introducing a group of learnable offsets which can be generated by a lightweight CNN. Using the offsets , the deformable convolution can be formulated as: \begin{align} Y(p_{0}) &= \sum_{p_i \in \mathcal{R}} w(p_{i}) X(p_{0} + p_{i} + \Delta p_{i}). \end{align} Through the above method, adaptive sampling is achieved. However, is a floating point value unsuited to grid sampling. To address this problem, bilinear interpolation is used. Deformable RoI pooling is also used, which greatly improves object detection.
Deformable ConvNets adaptively select the important regions and enlarge the valid receptive field of convolutional neural networks; this is important in object detection and semantic segmentation tasks.