deform_conv2d

paddle.vision.ops. deform_conv2d ( x: Tensor, offset: Tensor, weight: Tensor, bias: Tensor | None = None, stride: Size2 = 1, padding: Size2 = 0, dilation: Size2 = 1, deformable_groups: int = 1, groups: int = 1, mask: Tensor | None = None, name: str | None = None ) → Tensor [source]

Compute 2-D deformable convolution on 4-D input. Given input image x, output feature map y, the deformable convolution operation can be expressed as follow:

Deformable Convolution v2:

\[y(p) = \sum_{k=1}^{K}{w_k * x(p + p_k + \Delta p_k) * \Delta m_k}\]

Deformable Convolution v1:

\[y(p) = \sum_{k=1}^{K}{w_k * x(p + p_k + \Delta p_k)}\]

Where \(\Delta p_k\) and \(\Delta m_k\) are the learnable offset and modulation scalar for the k-th location, Which \(\Delta m_k\) is one in deformable convolution v1. Please refer to Deformable ConvNets v2: More Deformable, Better Results and Deformable Convolutional Networks.

Example

Input:

x shape: \((N, C_{in}, H_{in}, W_{in})\)

weight shape: \((C_{out}, C_{in}, H_f, W_f)\)

offset shape: \((N, 2 * H_f * W_f, H_{out}, W_{out})\)

mask shape: \((N, H_f * W_f, H_{out}, W_{out})\)
Output:

Output shape: \((N, C_{out}, H_{out}, W_{out})\)

Where

\[\begin{split}H_{out}&= \frac{(H_{in} + 2 * paddings[0] - (dilations[0] * (H_f - 1) + 1))}{strides[0]} + 1 \\ W_{out}&= \frac{(W_{in} + 2 * paddings[1] - (dilations[1] * (W_f - 1) + 1))}{strides[1]} + 1\end{split}\]

Parameters

x (Tensor) – The input image with [N, C, H, W] format. A Tensor with type float32, float64.
offset (Tensor) – The input coordinate offset of deformable convolution layer. A Tensor with type float32, float64.
weight (Tensor) – The convolution kernel with shape [M, C/g, kH, kW], where M is the number of output channels, g is the number of groups, kH is the filter’s height, kW is the filter’s width.
bias (Tensor, optional) – The bias with shape [M,]. Default: None.
stride (int|list|tuple, optional) – The stride size. If stride is a list/tuple, it must contain two integers, (stride_H, stride_W). Otherwise, the stride_H = stride_W = stride. Default: 1.
padding (int|list|tuple, optional) – The padding size. If padding is a list/tuple, it must contain two integers, (padding_H, padding_W). Otherwise, the padding_H = padding_W = padding. Default: 0.
dilation (int|list|tuple, optional) – The dilation size. If dilation is a list/tuple, it must contain two integers, (dilation_H, dilation_W). Otherwise, the dilation_H = dilation_W = dilation. Default: 1.
deformable_groups (int) – The number of deformable group partitions. Default: 1.
groups (int, optional) – The groups number of the deformable conv layer. According to grouped convolution in Alex Krizhevsky’s Deep CNN paper: when group=2, the first half of the filters is only connected to the first half of the input channels, while the second half of the filters is only connected to the second half of the input channels. Default: 1.
mask (Tensor, optional) – The input mask of deformable convolution layer. A Tensor with type float32, float64. It should be None when you use deformable convolution v1. Default: None.
name (str|None, optional) – For details, please refer to Name. Generally, no setting is required. Default: None.

Returns

4-D Tensor storing the deformable convolution result.: A Tensor with type float32, float64.

Return type

Tensor

Examples

>>> #deformable conv v2:

>>> import paddle
>>> input = paddle.rand((8, 1, 28, 28))
>>> kh, kw = 3, 3
>>> weight = paddle.rand((16, 1, kh, kw))
>>> # offset shape should be [bs, 2 * kh * kw, out_h, out_w]
>>> # mask shape should be [bs, hw * hw, out_h, out_w]
>>> # In this case, for an input of 28, stride of 1
>>> # and kernel size of 3, without padding, the output size is 26
>>> offset = paddle.rand((8, 2 * kh * kw, 26, 26))
>>> mask = paddle.rand((8, kh * kw, 26, 26))
>>> out = paddle.vision.ops.deform_conv2d(input, offset, weight, mask=mask)
>>> print(out.shape)
[8, 16, 26, 26]

>>> #deformable conv v1:

>>> import paddle
>>> input = paddle.rand((8, 1, 28, 28))
>>> kh, kw = 3, 3
>>> weight = paddle.rand((16, 1, kh, kw))
>>> # offset shape should be [bs, 2 * kh * kw, out_h, out_w]
>>> # In this case, for an input of 28, stride of 1
>>> # and kernel size of 3, without padding, the output size is 26
>>> offset = paddle.rand((8, 2 * kh * kw, 26, 26))
>>> out = paddle.vision.ops.deform_conv2d(input, offset, weight)
>>> print(out.shape)
[8, 16, 26, 26]