generate_proposals

paddle.vision.ops. generate_proposals ( scores: Tensor, bbox_deltas: Tensor, img_size: Tensor, anchors: Tensor, variances: Tensor, pre_nms_top_n: float = 6000, post_nms_top_n: float = 1000, nms_thresh: float = 0.5, min_size: float = 0.1, eta: float = 1.0, pixel_offset: bool = False, return_rois_num: Literal[True] = False, name: str | None = None ) → tuple[Tensor, Tensor, Tensor] [source]

paddle.vision.ops. generate_proposals ( scores: Tensor, bbox_deltas: Tensor, img_size: Tensor, anchors: Tensor, variances: Tensor, pre_nms_top_n: float = 6000, post_nms_top_n: float = 1000, nms_thresh: float = 0.5, min_size: float = 0.1, eta: float = 1.0, pixel_offset: bool = False, return_rois_num: Literal[False] = False, name: str | None = None ) → tuple[Tensor, Tensor, None]

paddle.vision.ops. generate_proposals ( scores: Tensor, bbox_deltas: Tensor, img_size: Tensor, anchors: Tensor, variances: Tensor, pre_nms_top_n: float = 6000, post_nms_top_n: float = 1000, nms_thresh: float = 0.5, min_size: float = 0.1, eta: float = 1.0, pixel_offset: bool = False, return_rois_num: bool = False, name: str | None = None ) → tuple[Tensor, Tensor, Tensor | None]

This operation proposes RoIs according to each box with their probability to be a foreground object. And the proposals of RPN output are calculated by anchors, bbox_deltas and scores. Final proposals could be used to train detection net.

For generating proposals, this operation performs following steps:

Transpose and resize scores and bbox_deltas in size of (H * W * A, 1) and (H * W * A, 4)
Calculate box locations as proposals candidates.
Clip boxes to image
Remove predicted boxes with small area.
Apply non-maximum suppression (NMS) to get final proposals as output.

Parameters

scores (Tensor) – A 4-D Tensor with shape [N, A, H, W] represents the probability for each box to be an object. N is batch size, A is number of anchors, H and W are height and width of the feature map. The data type must be float32.
bbox_deltas (Tensor) – A 4-D Tensor with shape [N, 4*A, H, W] represents the difference between predicted box location and anchor location. The data type must be float32.
img_size (Tensor) – A 2-D Tensor with shape [N, 2] represents origin image shape information for N batch, including height and width of the input sizes. The data type can be float32 or float64.
anchors (Tensor) – A 4-D Tensor represents the anchors with a layout of [H, W, A, 4]. H and W are height and width of the feature map, num_anchors is the box count of each position. Each anchor is in (xmin, ymin, xmax, ymax) format an unnormalized. The data type must be float32.
variances (Tensor) – A 4-D Tensor. The expanded variances of anchors with a layout of [H, W, num_priors, 4]. Each variance is in (xcenter, ycenter, w, h) format. The data type must be float32.
pre_nms_top_n (float, optional) – Number of total bboxes to be kept per image before NMS. 6000 by default.
post_nms_top_n (float, optional) – Number of total bboxes to be kept per image after NMS. 1000 by default.
nms_thresh (float, optional) – Threshold in NMS. The data type must be float32. 0.5 by default.
min_size (float, optional) – Remove predicted boxes with either height or width less than this value. 0.1 by default.
eta (float, optional) – Apply in adaptive NMS, only works if adaptive threshold > 0.5, adaptive_threshold = adaptive_threshold * eta in each iteration. 1.0 by default.
pixel_offset (bool, optional) – Whether there is pixel offset. If True, the offset of img_size will be 1. ‘False’ by default.
return_rois_num (bool, optional) – Whether to return rpn_rois_num . When setting True, it will return a 1D Tensor with shape [N, ] that includes Rois’s num of each image in one batch. ‘False’ by default.
name (str|None, optional) – For detailed information, please refer to Name. Usually name is no need to set and None by default.

Returns

The generated RoIs. 2-D Tensor with shape [N, 4] while N is the number of RoIs. The data type is the same as scores. - rpn_roi_probs (Tensor): The scores of generated RoIs. 2-D Tensor with shape [N, 1] while N is the number of RoIs. The data type is the same as scores. - rpn_rois_num (Tensor): Rois’s num of each image in one batch. 1-D Tensor with shape [B,] while B is the batch size. And its sum equals to RoIs number N .

Return type

rpn_rois (Tensor)

Examples

>>> import paddle
>>> paddle.seed(2023)

>>> scores = paddle.rand((2,4,5,5), dtype=paddle.float32)
>>> bbox_deltas = paddle.rand((2, 16, 5, 5), dtype=paddle.float32)
>>> img_size = paddle.to_tensor([[224.0, 224.0], [224.0, 224.0]])
>>> anchors = paddle.rand((2,5,4,4), dtype=paddle.float32)
>>> variances = paddle.rand((2,5,10,4), dtype=paddle.float32)
>>> rois, roi_probs, roi_nums = paddle.vision.ops.generate_proposals(scores, bbox_deltas,
...                 img_size, anchors, variances, return_rois_num=True)
>>> 
>>> print(rois, roi_probs, roi_nums)
Tensor(shape=[2, 4], dtype=float32, place=Place(cpu), stop_gradient=True,
[[0., 0., 0., 0.],
 [0., 0., 0., 0.]])
Tensor(shape=[2, 1], dtype=float32, place=Place(cpu), stop_gradient=True,
[[0.],
 [0.]])
Tensor(shape=[2], dtype=int32, place=Place(cpu), stop_gradient=True,
[1, 1])