retinanet_detection_output¶

paddle.fluid.layers.detection. retinanet_detection_output ( bboxes, scores, anchors, im_info, score_threshold=0.05, nms_top_k=1000, keep_top_k=100, nms_threshold=0.3, nms_eta=1.0 ) [source]

Detection Output Layer for the detector RetinaNet.

In the detector RetinaNet , many FPN levels output the category and location predictions, this OP is to get the detection results by performing following steps:

For each FPN level, decode box predictions according to the anchor boxes from at most nms_top_k top-scoring predictions after thresholding detector confidence at score_threshold.
Merge top predictions from all levels and apply multi-class non maximum suppression (NMS) on them to get the final detections.

Parameters

bboxes (List) – A list of Tensors from multiple FPN levels represents the location prediction for all anchor boxes. Each element is a 3-D Tensor with shape $[N, Mi, 4]$ , $N$ is the batch size, $Mi$ is the number of bounding boxes from $i$ -th FPN level and each bounding box has four coordinate values and the layout is [xmin, ymin, xmax, ymax]. The data type of each element is float32 or float64.
scores (List) – A list of Tensors from multiple FPN levels represents the category prediction for all anchor boxes. Each element is a 3-D Tensor with shape $[N, Mi, C]$ , $N$ is the batch size, $C$ is the class number (excluding background), $Mi$ is the number of bounding boxes from $i$ -th FPN level. The data type of each element is float32 or float64.
anchors (List) – A list of Tensors from multiple FPN levels represents the locations of all anchor boxes. Each element is a 2-D Tensor with shape $[Mi, 4]$ , $Mi$ is the number of bounding boxes from $i$ -th FPN level, and each bounding box has four coordinate values and the layout is [xmin, ymin, xmax, ymax]. The data type of each element is float32 or float64.
im_info (Variable) – A 2-D Tensor with shape $[N, 3]$ represents the size information of input images. $N$ is the batch size, the size information of each image is a 3-vector which are the height and width of the network input along with the factor scaling the origin image to the network input. The data type of im_info is float32.
score_threshold (float) – Threshold to filter out bounding boxes with a confidence score before NMS, default value is set to 0.05.
nms_top_k (int) – Maximum number of detections per FPN layer to be kept according to the confidences before NMS, default value is set to 1000.
keep_top_k (int) – Number of total bounding boxes to be kept per image after NMS step. Default value is set to 100, -1 means keeping all bounding boxes after NMS step.
nms_threshold (float) – The Intersection-over-Union(IoU) threshold used to filter out boxes in NMS.
nms_eta (float) – The parameter for adjusting nms_threshold in NMS. Default value is set to 1., which represents the value of nms_threshold keep the same in NMS. If nms_eta is set to be lower than 1. and the value of nms_threshold is set to be higher than 0.5, everytime a bounding box is filtered out, the adjustment for nms_threshold like nms_threshold = nms_threshold * nms_eta will not be stopped until the actual value of nms_threshold is lower than or equal to 0.5.

Notice: In some cases where the image sizes are very small, it’s possible that there is no detection if score_threshold are used at all levels. Hence, this OP do not filter out anchors from the highest FPN level before NMS. And the last element in bboxes:, scores and anchors is required to be from the highest FPN level.

Returns: The detection output is a 1-level LoDTensor with shape $[No, 6]$ . Each row has six values: [label, confidence, xmin, ymin, xmax, ymax]. $No$ is the total number of detections in this mini-batch. The $i$ -th image has LoD[i + 1] - LoD[i] detected results, if LoD[i + 1] - LoD[i] is 0, the $i$ -th image has no detected results. If all images have no detected results, LoD will be set to 0, and the output tensor is empty (None).
Return type: Variable(The data type is float32 or float64)

Examples

import paddle.fluid as fluid

bboxes_low = fluid.data(
    name='bboxes_low', shape=[1, 44, 4], dtype='float32')
bboxes_high = fluid.data(
    name='bboxes_high', shape=[1, 11, 4], dtype='float32')
scores_low = fluid.data(
    name='scores_low', shape=[1, 44, 10], dtype='float32')
scores_high = fluid.data(
    name='scores_high', shape=[1, 11, 10], dtype='float32')
anchors_low = fluid.data(
    name='anchors_low', shape=[44, 4], dtype='float32')
anchors_high = fluid.data(
    name='anchors_high', shape=[11, 4], dtype='float32')
im_info = fluid.data(
    name="im_info", shape=[1, 3], dtype='float32')
nmsed_outs = fluid.layers.retinanet_detection_output(
    bboxes=[bboxes_low, bboxes_high],
    scores=[scores_low, scores_high],
    anchors=[anchors_low, anchors_high],
    im_info=im_info,
    score_threshold=0.05,
    nms_top_k=1000,
    keep_top_k=100,
    nms_threshold=0.45,
    nms_eta=1.0)