retinanet_detection_output¶
- paddle.fluid.layers.detection. retinanet_detection_output ( bboxes, scores, anchors, im_info, score_threshold=0.05, nms_top_k=1000, keep_top_k=100, nms_threshold=0.3, nms_eta=1.0 ) [source]
-
Detection Output Layer for the detector RetinaNet.
In the detector RetinaNet , many FPN levels output the category and location predictions, this OP is to get the detection results by performing following steps:
For each FPN level, decode box predictions according to the anchor boxes from at most
nms_top_k
top-scoring predictions after thresholding detector confidence atscore_threshold
.Merge top predictions from all levels and apply multi-class non maximum suppression (NMS) on them to get the final detections.
- Parameters
-
bboxes (List) – A list of Tensors from multiple FPN levels represents the location prediction for all anchor boxes. Each element is a 3-D Tensor with shape \([N, Mi, 4]\), \(N\) is the batch size, \(Mi\) is the number of bounding boxes from \(i\)-th FPN level and each bounding box has four coordinate values and the layout is [xmin, ymin, xmax, ymax]. The data type of each element is float32 or float64.
scores (List) – A list of Tensors from multiple FPN levels represents the category prediction for all anchor boxes. Each element is a 3-D Tensor with shape \([N, Mi, C]\), \(N\) is the batch size, \(C\) is the class number (excluding background), \(Mi\) is the number of bounding boxes from \(i\)-th FPN level. The data type of each element is float32 or float64.
anchors (List) – A list of Tensors from multiple FPN levels represents the locations of all anchor boxes. Each element is a 2-D Tensor with shape \([Mi, 4]\), \(Mi\) is the number of bounding boxes from \(i\)-th FPN level, and each bounding box has four coordinate values and the layout is [xmin, ymin, xmax, ymax]. The data type of each element is float32 or float64.
im_info (Variable) – A 2-D Tensor with shape \([N, 3]\) represents the size information of input images. \(N\) is the batch size, the size information of each image is a 3-vector which are the height and width of the network input along with the factor scaling the origin image to the network input. The data type of
im_info
is float32.score_threshold (float) – Threshold to filter out bounding boxes with a confidence score before NMS, default value is set to 0.05.
nms_top_k (int) – Maximum number of detections per FPN layer to be kept according to the confidences before NMS, default value is set to 1000.
keep_top_k (int) – Number of total bounding boxes to be kept per image after NMS step. Default value is set to 100, -1 means keeping all bounding boxes after NMS step.
nms_threshold (float) – The Intersection-over-Union(IoU) threshold used to filter out boxes in NMS.
nms_eta (float) – The parameter for adjusting
nms_threshold
in NMS. Default value is set to 1., which represents the value ofnms_threshold
keep the same in NMS. Ifnms_eta
is set to be lower than 1. and the value ofnms_threshold
is set to be higher than 0.5, everytime a bounding box is filtered out, the adjustment fornms_threshold
likenms_threshold
=nms_threshold
*nms_eta
will not be stopped until the actual value ofnms_threshold
is lower than or equal to 0.5.
Notice: In some cases where the image sizes are very small, it’s possible that there is no detection if
score_threshold
are used at all levels. Hence, this OP do not filter out anchors from the highest FPN level before NMS. And the last element inbboxes
:,scores
andanchors
is required to be from the highest FPN level.- Returns
-
The detection output is a 1-level LoDTensor with shape \([No, 6]\). Each row has six values: [label, confidence, xmin, ymin, xmax, ymax]. \(No\) is the total number of detections in this mini-batch. The \(i\)-th image has LoD[i + 1] - LoD[i] detected results, if LoD[i + 1] - LoD[i] is 0, the \(i\)-th image has no detected results. If all images have no detected results, LoD will be set to 0, and the output tensor is empty (None).
- Return type
-
Variable(The data type is float32 or float64)
Examples
import paddle.fluid as fluid bboxes_low = fluid.data( name='bboxes_low', shape=[1, 44, 4], dtype='float32') bboxes_high = fluid.data( name='bboxes_high', shape=[1, 11, 4], dtype='float32') scores_low = fluid.data( name='scores_low', shape=[1, 44, 10], dtype='float32') scores_high = fluid.data( name='scores_high', shape=[1, 11, 10], dtype='float32') anchors_low = fluid.data( name='anchors_low', shape=[44, 4], dtype='float32') anchors_high = fluid.data( name='anchors_high', shape=[11, 4], dtype='float32') im_info = fluid.data( name="im_info", shape=[1, 3], dtype='float32') nmsed_outs = fluid.layers.retinanet_detection_output( bboxes=[bboxes_low, bboxes_high], scores=[scores_low, scores_high], anchors=[anchors_low, anchors_high], im_info=im_info, score_threshold=0.05, nms_top_k=1000, keep_top_k=100, nms_threshold=0.45, nms_eta=1.0)