multiclass_nms¶
- paddle.fluid.layers.detection. multiclass_nms ( bboxes, scores, score_threshold, nms_top_k, keep_top_k, nms_threshold=0.3, normalized=True, nms_eta=1.0, background_label=0, name=None ) [source]
-
Multiclass NMS
This operator is to do multi-class non maximum suppression (NMS) on boxes and scores.
In the NMS step, this operator greedily selects a subset of detection bounding boxes that have high scores larger than score_threshold, if providing this threshold, then selects the largest nms_top_k confidences scores if nms_top_k is larger than -1. Then this operator pruns away boxes that have high IOU (intersection over union) overlap with already selected boxes by adaptive threshold NMS based on parameters of nms_threshold and nms_eta. Aftern NMS step, at most keep_top_k number of total bboxes are to be kept per image if keep_top_k is larger than -1.
See below for an example:
if: box1.data = (2.0, 3.0, 7.0, 5.0) format is (xmin, ymin, xmax, ymax) box1.scores = (0.7, 0.2, 0.4) which is (label0.score=0.7, label1.score=0.2, label2.cores=0.4) box2.data = (3.0, 4.0, 8.0, 5.0) box2.score = (0.3, 0.3, 0.1) nms_threshold = 0.3 background_label = 0 score_threshold = 0 Then: iou = 4/11 > 0.3 out.data = [[1, 0.3, 3.0, 4.0, 8.0, 5.0], [2, 0.4, 2.0, 3.0, 7.0, 5.0]] Out format is (label, confidence, xmin, ymin, xmax, ymax)
- Parameters
-
bboxes (Variable) – Two types of bboxes are supported: 1. (Tensor) A 3-D Tensor with shape [N, M, 4 or 8 16 24 32] represents the predicted locations of M bounding bboxes, N is the batch size. Each bounding box has four coordinate values and the layout is [xmin, ymin, xmax, ymax], when box size equals to 4. The data type is float32 or float64. 2. (LoDTensor) A 3-D Tensor with shape [M, C, 4] M is the number of bounding boxes, C is the class number. The data type is float32 or float64.
scores (Variable) – Two types of scores are supported: 1. (Tensor) A 3-D Tensor with shape [N, C, M] represents the predicted confidence predictions. N is the batch size, C is the class number, M is number of bounding boxes. For each category there are total M scores which corresponding M bounding boxes. Please note, M is equal to the 2nd dimension of BBoxes.The data type is float32 or float64. 2. (LoDTensor) A 2-D LoDTensor with shape [M, C]. M is the number of bbox, C is the class number. In this case, input BBoxes should be the second case with shape [M, C, 4].The data type is float32 or float64.
background_label (int) – The index of background label, the background label will be ignored. If set to -1, then all categories will be considered. Default: 0
score_threshold (float) – Threshold to filter out bounding boxes with low confidence score. If not provided, consider all boxes.
nms_top_k (int) – Maximum number of detections to be kept according to the confidences after the filtering detections based on score_threshold.
nms_threshold (float) – The threshold to be used in NMS. Default: 0.3
nms_eta (float) – The threshold to be used in NMS. Default: 1.0
keep_top_k (int) – Number of total bboxes to be kept per image after NMS step. -1 means keeping all bboxes after NMS step.
normalized (bool) – Whether detections are normalized. Default: True
name (str) – Name of the multiclass nms op. Default: None.
- Returns
-
- A 2-D LoDTensor with shape [No, 6] represents the detections.
-
Each row has 6 values: [label, confidence, xmin, ymin, xmax, ymax] or A 2-D LoDTensor with shape [No, 10] represents the detections. Each row has 10 values: [label, confidence, x1, y1, x2, y2, x3, y3, x4, y4]. No is the total number of detections. If there is no detected boxes for all images, lod will be set to {1} and Out only contains one value which is -1. (After version 1.3, when no boxes detected, the lod is changed from {0} to {1})
- Return type
-
Variable
Examples
import paddle.fluid as fluid import paddle paddle.enable_static() boxes = fluid.data(name='bboxes', shape=[None,81, 4], dtype='float32', lod_level=1) scores = fluid.data(name='scores', shape=[None,81], dtype='float32', lod_level=1) out = fluid.layers.multiclass_nms(bboxes=boxes, scores=scores, background_label=0, score_threshold=0.5, nms_top_k=400, nms_threshold=0.3, keep_top_k=200, normalized=False)