generate_mask_labels¶
- paddle.fluid.layers.detection. generate_mask_labels ( im_info, gt_classes, is_crowd, gt_segms, rois, labels_int32, num_classes, resolution ) [source]
-
Generate Mask Labels for Mask-RCNN
This operator can be, for given the RoIs and corresponding labels, to sample foreground RoIs. This mask branch also has a :math: K \times M^{2} dimensional output targets for each foreground RoI, which encodes K binary masks of resolution M x M, one for each of the K classes. This mask targets are used to compute loss of mask branch.
Please note, the data format of groud-truth segmentation, assumed the segmentations are as follows. The first instance has two gt objects. The second instance has one gt object, this object has two gt segmentations.
#[ # [[[229.14, 370.9, 229.14, 370.9, ...]], # [[343.7, 139.85, 349.01, 138.46, ...]]], # 0-th instance # [[[500.0, 390.62, ...],[115.48, 187.86, ...]]] # 1-th instance #] batch_masks = [] for semgs in batch_semgs: gt_masks = [] for semg in semgs: gt_segm = [] for polys in semg: gt_segm.append(np.array(polys).reshape(-1, 2)) gt_masks.append(gt_segm) batch_masks.append(gt_masks) place = fluid.CPUPlace() feeder = fluid.DataFeeder(place=place, feed_list=feeds) feeder.feed(batch_masks)
- Parameters
-
im_info (Variable) – A 2-D Tensor with shape [N, 3] and float32 data type. N is the batch size, each element is [height, width, scale] of image. Image scale is target_size / original_size, target_size is the size after resize, original_size is the original image size.
gt_classes (Variable) – A 2-D LoDTensor with shape [M, 1]. Data type should be int. M is the total number of ground-truth, each element is a class label.
is_crowd (Variable) – A 2-D LoDTensor with same shape and same data type as gt_classes, each element is a flag indicating whether a groundtruth is crowd.
gt_segms (Variable) – This input is a 2D LoDTensor with shape [S, 2] and float32 data type, it’s LoD level is 3. Usually users do not needs to understand LoD, The users should return correct data format in reader. The LoD[0] represents the ground-truth objects number of each instance. LoD[1] represents the segmentation counts of each objects. LoD[2] represents the polygons number of each segmentation. S the total number of polygons coordinate points. Each element is (x, y) coordinate points.
rois (Variable) – A 2-D LoDTensor with shape [R, 4] and float32 data type float32. R is the total number of RoIs, each element is a bounding box with (xmin, ymin, xmax, ymax) format in the range of original image.
labels_int32 (Variable) – A 2-D LoDTensor in shape of [R, 1] with type of int32. R is the same as it in rois. Each element represents a class label of a RoI.
num_classes (int) – Class number.
resolution (int) – Resolution of mask predictions.
- Returns
-
A 2D LoDTensor with shape [P, 4] and same data type as rois. P is the total number of sampled RoIs. Each element is a bounding box with [xmin, ymin, xmax, ymax] format in range of original image size.
mask_rois_has_mask_int32 (Variable): A 2D LoDTensor with shape [P, 1] and int data type, each element represents the output mask RoI index with regard to input RoIs.
mask_int32 (Variable): A 2D LoDTensor with shape [P, K * M * M] and int data type, K is the classes number and M is the resolution of mask predictions. Each element represents the binary mask targets.
- Return type
-
mask_rois (Variable)
Examples
import paddle.fluid as fluid im_info = fluid.data(name="im_info", shape=[None, 3], dtype="float32") gt_classes = fluid.data(name="gt_classes", shape=[None, 1], dtype="float32", lod_level=1) is_crowd = fluid.data(name="is_crowd", shape=[None, 1], dtype="float32", lod_level=1) gt_masks = fluid.data(name="gt_masks", shape=[None, 2], dtype="float32", lod_level=3) # rois, roi_labels can be the output of # fluid.layers.generate_proposal_labels. rois = fluid.data(name="rois", shape=[None, 4], dtype="float32", lod_level=1) roi_labels = fluid.data(name="roi_labels", shape=[None, 1], dtype="int32", lod_level=1) mask_rois, mask_index, mask_int32 = fluid.layers.generate_mask_labels( im_info=im_info, gt_classes=gt_classes, is_crowd=is_crowd, gt_segms=gt_masks, rois=rois, labels_int32=roi_labels, num_classes=81, resolution=14)