margin_cross_entropy¶
- paddle.nn.functional. margin_cross_entropy ( logits, label, margin1=1.0, margin2=0.5, margin3=0.0, scale=64.0, group=None, return_softmax=False, reduction='mean' ) [source]
-
\[L=-\frac{1}{N}\sum^N_{i=1}\log\frac{e^{s(cos(m_{1}\theta_{y_i}+m_{2})-m_{3})}}{e^{s(cos(m_{1}\theta_{y_i}+m_{2})-m_{3})}+\sum^n_{j=1,j\neq y_i} e^{scos\theta_{y_i}}}\]
where the \(\theta_{y_i}\) is the angle between the feature \(x\) and the representation of class \(i\). The details of ArcFace loss could be referred to https://arxiv.org/abs/1801.07698.
Hint
The API supports single GPU and multi GPU, and don’t supports CPU. For data parallel mode, set
group=False
. For model parallel mode, setgroup=None
or the group instance return by paddle.distributed.new_group. And logits.shape[-1] can be different at each rank.- Parameters
-
logits (Tensor) – shape[N, local_num_classes], the output of the normalized X multiply the normalized W. The logits is shard_logits when using model parallel.
label (Tensor) – shape[N] or shape[N, 1], the ground truth label.
margin1 (float, optional) – m1 of margin loss, default value is 1.0.
margin2 (float, optional) – m2 of margin loss, default value is 0.5.
margin3 (float, optional) – m3 of margin loss, default value is 0.0.
scale (float, optional) – s of margin loss, default value is 64.0.
group (Group, optional) – The group instance return by paddle.distributed.new_group or
None
for global default group orFalse
for data parallel (do not communication cross ranks). Default isNone
.return_softmax (bool, optional) – Whether return softmax probability. Default value is False.
reduction (str, optional) – The candidates are
'none'
|'mean'
|'sum'
. Ifreduction
is'mean'
, return the average of loss; Ifreduction
is'sum'
, return the sum of loss; Ifreduction
is'none'
, no reduction will be applied. Default value is ‘mean’.
- Returns
-
- Tensor|tuple[Tensor, Tensor], return the cross entropy loss if
-
return_softmax is False, otherwise the tuple (loss, softmax), softmax is shard_softmax when using model parallel, otherwise softmax is in the same shape with input logits. If
reduction == None
, the shape of loss is[N, 1]
, otherwise the shape is[]
.
Examples
>>> >>> import paddle >>> paddle.seed(2023) >>> paddle.device.set_device('gpu') >>> m1 = 1.0 >>> m2 = 0.5 >>> m3 = 0.0 >>> s = 64.0 >>> batch_size = 2 >>> feature_length = 4 >>> num_classes = 4 >>> label = paddle.randint(low=0, high=num_classes, shape=[batch_size], dtype='int64') >>> X = paddle.randn( ... shape=[batch_size, feature_length], ... dtype='float64') >>> X_l2 = paddle.sqrt(paddle.sum(paddle.square(X), axis=1, keepdim=True)) >>> X = paddle.divide(X, X_l2) >>> W = paddle.randn( ... shape=[feature_length, num_classes], ... dtype='float64') >>> W_l2 = paddle.sqrt(paddle.sum(paddle.square(W), axis=0, keepdim=True)) >>> W = paddle.divide(W, W_l2) >>> logits = paddle.matmul(X, W) >>> loss, softmax = paddle.nn.functional.margin_cross_entropy( ... logits, label, margin1=m1, margin2=m2, margin3=m3, scale=s, return_softmax=True, reduction=None) >>> print(logits) Tensor(shape=[2, 4], dtype=float64, place=Place(gpu:0), stop_gradient=True, [[-0.59561850, 0.32797505, 0.80279214, 0.00144975], [-0.16265212, 0.84155098, 0.62008629, 0.79126072]]) >>> print(label) Tensor(shape=[2], dtype=int64, place=Place(gpu:0), stop_gradient=True, [1, 0]) >>> print(loss) Tensor(shape=[2, 1], dtype=float64, place=Place(gpu:0), stop_gradient=True, [[61.94391901], [93.30853839]]) >>> print(softmax) Tensor(shape=[2, 4], dtype=float64, place=Place(gpu:0), stop_gradient=True, [[0.00000000, 0.00000000, 1. , 0.00000000], [0.00000000, 0.96152676, 0.00000067, 0.03847257]])
>>> >>> # Multi GPU, test_margin_cross_entropy.py >>> import paddle >>> import paddle.distributed as dist >>> paddle.seed(2023) >>> strategy = dist.fleet.DistributedStrategy() >>> dist.fleet.init(is_collective=True, strategy=strategy) >>> rank_id = dist.get_rank() >>> m1 = 1.0 >>> m2 = 0.5 >>> m3 = 0.0 >>> s = 64.0 >>> batch_size = 2 >>> feature_length = 4 >>> num_class_per_card = [4, 8] >>> num_classes = paddle.sum(paddle.to_tensor(num_class_per_card)) >>> label = paddle.randint(low=0, high=num_classes.item(), shape=[batch_size], dtype='int64') >>> label_list = [] >>> dist.all_gather(label_list, label) >>> label = paddle.concat(label_list, axis=0) >>> X = paddle.randn( ... shape=[batch_size, feature_length], ... dtype='float64') >>> X_list = [] >>> dist.all_gather(X_list, X) >>> X = paddle.concat(X_list, axis=0) >>> X_l2 = paddle.sqrt(paddle.sum(paddle.square(X), axis=1, keepdim=True)) >>> X = paddle.divide(X, X_l2) >>> W = paddle.randn( ... shape=[feature_length, num_class_per_card[rank_id]], ... dtype='float64') >>> W_l2 = paddle.sqrt(paddle.sum(paddle.square(W), axis=0, keepdim=True)) >>> W = paddle.divide(W, W_l2) >>> logits = paddle.matmul(X, W) >>> loss, softmax = paddle.nn.functional.margin_cross_entropy( ... logits, label, margin1=m1, margin2=m2, margin3=m3, scale=s, return_softmax=True, reduction=None) >>> print(logits) >>> print(label) >>> print(loss) >>> print(softmax) >>> # python -m paddle.distributed.launch --gpus=0,1 --log_dir log test_margin_cross_entropy.py >>> # cat log/workerlog.0 >>> # Tensor(shape=[4, 4], dtype=float64, place=Place(gpu:0), stop_gradient=True, >>> # [[-0.59561850, 0.32797505, 0.80279214, 0.00144975], >>> # [-0.16265212, 0.84155098, 0.62008629, 0.79126072], >>> # [-0.59561850, 0.32797505, 0.80279214, 0.00144975], >>> # [-0.16265212, 0.84155098, 0.62008629, 0.79126072]]) >>> # Tensor(shape=[4], dtype=int64, place=Place(gpu:0), stop_gradient=True, >>> # [5, 4, 5, 4]) >>> # Tensor(shape=[4, 1], dtype=float64, place=Place(gpu:0), stop_gradient=True, >>> # [[104.27437027], >>> # [113.40243782], >>> # [104.27437027], >>> # [113.40243782]]) >>> # Tensor(shape=[4, 4], dtype=float64, place=Place(gpu:0), stop_gradient=True, >>> # [[0.00000000, 0.00000000, 0.01210039, 0.00000000], >>> # [0.00000000, 0.96152674, 0.00000067, 0.03847257], >>> # [0.00000000, 0.00000000, 0.01210039, 0.00000000], >>> # [0.00000000, 0.96152674, 0.00000067, 0.03847257]]) >>> # cat log/workerlog.1 >>> # Tensor(shape=[4, 8], dtype=float64, place=Place(gpu:1), stop_gradient=True, >>> # [[-0.34913275, -0.35180883, -0.53976657, -0.75234331, 0.70534995, >>> # 0.87157838, 0.31064437, 0.19537700], >>> # [-0.63941012, -0.05631600, -0.02561853, 0.09363013, 0.56571130, >>> # 0.13611246, 0.08849565, 0.39219619], >>> # [-0.34913275, -0.35180883, -0.53976657, -0.75234331, 0.70534995, >>> # 0.87157838, 0.31064437, 0.19537700], >>> # [-0.63941012, -0.05631600, -0.02561853, 0.09363013, 0.56571130, >>> # 0.13611246, 0.08849565, 0.39219619]]) >>> # Tensor(shape=[4], dtype=int64, place=Place(gpu:1), stop_gradient=True, >>> # [5, 4, 5, 4]) >>> # Tensor(shape=[4, 1], dtype=float64, place=Place(gpu:1), stop_gradient=True, >>> # [[104.27437027], >>> # [113.40243782], >>> # [104.27437027], >>> # [113.40243782]]) >>> # Tensor(shape=[4, 8], dtype=float64, place=Place(gpu:1), stop_gradient=True, >>> # [[0.00000000, 0.00000000, 0.00000000, 0.00000000, 0.00002368, 0.98787593, >>> # 0.00000000, 0.00000000], >>> # [0.00000000, 0.00000000, 0.00000000, 0.00000000, 0.00000002, 0.00000000, >>> # 0.00000000, 0.00000000], >>> # [0.00000000, 0.00000000, 0.00000000, 0.00000000, 0.00002368, 0.98787593, >>> # 0.00000000, 0.00000000], >>> # [0.00000000, 0.00000000, 0.00000000, 0.00000000, 0.00000002, 0.00000000, >>> # 0.00000000, 0.00000000]])