margin_cross_entropy¶
- paddle.nn.functional. margin_cross_entropy ( logits, label, margin1=1.0, margin2=0.5, margin3=0.0, scale=64.0, group=None, return_softmax=False, reduction='mean' ) [源代码] ¶
其中,\(\theta_{y_i}\) 是特征 \(x\) 与类 \(w_{i}\) 的角度。更详细的介绍请参考 Arcface loss
,https://arxiv.org/abs/1801.07698 。
提示:
这个 API 支持单卡,也支持多卡(模型并行),使用模型并行时,
logits.shape[-1]
在每张卡上可以不同。
参数¶
logits (Tensor) - 2-D Tensor,维度为
[N, local_num_classes]
,logits
为归一化后的X
与归一化后的W
矩阵乘得到,数据类型为 float16,float32 或者 float64。如果用了模型并行,则logits == sahrd_logits
。label (Tensor) - 维度为
[N]
或者[N, 1]
的标签。margin1 (float,可选) - 公式中的
m1
。默认值为1.0
。margin2 (float,可选) - 公式中的
m2
。默认值为0.5
。margin3 (float,可选) - 公式中的
m3
。默认值为0.0
。scale (float,可选) - 公式中的
s
。默认值为64.0
。group (Group,可选) - 通信组的抽象描述,具体可以参考
paddle.distributed.collective.Group
。默认值为None
。return_softmax (bool,可选) - 是否返回
softmax
概率值。默认值为None
。reduction (str,可选)- 是否对
loss
进行归约。可选值为'none'
|'mean'
|'sum'
。如果reduction='mean'
,则对loss
进行平均,如果reduction='sum'
,则对loss
进行求和,reduction='None'
,则直接返回loss
。默认值为'mean'
。
返回¶
Tensor
(loss
) 或者Tensor
二元组 (loss
,softmax
) - 如果return_softmax=False
返回loss
,否则返回 (loss
,softmax
)。当使用模型并行时softmax == shard_softmax
,否则softmax
的维度与logits
相同。如果reduction == None
,loss
的维度为[N, 1]
,否则为[]
。
代码示例¶
# required: gpu
# Single GPU
import paddle
m1 = 1.0
m2 = 0.5
m3 = 0.0
s = 64.0
batch_size = 2
feature_length = 4
num_classes = 4
label = paddle.randint(low=0, high=num_classes, shape=[batch_size], dtype='int64')
X = paddle.randn(
shape=[batch_size, feature_length],
dtype='float64')
X_l2 = paddle.sqrt(paddle.sum(paddle.square(X), axis=1, keepdim=True))
X = paddle.divide(X, X_l2)
W = paddle.randn(
shape=[feature_length, num_classes],
dtype='float64')
W_l2 = paddle.sqrt(paddle.sum(paddle.square(W), axis=0, keepdim=True))
W = paddle.divide(W, W_l2)
logits = paddle.matmul(X, W)
loss, softmax = paddle.nn.functional.margin_cross_entropy(
logits, label, margin1=m1, margin2=m2, margin3=m3, scale=s, return_softmax=True, reduction=None)
print(logits)
print(label)
print(loss)
print(softmax)
#Tensor(shape=[2, 4], dtype=float64, place=CUDAPlace(0), stop_gradient=True,
# [[ 0.85204151, -0.55557678, 0.04994566, 0.71986042],
# [-0.20198586, -0.35270476, -0.55182702, 0.09749021]])
#Tensor(shape=[2], dtype=int64, place=CUDAPlace(0), stop_gradient=True,
# [2, 3])
#Tensor(shape=[2, 1], dtype=float64, place=CUDAPlace(0), stop_gradient=True,
# [[82.37059586],
# [12.13448420]])
#Tensor(shape=[2, 4], dtype=float64, place=CUDAPlace(0), stop_gradient=True,
# [[0.99978819, 0.00000000, 0.00000000, 0.00021181],
# [0.99992995, 0.00006468, 0.00000000, 0.00000537]])
# required: distributed
# Multi GPU, test_margin_cross_entropy.py
import paddle
import paddle.distributed as dist
strategy = dist.fleet.DistributedStrategy()
dist.fleet.init(is_collective=True, strategy=strategy)
rank_id = dist.get_rank()
m1 = 1.0
m2 = 0.5
m3 = 0.0
s = 64.0
batch_size = 2
feature_length = 4
num_class_per_card = [4, 8]
num_classes = paddle.sum(paddle.to_tensor(num_class_per_card))
label = paddle.randint(low=0, high=num_classes.item(), shape=[batch_size], dtype='int64')
label_list = []
dist.all_gather(label_list, label)
label = paddle.concat(label_list, axis=0)
X = paddle.randn(
shape=[batch_size, feature_length],
dtype='float64')
X_list = []
dist.all_gather(X_list, X)
X = paddle.concat(X_list, axis=0)
X_l2 = paddle.sqrt(paddle.sum(paddle.square(X), axis=1, keepdim=True))
X = paddle.divide(X, X_l2)
W = paddle.randn(
shape=[feature_length, num_class_per_card[rank_id]],
dtype='float64')
W_l2 = paddle.sqrt(paddle.sum(paddle.square(W), axis=0, keepdim=True))
W = paddle.divide(W, W_l2)
logits = paddle.matmul(X, W)
loss, softmax = paddle.nn.functional.margin_cross_entropy(
logits, label, margin1=m1, margin2=m2, margin3=m3, scale=s, return_softmax=True, reduction=None)
print(logits)
print(label)
print(loss)
print(softmax)
# python -m paddle.distributed.launch --gpus=0,1 test_margin_cross_entropy.py
## for rank0 input
#Tensor(shape=[4, 4], dtype=float64, place=CUDAPlace(0), stop_gradient=True,
# [[ 0.32888934, 0.02408748, -0.02763289, 0.18173063],
# [-0.52893978, -0.10623845, -0.21596515, -0.06432517],
# [-0.00536345, -0.03924667, 0.66735314, -0.28640926],
# [-0.09907366, -0.48534973, -0.10365338, -0.39472322]])
#Tensor(shape=[4], dtype=int64, place=CUDAPlace(0), stop_gradient=True,
# [11, 1 , 10, 11])
## for rank1 input
#Tensor(shape=[4, 8], dtype=float64, place=CUDAPlace(1), stop_gradient=True,
# [[ 0.68654754, 0.28137170, 0.69694954, -0.60923933, -0.57077653, 0.54576703, -0.38709028, 0.56028204],
# [-0.80360371, -0.03042448, -0.45107338, 0.49559349, 0.69998950, -0.45411693, 0.61927630, -0.82808600],
# [ 0.11457570, -0.34785879, -0.68819499, -0.26189226, -0.48241491, -0.67685711, 0.06510185, 0.49660849],
# [ 0.31604851, 0.52087884, 0.53124749, -0.86176582, -0.43426329, 0.34786144, -0.10850784, 0.51566383]])
#Tensor(shape=[4], dtype=int64, place=CUDAPlace(1), stop_gradient=True,
# [11, 1 , 10, 11])
## for rank0 output
#Tensor(shape=[4, 1], dtype=float64, place=CUDAPlace(0), stop_gradient=True,
# [[38.96608230],
# [81.28152394],
# [69.67229865],
# [31.74197251]])
#Tensor(shape=[4, 4], dtype=float64, place=CUDAPlace(0), stop_gradient=True,
# [[0.00000000, 0.00000000, 0.00000000, 0.00000000],
# [0.00000000, 0.00000000, 0.00000000, 0.00000000],
# [0.00000000, 0.00000000, 0.99998205, 0.00000000],
# [0.00000000, 0.00000000, 0.00000000, 0.00000000]])
## for rank1 output
#Tensor(shape=[4, 1], dtype=float64, place=CUDAPlace(1), stop_gradient=True,
# [[38.96608230],
# [81.28152394],
# [69.67229865],
# [31.74197251]])
#Tensor(shape=[4, 8], dtype=float64, place=CUDAPlace(1), stop_gradient=True,
# [[0.33943993, 0.00000000, 0.66051859, 0.00000000, 0.00000000, 0.00004148, 0.00000000, 0.00000000],
# [0.00000000, 0.00000000, 0.00000000, 0.00000207, 0.99432097, 0.00000000, 0.00567696, 0.00000000],
# [0.00000000, 0.00000000, 0.00000000, 0.00000000, 0.00000000, 0.00000000, 0.00000000, 0.00001795],
# [0.00000069, 0.33993085, 0.66006319, 0.00000000, 0.00000000, 0.00000528, 0.00000000, 0.00000000]])