dropout¶
- paddle.nn.functional. dropout ( x, p=0.5, axis=None, training=True, mode='upscale_in_train', name=None ) [source]
-
Dropout is a regularization technique for reducing overfitting by preventing neuron co-adaption during training. The dropout operator randomly sets the outputs of some units to zero, while upscale others according to the given dropout probability.
- Parameters
-
x (Tensor) – The input tensor. The data type is float16, float32 or float64.
p (float|int, optional) – Probability of setting units to zero. Default: 0.5.
axis (int|list|tuple, optional) – The axis along which the dropout is performed. Default: None.
training (bool, optional) – A flag indicating whether it is in train phrase or not. Default: True.
mode (str, optional) –
[‘upscale_in_train’(default) | ‘downscale_in_infer’].
upscale_in_train (default), upscale the output at training time
train: \(out = input \times \frac{mask}{(1.0 - dropout\_prob)}\)
inference: \(out = input\)
downscale_in_infer, downscale the output at inference
train: \(out = input \times mask\)
inference: \(out = input \times (1.0 - dropout\_prob)\)
name (str, optional) – Name for the operation, Default: None. For more information, please refer to Name.
- Returns
-
A Tensor representing the dropout, has same shape and data type as x .
Examples
We use
p=0.5
in the following description for simplicity.When
axis=None
, this is commonly used dropout, which dropout each element of x randomly.
Let's see a simple case when x is a 2d tensor with shape 2*3: [[1 2 3] [4 5 6]] we generate mask with the same shape as x, which is 2*3. The value of mask is sampled from a Bernoulli distribution randomly. For example, we may get such mask: [[0 1 0] [1 0 1]] So the output is obtained from elementwise multiply of x and mask: [[0 2 0] [4 0 6]] Using default setting, i.e. ``mode='upscale_in_train'`` , if in training phase, the final upscale output is: [[0 4 0 ] [8 0 12]] if in test phase, the output is the same as input: [[1 2 3] [4 5 6]] we can also set ``mode='downscale_in_infer'`` , then if in training phase, the final output is: [[0 2 0] [4 0 6]] if in test phase, the scale output is: [[0.5 1. 1.5] [2. 2.5 3. ]]
When
axis!=None
, this is useful for dropping whole channels from an image or sequence.
Let's see the simple case when x is a 2d tensor with shape 2*3 again: [[1 2 3] [4 5 6]] (1) If ``axis=0`` , this means the dropout is only performed in axis `0` . we generate mask with the shape 2*1. Only in axis `0` the value is randomly selected. For example, we may get such mask: [[1] [0]] The output is obtained from elementwise multiply of x and mask. Doing that the mask will be broadcast from 2*1 to 2*3: [[1 1 1] [0 0 0]] and the result after elementwise multiply is: [[1 2 3] [0 0 0]] then we can do upscale or downscale according to the setting of other arguments. (2) If ``axis=1`` , this means the dropout is only performed in axis `1` . we generate mask with the shape 1*3. Only in axis `1` the value is randomly selected. For example, we may get such mask: [[1 0 1]] Doing elementwise multiply the mask will be broadcast from 1*3 to 2*3: [[1 0 1] [1 0 1]] and the result after elementwise multiply is: [[1 0 3] [4 0 6]] (3) What about ``axis=[0, 1]`` ? This means the dropout is performed in all axes of x, which is the same case as default setting ``axis=None`` . (4) You may note that logically `axis=None` means the dropout is performed in none axis of x, We generate mask with the shape 1*1. Whole input is randomly selected or dropped. For example, we may get such mask: [[0]] Doing elementwise multiply the mask will be broadcast from 1*1 to 2*3: [[0 0 0] [0 0 0]] and the result after elementwise multiply is: [[0 0 0] [0 0 0]] Actually this is not what we want because all elements may set to zero~
When x is a 4d tensor with shape NCHW, where N is batch size, C is the number of channels, H and W are the height and width of the feature, we can set
axis=[0,1]
and the dropout will be performed in channel N and C, H and W is tied, i.e. paddle.nn.dropout(x, p, axis=[0,1]) . Please refer topaddle.nn.functional.dropout2d
for more details. Similarly, when x is a 5d tensor with shape NCDHW, where D is the depth of the feature, we can setaxis=[0,1]
to perform dropout3d. Please refer topaddle.nn.functional.dropout3d
for more details.>>> import paddle >>> paddle.seed(2023) >>> x = paddle.to_tensor([[1,2,3], [4,5,6]]).astype(paddle.float32) >>> y_train = paddle.nn.functional.dropout(x, 0.5) >>> y_test = paddle.nn.functional.dropout(x, 0.5, training=False) >>> y_0 = paddle.nn.functional.dropout(x, axis=0) >>> y_1 = paddle.nn.functional.dropout(x, axis=1) >>> y_01 = paddle.nn.functional.dropout(x, axis=[0,1]) >>> print(x) Tensor(shape=[2, 3], dtype=float32, place=Place(cpu), stop_gradient=True, [[1., 2., 3.], [4., 5., 6.]]) >>> print(y_train) Tensor(shape=[2, 3], dtype=float32, place=Place(cpu), stop_gradient=True, [[2., 4., 0.], [8., 0., 0.]]) >>> print(y_test) Tensor(shape=[2, 3], dtype=float32, place=Place(cpu), stop_gradient=True, [[1., 2., 3.], [4., 5., 6.]]) >>> print(y_0) Tensor(shape=[2, 3], dtype=float32, place=Place(cpu), stop_gradient=True, [[2., 4., 6.], [8. , 10., 12.]]) >>> print(y_1) Tensor(shape=[2, 3], dtype=float32, place=Place(cpu), stop_gradient=True, [[2. , 4. , 6. ], [8. , 10., 12.]]) >>> print(y_01) Tensor(shape=[2, 3], dtype=float32, place=Place(cpu), stop_gradient=True, [[0., 0., 6.], [0., 0., 0.]])