Embedding

class paddle.nn. Embedding ( num_embeddings: int, embedding_dim: int, padding_idx: float | None = None, max_norm: float | None = None, norm_type: float = 2.0, sparse: bool = False, weight_attr: ParamAttrLike | None = None, scale_grad_by_freq: bool = False, name: str | None = None ) [source]

Embedding Layer, used to construct a callable object of the Embedding class. For specific usage, refer to code examples. It implements the function of the Embedding Layer. This layer is used to lookup embeddings vector of ids provided by x . It automatically constructs a 2D embedding matrix based on the input num_embeddings and embedding_dim.

The shape of output Tensor is generated by appending an emb_size dimension to the last dimension of the input Tensor shape.

Note

The id in x must satisfy \(0 <= id < num_embeddings\) , otherwise the program will throw an exception and exit.

Case 1:

x is a Tensor. padding_idx = -1
    x.data = [[1, 3], [2, 4], [4, 127]
    x.shape = [3, 2]
Given size = [128, 16]
output is a Tensor:
    out.shape = [3, 2, 16]
    out.data = [[[0.129435295, 0.244512452, ..., 0.436322452],
                 [0.345421456, 0.524563927, ..., 0.144534654]],
                [[0.345249859, 0.124939536, ..., 0.194353745],
                 [0.945345345, 0.435394634, ..., 0.435345365]],
                [[0.945345345, 0.435394634, ..., 0.435345365],
                 [0.0,         0.0,         ..., 0.0        ]]]  # padding data
The input padding_idx is less than 0, it is automatically converted to padding_idx = -1 + 128 = 127
It will pad all-zero data when ids is 127.

Parameters

num_embeddings (int) – Just one element which indicate the size of the dictionary of embeddings.
embedding_dim (int) – Just one element which indicate the size of each embedding vector respectively.
padding_idx (int|float|None, optional) – padding_idx needs to be in the interval [-num_embeddings, num_embeddings). If \(padding\_idx < 0\), the \(padding\_idx\) will automatically be converted to \(vocab\_size + padding\_idx\) . It will output all-zero padding data whenever lookup encounters \(padding\_idx\) in id. And the padding data will not be updated while training. If set None, it makes no effect to output. Default: None.
max_norm (float, optional) – If provided, will renormalize the embedding vectors to have a norm larger than max_norm . It will inplace update the input embedding weight in dynamic graph mode. Default: None.
norm_type (float, optional) – The p of the p-norm to compute for the max_norm option. Default: 2.0.
sparse (bool, optional) – The flag indicating whether to use sparse update. This parameter only affects the performance of the backwards gradient update. It is recommended to set True because sparse update is faster. But some optimizer does not support sparse update, such as api_paddle_optimizer_adadelta_Adadelta , api_paddle_optimizer_adamax_Adamax , api_paddle_optimizer_lamb_Lamb. In these case, sparse must be False. Default: False.
weight_attr (ParamAttr|None, optional) – To specify the weight parameter property. Default: None, which means the default weight parameter property is used. See usage for details in ParamAttr . In addition, user-defined or pre-trained word vectors can be loaded with the param_attr parameter. The local word vector needs to be transformed into numpy format, and the shape of local word vector should be consistent with num_embeddings . Then Assign is used to load custom or pre-trained word vectors. See code example for details.
scale_grad_by_freq (bool, optional) – Indicating whether to scale the gradients by the inverse frequency of the word ids in input x. Default: False.
name (str|None, optional) – For detailed information, please refer to Name. Usually name is no need to set and None by default.

Attribute:: weight (Parameter): the learnable weights of this layer.

Returns: None

Examples

>>> import paddle

>>> x = paddle.to_tensor([[0], [1], [3]], dtype="int64", stop_gradient=False)
>>> embedding = paddle.nn.Embedding(4, 3, sparse=True)

>>> w0 = paddle.to_tensor([[0., 0., 0.],
...                        [1., 1., 1.],
...                        [2., 2., 2.],
...                        [3., 3., 3.]], dtype="float32")
>>> embedding.weight.set_value(w0)
>>> print(embedding.weight)
Parameter containing:
Tensor(shape=[4, 3], dtype=float32, place=Place(cpu), stop_gradient=False,
[[0., 0., 0.],
 [1., 1., 1.],
 [2., 2., 2.],
 [3., 3., 3.]])

>>> adam = paddle.optimizer.Adam(parameters=[embedding.weight], learning_rate=0.01)
>>> adam.clear_grad()

>>> out = embedding(x)
>>> print(out)
Tensor(shape=[3, 1, 3], dtype=float32, place=Place(cpu), stop_gradient=False,
[[[0., 0., 0.]],
 [[1., 1., 1.]],
 [[3., 3., 3.]]])

>>> out.backward()
>>> adam.step()

forward ( x: Tensor ) → Tensor forward¶

Defines the computation performed at every call. Should be overridden by all subclasses.

Parameters

*inputs (tuple) – unpacked tuple arguments
**kwargs (dict) – unpacked dict arguments

extra_repr ( ) → str extra_repr¶: Extra representation of this layer, you can have custom implementation of your own layer.

Embedding

forward¶

extra_repr¶