fastNLP.modules.encoder.gpt2 module

class fastNLP.modules.encoder.gpt2.GPT2Model(config)[源代码]

基类:fastNLP.modules.encoder.gpt2.GPT2PreTrainedModel

基类 fastNLP.modules.GPT2PreTrainedModel

别名 fastNLP.modules.GPT2Model fastNLP.modules.encoder.gpt2.GPT2Model

Outputs: Tuple comprising various elements depending on the configuration (config) and inputs:
last_hidden_state: torch.FloatTensor of shape (batch_size, sequence_length, hidden_size)

Sequence of hidden-states at the last layer of the model.

past:

list of torch.FloatTensor (one for each layer) of shape (2, batch_size, num_heads, sequence_length, embed_size_per_head): that contains pre-computed hidden-states (key and values in the attention blocks). Can be used (see past input) to speed up sequential decoding. The token ids which have their past given to this model should not be passed as input ids as they have already been computed.

hidden_states: (optional, returned when config.output_hidden_states=True)

list of torch.FloatTensor (one for the output of each layer + the output of the embeddings) of shape (batch_size, sequence_length, hidden_size): Hidden-states of the model at the output of each layer plus the initial embedding outputs.

attentions: (optional, returned when config.output_attentions=True)

list of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length): Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

Examples:

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2Model.from_pretrained('gpt2')
input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0)  # Batch size 1
outputs = model(input_ids)
last_hidden_states = outputs[0]  # The last hidden-state is the first element of the output tuple
property dtype

torch.dtype: The dtype of the module (assuming that all the module parameters have the same dtype).

forward(input_ids, state=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, output_attentions=True)[源代码]
参数
  • input_ids (torch.LongTensor) – batch_size x max_len or batch_size x beam_size x 1

  • state (GPT2State) – 之前的状态

  • attention_mask (torch.ByteTensor) – batch_size x (pre_len+past_len), 与input_ids与state的concat一样大。 为0的地方为padding。

  • token_type_ids (torch.LongTensor) – batch_size x max_len。

  • position_ids (torch.LongTensor) – 与input_ids对应的位置

  • head_mask

  • output_attentions (bool) – 是否输出attention状态

返回

training: bool