EduNLP.ModelZoo¶
rnn¶
- class EduNLP.ModelZoo.rnn.LM(rnn_type: str, vocab_size: int, embedding_dim: int, hidden_size: int, num_layers=1, bidirectional=False, embedding=None, model_params=None, **kwargs)[source]¶
- Parameters
rnn_type:str – Legal types including RNN, LSTM, GRU,ELMO
vocab_size (int) –
embedding_dim (int) –
hidden_size (int) –
num_layers –
bidirectional –
embedding –
model_params –
kwargs –
Examples
>>> import torch >>> seq_idx = torch.LongTensor([[1, 2, 3], [1, 2, 0], [3, 0, 0]]) >>> seq_len = torch.LongTensor([3, 2, 1]) >>> lm = LM("RNN", 4, 3, 2) >>> output, hn = lm(seq_idx, seq_len) >>> output.shape torch.Size([3, 3, 2]) >>> hn.shape torch.Size([1, 3, 2]) >>> lm = LM("RNN", 4, 3, 2, num_layers=2) >>> output, hn = lm(seq_idx, seq_len) >>> output.shape torch.Size([3, 3, 2]) >>> hn.shape torch.Size([2, 3, 2])
utils¶
- class EduNLP.ModelZoo.utils.Masker(mask: (<class 'int'>, <class 'str'>, Ellipsis) = 0, per=0.2, seed=None)[source]¶
- Parameters
mask (int, str) –
per –
seed –
Examples
>>> masker = Masker(per=0.5, seed=10) >>> items = [[1, 1, 3, 4, 6], [2], [5, 9, 1, 4]] >>> masked_seq, mask_label = masker(items) >>> masked_seq [[1, 1, 0, 0, 6], [2], [0, 9, 0, 4]] >>> mask_label [[0, 0, 1, 1, 0], [0], [1, 0, 1, 0]] >>> items = [[1, 2, 3], [1, 1, 0], [2, 0, 0]] >>> masked_seq, mask_label = masker(items, [3, 2, 1]) >>> masked_seq [[1, 0, 3], [0, 1, 0], [2, 0, 0]] >>> mask_label [[0, 1, 0], [1, 0, 0], [0, 0, 0]] >>> masker = Masker(mask="[MASK]", per=0.5, seed=10) >>> items = [["a", "b", "c"], ["d", "[PAD]", "[PAD]"], ["hello", "world", "[PAD]"]] >>> masked_seq, mask_label = masker(items, length=[3, 1, 2]) >>> masked_seq [['a', '[MASK]', 'c'], ['d', '[PAD]', '[PAD]'], ['hello', '[MASK]', '[PAD]']] >>> mask_label [[0, 1, 0], [0, 0, 0], [0, 1, 0]]
- Returns
list of masked_seq and list of masked_list
- Return type
list
- class EduNLP.ModelZoo.utils.PadSequence(length, pad_val=0, clip=True)[source]¶
Pad the sequence.
Pad the sequence to the given length by inserting pad_val. If clip is set, sequence that has length larger than length will be clipped.
- Parameters
length (int) – The maximum length to pad/clip the sequence
pad_val (number) – The pad value. Default 0
clip (bool) –
- Returns
list of number
- Return type
ret
- EduNLP.ModelZoo.utils.pad_sequence(sequence: list, max_length=None, pad_val=0, clip=True)[source]¶
- Parameters
sequence –
max_length –
pad_val –
clip –
- Returns
Modified list – padding the sequence in the same size.
- Return type
list
Examples
>>> seq = [[4, 3, 3], [2], [3, 3, 2]] >>> pad_sequence(seq) [[4, 3, 3], [2, 0, 0], [3, 3, 2]] >>> pad_sequence(seq, pad_val=1) [[4, 3, 3], [2, 1, 1], [3, 3, 2]] >>> pad_sequence(seq, max_length=2) [[4, 3], [2, 0], [3, 3]] >>> pad_sequence(seq, max_length=2, clip=False) [[4, 3, 3], [2, 0], [3, 3, 2]]