Modules¶
ModuleBase¶
-
class
texar.tf.
ModuleBase
(hparams=None)[source]¶ Base class inherited by modules that create Variables and are configurable through hyperparameters.
A Texar module inheriting
ModuleBase
has following key features:Convenient variable re-use: A module instance creates its own sets of variables, and automatically re-uses its variables on subsequent calls. Hence TF variable/name scope is transparent to users. For example:
encoder = UnidirectionalRNNEncoder(hparams) # create instance output_1 = encoder(inputs_1) # variables are created output_2 = encoder(inputs_2) # variables are re-used print(encoder.trainable_variables) # access trainable variables # [ ... ]
Configurable through hyperparameters: Each module defines allowed hyperparameters and default values. Hyperparameters not specified by users will take default values.
Callable: As the above example, a module instance is “called” with input tensors and returns output tensors. Every call of a module will add ops to the Graph to perform the module’s logic.
Parameters: hparams (dict, optional) – Hyperparameters of the module. See default_hparams()
for the structure and default values.-
_build
(*args, **kwargs)[source]¶ Subclass must implement this method to build the logic.
Parameters: - *args – Arguments.
- **kwargs – Keyword arguments.
Returns: Output Tensor(s).
-
static
default_hparams
()[source]¶ Returns a dict of hyperparameters of the module with default values. Used to replace the missing values of input hparams during module construction.
{ "name": "module" }
-
variable_scope
¶ The variable scope of the module.
-
name
¶ The uniquified name of the module.
-
trainable_variables
¶ The list of trainable variables of the module.
Embedders¶
WordEmbedder¶
-
class
texar.tf.modules.
WordEmbedder
(init_value=None, vocab_size=None, hparams=None)[source]¶ Simple word embedder that maps indexes into embeddings. The indexes can be soft (e.g., distributions over vocabulary).
Either
init_value
orvocab_size
is required. If both are given, there must beinit_value.shape[0]==vocab_size
.Parameters: - init_value (optional) –
A Tensor or numpy array that contains the initial value of embeddings. It is typically of shape
[vocab_size] + embedding-dim
. Embedding can have dimensionality > 1.If None, embedding is initialized as specified in
hparams["initializer"]
. Otherwise, the"initializer"
and"dim"
hyperparameters inhparams
are ignored. - vocab_size (int, optional) – The vocabulary size. Required if
init_value
is not given. - hparams (dict, optional) – Embedder hyperparameters. Missing
hyperparamerter will be set to default values. See
default_hparams()
for the hyperparameter structure and default values.
See
_build()
for the inputs and outputs of the embedder.Example
ids = tf.random_uniform(shape=[32, 10], maxval=10, dtype=tf.int64) soft_ids = tf.random_uniform(shape=[32, 10, 100]) embedder = WordEmbedder(vocab_size=100, hparams={'dim': 256}) ids_emb = embedder(ids=ids) # shape: [32, 10, 256] soft_ids_emb = embedder(soft_ids=soft_ids) # shape: [32, 10, 256]
# Use with Texar data module hparams={ 'dataset': { 'embedding_init': {'file': 'word2vec.txt'} ... }, } data = MonoTextData(data_params) iterator = DataIterator(data) batch = iterator.get_next() # Use data vocab size embedder_1 = WordEmbedder(vocab_size=data.vocab.size) emb_1 = embedder_1(batch['text_ids']) # Use pre-trained embedding embedder_2 = WordEmbedder(init_value=data.embedding_init_value) emb_2 = embedder_2(batch['text_ids'])
-
_build
(ids=None, soft_ids=None, mode=None, **kwargs)[source]¶ Embeds (soft) ids.
Either
ids
orsoft_ids
must be given, and they must not be given at the same time.Parameters: - ids (optional) – An integer tensor containing the ids to embed.
- soft_ids (optional) – A tensor of weights (probabilities) used to mix the embedding vectors.
- mode (optional) – A tensor taking value in
tf.estimator.ModeKeys, including
TRAIN, EVAL, and PREDICT. If None, dropout is
controlled by
texar.tf.global_mode()
. - kwargs – Additional keyword arguments for
tf.nn.embedding_lookup besides
params
andids
.
Returns: If
ids
is given, returns a Tensor of shapeshape(ids) + embedding-dim
. For example, ifshape(ids) = [batch_size, max_time]
andshape(embedding) = [vocab_size, emb_dim]
, then the return tensor has shape[batch_size, max_time, emb_dim]
.If
soft_ids
is given, returns a Tensor of shapeshape(soft_ids)[:-1] + embdding-dim
. For example, ifshape(soft_ids) = [batch_size, max_time, vocab_size]
andshape(embedding) = [vocab_size, emb_dim]
, then the return tensor has shape[batch_size, max_time, emb_dim]
.
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
{ "dim": 100, "dropout_rate": 0, "dropout_strategy": 'element', "trainable": True, "initializer": { "type": "random_uniform_initializer", "kwargs": { "minval": -0.1, "maxval": 0.1, "seed": None } }, "regularizer": { "type": "L1L2", "kwargs": { "l1": 0., "l2": 0. } }, "name": "word_embedder", }
Here:
- “dim”: int or list
Embedding dimension. Can be a list of integers to yield embeddings with dimensionality > 1.
Ignored if
init_value
is given to the embedder constructor.- “dropout_rate”: float
- The dropout rate between 0 and 1. E.g.,
dropout_rate=0.1
would drop out 10% of the embedding. Set to 0 to disable dropout. - “dropout_strategy”: str
The dropout strategy. Can be one of the following
"element"
: The regular strategy that drops individual elements of embedding vectors."item"
: Drops individual items (e.g., words) entirely. E.g., for the word sequence “the simpler the better”, the strategy can yield “_ simpler the better”, where the first “the” is dropped."item_type"
: Drops item types (e.g., word types). E.g., for the above sequence, the strategy can yield “_ simpler _ better”, where the word type “the” is dropped. The dropout will never yield “_ simpler the better” as in the"item"
strategy.
- “trainable”: bool
- Whether the embedding is trainable.
- “initializer”: dict or None
- Hyperparameters of the initializer for embedding values. See
get_initializer()
for the details. Ignored ifinit_value
is given to the embedder constructor. - “regularizer”: dict
- Hyperparameters of the regularizer for embedding values. See
get_regularizer()
for the details. - “name”: str
- Name of the embedding variable.
-
embedding
¶ The embedding tensor, of shape
[vocab_size] + dim
.
-
dim
¶ The embedding dimension.
-
vocab_size
¶ The vocabulary size.
- init_value (optional) –
PositionEmbedder¶
-
class
texar.tf.modules.
PositionEmbedder
(init_value=None, position_size=None, hparams=None)[source]¶ Simple position embedder that maps position indexes into embeddings via lookup.
Either
init_value
orposition_size
is required. If both are given, there must beinit_value.shape[0]==position_size
.Parameters: - init_value (optional) –
A Tensor or numpy array that contains the initial value of embeddings. It is typically of shape
[position_size, embedding dim]
If None, embedding is initialized as specified in
hparams["initializer"]
. Otherwise, the"initializer"
and"dim"
hyperparameters inhparams
are ignored. - position_size (int, optional) – The number of possible positions, e.g.,
the maximum sequence length. Required if
init_value
is not given. - hparams (dict, optional) – Embedder hyperparameters. If it is not
specified, the default hyperparameter setting is used. See
default_hparams
for the structure and default values.
-
_build
(positions=None, sequence_length=None, mode=None, **kwargs)[source]¶ Embeds the positions.
Either
positions
orsequence_length
is required:- If both are given,
sequence_length
is used to mask out embeddings of those time steps beyond the respective sequence lengths. - If only
sequence_length
is given, then positions from 0 tosequence_length-1
are embedded.
Parameters: - positions (optional) – An integer tensor containing the position ids to embed.
- sequence_length (optional) – An integer tensor of shape
[batch_size]
. Time steps beyond the respective sequence lengths will have zero-valued embeddings. - mode (optional) – A tensor taking value in
tf.estimator.ModeKeys, including
TRAIN, EVAL, and PREDICT. If None, dropout will be
controlled by
texar.tf.global_mode()
. - kwargs – Additional keyword arguments for
tf.nn.embedding_lookup besides
params
andids
.
Returns: A Tensor of shape shape(inputs) + embedding dimension.
- If both are given,
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
{ "dim": 100, "initializer": { "type": "random_uniform_initializer", "kwargs": { "minval": -0.1, "maxval": 0.1, "seed": None } }, "regularizer": { "type": "L1L2", "kwargs": { "l1": 0., "l2": 0. } }, "dropout_rate": 0, "trainable": True, "name": "position_embedder" }
The hyperparameters have the same meaning as those in
texar.tf.modules.WordEmbedder.default_hparams()
.
-
embedding
¶ The embedding tensor.
-
dim
¶ The embedding dimension.
-
position_size
¶ The position size, i.e., maximum number of positions.
- init_value (optional) –
SinusoidsPositionEmbedder¶
-
class
texar.tf.modules.
SinusoidsPositionEmbedder
(position_size, hparams=None)[source]¶ Sinusoid position embedder that maps position indexes into embeddings via sinusoid calculation. This module does not have trainable parameters. Used in, e.g., Transformer models (Vaswani et al.) “Attention Is All You Need”.
Each channel of the input Tensor is incremented by a sinusoid of a different frequency and phase. This allows attention to learn to use absolute and relative positions.
Timing signals should be added to some precursors of both the query and the memory inputs to attention. The use of relative position is possible because sin(x+y) and cos(x+y) can be expressed in terms of y, sin(x), and cos(x). In particular, we use a geometric sequence of timescales starting with min_timescale and ending with max_timescale. The number of different timescales is equal to
dim / 2
. For each timescale, we generate the two sinusoidal signals sin(timestep/timescale) and cos(timestep/timescale). All of these sinusoids are concatenated in the dim dimension.Parameters: position_size (int) – The number of possible positions, e.g., the maximum sequence length. Set position_size=None
andhparams['cache_embeddings']=False
to enable infinite large or negative position indexes.-
_build
(positions=None, sequence_length=None)[source]¶ Embeds. Either
positions
orsequence_length
is required:- If both are given,
sequence_length
is used to mask out embeddings of those time steps beyond the respective sequence lengths. - If only
sequence_length
is given, then positions from 0 to sequence_length-1 are embedded.
Parameters: - positions (optional) – An integer tensor containing the position ids to embed.
- sequence_length (optional) – An integer tensor of shape
[batch_size]
. Time steps beyond the respective sequence lengths will have zero-valued embeddings.
Returns: A Tensor of shape
[batch_size, max_time, dim]
.- If both are given,
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values We use a geometric sequence of timescales starting with min_timescale and ending with max_timescale. The number of different timescales is equal to
dim/2
.{ 'min_timescale': 1.0, 'max_timescale': 10000.0, 'dim': 512, 'cache_embeddings': True, 'name':'sinusoid_posisiton_embedder', }
Here:
- “cache_embeddings”: bool
If True, precompute embeddings for positions in range [0, position_size - 1]. This leads to faster lookup but requires lookup indices to be within this range.
If False, embeddings are computed on-the-fly during lookup. Set to False if your application needs to handle sequences of arbitrary length, or requires embeddings at negative positions.
-
EmbedderBase¶
-
class
texar.tf.modules.
EmbedderBase
(num_embeds=None, hparams=None)[source]¶ The base embedder class that all embedder classes inherit.
Parameters: - num_embeds (int, optional) – The number of embedding elements, e.g., the vocabulary size of a word embedder.
- hparams (dict or HParams, optional) – Embedder hyperparameters. Missing
hyperparamerter will be set to default values. See
default_hparams()
for the hyperparameter structure and default values.
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
{ "name": "embedder" }
-
num_embeds
¶ The number of embedding elements.
Encoders¶
UnidirectionalRNNEncoder¶
-
class
texar.tf.modules.
UnidirectionalRNNEncoder
(cell=None, cell_dropout_mode=None, output_layer=None, hparams=None)[source]¶ One directional RNN encoder.
Parameters: - cell – (RNNCell, optional) If not specified,
a cell is created as specified in
hparams["rnn_cell"]
. - cell_dropout_mode (optional) – A Tensor taking value of
tf.estimator.ModeKeys, which
toggles dropout in the RNN cell (e.g., activates dropout in
TRAIN mode). If None,
global_mode()
is used. Ignored ifcell
is given. - output_layer (optional) – An instance of
tf.layers.Layer. Applies to the RNN cell
output of each step. If None (default), the output layer is
created as specified in
hparams["output_layer"]
. - hparams (dict or HParams, optional) – Hyperparameters. Missing
hyperparamerter will be set to default values. See
default_hparams()
for the hyperparameter sturcture and default values.
See
_build()
for the inputs and outputs of the encoder.Example
# Use with embedder embedder = WordEmbedder(vocab_size, hparams=emb_hparams) encoder = UnidirectionalRNNEncoder(hparams=enc_hparams) outputs, final_state = encoder( inputs=embedder(data_batch['text_ids']), sequence_length=data_batch['length'])
-
_build
(inputs, sequence_length=None, initial_state=None, time_major=False, mode=None, return_cell_output=False, return_output_size=False, **kwargs)[source]¶ Encodes the inputs.
Parameters: - inputs – A 3D Tensor of shape [batch_size, max_time, dim].
The first two dimensions
batch_size
andmax_time
are exchanged iftime_major=True
is specified. - sequence_length (optional) – A 1D int tensor of shape [batch_size]. Sequence lengths of the batch inputs. Used to copy-through state and zero-out outputs when past a batch element’s sequence length.
- initial_state (optional) – Initial state of the RNN.
- time_major (bool) – The shape format of the
inputs
andoutputs
Tensors. If True, these tensors are of shape [max_time, batch_size, depth]. If False (default), these tensors are of shape [batch_size, max_time, depth]. - mode (optional) – A tensor taking value in
tf.estimator.ModeKeys, including
TRAIN, EVAL, and PREDICT. Controls output layer dropout
if the output layer is specified with
hparams
. If None (default),texar.tf.global_mode()
is used. - return_cell_output (bool) – Whether to return the output of the RNN cell. This is the results prior to the output layer.
- return_output_size (bool) – Whether to return the size of the output (i.e., the results after output layers).
- **kwargs – Optional keyword arguments of tf.nn.dynamic_rnn, such as swap_memory, dtype, parallel_iterations, etc.
Returns: By default (both return_cell_output and return_output_size are False), returns a pair
(outputs, final_state)
outputs
: The RNN output tensor by the output layer (if exists) or the RNN cell (otherwise). The tensor is of shape [batch_size, max_time, output_size] if time_major is False, or [max_time, batch_size, output_size] if time_major is True. If RNN cell output is a (nested) tuple of Tensors, then theoutputs
will be a (nested) tuple having the same nest structure as the cell output.final_state
: The final state of the RNN, which is a Tensor of shape [batch_size] + cell.state_size or a (nested) tuple of Tensors if cell.state_size is a (nested) tuple.
If return_cell_output is True, returns a triple
(outputs, final_state, cell_outputs)
cell_outputs
: The outputs by the RNN cell prior to the output layer, having the same structure withoutputs
except for the output_dim.
If return_output_size is True, returns a tuple
(outputs, final_state, output_size)
output_size
: A (possibly nested tuple of) int representing the size ofoutputs
. If a single int or an int array, then outputs has shape [batch/time, time/batch] + output_size. If a (nested) tuple, then output_size has the same structure as with outputs.
If both return_cell_output and return_output_size are True, returns
(outputs, final_state, cell_outputs, output_size)
.
- inputs – A 3D Tensor of shape [batch_size, max_time, dim].
The first two dimensions
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
{ "rnn_cell": default_rnn_cell_hparams(), "output_layer": { "num_layers": 0, "layer_size": 128, "activation": "identity", "final_layer_activation": None, "other_dense_kwargs": None, "dropout_layer_ids": [], "dropout_rate": 0.5, "variational_dropout": False }, "name": "unidirectional_rnn_encoder" }
Here:
- “rnn_cell”: dict
A dictionary of RNN cell hyperparameters. Ignored if
cell
is given to the encoder constructor.The default value is defined in
default_rnn_cell_hparams()
.- “output_layer”: dict
Output layer hyperparameters. Ignored if
output_layer
is given to the encoder constructor. Includes:- “num_layers”: int
- The number of output (dense) layers. Set to 0 to avoid any output layers applied to the cell outputs..
- “layer_size”: int or list
The size of each of the output (dense) layers.
If an int, each output layer will have the same size. If a list, the length must equal to
num_layers
.- “activation”: str or callable or None
Activation function for each of the output (dense) layer except for the final layer. This can be a function, or its string name or module path. If function name is given, the function must be from module tf.nn or tf. For example
"activation": "relu" # function name "activation": "my_module.my_activation_fn" # module path "activation": my_module.my_activation_fn # function
Default is None which maintains a linear activation.
- “final_layer_activation”: str or callable or None
- The activation function for the final output layer.
- “other_dense_kwargs”: dict or None
- Other keyword arguments to construct each of the output dense layers, e.g., use_bias. See Dense for the keyword arguments.
- “dropout_layer_ids”: int or list
The indexes of layers (starting from 0) whose inputs are applied with dropout. The index =
num_layers
means dropout applies to the final layer output. E.g.,{ "num_layers": 2, "dropout_layer_ids": [0, 2] }
will leads to a series of layers as -dropout-layer0-layer1-dropout-.
The dropout mode (training or not) is controlled by the
mode
argument of_build()
.- “dropout_rate”: float
- The dropout rate, between 0 and 1. E.g., “dropout_rate”: 0.1 would drop out 10% of elements.
- “variational_dropout”: bool
- Whether the dropout mask is the same across all time steps.
- “name”: str
- Name of the encoder
-
cell
¶ The RNN cell.
-
state_size
¶ The state size of encoder cell.
Same as
encoder.cell.state_size
.
-
output_layer
¶ The output layer.
- cell – (RNNCell, optional) If not specified,
a cell is created as specified in
BidirectionalRNNEncoder¶
-
class
texar.tf.modules.
BidirectionalRNNEncoder
(cell_fw=None, cell_bw=None, cell_dropout_mode=None, output_layer_fw=None, output_layer_bw=None, hparams=None)[source]¶ Bidirectional forward-backward RNN encoder.
Parameters: - cell_fw (RNNCell, optional) – The forward RNN cell. If not given,
a cell is created as specified in
hparams["rnn_cell_fw"]
. - cell_bw (RNNCell, optional) – The backward RNN cell. If not given,
a cell is created as specified in
hparams["rnn_cell_bw"]
. - cell_dropout_mode (optional) – A tensor taking value of
tf.estimator.ModeKeys, which
toggles dropout in the RNN cells (e.g., activates dropout in
TRAIN mode). If None,
global_mode()
is used. Ignored if respective cell is given. - output_layer_fw (optional) – An instance of
tf.layers.Layer. Apply to the forward
RNN cell output of each step. If None (default), the output
layer is created as specified in
hparams["output_layer_fw"]
. - output_layer_bw (optional) – An instance of
tf.layers.Layer. Apply to the backward
RNN cell output of each step. If None (default), the output
layer is created as specified in
hparams["output_layer_bw"]
. - hparams (dict or HParams, optional) – Hyperparameters. Missing
hyperparamerter will be set to default values. See
default_hparams()
for the hyperparameter sturcture and default values.
See
_build()
for the inputs and outputs of the encoder.Example
# Use with embedder embedder = WordEmbedder(vocab_size, hparams=emb_hparams) encoder = BidirectionalRNNEncoder(hparams=enc_hparams) outputs, final_state = encoder( inputs=embedder(data_batch['text_ids']), sequence_length=data_batch['length']) # outputs == (outputs_fw, outputs_bw) # final_state == (final_state_fw, final_state_bw)
-
_build
(inputs, sequence_length=None, initial_state_fw=None, initial_state_bw=None, time_major=False, mode=None, return_cell_output=False, return_output_size=False, **kwargs)[source]¶ Encodes the inputs.
Parameters: - inputs – A 3D Tensor of shape [batch_size, max_time, dim]. The first two dimensions batch_size and max_time may be exchanged if time_major=True is specified.
- sequence_length (optional) – A 1D int tensor of shape [batch_size]. Sequence lengths of the batch inputs. Used to copy-through state and zero-out outputs when past a batch element’s sequence length.
- initial_state (optional) – Initial state of the RNN.
- time_major (bool) – The shape format of the
inputs
andoutputs
Tensors. If True, these tensors are of shape [max_time, batch_size, depth]. If False (default), these tensors are of shape [batch_size, max_time, depth]. - mode (optional) – A tensor taking value in
tf.estimator.ModeKeys, including
TRAIN, EVAL, and PREDICT. Controls output layer dropout
if the output layer is specified with
hparams
. If None (default),texar.tf.global_mode()
is used. - return_cell_output (bool) – Whether to return the output of the RNN cell. This is the results prior to the output layer.
- **kwargs – Optional keyword arguments of tf.nn.dynamic_rnn, such as swap_memory, dtype, parallel_iterations, etc.
Returns: By default (both return_cell_output and return_output_size are False), returns a pair
(outputs, final_state)
outputs
: A tuple (outputs_fw, outputs_bw) containing the forward and the backward RNN outputs, each of which is of shape [batch_size, max_time, output_dim] if time_major is False, or [max_time, batch_size, output_dim] if time_major is True. If RNN cell output is a (nested) tuple of Tensors, then outputs_fw and outputs_bw will be a (nested) tuple having the same structure as the cell output.final_state
: A tuple (final_state_fw, final_state_bw) containing the final states of the forward and backward RNNs, each of which is a Tensor of shape [batch_size] + cell.state_size, or a (nested) tuple of Tensors if cell.state_size is a (nested) tuple.
If return_cell_output is True, returns a triple
(outputs, final_state, cell_outputs)
wherecell_outputs
: A tuple (cell_outputs_fw, cell_outputs_bw) containting the outputs by the forward and backward RNN cells prior to the output layers, having the same structure withoutputs
except for the output_dim.
If return_output_size is True, returns a tuple
(outputs, final_state, output_size)
whereoutput_size
: A tupple (output_size_fw, output_size_bw) containing the size of outputs_fw and outputs_bw, respectively. Take *_fw for example, output_size_fw is a (possibly nested tuple of) int. If a single int or an int array, then outputs_fw has shape [batch/time, time/batch] + output_size_fw. If a (nested) tuple, then output_size_fw has the same structure as with outputs_fw. The same applies to output_size_bw.
If both return_cell_output and return_output_size are True, returns
(outputs, final_state, cell_outputs, output_size)
.
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
{ "rnn_cell_fw": default_rnn_cell_hparams(), "rnn_cell_bw": default_rnn_cell_hparams(), "rnn_cell_share_config": True, "output_layer_fw": { "num_layers": 0, "layer_size": 128, "activation": "identity", "final_layer_activation": None, "other_dense_kwargs": None, "dropout_layer_ids": [], "dropout_rate": 0.5, "variational_dropout": False }, "output_layer_bw": { # Same hyperparams and default values as "output_layer_fw" # ... }, "output_layer_share_config": True, "name": "bidirectional_rnn_encoder" }
Here:
- “rnn_cell_fw”: dict
Hyperparameters of the forward RNN cell. Ignored if
cell_fw
is given to the encoder constructor.The default value is defined in
default_rnn_cell_hparams()
.- “rnn_cell_bw”: dict
Hyperparameters of the backward RNN cell. Ignored if
cell_bw
is given to the encoder constructor , or if"rnn_cell_share_config"
is True.The default value is defined in
default_rnn_cell_hparams()
.- “rnn_cell_share_config”: bool
- Whether share hyperparameters of the backward cell with the forward cell. Note that the cell parameters (variables) are not shared.
- “output_layer_fw”: dict
- Hyperparameters of the forward output layer. Ignored if
output_layer_fw
is given to the constructor. See the “output_layer” field ofdefault_hparams()
for details. - “output_layer_bw”: dict
Hyperparameters of the backward output layer. Ignored if
output_layer_bw
is given to the constructor. Have the same structure and defaults with"output_layer_fw"
.Ignored if
"output_layer_share_config"
is True.- “output_layer_share_config”: bool
- Whether share hyperparameters of the backward output layer with the forward output layer. Note that the layer parameters (variables) are not shared.
- “name”: str
- Name of the encoder
-
cell_fw
¶ The forward RNN cell.
-
cell_bw
¶ The backward RNN cell.
-
state_size_fw
¶ The state size of the forward encoder cell.
Same as
encoder.cell_fw.state_size
.
-
state_size_bw
¶ The state size of the backward encoder cell.
Same as
encoder.cell_bw.state_size
.
-
output_layer_fw
¶ The output layer of the forward RNN.
-
output_layer_bw
¶ The output layer of the backward RNN.
- cell_fw (RNNCell, optional) – The forward RNN cell. If not given,
a cell is created as specified in
HierarchicalRNNEncoder¶
-
class
texar.tf.modules.
HierarchicalRNNEncoder
(encoder_major=None, encoder_minor=None, hparams=None)[source]¶ A hierarchical encoder that stacks basic RNN encoders into two layers. Can be used to encode long, structured sequences, e.g. paragraphs, dialog history, etc.
Parameters: - encoder_major (optional) – An instance of subclass of
RNNEncoderBase
The high-level encoder taking final states from low-level encoder as its inputs. If not specified, an encoder is created as specified inhparams["encoder_major"]
. - encoder_minor (optional) – An instance of subclass of
RNNEncoderBase
The low-level encoder. If not specified, an encoder is created as specified inhparams["encoder_minor"]
. - hparams (dict or HParams, optional) – Hyperparameters. Missing
hyperparamerter will be set to default values. See
default_hparams()
for the hyperparameter sturcture and default values.
See
_build()
for the inputs and outputs of the encoder.-
_build
(inputs, order='btu', medium=None, sequence_length_major=None, sequence_length_minor=None, **kwargs)[source]¶ Encodes the inputs.
Parameters: - inputs –
A 4-D tensor of shape [B, T, U, dim], where
- B: batch_size
- T: the max length of high-level sequences. E.g., the max number of utterances in dialog history.
- U: the max length of low-level sequences. E.g., the max length of each utterance in dialog history.
- dim: embedding dimension
The order of first three dimensions can be changed according to
order
. - order –
A 3-char string containing ‘b’, ‘t’, and ‘u’, that specifies the order of inputs dimensions above. Following four can be accepted:
- ’btu’: None of the encoders are time-major.
- ’utb’: Both encoders are time-major.
- ’tbu’: The major encoder is time-major.
- ’ubt’: The minor encoder is time-major.
- medium (optional) – A list of callables that subsequently process the
final states of minor encoder and obtain the inputs
for the major encoder.
If not specified,
flatten()
is used for processing the minor’s final states. - sequence_length_major (optional) – The sequence_length argument sent to major encoder. This is a 1-D Tensor of shape [B].
- sequence_length_minor (optional) – The sequence_length argument
sent to minor encoder. It can be either a 1-D Tensor of shape
[B*T], or a 2-D Tensor of shape [B, T] or [T, B]
according to
order
. - **kwargs –
Other keyword arguments for the major and minor encoders, such as initial_state, etc. Note that sequence_length, and time_major must not be included here. time_major is derived from
order
automatically. By default, arguments will be sent to both major and minor encoders. To specify which encoder an argument should be sent to, add ‘_minor’/’_major’ as its suffix.Note that initial_state_minor must have a batch dimension of size B*T. If you have an initial state of batch dimension = T, use
tile_initial_state_minor()
to tile it according to order.
Returns: A tuple (outputs, final_state) by the major encoder.
See the return values of _build() method of respective encoder class for details.
- inputs –
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
{ "encoder_major_type": "UnidirectionalRNNEncoder", "encoder_major_hparams": {}, "encoder_minor_type": "UnidirectionalRNNEncoder", "encoder_minor_hparams": {}, "config_share": False, "name": "hierarchical_encoder_wrapper" }
Here:
- “encoder_major_type”: str or class or instance
- The high-level encoder. Can be a RNN encoder class, its name or module path, or a class instance. Ignored if encoder_major is given to the encoder constructor.
- “encoder_major_hparams”: dict
- The hyperparameters for the high-level encoder. The high-level
encoder is created with
encoder_class(hparams=encoder_major_hparams)
. Ignored if encoder_major is given to the encoder constructor, or if “encoder_major_type” is an encoder instance. - “encoder_minor_type”: str or class or instance
- The low-level encoder. Can be a RNN encoder class, its name or module path, or a class instance. Ignored if encoder_minor is given to the encoder constructor, or if “config_share” is True.
- “encoder_minor_hparams”: dict
- The hyperparameters for the low-level encoder. The high-level
encoder is created with
encoder_class(hparams=encoder_minor_hparams)
. Ignored if encoder_minor is given to the encoder constructor, or if “config_share” is True, or if “encoder_minor_type” is an encoder instance. - “config_share”:
- Whether to use encoder_major’s hyperparameters to construct encoder_minor.
- “name”:
- Name of the encoder.
-
static
tile_initial_state_minor
(initial_state, order, inputs_shape)[source]¶ Tiles an initial state to be used for encoder minor.
The batch dimension of
initial_state
must equal T. The state will be copied for B times and used to start encoding each low-level sequence. For example, the first utterance in each dialog history in the batch will have the same initial state.Parameters: Returns: A tiled initial state with batch dimension of size B*T
-
static
flatten
(x)[source]¶ Flattens a cell state by concatenating a sequence of cell states along the last dimension. If the cell states are LSTMStateTuple, only the hidden LSTMStateTuple.h is used.
This process is used by default if
medium
is not provided to_build()
.
-
encoder_major
¶ The high-level encoder.
-
encoder_minor
¶ The low-level encoder.
- encoder_major (optional) – An instance of subclass of
MultiheadAttentionEncoder¶
-
class
texar.tf.modules.
MultiheadAttentionEncoder
(hparams=None)[source]¶ Multihead Attention Encoder
Parameters: hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams()
for the hyperparameter sturcture and default values.-
_build
(queries, memory, memory_attention_bias, cache=None, mode=None)[source]¶ Encodes the inputs.
Parameters: - queries – A 3d tensor with shape of [batch, length_query, depth_query].
- memory – A 3d tensor with shape of [batch, length_key, depth_key].
- memory_attention_bias – A 3d tensor with shape of [batch, length_key, num_units].
- cache – Memory cache only when inferencing the sentence from sractch.
- mode (optional) – A tensor taking value in
tf.estimator.ModeKeys, including
TRAIN, EVAL and PREDICT. Controls dropout mode.
If None (default),
texar.tf.global_mode()
is used.
Returns: A Tensor of shape [batch_size, max_time, dim] containing the encoded vectors.
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
{ "initializer": None, 'num_heads': 8, 'output_dim': 512, 'num_units': 512, 'dropout_rate': 0.1, 'use_bias': False, "name": "multihead_attention" }
Here:
- “initializer”: dict, optional
- Hyperparameters of the default initializer that initializes
variables created in this module.
See
get_initializer()
for details. - “num_heads”: int
- Number of heads for attention calculation.
- “output_dim”: int
- Output dimension of the returned tensor.
- “num_units”: int
- Hidden dimension of the unsplitted attention space. Should be devisible by num_heads.
- “dropout_rate: : float
- Dropout rate in the attention.
- “use_bias”: bool
- Use bias when projecting the key, value and query.
- “name”: str
- Name of the module.
-
TransformerEncoder¶
-
class
texar.tf.modules.
TransformerEncoder
(hparams=None)[source]¶ Transformer encoder that applies multi-head self attention for encoding sequences.
This module basically stacks
MultiheadAttentionEncoder
,FeedForwardNetwork
and residual connections.This module supports two types of architectures, namely, the standard Transformer Encoder architecture first proposed in (Vaswani et al.) “Attention is All You Need”, and the variant first used in (Devlin et al.) BERT. See
default_hparams()
for the nuance between the two types of architectures.Parameters: hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparameter will be set to default values. See default_hparams()
for the hyperparameter sturcture and default values.-
_build
(inputs, sequence_length, mode=None)[source]¶ Encodes the inputs.
Parameters: - inputs – A 3D Tensor of shape [batch_size, max_time, dim],
containing the embedding of input sequences. Note that
the embedding dimension dim must equal “dim” in
hparams
. The input embedding is typically an aggregation of word embedding and position embedding. - sequence_length – A 1D Tensor of shape [batch_size]. Input tokens beyond respective sequence lengths are masked out automatically.
- mode (optional) – A tensor taking value in
tf.estimator.ModeKeys,
including TRAIN, EVAL, and PREDICT. Used to toggle
dropout.
If None (default),
texar.tf.global_mode()
is used.
Returns: A Tensor of shape [batch_size, max_time, dim] containing the encoded vectors.
- inputs – A 3D Tensor of shape [batch_size, max_time, dim],
containing the embedding of input sequences. Note that
the embedding dimension dim must equal “dim” in
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
{ "num_blocks": 6, "dim": 512, 'use_bert_config': False, "embedding_dropout": 0.1, "residual_dropout": 0.1, "poswise_feedforward": default_transformer_poswise_net_hparams, 'multihead_attention': { 'name': 'multihead_attention', 'num_units': 512, 'output_dim': 512, 'num_heads': 8, 'dropout_rate': 0.1, 'output_dim': 512, 'use_bias': False, }, "initializer": None, "name": "transformer_encoder" }
Here:
- “num_blocks”: int
- Number of stacked blocks.
- “dim”: int
- Hidden dimension of the encoders.
- “use_bert_config”: bool
If False, apply the standard Transformer Encoder architecture from the original paper (Vaswani et al.) “Attention is All You Need”. If True, apply the Transformer Encoder architecture used in BERT (Devlin et al.).
The differences lie in:
- The standard arch restricts the word embedding of PAD token to all zero. The BERT arch does not.
- The attention bias for padding tokens: The standard arch uses -1e8 for nagative attention mask. BERT uses -1e4 instead.
- The residual connections between internal tensors: In BERT, a residual layer connects the tensors after layer normalization. In the standard arch, the tensors are connected before layer normalization.
- “embedding_dropout”: float
- Dropout rate of the input embedding.
- “residual_dropout”: float
- Dropout rate of the residual connections.
- “poswise_feedforward”: dict
Hyperparameters for a feed-forward network used in residual connections. Make sure the dimension of the output tensor is equal to dim.
See
default_transformer_poswise_net_hparams()
for details.- “multihead_attention”: dict
- Hyperparameters for the multihead attention strategy.
Make sure the “output_dim” in this module is equal to “dim”.
See
default_harams()
for details. - “initializer”: dict, optional
- Hyperparameters of the default initializer that initializes
variables created in this module.
See
get_initializer()
for details. - “name”: str
- Name of the module.
-
BERTEncoder¶
-
class
texar.tf.modules.
BERTEncoder
(pretrained_model_name=None, cache_dir=None, hparams=None)[source]¶ Raw BERT Transformer for encoding sequences. Please see
PretrainedBERTMixin
for a brief description of BERT.This module basically stacks
WordEmbedder
,PositionEmbedder
,TransformerEncoder
and a dense pooler.Parameters: - pretrained_model_name (optional) – a str, the name
of pre-trained model (e.g.,
bert-base-uncased
). Please refer toPretrainedBERTMixin
for all supported models. If None, the model name inhparams
is used. - cache_dir (optional) – the path to a folder in which the
pre-trained models will be cached. If None (default),
a default directory (
texar_data
folder under user’s home directory) will be used. - hparams (dict or HParams, optional) – Hyperparameters. Missing
hyperparameter will be set to default values. See
default_hparams()
for the hyperparameter structure and default values.
-
_build
(inputs, sequence_length=None, segment_ids=None, mode=None, **kwargs)[source]¶ Encodes the inputs.
Parameters: - inputs – A 2D Tensor of shape [batch_size, max_time], containing the token ids of tokens in the input sequences.
- segment_ids (optional) – A 2D Tensor of shape [batch_size, max_time], containing the segment ids of tokens in input sequences. If None (default), a tensor with all elements set to zero is used.
- sequence_length (optional) – A 1D Tensor of shape [batch_size]. Input tokens beyond respective sequence lengths are masked out automatically.
- mode (optional) – A tensor taking value in
tf.estimator.ModeKeys,
including TRAIN, EVAL, and PREDICT. Used to toggle
dropout.
If None (default),
texar.tf.global_mode()
is used. - **kwargs – Keyword arguments.
Returns: A pair
(outputs, pooled_output)
outputs
: A Tensor of shape [batch_size, max_time, dim] containing the encoded vectors.pooled_output
: A Tensor of size [batch_size, hidden_size] which is the output of a pooler berts on top of the hidden state associated to the first character of the input (CLS), see BERT’s paper.
-
reset_parameters
()[source]¶ Initialize parameters of the pre-trained model. This method is only called if pre-trained checkpoints are not loaded.
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
- The encoder arch is determined by the constructor argument
pretrained_model_name
if it’s specified. In this case, hparams are ignored. - Otherwise, the encoder arch is determined by hparams[‘pretrained_model_name’] if it’s specified. All other configurations in hparams are ignored.
- If the above two are None, the encoder arch is defined by the configurations in hparams and weights are randomly initialized.
{ "pretrained_model_name": "bert-base-uncased", "embed": { "dim": 768, "name": "word_embeddings" }, "vocab_size": 30522, "segment_embed": { "dim": 768, "name": "token_type_embeddings" }, "type_vocab_size": 2, "position_embed": { "dim": 768, "name": "position_embeddings" }, "position_size": 512, "encoder": { "dim": 768, "embedding_dropout": 0.1, "multihead_attention": { "dropout_rate": 0.1, "name": "self", "num_heads": 12, "num_units": 768, "output_dim": 768, "use_bias": True }, "name": "encoder", "num_blocks": 12, "poswise_feedforward": { "layers": [ { "kwargs": { "activation": "gelu", "name": "intermediate", "units": 3072, "use_bias": True }, "type": "Dense" }, { "kwargs": {"activation": None, "name": "output", "units": 768, "use_bias": True }, "type": "Dense" } ] }, "residual_dropout": 0.1, "use_bert_config": True }, "hidden_size": 768, "initializer": None, "name": "bert_encoder" }
Here:
The default parameters are values for uncased BERT-Base model.
- “pretrained_model_name”: str or None
- The name of the pre-trained BERT model. If None, the model will be randomly initialized.
- “embed”: dict
- Hyperparameters for word embedding layer.
- “vocab_size”: int
- The vocabulary size of inputs in BERT model.
- “segment_embed”: dict
- Hyperparameters for segment embedding layer.
- “type_vocab_size”: int
- The vocabulary size of the segment_ids passed into BertModel.
- “position_embed”: dict
- Hyperparameters for position embedding layer.
- “position_size”: int
- The maximum sequence length that this model might ever be used with.
- “encoder”: dict
- Hyperparameters for the TransformerEncoder.
See
default_harams()
for details. - “hidden_size”: int
- Size of the pooler dense layer.
- “initializer”: dict, optional
- Hyperparameters of the default initializer that initializes
variables created in this module.
See
get_initializer()
for details. - “name”: str
- Name of the module.
- The encoder arch is determined by the constructor argument
- pretrained_model_name (optional) – a str, the name
of pre-trained model (e.g.,
Conv1DEncoder¶
-
class
texar.tf.modules.
Conv1DEncoder
(hparams=None)[source]¶ Simple Conv-1D encoder which consists of a sequence of conv layers followed with a sequence of dense layers.
Wraps
Conv1DNetwork
to be a subclass ofEncoderBase
. Has exact the same functionality withConv1DNetwork
.-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
The same as
default_hparams()
ofConv1DNetwork
, except that the default name is ‘conv_encoder’.
-
static
EncoderBase¶
RNNEncoderBase¶
-
class
texar.tf.modules.
RNNEncoderBase
(hparams=None)[source]¶ Base class for all RNN encoder classes to inherit.
Parameters: hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams()
for the hyperparameter sturcture and default values.
XLNetEncoder¶
-
class
texar.tf.modules.
XLNetEncoder
(pretrained_model_name=None, cache_dir=None, hparams=None)[source]¶ Raw XLNet module for encoding sequences. Please see
PretrainedXLNetMixin
for a brief description of XLNet.Parameters: - pretrained_model_name (optional) – a str, the name
of pre-trained model (e.g.,
xlnet-based-cased
). Please refer toPretrainedXLNetMixin
for all supported models. If None, the model name inhparams
is used. - cache_dir (optional) – the path to a folder in which the
pre-trained models will be cached. If None (default),
a default directory (
texar_data
folder under user’s home directory) will be used. - hparams (dict or HParams, optional) – Hyperparameters. Missing
hyperparameter will be set to default values. See
default_hparams()
for the hyperparameter structure and default values.
-
_build
(token_ids, segment_ids=None, input_mask=None, memory=None, permute_mask=None, target_mapping=None, bi_data=False, clamp_len=None, cache_len=0, same_length=False, attn_type='bi', two_stream=False, mode=None)[source]¶ Compute XLNet representations for the input.
Parameters: - token_ids – Shape [batch_size, max_time].
- segment_ids – Shape [batch_size, max_time].
- input_mask – Float tensor of shape [batch_size, max_time]. Note that positions with value 1 are masked out.
- memory – Memory from previous batches. A list of length num_layers, each tensor of shape [batch_size, mem_len, hidden_dim].
- permute_mask – The permutation mask. Float tensor of shape
[batch_size, max_time, max_time].
A value of 0 for
permute_mask[i, j, k]
indicates that position i attends to position j in batch k. - target_mapping – The target token mapping. Float tensor of shape
[batch_size, num_targets, max_time].
A value of 1 for
target_mapping[i, j, k]
indicates that the i-th target token (in order of permutation) in batch k is the token at position j. Each rowtarget_mapping[i, :, k]
can have no more than one value of 1. - bi_data (bool) – Whether to use bidirectional data input pipeline.
- clamp_len (int) – Clamp all relative distances larger than
clamp_len
. A value of -1 means no clamping. - cache_len (int) – Length of memory (number of tokens) to cache.
- same_length (bool) – Whether to use the same attention length for each token.
- attn_type (str) – Attention type. Supported values are “uni” and “bi”.
- two_stream (bool) – Whether to use two-stream attention. Only set to True when pre-training or generating text. Defaults to False.
Returns: A tuple of (output, new_memory):
- output: The final layer output representations. Shape [batch_size, max_time, hidden_dim].
- new_memory: The memory of the current batch.
If cache_len is 0, then new_memory is None. Otherwise, it is
a list of length num_layers, each tensor of shape
[batch_size, cache_len, hidden_dim].
This can be used as the
memory
argument in the next batch.
-
reset_parameters
()[source]¶ Initialize parameters of the pre-trained model. This method is only called if pre-trained checkpoints are not loaded.
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
- The encoder arch is determined by the constructor argument
pretrained_model_name
if it’s specified. In this case, hparams are ignored. - Otherwise, the encoder arch is determined by hparams[‘pretrained_model_name’] if it’s specified. All other configurations in hparams are ignored.
- If the above two are None, the encoder arch is defined by the configurations in hparams and weights are randomly initialized.
{ "name": "xlnet_encoder", "pretrained_model_name": "xlnet-base-cased", "untie_r": True, "num_layers": 12, "mem_len": 0, "reuse_len": 0, "initializer": None, "num_heads": 12, "hidden_dim": 768, "head_dim": 64, "dropout": 0.1, "attention_dropout": 0.1, "use_segments": True, "ffn_inner_dim": 3072, "activation": 'gelu', "vocab_size": 32000, "max_seq_len": 512, }
Here:
The default parameters are values for cased XLNet-Base model.
- “pretrained_model_name”: str or None
- The name of the pre-trained bert model. If None, the model will be randomly initialized.
- “untie_r”: bool
- Boolean value to indicate if biases should be untied for all the layers
- “num_layers”: int
- Number of layers in the network
- “mem_len”: int
- Length of the memory to be used during attention score calculation.
- “reuse_len”: int
- Length of the memory that can be re-used
- “initializer”: dict, optional
- Hyperparameters of the default initializer that initializes
variables created in this module.
See
get_initializer()
for details. - “num_heads”: int
- Number of heads in the attention
- “hidden_dim”: int
- Hidden dimension of the embeddings
- “head_dim”: int
- Size of the vectors after head projection.
- “dropout”: float
- Dropout rate for layers
- “attention_dropout”: float
- Dropout rate for attention layers
- “use_segments”: bool
- Boolean to indicate if the input has segments
- “ffn_inner_dim”: int
- Dimension of PositionWise FF network’s hidden layer
- “activation”: str or callable
- Activation function applied to the output of the PositionWise FF.
See
get_activation_fn()
for more details. - “vocab_size”: int
- The vocabulary size of inputs in XLNet.
- “max_seq_len”: int
- Maximum len of the sequence allowed in one segment
- “name”: str
- Name of the module.
- The encoder arch is determined by the constructor argument
-
param_groups
(lr=None, lr_layer_scale=1.0, decay_base_params=False)[source]¶ Create parameter groups for optimizers. When
lr_layer_decay_rate
is not 1.0, parameters from each layer form separate groups with different base learning rates.This method should be called before applying gradients to the variables through the optimizer. Particularly, after calling the optimizer’s compute_gradients method, the user can call this method to get variable-specific learning rates for the network. The gradients for each variables can then be scaled accordingly. These scaled gradients are finally applied by calling optimizer’s apply_gradients method.
Example
grads_and_vars = optimizer.compute_gradients(loss) vars_to_grads = {key: value for key, value in grads_and_vars} vars_to_learning_rates = xlnet_encoder.param_groups( lr=1, ly_layer_scale=0.75) for key in vars_to_grads.keys(): vars_to_grads[key] *= vars_to_learning_rates[key] train_op = optimizer.apply_gradients(zip( *vars_to_grads.items()))
Parameters: - lr (float) – The learning rate. Can be omitted if
lr_layer_decay_rate
is 1.0. - lr_layer_scale (float) – Per-layer LR scaling rate. The i-th layer will be scaled by lr_layer_scale ^ (num_layers - i - 1).
- decay_base_params (bool) – If True, treat non-layer parameters (e.g. embeddings) as if they’re in layer 0. If False, these parameters are not scaled.
Returns: A dict mapping tensorflow variables to their learning rates.
- lr (float) – The learning rate. Can be omitted if
- pretrained_model_name (optional) – a str, the name
of pre-trained model (e.g.,
default_transformer_poswise_net_hparams¶
-
texar.tf.modules.
default_transformer_poswise_net_hparams
(output_dim=512)[source]¶ Returns default hyperparameters of a
FeedForwardNetwork
as a pos-wise network used inTransformerEncoder
andTransformerDecoder
.This is a 2-layer dense network with dropout in-between.
{ "layers": [ { "type": "Dense", "kwargs": { "name": "conv1", "units": output_dim*4, "activation": "relu", "use_bias": True, } }, { "type": "Dropout", "kwargs": { "rate": 0.1, } }, { "type": "Dense", "kwargs": { "name": "conv2", "units": output_dim, "use_bias": True, } } ], "name": "ffn" }
Parameters: output_dim (int) – The size of output dense layer.
Decoders¶
RNNDecoderBase¶
-
class
texar.tf.modules.
RNNDecoderBase
(cell=None, vocab_size=None, output_layer=None, cell_dropout_mode=None, hparams=None)[source]¶ Base class inherited by all RNN decoder classes. See
BasicRNNDecoder
for the argumenrts.See
_build()
for the inputs and outputs of RNN decoders in general.-
_build
(decoding_strategy='train_greedy', initial_state=None, inputs=None, sequence_length=None, embedding=None, start_tokens=None, end_token=None, softmax_temperature=None, max_decoding_length=None, impute_finished=False, output_time_major=False, input_time_major=False, helper=None, mode=None, **kwargs)[source]¶ Performs decoding. This is a shared interface for both
BasicRNNDecoder
andAttentionRNNDecoder
.The function provides 3 ways to specify the decoding method, with varying flexibility:
The
decoding_strategy
argument: A string taking value of:- “train_greedy”: decoding in teacher-forcing fashion
(i.e., feeding ground truth to decode the next step), and each
sample is obtained by taking the argmax of the RNN output
logits. Arguments
(inputs, sequence_length, input_time_major)
are required for this strategy, and argumentembedding
is optional. - “infer_greedy”: decoding in inference fashion (i.e., feeding
the generated sample to decode the next step), and each sample
is obtained by taking the argmax of the RNN output logits.
Arguments
(embedding, start_tokens, end_token)
are required for this strategy, and argumentmax_decoding_length
is optional. - “infer_sample”: decoding in inference fashion, and each
sample is obtained by random sampling from the RNN output
distribution. Arguments
(embedding, start_tokens, end_token)
are required for this strategy, and argumentmax_decoding_length
is optional.
- “train_greedy”: decoding in teacher-forcing fashion
(i.e., feeding ground truth to decode the next step), and each
sample is obtained by taking the argmax of the RNN output
logits. Arguments
This argument is used only when argument
helper
is None.Example:
embedder = WordEmbedder(vocab_size=data.vocab.size) decoder = BasicRNNDecoder(vocab_size=data.vocab.size) # Teacher-forcing decoding outputs_1, _, _ = decoder( decoding_strategy='train_greedy', inputs=embedder(data_batch['text_ids']), sequence_length=data_batch['length']-1) # Random sample decoding. Gets 100 sequence samples outputs_2, _, sequence_length = decoder( decoding_strategy='infer_sample', start_tokens=[data.vocab.bos_token_id]*100, end_token=data.vocab.eos.token_id, embedding=embedder, max_decoding_length=60)
The
helper
argument: An instance of subclass oftexar.tf.modules.Helper
. This provides a superset of decoding strategies than above, for example:TrainingHelper
corresponding to the “train_greedy” strategy.GreedyEmbeddingHelper
andSampleEmbeddingHelper
corresponding to the “infer_greedy” and “infer_sample”, respectively.TopKSampleEmbeddingHelper
for Top-K sample decoding.ScheduledEmbeddingTrainingHelper
andScheduledOutputTrainingHelper
for scheduled sampling.SoftmaxEmbeddingHelper
andGumbelSoftmaxEmbeddingHelper
for soft decoding and gradient backpropagation.
Helpers give the maximal flexibility of configuring the decoding strategy.
Example:
embedder = WordEmbedder(vocab_size=data.vocab.size) decoder = BasicRNNDecoder(vocab_size=data.vocab.size) # Teacher-forcing decoding, same as above with # `decoding_strategy='train_greedy'` helper_1 = tx.modules.TrainingHelper( inputs=embedders(data_batch['text_ids']), sequence_length=data_batch['length']-1) outputs_1, _, _ = decoder(helper=helper_1) # Gumbel-softmax decoding helper_2 = GumbelSoftmaxEmbeddingHelper( embedding=embedder, start_tokens=[data.vocab.bos_token_id]*100, end_token=data.vocab.eos_token_id, tau=0.1) outputs_2, _, sequence_length = decoder( max_decoding_length=60, helper=helper_2)
hparams["helper_train"]
andhparams["helper_infer"]
: Specifying the helper through hyperparameters. Train and infer strategy is toggled based onmode
. Appriopriate arguments (e.g.,inputs
,start_tokens
, etc) are selected to construct the helper. Additional arguments for helper constructor can be provided either through**kwargs
, or throughhparams["helper_train/infer"]["kwargs"]
.This means is used only when both
decoding_strategy
andhelper
are None.Example:
h = { "helper_infer": { "type": "GumbelSoftmaxEmbeddingHelper", "kwargs": { "tau": 0.1 } } } embedder = WordEmbedder(vocab_size=data.vocab.size) decoder = BasicRNNDecoder(vocab_size=data.vocab.size, hparams=h) # Gumbel-softmax decoding output, _, _ = decoder( decoding_strategy=None, # Sets to None explicit embedding=embedder, start_tokens=[data.vocab.bos_token_id]*100, end_token=data.vocab.eos_token_id, max_decoding_length=60, mode=tf.estimator.ModeKeys.PREDICT) # PREDICT mode also shuts down dropout
Parameters: - decoding_strategy (str) – A string specifying the decoding
strategy. Different arguments are required based on the
strategy. Ignored if
helper
is given. - initial_state (optional) – Initial state of decoding. If None (default), zero state is used.
- inputs (optional) –
Input tensors for teacher forcing decoding. Used when
decoding_strategy
is set to"train_greedy"
, or when hparams-configured helper is used.- If embedding is None, inputs is directly fed to the decoder. E.g., in “train_greedy” strategy, inputs must be a 3D Tensor of shape [batch_size, max_time, emb_dim] (or [max_time, batch_size, emb_dim] if input_time_major == True).
- If embedding is given, inputs is used as index to look up
embeddings and feed in the decoder. E.g., if embedding is
an instance of
WordEmbedder
, theninputs
is usually a 2D int Tensor [batch_size, max_time] (or [max_time, batch_size] if input_time_major == True) containing the token indexes.
- sequence_length (optional) – A 1D int Tensor containing the
sequence length of
inputs
. Used when decoding_strategy=”train_greedy” or hparams-configured helper is used. - embedding (optional) –
Embedding used when:
- ”infer_greedy” or “infer_sample” decoding_strategy is used. This can be a callable or the params argument for embedding_lookup. If a callable, it can take a vector tensor of token ids, or take two arguments (ids, times), where ids is a vector tensor of token ids, and times is a vector tensor of time steps (i.e., position ids). The latter case can be used when attr:embedding is a combination of word embedding and position embedding. embedding is required in this case.
- ”train_greedy” decoding_strategy is used.
This can be a callable or the params argument for
embedding_lookup.
If a callable, it can take
inputs
and returns the input embedding. embedding is optional in this case.
- start_tokens (optional) –
A int Tensor of shape [batch_size], the start tokens. Used when decoding_strategy=”infer_greedy” or “infer_sample”, or when the helper specified in hparams is used.
Example
data = tx.data.MonoTextData(hparams) iterator = DataIterator(data) batch = iterator.get_next() bos_token_id = data.vocab.bos_token_id start_tokens=tf.ones_like(batch['length'])*bos_token_id
- end_token (optional) – A int 0D Tensor, the token that marks end of decoding. Used when decoding_strategy=”infer_greedy” or “infer_sample”, or when the helper specified in hparams is used.
- softmax_temperature (optional) – A float 0D Tensor, value to divide the logits by before computing the softmax. Larger values (above 1.0) result in more random samples. Must > 0. If None, 1.0 is used. Used when decoding_strategy=”infer_sample”.
- max_decoding_length – A int scalar Tensor indicating the maximum
allowed number of decoding steps. If None (default), either
hparams[“max_decoding_length_train”] or
hparams[“max_decoding_length_infer”] is used
according to
mode
. - impute_finished (bool) – If True, then states for batch entries which are marked as finished get copied through and the corresponding outputs get zeroed out. This causes some slowdown at each time step, but ensures that the final state and outputs have the correct values and that backprop ignores time steps that were marked as finished.
- output_time_major (bool) – If True, outputs are returned as time major tensors. If False (default), outputs are returned as batch major tensors.
- input_time_major (optional) – Whether the
inputs
tensor is time major. Used when decoding_strategy=”train_greedy” or hparams-configured helper is used. - helper (optional) – An instance of
texar.tf.modules.Helper
that defines the decoding strategy. If given, decoding_strategy and helper configs inhparams
are ignored. - mode (str, optional) – A string taking value in tf.estimator.ModeKeys. If TRAIN, training related hyperparameters are used (e.g., hparams[‘max_decoding_length_train’]), otherwise, inference related hyperparameters are used (e.g., hparams[‘max_decoding_length_infer’]). If None (default), TRAIN mode is used.
- **kwargs – Other keyword arguments for constructing helpers defined by hparams[“helper_trainn”] or hparams[“helper_infer”].
Returns: (outputs, final_state, sequence_lengths), where
- `outputs`: an object containing the decoder output on all time steps.
- `final_state`: is the cell state of the final time step.
- `sequence_lengths`: is an int Tensor of shape [batch_size] containing the length of each sample.
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
The hyperparameters are the same as in
default_hparams()
ofBasicRNNDecoder
, except that the default “name” here is “rnn_decoder”.
-
batch_size
¶ The batch size of input values.
-
cell
¶ The RNN cell.
-
zero_state
(batch_size, dtype)[source]¶ Zero state of the RNN cell. Equivalent to
decoder.cell.zero_state
.
-
state_size
¶ The state size of decoder cell. Equivalent to
decoder.cell.state_size
.
-
vocab_size
¶ The vocab size.
-
name
¶ The uniquified name of the module.
-
output_layer
¶ The output layer.
-
trainable_variables
¶ The list of trainable variables of the module.
-
variable_scope
¶ The variable scope of the module.
-
BasicRNNDecoder¶
-
class
texar.tf.modules.
BasicRNNDecoder
(cell=None, cell_dropout_mode=None, vocab_size=None, output_layer=None, hparams=None)[source]¶ Basic RNN decoder.
Parameters: - cell (RNNCell, optional) – An instance of
RNNCell. If None
(default), a cell is created as specified in
hparams
. - cell_dropout_mode (optional) – A Tensor taking value of
tf.estimator.ModeKeys, which
toggles dropout in the RNN cell (e.g., activates dropout in
TRAIN mode). If None,
global_mode()
is used. Ignored ifcell
is given. - vocab_size (int, optional) – Vocabulary size. Required if
output_layer
is None. - output_layer (optional) –
An output layer that transforms cell output to logits. This can be:
- A callable layer, e.g., an instance of tf.layers.Layer.
- A tensor. A dense layer will be created using the tensor as the kernel weights. The bias of the dense layer is determined by hparams.output_layer_bias. This can be used to tie the output layer with the input embedding matrix, as proposed in https://arxiv.org/pdf/1608.05859.pdf
- None. A dense layer will be created based on attr:vocab_size and hparams.output_layer_bias.
- If no output layer after the cell output is needed, set (vocab_size=None, output_layer=tf.identity).
- hparams (dict, optional) – Hyperparameters. Missing
hyperparamerter will be set to default values. See
default_hparams()
for the hyperparameter sturcture and default values.
See
_build()
for the inputs and outputs of the decoder. The decoder returns (outputs, final_state, sequence_lengths), where outputs is an instance ofBasicRNNDecoderOutput
.Example
embedder = WordEmbedder(vocab_size=data.vocab.size) decoder = BasicRNNDecoder(vocab_size=data.vocab.size) # Training loss outputs, _, _ = decoder( decoding_strategy='train_greedy', inputs=embedder(data_batch['text_ids']), sequence_length=data_batch['length']-1) loss = tx.losses.sequence_sparse_softmax_cross_entropy( labels=data_batch['text_ids'][:, 1:], logits=outputs.logits, sequence_length=data_batch['length']-1) # Inference sample outputs, _, _ = decoder( decoding_strategy='infer_sample', start_tokens=[data.vocab.bos_token_id]*100, end_token=data.vocab.eos.token_id, embedding=embedder, max_decoding_length=60, mode=tf.estimator.ModeKeys.PREDICT) sample_id = sess.run(outputs.sample_id) sample_text = tx.utils.map_ids_to_strs(sample_id, data.vocab) print(sample_text) # [ # the first sequence sample . # the second sequence sample . # ... # ]
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
{ "rnn_cell": default_rnn_cell_hparams(), "max_decoding_length_train": None, "max_decoding_length_infer": None, "helper_train": { "type": "TrainingHelper", "kwargs": {} } "helper_infer": { "type": "SampleEmbeddingHelper", "kwargs": {} } "name": "basic_rnn_decoder" }
Here:
- “rnn_cell”: dict
- A dictionary of RNN cell hyperparameters. Ignored if
cell
is given to the decoder constructor. The default value is defined indefault_rnn_cell_hparams()
. - “max_decoding_length_train”: int or None
- Maximum allowed number of decoding steps in training mode. If None (default), decoding is performed until fully done, e.g., encountering the <EOS> token. Ignored if max_decoding_length is given when calling the decoder.
- “max_decoding_length_infer”: int or None
- Same as “max_decoding_length_train” but for inference mode.
- “helper_train”: dict
- The hyperparameters of the helper used in training.
“type” can be a helper class, its name or module path, or a
helper instance. If a class name is given, the class must be
from module tf.contrib.seq2seq,
texar.tf.modules
, ortexar.tf.custom
. This is used only when both decoding_strategy and helper augments are None when calling the decoder. See_build()
for more details. - “helper_infer”: dict
- Same as “helper_train” but during inference mode.
- “name”: str
Name of the decoder.
The default value is “basic_rnn_decoder”.
-
batch_size
¶ The batch size of input values.
-
cell
¶ The RNN cell.
-
name
¶ The uniquified name of the module.
-
output_layer
¶ The output layer.
-
state_size
¶ The state size of decoder cell. Equivalent to
decoder.cell.state_size
.
-
trainable_variables
¶ The list of trainable variables of the module.
-
variable_scope
¶ The variable scope of the module.
-
vocab_size
¶ The vocab size.
-
zero_state
(batch_size, dtype)¶ Zero state of the RNN cell. Equivalent to
decoder.cell.zero_state
.
- cell (RNNCell, optional) – An instance of
RNNCell. If None
(default), a cell is created as specified in
BasicRNNDecoderOutput¶
-
class
texar.tf.modules.
BasicRNNDecoderOutput
[source]¶ The outputs of basic RNN decoder that include both RNN outputs and sampled ids at each step. This is also used to store results of all the steps after decoding the whole sequence.
-
logits
¶ The outputs of RNN (at each step/of all steps) by applying the output layer on cell outputs. E.g., in
BasicRNNDecoder
with default hyperparameters, this is a Tensor of shape [batch_size, max_time, vocab_size] after decoding the whole sequence.
-
sample_id
¶ The sampled results (at each step/of all steps). E.g., in BasicRNNDecoder with decoding strategy of train_greedy, this is a Tensor of shape [batch_size, max_time] containing the sampled token indexes of all steps.
-
cell_output
¶ The output of RNN cell (at each step/of all steps). This is the results prior to the output layer. E.g., in BasicRNNDecoder with default hyperparameters, this is a Tensor of shape [batch_size, max_time, cell_output_size] after decoding the whole sequence.
-
AttentionRNNDecoder¶
-
class
texar.tf.modules.
AttentionRNNDecoder
(memory, memory_sequence_length=None, cell=None, cell_dropout_mode=None, vocab_size=None, output_layer=None, cell_input_fn=None, hparams=None)[source]¶ RNN decoder with attention mechanism.
Parameters: - memory – The memory to query, e.g., the output of an RNN encoder. This tensor should be shaped [batch_size, max_time, dim].
- memory_sequence_length (optional) – A tensor of shape [batch_size] containing the sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths.
- cell (RNNCell, optional) – An instance of RNNCell. If None, a cell
is created as specified in
hparams
. - cell_dropout_mode (optional) – A Tensor taking value of
tf.estimator.ModeKeys, which
toggles dropout in the RNN cell (e.g., activates dropout in
TRAIN mode). If None,
global_mode()
is used. Ignored ifcell
is given. - vocab_size (int, optional) – Vocabulary size. Required if
output_layer
is None. - output_layer (optional) –
An output layer that transforms cell output to logits. This can be:
- A callable layer, e.g., an instance of tf.layers.Layer.
- A tensor. A dense layer will be created using the tensor as the kernel weights. The bias of the dense layer is determined by hparams.output_layer_bias. This can be used to tie the output layer with the input embedding matrix, as proposed in https://arxiv.org/pdf/1608.05859.pdf
- None. A dense layer will be created based on attr:vocab_size and hparams.output_layer_bias.
- If no output layer after the cell output is needed, set (vocab_size=None, output_layer=tf.identity).
- cell_input_fn (callable, optional) – A callable that produces RNN cell inputs. If None (default), the default is used: lambda inputs, attention: tf.concat([inputs, attention], -1), which cancats regular RNN cell inputs with attentions.
- hparams (dict, optional) – Hyperparameters. Missing
hyperparamerter will be set to default values. See
default_hparams()
for the hyperparameter sturcture and default values.
See
_build()
for the inputs and outputs of the decoder. The decoder returns (outputs, final_state, sequence_lengths), where outputs is an instance ofAttentionRNNDecoderOutput
.Example
# Encodes the source enc_embedder = WordEmbedder(data.source_vocab.size, ...) encoder = UnidirectionalRNNEncoder(...) enc_outputs, _ = encoder( inputs=enc_embedder(data_batch['source_text_ids']), sequence_length=data_batch['source_length']) # Decodes while attending to the source dec_embedder = WordEmbedder(vocab_size=data.target_vocab.size, ...) decoder = AttentionRNNDecoder( memory=enc_outputs, memory_sequence_length=data_batch['source_length'], vocab_size=data.target_vocab.size) outputs, _, _ = decoder( decoding_strategy='train_greedy', inputs=dec_embedder(data_batch['target_text_ids']), sequence_length=data_batch['target_length']-1)
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values:
Common hyperparameters are the same as in
BasicRNNDecoder
.default_hparams()
. Additional hyperparameters are for attention mechanism configuration.{ "attention": { "type": "LuongAttention", "kwargs": { "num_units": 256, }, "attention_layer_size": None, "alignment_history": False, "output_attention": True, }, # The following hyperparameters are the same as with # `BasicRNNDecoder` "rnn_cell": default_rnn_cell_hparams(), "max_decoding_length_train": None, "max_decoding_length_infer": None, "helper_train": { "type": "TrainingHelper", "kwargs": {} } "helper_infer": { "type": "SampleEmbeddingHelper", "kwargs": {} } "name": "attention_rnn_decoder" }
Here:
- “attention”: dict
Attention hyperparameters, including:
- “type”: str or class or instance
The attention type. Can be an attention class, its name or module path, or a class instance. The class must be a subclass of TF AttentionMechanism. If class name is given, the class must be from modules tf.contrib.seq2seq or
texar.tf.custom
.Example:
# class name "type": "LuongAttention" "type": "BahdanauAttention" # module path "type": "tf.contrib.seq2seq.BahdanauMonotonicAttention" "type": "my_module.MyAttentionMechanismClass" # class "type": tf.contrib.seq2seq.LuongMonotonicAttention # instance "type": LuongAttention(...)
- “kwargs”: dict
keyword arguments for the attention class constructor. Arguments
memory
andmemory_sequence_length
should not be specified here because they are given to the decoder constructor. Ignored if “type” is an attention class instance. For exampleExample:
"type": "LuongAttention", "kwargs": { "num_units": 256, "probability_fn": tf.nn.softmax }
Here “probability_fn” can also be set to the string name or module path to a probability function.
- “attention_layer_size”: int or None
- The depth of the attention (output) layer. The context and cell output are fed into the attention layer to generate attention at each time step. If None (default), use the context as attention at each time step.
- “alignment_history”: bool
- whether to store alignment history from all time steps in the final output state. (Stored as a time major TensorArray on which you must call stack().)
- “output_attention”: bool
- If True (default), the output at each time step is the attention value. This is the behavior of Luong-style attention mechanisms. If False, the output at each time step is the output of cell. This is the beahvior of Bhadanau-style attention mechanisms. In both cases, the attention tensor is propagated to the next time step via the state and is used there. This flag only controls whether the attention mechanism is propagated up to the next cell in an RNN stack or to the top RNN output.
-
zero_state
(batch_size, dtype)[source]¶ Returns zero state of the basic cell. Equivalent to
decoder.cell._cell.zero_state
.
-
wrapper_zero_state
(batch_size, dtype)[source]¶ Returns zero state of the attention-wrapped cell. Equivalent to
decoder.cell.zero_state
.
-
state_size
¶ The state size of the basic cell. Equivalent to
decoder.cell._cell.state_size
.
-
wrapper_state_size
¶ The state size of the attention-wrapped cell. Equivalent to
decoder.cell.state_size
.
-
batch_size
¶ The batch size of input values.
-
cell
¶ The RNN cell.
-
name
¶ The uniquified name of the module.
-
output_layer
¶ The output layer.
-
trainable_variables
¶ The list of trainable variables of the module.
-
variable_scope
¶ The variable scope of the module.
-
vocab_size
¶ The vocab size.
AttentionRNNDecoderOutput¶
-
class
texar.tf.modules.
AttentionRNNDecoderOutput
[source]¶ The outputs of attention RNN decoders that additionally include attention results.
-
logits
¶ The outputs of RNN (at each step/of all steps) by applying the output layer on cell outputs. E.g., in
AttentionRNNDecoder
, this is a Tensor of shape [batch_size, max_time, vocab_size] after decoding.
-
sample_id
¶ The sampled results (at each step/of all steps). E.g., in
AttentionRNNDecoder
with decoding strategy of train_greedy, this is a Tensor of shape [batch_size, max_time] containing the sampled token indexes of all steps.
-
cell_output
¶ The output of RNN cell (at each step/of all steps). This is the results prior to the output layer. E.g., in AttentionRNNDecoder with default hyperparameters, this is a Tensor of shape [batch_size, max_time, cell_output_size] after decoding the whole sequence.
-
attention_scores
¶ A single or tuple of Tensor(s) containing the alignments emitted (at the previous time step/of all time steps) for each attention mechanism.
-
attention_context
¶ The attention emitted (at the previous time step/of all time steps).
-
GPT2Decoder¶
-
class
texar.tf.modules.
GPT2Decoder
(pretrained_model_name=None, cache_dir=None, hparams=None)[source]¶ Raw GPT2 Transformer for decoding sequences. Please see
PretrainedGPT2Mixin
for a brief description of GPT2.This module basically stacks
WordEmbedder
,PositionEmbedder
,TransformerDecoder
.This module supports the architecture first proposed in (Radford et al.) GPT2.
Parameters: - pretrained_model_name (optional) – a str, the name
of pre-trained model (e.g.,
gpt2-small
). Please refer toPretrainedGPT2Mixin
for all supported models. If None, the model name inhparams
is used. - cache_dir (optional) – the path to a folder in which the
pre-trained models will be cached. If None (default),
a default directory (
texar_data
folder under user’s home directory) will be used. - hparams (dict or HParams, optional) – Hyperparameters. Missing
hyperparameter will be set to default values. See
default_hparams()
for the hyperparameter structure and default values.
-
_build
(decoding_strategy='train_greedy', inputs=None, memory=None, memory_sequence_length=None, memory_attention_bias=None, beam_width=None, length_penalty=0.0, start_tokens=None, end_token=None, context=None, context_sequence_length=None, softmax_temperature=None, max_decoding_length=None, impute_finished=False, helper=None, mode=None)[source]¶ Performs decoding. Has exact the same interfaces with
texar.tf.modules.TransformerDecoder._build()
except inputs which is a tensor with shape [batch_size, max_time]. Please refer to it for the detailed usage.
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
- The decoder arch is determined by the constructor argument
pretrained_model_name
if it’s specified. In this case, hparams are ignored. - Otherwise, the encoder arch is determined by hparams[‘pretrained_model_name’] if it’s specified. All other configurations in hparams are ignored.
- If the above two are None, the decoder arch is defined by the configurations in hparams and weights are randomly initialized.
{ "name": "gpt2_decoder", "pretrained_model_name": "gpt2-small", "vocab_size": 50257, "context_size": 1024, "embedding_size": 768, "embed": { "dim": 768, "name": "word_embeddings" }, "position_size": 1024, "position_embed": { "dim": 768, "name": "position_embeddings" }, # hparams for TransformerDecoder "decoder": { "dim": 768, "num_blocks": 12, "use_gpt_config": True, "embedding_dropout": 0, "residual_dropout": 0, "multihead_attention": { "use_bias": True, "num_units": 768, "num_heads": 12, "dropout_rate": 0.0, "output_dim": 768 }, "initializer": { "type": "variance_scaling_initializer", "kwargs": { "factor": 1.0, "mode": "FAN_AVG", "uniform": True } }, "poswise_feedforward": { "layers": [ { "type": "Dense", "kwargs": { "activation": "gelu", "name": "intermediate", "units": 3072, "use_bias": True } }, { "type": "Dense", "kwargs": { "activation": None, "name": "output", "units": 3072, "use_bias": True } } ], "name": "ffn" } }, "name": "gpt2_decoder", }
Here:
The default parameters are values for 124M GPT2 model.
- “pretrained_model_name”: str or None
- The name of the pre-trained GPT2 model. If None, the model will be randomly initialized.
- “embed”: dict
- Hyperparameters for word embedding layer.
- “vocab_size”: int
- The vocabulary size of inputs in GPT2Model.
- “position_embed”: dict
- Hyperparameters for position embedding layer.
- “position_size”: int
- The maximum sequence length that this model might ever be used with.
- “name”: str
- Name of the module.
- The decoder arch is determined by the constructor argument
- pretrained_model_name (optional) – a str, the name
of pre-trained model (e.g.,
beam_search_decode¶
-
texar.tf.modules.
beam_search_decode
(decoder_or_cell, embedding, start_tokens, end_token, beam_width, initial_state=None, tiled_initial_state=None, output_layer=None, length_penalty_weight=0.0, max_decoding_length=None, output_time_major=False, **kwargs)[source]¶ Performs beam search sampling decoding.
Parameters: - decoder_or_cell – An instance of
subclass of
RNNDecoderBase
, or an instance of RNNCell. The decoder or RNN cell to perform decoding. - embedding – A callable that takes a vector tensor of indexes (e.g.,
an instance of subclass of
EmbedderBase
), or theparams
argument for tf.nn.embedding_lookup. - start_tokens – int32 vector shaped [batch_size], the start tokens.
- end_token – int32 scalar, the token that marks end of decoding.
- beam_width (int) – Python integer, the number of beams.
- initial_state (optional) –
Initial state of decoding. If None (default), zero state is used.
The state must not be tiled with tile_batch. If you have an already-tiled initial state, use
tiled_initial_state
instead.In the case of attention RNN decoder, initial_state must not be an AttentionWrapperState. Instead, it must be a state of the wrapped RNNCell, which state will be wrapped into AttentionWrapperState automatically.
Ignored if
tiled_initial_state
is given. - tiled_initial_state (optional) –
Initial state that has been tiled (typicaly with tile_batch) so that the batch dimension has size batch_size * beam_width.
In the case of attention RNN decoder, this can be either a state of the wrapped RNNCell, or an AttentionWrapperState.
If not given,
initial_state
is used. - output_layer (optional) – A Layer instance to
apply to the RNN output prior to storing the result or sampling. If
None and
decoder_or_cell
is a decoder, the decoder’s output layer will be used. - length_penalty_weight – Float weight to penalize length. Disabled with 0.0 (default).
- max_decoding_length (optional) – A int scalar Tensor indicating the maximum allowed number of decoding steps. If None (default), decoding will continue until the end token is encountered.
- output_time_major (bool) – If True, outputs are returned as time major tensors. If False (default), outputs are returned as batch major tensors.
- **kwargs – Other keyword arguments for dynamic_decode except argument
maximum_iterations which is set to
max_decoding_length
.
Returns: A tuple (outputs, final_state, sequence_length), where
- outputs: An instance of FinalBeamSearchDecoderOutput.
- final_state: An instance of BeamSearchDecoderState.
- sequence_length: A Tensor of shape [batch_size] containing the lengths of samples.
Example
## Beam search with basic RNN decoder embedder = WordEmbedder(vocab_size=data.vocab.size) decoder = BasicRNNDecoder(vocab_size=data.vocab.size) outputs, _, _, = beam_search_decode( decoder_or_cell=decoder, embedding=embedder, start_tokens=[data.vocab.bos_token_id] * 100, end_token=data.vocab.eos_token_id, beam_width=5, max_decoding_length=60) sample_ids = sess.run(outputs.predicted_ids) sample_text = tx.utils.map_ids_to_strs(sample_id[:,:,0], data.vocab) print(sample_text) # [ # the first sequence sample . # the second sequence sample . # ... # ]
## Beam search with attention RNN decoder # Encodes the source enc_embedder = WordEmbedder(data.source_vocab.size, ...) encoder = UnidirectionalRNNEncoder(...) enc_outputs, enc_state = encoder( inputs=enc_embedder(data_batch['source_text_ids']), sequence_length=data_batch['source_length']) # Decodes while attending to the source dec_embedder = WordEmbedder(vocab_size=data.target_vocab.size, ...) decoder = AttentionRNNDecoder( memory=enc_outputs, memory_sequence_length=data_batch['source_length'], vocab_size=data.target_vocab.size) # Beam search outputs, _, _, = beam_search_decode( decoder_or_cell=decoder, embedding=dec_embedder, start_tokens=[data.vocab.bos_token_id] * 100, end_token=data.vocab.eos_token_id, beam_width=5, initial_state=enc_state, max_decoding_length=60)
- decoder_or_cell – An instance of
subclass of
TransformerDecoder¶
-
class
texar.tf.modules.
TransformerDecoder
(vocab_size=None, output_layer=None, hparams=None)[source]¶ Transformer decoder that applies multi-head self-attention for sequence decoding.
It is a stack of
MultiheadAttentionEncoder
,FeedForwardNetwork
and residual connections.Parameters: - vocab_size (int, optional) – Vocabulary size. Required if
output_layer
is None. - output_layer (optional) –
An output layer that transforms cell output to logits. This can be:
- A callable layer, e.g., an instance of tf.layers.Layer.
- A tensor. A dense layer will be created using the tensor as the kernel weights. The bias of the dense layer is determined by hparams.output_layer_bias. This can be used to tie the output layer with the input embedding matrix, as proposed in https://arxiv.org/pdf/1608.05859.pdf
- None. A dense layer will be created based on attr:vocab_size and hparams.output_layer_bias.
- If no output layer in the end is needed, set (vocab_size=None, output_layer=tf.identity).
-
_build
(decoding_strategy='train_greedy', inputs=None, memory=None, memory_sequence_length=None, memory_attention_bias=None, beam_width=None, length_penalty=0.0, start_tokens=None, end_token=None, context=None, context_sequence_length=None, softmax_temperature=None, max_decoding_length=None, impute_finished=False, embedding=None, helper=None, mode=None)[source]¶ Performs decoding.
The interface is mostly the same with that of RNN decoders (see
_build()
). The main difference is that, here, sequence_length is not needed, and continuation generation is additionally supported.The function provides 3 ways to specify the decoding method, with varying flexibility:
The
decoding_strategy
argument.- “train_greedy”: decoding in teacher-forcing fashion (i.e.,
feeding ground truth to decode the next step), and for each step
sample is obtained by taking the argmax of logits.
Argument
inputs
is required for this strategy. - “infer_greedy”: decoding in inference fashion (i.e., feeding
generated sample to decode the next step), and for each step
sample is obtained by taking the argmax of logits.
Arguments
(start_tokens, end_token, embedding)
are required for this strategy, and argumentmax_decoding_length
is optional. - “infer_sample”: decoding in inference fashion, and for each
step sample is obtained by random sampling from the logits.
Arguments
(start_tokens, end_token, embedding)
are required for this strategy, and argumentmax_decoding_length
is optional.
- “train_greedy”: decoding in teacher-forcing fashion (i.e.,
feeding ground truth to decode the next step), and for each step
sample is obtained by taking the argmax of logits.
Argument
This argument is used only when argumentshelper
andbeam_width
are both None.The
helper
argument: An instance of subclass oftexar.tf.modules.Helper
. This provides a superset of decoding strategies than above. The interface is the same as in RNN decoders. Please refer totexar.tf.modules.RNNDecoderBase._build()
for detailed usage and examples.Note that, here, though using a
TrainingHelper
corresponds to the “train_greedy” strategy above and will get the same output results, the implementation is slower than directly setting decoding_strategy = “train_greedy”.Argument
max_decoding_length
is optional.Beam search: set
beam_width
to use beam search decoding. Arguments(start_tokens, end_token)
are required, and argumentmax_decoding_length
is optional.
Parameters: - memory (optional) – The memory to attend, e.g., the output of an RNN encoder. A Tensor of shape [batch_size, memory_max_time, dim].
- memory_sequence_length (optional) – A Tensor of shape [batch_size]
containing the sequence lengths for the batch entries in
memory. Used to create attention bias of
memory_attention_bias
is not given. Ignored if memory_attention_bias is provided. - memory_attention_bias (optional) – A Tensor of shape
[batch_size, num_heads, memory_max_time, dim].
An attention bias typically sets the value of a padding
position to a large negative value for masking. If not given,
memory_sequence_length
is used to automatically create an attention bias. - inputs (optional) – Input tensor for teacher forcing decoding, of
shape [batch_size, target_max_time, emb_dim] containing the
target sequence word embeddings.
Used when
decoding_strategy
is set to “train_greedy”. - decoding_strategy (str) – A string specifying the decoding
strategy, including “train_greedy”, “infer_greedy”,
“infer_sample”.
Different arguments are required based on the
strategy. See above for details. Ignored if
beam_width
orhelper
is set. - beam_width (int) – Set to use beam search. If given,
decoding_strategy
is ignored. - length_penalty (float) – Length penalty coefficient used in beam search decoding. Refer to https://arxiv.org/abs/1609.08144 for more details. It Should be larger if longer sentences are wanted.
- start_tokens (optional) – An int Tensor of shape [batch_size],
containing the start tokens.
Used when
decoding_strategy
= “infer_greedy” or “infer_sample”, orbeam_width
is set. Ignored when context is set. - end_token (optional) – An int 0D Tensor, the token that marks end
of decoding.
Used when
decoding_strategy
= “infer_greedy” or “infer_sample”, orbeam_width
is set. - context (optional) – An int Tensor of shape [batch_size, length], containing the starting tokens for decoding. If context is set, the start_tokens will be ignored.
- context_sequence_length (optional) – specify the length of context.
- softmax_temperature (optional) – A float 0D Tensor, value to divide the logits by before computing the softmax. Larger values (above 1.0) result in more random samples. Must > 0. If None, 1.0 is used. Used when decoding_strategy = “infer_sample”.
- max_decoding_length (optional) – An int scalar Tensor indicating
the maximum allowed number of decoding steps.
If None (default), use “max_decoding_length” defined in
hparams
. Ignored in “train_greedy” decoding. - impute_finished (bool) – If True, then states for batch entries which are marked as finished get copied through and the corresponding outputs get zeroed out. This causes some slowdown at each time step, but ensures that the final state and outputs have the correct values and that backprop ignores time steps that were marked as finished. Ignored in “train_greedy” decoding.
- embedding (optional) – Embedding used when “infer_greedy” or “infer_sample” decoding_strategy, or beam search, is used. This can be a callable or the params argument for embedding_lookup. If a callable, it can take a vector tensor of token ids, or take two arguments (ids, times), where ids is a vector tensor of token ids, and times is a vector tensor of time steps (i.e., position ids). The latter case can be used when attr:embedding is a combination of word embedding and position embedding.
- helper (optional) – An instance of
Helper that defines the
decoding strategy. If given,
decoding_strategy
is ignored. - mode (optional) – A tensor taking value in
tf.estimator.ModeKeys, including
TRAIN, EVAL, and PREDICT. Controls dropout mode.
If None (default),
texar.tf.global_mode()
is used.
Returns: For “train_greedy” decoding, returns an instance of
TransformerDecoderOutput
which contains sample_id and logits.For “infer_greedy” and “infer_sample” decoding or decoding with
helper
, returns a tuple (outputs, sequence_lengths), where outputs is an instance ofTransformerDecoderOutput
as in “train_greedy”, and sequence_lengths is a Tensor of shape [batch_size] containing the length of each sample.For beam search decoding, returns a dict containing keys “sample_id” and “log_prob”.
- ”sample_id” is an int Tensor of shape [batch_size, max_time, beam_width] containing generated token indexes. sample_id[:,:,0] is the highest-probable sample.
- ”log_prob” is a float Tensor of shape [batch_size, beam_width] containing the log probability of each sequence sample.
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
{ # Same as in TransformerEncoder "num_blocks": 6, "dim": 512, "embedding_dropout": 0.1, "residual_dropout": 0.1, "poswise_feedforward": default_transformer_poswise_net_hparams, "multihead_attention": { 'name': 'multihead_attention', 'num_units': 512, 'output_dim': 512, 'num_heads': 8, 'dropout_rate': 0.1, 'output_dim': 512, 'use_bias': False, }, "initializer": None, "name": "transformer_decoder" # Additional for TransformerDecoder "embedding_tie": True, "output_layer_bias": False, "max_decoding_length": int(1e10), }
Here:
- “num_blocks”: int
- Number of stacked blocks.
- “dim”: int
- Hidden dimension of the encoder.
- “embedding_dropout”: float
- Dropout rate of the input word and position embeddings.
- “residual_dropout”: float
- Dropout rate of the residual connections.
- “poswise_feedforward”: dict
Hyperparameters for a feed-forward network used in residual connections. Make sure the dimension of the output tensor is equal to dim.
See
default_transformer_poswise_net_hparams()
for details.- “multihead_attention”: dict
Hyperparameters for the multihead attention strategy. Make sure the output_dim in this module is equal to dim.
See
default_hparams()
for details.- “initializer”: dict, optional
- Hyperparameters of the default initializer that initializes
variables created in this module.
See
get_initializer()
for details. - “output_layer_bias”: bool
- Whether to use bias to the output layer.
Used only if
output_layer
is None when constructing the class instance. - “max_decoding_length”: int
The maximum allowed number of decoding steps. Set to a very large number of avoid the length constraint. Ignored if provided in
_build()
or “train_greedy” decoding is used.Length penalty coefficient. Refer to https://arxiv.org/abs/1609.08144 for more details.
- “name”: str
- Name of the module.
-
batch_size
¶ The batch size of input values.
-
output_size
¶ Output size of one step.
-
output_dtype
¶ Types of output of one step.
-
initialize
(name=None)[source]¶ Called before any decoding iterations.
This methods computes initial input values and initial state (i.e. cache).
Parameters: name – Name scope for any created operations. Returns: (finished, initial_inputs, initial_state), representing initial values of finished flags, inputs and state (i.e. cache).
-
step
(time, inputs, state, name=None)[source]¶ Called per step of decoding.
Parameters: - time – Scalar int32 tensor. Current step number.
- inputs – Input tensor for this time step.
- state – State (i.e. cache) from previous time step.
- name – Name scope for any created operations.
Returns: (outputs, next_state, next_inputs, finished). outputs is an object containing the decoder output, next_state is the state (i.e. cache), next_inputs is the tensor that should be used as input for the next step, finished is a boolean tensor telling whether the sequence is complete, for each sequence in the batch.
-
vocab_size
¶ The vocab size.
- vocab_size (int, optional) – Vocabulary size. Required if
TransformerDecoderOutput¶
-
class
texar.tf.modules.
TransformerDecoderOutput
[source]¶ The output of
TransformerDecoder
.-
logits
¶ A float Tensor of shape [batch_size, max_time, vocab_size] containing the logits.
-
sample_id
¶ An int Tensor of shape [batch_size, max_time] containing the sampled token indexes.
-
Helper¶
-
class
texar.tf.modules.
Helper
[source]¶ Interface for implementing different decoding strategies in
RNN decoders
andTransformer decoder
.Adapted from the tensorflow.contrib.seq2seq package.
-
batch_size
¶ Batch size of tensor returned by sample.
Returns a scalar int32 tensor.
-
sample_ids_shape
¶ Shape of tensor returned by sample, excluding the batch dimension.
Returns a TensorShape.
-
sample_ids_dtype
¶ DType of tensor returned by sample.
Returns a DType.
-
GreedyEmbeddingHelper¶
-
class
texar.tf.modules.
GreedyEmbeddingHelper
(embedding, start_tokens, end_token)[source]¶ A helper for use during inference.
Uses the argmax of the output (treated as logits) and passes the result through an embedding layer to get the next input.
Note that for greedy decoding, Texar’s decoders provide a simpler interface by specifying decoding_strategy=’infer_greedy’ when calling a decoder (see, e.g.,,
RNN decoder
). In this case, use of GreedyEmbeddingHelper is not necessary.-
batch_size
¶ Batch size of tensor returned by sample.
Returns a scalar int32 tensor.
-
sample_ids_shape
¶ Shape of tensor returned by sample, excluding the batch dimension.
Returns a TensorShape.
-
sample_ids_dtype
¶ DType of tensor returned by sample.
Returns a DType.
-
SampleEmbeddingHelper¶
-
class
texar.tf.modules.
SampleEmbeddingHelper
(embedding, start_tokens, end_token, softmax_temperature=None, seed=None)[source]¶ A helper for use during inference.
Uses sampling (from a distribution) instead of argmax and passes the result through an embedding layer to get the next input.
Note that for sample decoding, Texar’s decoders provide a simpler interface by specifying decoding_strategy=’infer_sample’ when calling a decoder (see, e.g.,,
RNN decoder
). In this case, use of SampleEmbeddingHelper is not necessary.-
batch_size
¶ Batch size of tensor returned by sample.
Returns a scalar int32 tensor.
-
initialize
(name=None)¶ Returns (initial_finished, initial_inputs).
-
next_inputs
(time, outputs, state, sample_ids, name=None)¶ Gets the inputs for next step.
-
sample_ids_dtype
¶ DType of tensor returned by sample.
Returns a DType.
-
sample_ids_shape
¶ Shape of tensor returned by sample, excluding the batch dimension.
Returns a TensorShape.
-
TopKSampleEmbeddingHelper¶
-
class
texar.tf.modules.
TopKSampleEmbeddingHelper
(embedding, start_tokens, end_token, top_k=10, softmax_temperature=None, seed=None)[source]¶ A helper for use during inference.
Samples from top_k most likely candidates from a vocab distribution, and passes the result through an embedding layer to get the next input.
-
batch_size
¶ Batch size of tensor returned by sample.
Returns a scalar int32 tensor.
-
initialize
(name=None)¶ Returns (initial_finished, initial_inputs).
-
next_inputs
(time, outputs, state, sample_ids, name=None)¶ Gets the inputs for next step.
-
sample_ids_dtype
¶ DType of tensor returned by sample.
Returns a DType.
-
sample_ids_shape
¶ Shape of tensor returned by sample, excluding the batch dimension.
Returns a TensorShape.
-
SoftmaxEmbeddingHelper¶
-
class
texar.tf.modules.
SoftmaxEmbeddingHelper
(embedding, start_tokens, end_token, tau, embedding_size=None, stop_gradient=False, use_finish=True)[source]¶ A helper that feeds softmax probabilities over vocabulary to the next step. Uses the softmax probability vector to pass through word embeddings to get the next input (i.e., a mixed word embedding).
A subclass of Helper. Used as a helper to
RNNDecoderBase
_build()
in inference mode.Parameters: - embedding – A callable or the params argument for tf.nn.embedding_lookup. If a callable, it can take a float tensor named soft_ids which is a distribution over indexes. For example, the shape of the tensor is typically [batch_size, vocab_size]. The callable can also take two arguments (soft_ids, times), where soft_ids is as above, and times is an int vector tensor of current time steps (i.e., position ids). The latter case can be used when attr:embedding is a combination of word embedding and position embedding.
- start_tokens – An int tensor shaped [batch_size]. The start tokens.
- end_token – An int scalar tensor. The token that marks end of decoding.
- tau – A float scalar tensor, the softmax temperature.
- embedding_size (optional) – An int scalar tensor, the number of
embedding vectors. Usually it is the vocab size. Required if
embedding
is a callable. - stop_gradient (bool) – Whether to stop the gradient backpropagation when feeding softmax vector to the next step.
- use_finish (bool) – Whether to stop decoding once end_token is generated. If False, decoding will continue until max_decoding_length of the decoder is reached.
-
batch_size
¶ Batch size of tensor returned by sample.
Returns a scalar int32 tensor.
-
sample_ids_dtype
¶ DType of tensor returned by sample.
Returns a DType.
-
sample_ids_shape
¶ Shape of tensor returned by sample, excluding the batch dimension.
Returns a TensorShape.
GumbelSoftmaxEmbeddingHelper¶
-
class
texar.tf.modules.
GumbelSoftmaxEmbeddingHelper
(embedding, start_tokens, end_token, tau, embedding_size=None, straight_through=False, stop_gradient=False, use_finish=True)[source]¶ A helper that feeds gumbel softmax sample to the next step. Uses the gumbel softmax vector to pass through word embeddings to get the next input (i.e., a mixed word embedding).
A subclass of Helper. Used as a helper to
RNNDecoderBase
_build()
in inference mode.Same as
SoftmaxEmbeddingHelper
except that here gumbel softmax (instead of softmax) is used.Parameters: - embedding – A callable or the params argument for tf.nn.embedding_lookup. If a callable, it can take a float tensor named soft_ids which is a distribution over indexes. For example, the shape of the tensor is typically [batch_size, vocab_size]. The callable can also take two arguments (soft_ids, times), where soft_ids is as above, and times is an int vector tensor of current time steps (i.e., position ids). The latter case can be used when attr:embedding is a combination of word embedding and position embedding.
- start_tokens – An int tensor shaped [batch_size]. The start tokens.
- end_token – An int scalar tensor. The token that marks end of decoding.
- tau – A float scalar tensor, the softmax temperature.
- embedding_size (optional) – An int scalar tensor, the number of
embedding vectors. Usually it is the vocab size. Required if
embedding
is a callable. - straight_through (bool) – Whether to use straight through gradient between time steps. If True, a single token with highest probability (i.e., greedy sample) is fed to the next step and gradient is computed using straight through. If False (default), the soft gumbel-softmax distribution is fed to the next step.
- stop_gradient (bool) – Whether to stop the gradient backpropagation when feeding softmax vector to the next step.
- use_finish (bool) – Whether to stop decoding once end_token is generated. If False, decoding will continue until max_decoding_length of the decoder is reached.
-
batch_size
¶ Batch size of tensor returned by sample.
Returns a scalar int32 tensor.
-
initialize
(name=None)¶ Returns (initial_finished, initial_inputs).
-
next_inputs
(time, outputs, state, sample_ids, name=None)¶ Returns (finished, next_inputs, next_state).
-
sample_ids_dtype
¶ DType of tensor returned by sample.
Returns a DType.
-
sample_ids_shape
¶ Shape of tensor returned by sample, excluding the batch dimension.
Returns a TensorShape.
TrainingHelper¶
-
class
texar.tf.modules.
TrainingHelper
(inputs, sequence_length, time_major=False, name=None)[source]¶ A helper for use during training. Performs teacher-forcing decoding.
Returned sample_ids are the argmax of the RNN output logits.
Note that for teacher-forcing decoding, Texar’s decoders provide a simpler interface by specifying decoding_strategy=’train_greedy’ when calling a decoder (see, e.g.,,
RNN decoder
). In this case, use of TrainingHelper is not necessary.-
batch_size
¶ Batch size of tensor returned by sample.
Returns a scalar int32 tensor.
-
sample_ids_shape
¶ Shape of tensor returned by sample, excluding the batch dimension.
Returns a TensorShape.
-
sample_ids_dtype
¶ DType of tensor returned by sample.
Returns a DType.
-
ScheduledEmbeddingTrainingHelper¶
-
class
texar.tf.modules.
ScheduledEmbeddingTrainingHelper
(inputs, sequence_length, embedding, sampling_probability, time_major=False, seed=None, scheduling_seed=None, name=None)[source]¶ A training helper that adds scheduled sampling.
Returns -1s for sample_ids where no sampling took place; valid sample id values elsewhere.
-
batch_size
¶ Batch size of tensor returned by sample.
Returns a scalar int32 tensor.
-
sample_ids_dtype
¶ DType of tensor returned by sample.
Returns a DType.
-
sample_ids_shape
¶ Shape of tensor returned by sample, excluding the batch dimension.
Returns a TensorShape.
-
ScheduledOutputTrainingHelper¶
-
class
texar.tf.modules.
ScheduledOutputTrainingHelper
(inputs, sequence_length, sampling_probability, time_major=False, seed=None, next_inputs_fn=None, auxiliary_inputs=None, name=None)[source]¶ A training helper that adds scheduled sampling directly to outputs.
Returns False for sample_ids where no sampling took place; True elsewhere.
-
next_inputs
(time, outputs, state, sample_ids, name=None)[source]¶ Gets the next inputs for next step.
-
batch_size
¶ Batch size of tensor returned by sample.
Returns a scalar int32 tensor.
-
sample_ids_dtype
¶ DType of tensor returned by sample.
Returns a DType.
-
sample_ids_shape
¶ Shape of tensor returned by sample, excluding the batch dimension.
Returns a TensorShape.
-
InferenceHelper¶
-
class
texar.tf.modules.
InferenceHelper
(sample_fn, sample_shape, sample_dtype, start_inputs, end_fn, next_inputs_fn=None)[source]¶ A helper to use during inference with a custom sampling function.
-
batch_size
¶ Batch size of tensor returned by sample.
Returns a scalar int32 tensor.
-
sample_ids_shape
¶ Shape of tensor returned by sample, excluding the batch dimension.
Returns a TensorShape.
-
sample_ids_dtype
¶ DType of tensor returned by sample.
Returns a DType.
-
CustomHelper¶
-
class
texar.tf.modules.
CustomHelper
(initialize_fn, sample_fn, next_inputs_fn, sample_ids_shape=None, sample_ids_dtype=None)[source]¶ Base abstract class that allows the user to customize decoding.
-
batch_size
¶ Batch size of tensor returned by sample.
Returns a scalar int32 tensor.
-
sample_ids_shape
¶ Shape of tensor returned by sample, excluding the batch dimension.
Returns a TensorShape.
-
sample_ids_dtype
¶ DType of tensor returned by sample.
Returns a DType.
-
get_helper¶
-
texar.tf.modules.
get_helper
(helper_type, inputs=None, sequence_length=None, embedding=None, start_tokens=None, end_token=None, **kwargs)[source]¶ Creates a Helper instance.
Parameters: - helper_type – A Helper class, its name or module path, or a class instance. If a class instance is given, it is returned directly.
- inputs (optional) – Inputs to the RNN decoder, e.g., ground truth tokens for teacher forcing decoding.
- sequence_length (optional) – A 1D int Tensor containing the
sequence length of
inputs
. - embedding (optional) – A callable that takes a vector tensor of
indexes (e.g., an instance of subclass of
EmbedderBase
), or the params argument for embedding_lookup (e.g., the embedding Tensor). - start_tokens (optional) – A int Tensor of shape [batch_size], the start tokens.
- end_token (optional) – A int 0D Tensor, the token that marks end of decoding.
- **kwargs – Additional keyword arguments for constructing the helper.
Returns: A helper instance.
Classifiers¶
Conv1DClassifier¶
-
class
texar.tf.modules.
Conv1DClassifier
(hparams=None)[source]¶ Simple Conv-1D classifier. This is a combination of the
Conv1DEncoder
with a classification layer.Parameters: hparams (dict, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams()
for the hyperparameter sturcture and default values.Example
clas = Conv1DClassifier(hparams={'num_classes': 10}) inputs = tf.random_uniform([64, 20, 256]) logits, pred = clas(inputs) # logits == Tensor of shape [64, 10] # pred == Tensor of shape [64]
-
_build
(inputs, sequence_length=None, dtype=None, mode=None)[source]¶ Feeds the inputs through the network and makes classification.
The arguments are the same as in
Conv1DEncoder
.The predictions of binary classification (“num_classes”=1) and multi-way classification (“num_classes”>1) are different, as explained below.
Parameters: - inputs – The inputs to the network, which is a 3D tensor. See
Conv1DEncoder
for more details. - sequence_length (optional) – An int tensor of shape [batch_size]
containing the length of each element in
inputs
. If given, time steps beyond the length will first be masked out before feeding to the layers. - dtype (optional) – Type of the inputs. If not provided, infers from inputs automatically.
- mode (optional) – A tensor taking value in
tf.estimator.ModeKeys, including
TRAIN, EVAL, and PREDICT. If None,
texar.tf.global_mode()
is used.
Returns: A tuple (logits, pred), where
- `logits` is a Tensor of shape [batch_size, num_classes] for num_classes >1, and [batch_size] for num_classes =1 (i.e., binary classification).
- `pred` is the prediction, a Tensor of shape [batch_size] and type tf.int64. For binary classification, the standard sigmoid function is used for prediction, and the class labels are {0, 1}.
- inputs – The inputs to the network, which is a 3D tensor. See
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
{ # (1) Same hyperparameters as in Conv1DEncoder ... # (2) Additional hyperparameters "num_classes": 2, "logit_layer_kwargs": { "use_bias": False }, "name": "conv1d_classifier" }
Here:
1. Same hyperparameters as in
Conv1DEncoder
. See thedefault_hparams()
. An instance of Conv1DEncoder is created for feature extraction.Additional hyperparameters:
- “num_classes”: int
Number of classes:
- If `> 0`, an additional Dense layer is appended to the encoder to compute the logits over classes.
- If `<= 0`, no dense layer is appended. The number of classes is assumed to be the final dense layer size of the encoder.
- “logit_layer_kwargs”: dict
Keyword arguments for the logit Dense layer constructor, except for argument “units” which is set to “num_classes”. Ignored if no extra logit layer is appended.
- “name”: str
Name of the classifier.
-
trainable_variables
¶ The list of trainable variables of the module.
-
num_classes
¶ The number of classes.
-
nn
¶ The classifier neural network.
-
has_layer
(layer_name)[source]¶ Returns True if the network with the name exists. Returns False otherwise.
Parameters: layer_name (str) – Name of the layer.
-
layer_by_name
(layer_name)[source]¶ Returns the layer with the name. Returns ‘None’ if the layer name does not exist.
Parameters: layer_name (str) – Name of the layer.
-
layers_by_name
¶ A dictionary mapping layer names to the layers.
-
layers
¶ A list of the layers.
-
layer_names
¶ A list of uniquified layer names.
-
layer_outputs_by_name
(layer_name)[source]¶ Returns the output tensors of the layer with the specified name. Returns None if the layer name does not exist.
Parameters: layer_name (str) – Name of the layer.
-
layer_outputs
¶ A list containing output tensors of each layer.
-
name
¶ The uniquified name of the module.
-
variable_scope
¶ The variable scope of the module.
-
UnidirectionalRNNClassifier¶
-
class
texar.tf.modules.
UnidirectionalRNNClassifier
(cell=None, cell_dropout_mode=None, output_layer=None, hparams=None)[source]¶ One directional RNN classifier. This is a combination of the
UnidirectionalRNNEncoder
with a classification layer. Both step-wise classification and sequence-level classification are supported, specified inhparams
.Arguments are the same as in
UnidirectionalRNNEncoder
.Parameters: - cell – (RNNCell, optional) If not specified,
a cell is created as specified in
hparams["rnn_cell"]
. - cell_dropout_mode (optional) – A Tensor taking value of
tf.estimator.ModeKeys, which
toggles dropout in the RNN cell (e.g., activates dropout in
TRAIN mode). If None,
global_mode()
is used. Ignored ifcell
is given. - output_layer (optional) – An instance of
tf.layers.Layer. Applies to the RNN cell
output of each step. If None (default), the output layer is
created as specified in
hparams["output_layer"]
. - hparams (dict or HParams, optional) – Hyperparameters. Missing
hyperparamerter will be set to default values. See
default_hparams()
for the hyperparameter sturcture and default values.
-
_build
(inputs, sequence_length=None, initial_state=None, time_major=False, mode=None, **kwargs)[source]¶ Feeds the inputs through the network and makes classification.
The arguments are the same as in
UnidirectionalRNNEncoder
.Parameters: - inputs – A 3D Tensor of shape [batch_size, max_time, dim]. The first two dimensions batch_size and max_time may be exchanged if time_major=True is specified.
- sequence_length (optional) – A 1D int tensor of shape [batch_size]. Sequence lengths of the batch inputs. Used to copy-through state and zero-out outputs when past a batch element’s sequence length.
- initial_state (optional) – Initial state of the RNN.
- time_major (bool) – The shape format of the
inputs
andoutputs
Tensors. If True, these tensors are of shape [max_time, batch_size, depth]. If False (default), these tensors are of shape [batch_size, max_time, depth]. - mode (optional) – A tensor taking value in
tf.estimator.ModeKeys, including
TRAIN, EVAL, and PREDICT. Controls output layer dropout
if the output layer is specified with
hparams
. If None (default),texar.tf.global_mode()
is used. - return_cell_output (bool) – Whether to return the output of the RNN cell. This is the results prior to the output layer.
- **kwargs – Optional keyword arguments of tf.nn.dynamic_rnn, such as swap_memory, dtype, parallel_iterations, etc.
Returns: A tuple (logits, pred), containing the logits over classes and the predictions, respectively.
If “clas_strategy”==”final_time” or “all_time”
- If “num_classes”==1, logits and pred are of both shape [batch_size]
- If “num_classes”>1, logits is of shape [batch_size, num_classes] and pred is of shape [batch_size].
If “clas_strategy”==”time_wise”,
- If “num_classes”==1, logits and pred are of both shape [batch_size, max_time]
- If “num_classes”>1, logits is of shape [batch_size, max_time, num_classes] and pred is of shape [batch_size, max_time].
- If time_major is True, the batch and time dimensions are exchanged.
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
{ # (1) Same hyperparameters as in UnidirectionalRNNEncoder ... # (2) Additional hyperparameters "num_classes": 2, "logit_layer_kwargs": None, "clas_strategy": "final_time", "max_seq_length": None, "name": "unidirectional_rnn_classifier" }
Here:
1. Same hyperparameters as in
UnidirectionalRNNEncoder
. See thedefault_hparams()
. An instance of UnidirectionalRNNEncoder is created for feature extraction.Additional hyperparameters:
- “num_classes”: int
Number of classes:
- If `> 0`, an additional Dense layer is appended to the encoder to compute the logits over classes.
- If `<= 0`, no dense layer is appended. The number of classes is assumed to be the final dense layer size of the encoder.
- “logit_layer_kwargs”: dict
Keyword arguments for the logit Dense layer constructor, except for argument “units” which is set to “num_classes”. Ignored if no extra logit layer is appended.
- “clas_strategy”: str
The classification strategy, one of:
- “final_time”: Sequence-leve classification based on the output of the final time step. One sequence has one class.
- “all_time”: Sequence-level classification based on the output of all time steps. One sequence has one class.
- “time_wise”: Step-wise classfication, i.e., make classification for each time step based on its output.
- “max_seq_length”: int, optional
Maximum possible length of input sequences. Required if “clas_strategy” is “all_time”.
- “name”: str
Name of the classifier.
-
name
¶ The uniquified name of the module.
-
trainable_variables
¶ The list of trainable variables of the module.
-
variable_scope
¶ The variable scope of the module.
- cell – (RNNCell, optional) If not specified,
a cell is created as specified in
BertClassifier¶
-
class
texar.tf.modules.
BERTClassifier
(pretrained_model_name=None, cache_dir=None, hparams=None)[source]¶ Classifier based on BERT modules. Please see
PretrainedBERTMixin
for a brief description of BERT.This is a combination of the
BertEncoder
with a classification layer. Both step-wise classification and sequence-level classification are supported, specified inhparams
.Arguments are the same as in
BERTEncoder
.Parameters: - pretrained_model_name (optional) – a str, the name
of pre-trained model (e.g.,
bert-base-uncased
). Please refer toPretrainedBERTMixin
for all supported models. If None, the model name inhparams
is used. - cache_dir (optional) – the path to a folder in which the
pre-trained models will be cached. If None (default),
a default directory (
texar_data
folder under user’s home directory) will be used. - hparams (dict or HParams, optional) – Hyperparameters. Missing
hyperparameters will be set to default values. See
default_hparams()
for the hyperparameter structure and default values.
-
_build
(inputs, sequence_length=None, segment_ids=None, mode=None, **kwargs)[source]¶ Feeds the inputs through the network and makes classification.
The arguments are the same as in
BertEncoder
.Parameters: - inputs – A 2D Tensor of shape [batch_size, max_time], containing the token ids of tokens in input sequences.
- sequence_length (optional) – A 1D Tensor of shape [batch_size]. Input tokens beyond respective sequence lengths are masked out automatically.
- segment_ids (optional) – A 2D Tensor of shape [batch_size, max_time], containing the segment ids of tokens in input sequences. If None (default), a tensor with all elements set to zero is used.
- mode (optional) – A tensor taking value in
tf.estimator.ModeKeys,
including TRAIN, EVAL, and PREDICT. Used to toggle
dropout.
If None (default),
texar.tf.global_mode()
is used. - **kwargs – Keyword arguments.
Returns: A tuple (logits, pred), containing the logits over classes and the predictions, respectively.
If “clas_strategy”==”cls_time” or “all_time”
- If “num_classes”==1, logits and pred are of both shape [batch_size]
- If “num_classes”>1, logits is of shape [batch_size, num_classes] and pred is of shape [batch_size].
If “clas_strategy”==”time_wise”,
- If “num_classes”==1, logits and pred are of both shape [batch_size, max_time]
- If “num_classes”>1, logits is of shape [batch_size, max_time, num_classes] and pred is of shape [batch_size, max_time].
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
{ # (1) Same hyperparameters as in BertEncoder ... # (2) Additional hyperparameters "num_classes": 2, "logit_layer_kwargs": None, "clas_strategy": "cls_time", "max_seq_length": None, "dropout": 0.1, "name": "bert_classifier" }
Here:
1. Same hyperparameters as in
BertEncoder
. See thedefault_hparams()
. An instance of BertEncoder is created for feature extraction.Additional hyperparameters:
- “num_classes”: int
Number of classes:
- If > 0, an additional Dense layer is appended to the encoder to compute the logits over classes.
- If <= 0, no dense layer is appended. The number of classes is assumed to be the final dense layer size of the encoder.
- “logit_layer_kwargs”: dict
Keyword arguments for the logit Dense layer constructor, except for argument “units” which is set to num_classes. Ignored if no extra logit layer is appended.
- “clas_strategy”: str
The classification strategy, one of:
- cls_time: Sequence-level classification based on the output of the first time step (which is the CLS token). Each sequence has a class.
- all_time: Sequence-level classification based on the output of all time steps. Each sequence has a class.
- time_wise: Step-wise classification, i.e., make classification for each time step based on its output.
- “max_seq_length”: int, optional
Maximum possible length of input sequences. Required if clas_strategy is all_time.
- “dropout”: float
The dropout rate of the BERT encoder output.
- “name”: str
Name of the classifier.
-
classmethod
download_checkpoint
(pretrained_model_name, cache_dir=None)¶ Download the specified pre-trained checkpoint, and return the directory in which the checkpoint is cached.
Parameters: Returns: Path to the cache directory.
-
load_pretrained_config
(pretrained_model_name=None, cache_dir=None, hparams=None)¶ Load paths and configurations of the pre-trained model.
Parameters: - pretrained_model_name (optional) – A str with the name
of a pre-trained model to load. If None, will use the model
name in
hparams
. - cache_dir (optional) – The path to a folder in which the pre-trained models will be cached. If None (default), a default directory will be used.
- hparams (dict or HParams, optional) – Hyperparameters. Missing
hyperparameter will be set to default values. See
default_hparams()
for the hyperparameter structure and default values.
- pretrained_model_name (optional) – A str with the name
of a pre-trained model to load. If None, will use the model
name in
-
name
¶ The uniquified name of the module.
-
reset_parameters
()¶ Initialize parameters of the pre-trained model. This method is only called if pre-trained checkpoints are not loaded.
-
trainable_variables
¶ The list of trainable variables of the module.
-
variable_scope
¶ The variable scope of the module.
- pretrained_model_name (optional) – a str, the name
of pre-trained model (e.g.,
XLNetClassifier¶
-
class
texar.tf.modules.
XLNetClassifier
(pretrained_model_name=None, cache_dir=None, hparams=None)[source]¶ Classifier based on XLNet modules. Please see
PretrainedXLNetMixin
for a brief description of XLNet.This is a combination of the
XLNetEncoder
with a classification layer. Both step-wise classification and sequence-level classification are supported, specified inhparams
.Arguments are the same as in
XLNetEncoder
.Parameters: - pretrained_model_name (optional) – a str, the name
of pre-trained model (e.g.,
xlnet-based-cased
). Please refer toPretrainedXLNetMixin
for all supported models. If None, the model name inhparams
is used. - cache_dir (optional) – the path to a folder in which the
pre-trained models will be cached. If None (default),
a default directory (
texar_data
folder under user’s home directory) will be used. - hparams (dict or HParams, optional) – Hyperparameters. Missing
hyperparameters will be set to default values. See
default_hparams()
for the hyperparameter structure and default values.
-
_build
(token_ids, segment_ids=None, input_mask=None, mode=None)[source]¶ Feeds the inputs through the network and makes classification.
Parameters: - token_ids – Shape [batch_size, max_time].
- segment_ids – Shape [batch_size, max_time].
- input_mask – Float tensor of shape [batch_size, max_time]. Note that positions with value 1 are masked out.
- mode (optional) – A tensor taking value in
tf.estimator.ModeKeys,
including TRAIN, EVAL, and PREDICT. Used to toggle
dropout.
If None (default),
texar.tf.global_mode()
is used.
Returns: A tuple (logits, preds), containing the logits over classes and the predictions, respectively.
If
clas_strategy
iscls_time
orall_time
:- If
num_classes
== 1,logits
andpred
are both of shape[batch_size]
. - If
num_classes
> 1,logits
is of shape[batch_size, num_classes]
andpred
is of shape[batch_size]
.
- If
If
clas_strategy
istime_wise
:num_classes
== 1,logits
andpred
are both of shape[batch_size, max_time]
.- If
num_classes
> 1,logits
is of shape[batch_size, max_time, num_classes]
andpred
is of shape[batch_size, max_time]
.
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
{ # (1) Same hyperparameters as in XLNetEncoder ... # (2) Additional hyperparameters "clas_strategy": "cls_time", "use_projection": True, "num_classes": 2, "logit_layer_kwargs": None, "name": "xlnet_classifier", }
Here:
- Same hyperparameters as in
XLNetEncoder
. See thedefault_hparams()
. An instance of XLNetEncoder is created for feature extraction.
Additional hyperparameters:
- “clas_strategy”: str
The classification strategy, one of:
- cls_time: Sequence-level classification based on the output of the last time step (which is the CLS token). Each sequence has a class.
- all_time: Sequence-level classification based on the output of all time steps. Each sequence has a class.
- time_wise: Step-wise classification, i.e., make classification for each time step based on its output.
- “use_projection”: bool
If True, an additional Dense layer is added after the summary step.
- “num_classes”: int
Number of classes:
- If > 0, an additional dense layer is appended to the encoder to compute the logits over classes.
- If <= 0, no dense layer is appended. The number of classes is assumed to be the final dense layer size of the encoder.
- “logit_layer_kwargs” : dict
Keyword arguments for the logit Dense layer constructor, except for argument “units” which is set to “num_classes”. Ignored if no extra logit layer is appended.
- “name”: str
Name of the classifier.
-
param_groups
(lr=None, lr_layer_scale=1.0, decay_base_params=False)[source]¶ Create parameter groups for optimizers. When
lr_layer_decay_rate
is not 1.0, parameters from each layer form separate groups with different base learning rates.This method should be called before applying gradients to the variables through the optimizer. Particularly, after calling the optimizer’s compute_gradients method, the user can call this method to get variable-specific learning rates for the network. The gradients for each variables can then be scaled accordingly. These scaled gradients are finally applied by calling optimizer’s apply_gradients method.
Parameters: - lr (float) – The learning rate. Can be omitted if
lr_layer_decay_rate
is 1.0. - lr_layer_scale (float) – Per-layer LR scaling rate. The i-th layer will be scaled by lr_layer_scale ^ (num_layers - i - 1).
- decay_base_params (bool) – If True, treat non-layer parameters (e.g. embeddings) as if they’re in layer 0. If False, these parameters are not scaled.
Returns: A dict mapping tensorflow variables to their learning rates.
- lr (float) – The learning rate. Can be omitted if
- pretrained_model_name (optional) – a str, the name
of pre-trained model (e.g.,
Regressors¶
XLNetRegressor¶
-
class
texar.tf.modules.
XLNetRegressor
(pretrained_model_name=None, cache_dir=None, hparams=None)[source]¶ Regressor based on XLNet modules. Please see
PretrainedXLNetMixin
for a brief description of XLNet.This is a combination of the
XLNetEncoder
with a classification layer. Both step-wise classification and sequence-level classification are supported, specified inhparams
.Arguments are the same as in
XLNetEncoder
.Parameters: - pretrained_model_name (optional) – a str, the name
of pre-trained model (e.g.,
xlnet-based-cased
). Please refer toPretrainedXLNetMixin
for all supported models. If None, the model name inhparams
is used. - cache_dir (optional) – the path to a folder in which the
pre-trained models will be cached. If None (default),
a default directory (
texar_data
folder under user’s home directory) will be used. - hparams (dict or HParams, optional) – Hyperparameters. Missing
hyperparameters will be set to default values. See
default_hparams()
for the hyperparameter structure and default values.
-
_build
(token_ids, segment_ids=None, input_mask=None, mode=None)[source]¶ Feeds the inputs through the network and makes regression.
Parameters: - token_ids – Shape [batch_size, max_time].
- segment_ids – Shape [batch_size, max_time].
- input_mask – Float tensor of shape [batch_size, max_time]. Note that positions with value 1 are masked out.
- mode (optional) – A tensor taking value in
tf.estimator.ModeKeys,
including TRAIN, EVAL, and PREDICT. Used to toggle
dropout.
If None (default),
texar.tf.global_mode()
is used.
Returns: Regression predictions.
- If
regr_strategy
iscls_time
orall_time
, predictions have shape [batch_size]. - If
clas_strategy
istime_wise
, predictions have shape [batch_size, max_time].
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
{ # (1) Same hyperparameters as in XLNetEncoder ... # (2) Additional hyperparameters "regr_strategy": "cls_time", "use_projection": True, "logit_layer_kwargs": None, "name": "xlnet_regressor", }
Here:
Same hyperparameters as in
XLNetEncoder
. See thedefault_hparams()
. An instance of XLNetEncoder is created for feature extraction.Additional hyperparameters:
- “regr_strategy”: str
The regression strategy, one of:
- cls_time: Sequence-level regression based on the output of the first time step (which is the CLS token). Each sequence has a prediction.
- all_time: Sequence-level regression based on the output of all time steps. Each sequence has a prediction.
- time_wise: Step-wise regression, i.e., make regression for each time step based on its output.
- “logit_layer_kwargs” : dict
Keyword arguments for the logit Dense layer constructor, except for argument “units” which is set to “num_classes”. Ignored if no extra logit layer is appended.
- “use_projection”: bool
If True, an additional dense layer is added after the summary step.
- “name”: str
Name of the regressor.
-
param_groups
(lr=None, lr_layer_scale=1.0, decay_base_params=False)[source]¶ Create parameter groups for optimizers. When
lr_layer_decay_rate
is not 1.0, parameters from each layer form separate groups with different base learning rates.This method should be called before applying gradients to the variables through the optimizer. Particularly, after calling the optimizer’s compute_gradients method, the user can call this method to get variable-specific learning rates for the network. The gradients for each variables can then be scaled accordingly. These scaled gradients are finally applied by calling optimizer’s apply_gradients method.
Parameters: - lr (float) – The learning rate. Can be omitted if
lr_layer_decay_rate
is 1.0. - lr_layer_scale (float) – Per-layer LR scaling rate. The i-th layer will be scaled by lr_layer_scale ^ (num_layers - i - 1).
- decay_base_params (bool) – If True, treat non-layer parameters (e.g. embeddings) as if they’re in layer 0. If False, these parameters are not scaled.
Returns: A dict mapping tensorflow variables to their learning rates.
- lr (float) – The learning rate. Can be omitted if
- pretrained_model_name (optional) – a str, the name
of pre-trained model (e.g.,
Pre-trained¶
PretrainedMixin¶
-
class
texar.tf.modules.
PretrainedMixin
(hparams=None)[source]¶ A mixin class for all pre-trained classes to inherit.
-
load_pretrained_config
(pretrained_model_name=None, cache_dir=None, hparams=None)[source]¶ Load paths and configurations of the pre-trained model.
Parameters: - pretrained_model_name (optional) – A str with the name
of a pre-trained model to load. If None, will use the model
name in
hparams
. - cache_dir (optional) – The path to a folder in which the pre-trained models will be cached. If None (default), a default directory will be used.
- hparams (dict or HParams, optional) – Hyperparameters. Missing
hyperparameter will be set to default values. See
default_hparams()
for the hyperparameter structure and default values.
- pretrained_model_name (optional) – A str with the name
of a pre-trained model to load. If None, will use the model
name in
-
reset_parameters
()[source]¶ Initialize parameters of the pre-trained model. This method is only called if pre-trained checkpoints are not loaded.
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
{ "pretrained_model_name": None, "name": "pretrained_base" }
-
classmethod
download_checkpoint
(pretrained_model_name, cache_dir=None)[source]¶ Download the specified pre-trained checkpoint, and return the directory in which the checkpoint is cached.
Parameters: Returns: Path to the cache directory.
-
classmethod
_transform_config
(pretrained_model_name, cache_dir)[source]¶ Load the official configuration file and transform it into Texar-style hyperparameters.
Parameters: Returns: Texar module hyperparameters.
Return type:
-
PretrainedBERTMixin¶
-
class
texar.tf.modules.
PretrainedBERTMixin
(hparams=None)[source]¶ A mixin class to support loading pre-trained checkpoints for modules that implement the BERT model.
The BERT model was proposed in (Devlin et al. 2018) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding . A bidirectional Transformer language model pre-trained on large text corpora. Available model names include:
bert-base-uncased
: 12-layer, 768-hidden, 12-heads, 110M parameters.bert-large-uncased
: 24-layer, 1024-hidden, 16-heads, 340M parameters.bert-base-cased
: 12-layer, 768-hidden, 12-heads , 110M parameters.bert-large-cased
: 24-layer, 1024-hidden, 16-heads, 340M parameters.bert-base-multilingual-uncased
: 102 languages, 12-layer, 768-hidden, 12-heads, 110M parameters.bert-base-multilingual-cased
: 104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters.bert-base-chinese
: Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters.
We provide the following BERT classes:
BERTEncoder
for text encoding.BERTClassifier
for text classification and sequence tagging.
PretrainedXLNetMixin¶
-
class
texar.tf.modules.
PretrainedXLNetMixin
(hparams=None)[source]¶ A mixin class to support loading pre-trained checkpoints for modules that implement the XLNet model.
The XLNet model was proposed in XLNet: Generalized Autoregressive Pretraining for Language Understanding by Yang et al. It is based on the Transformer-XL model, pre-trained on a large corpus using a language modeling objective that considers all permutations of the input sentence.
The available XLNet models are as follows:
xlnet-based-cased
: 12-layer, 768-hidden, 12-heads. This model is trained on full data (different from the one in the paper).xlnet-large-cased
: 24-layer, 1024-hidden, 16-heads.
We provide the following XLNet classes:
XLNetEncoder
for text encoding.XLNetDecoder
for text generation and decoding.XLNetClassifier
for text classification and sequence tagging.XLNetRegressor
for text regression.
Connectors¶
ConnectorBase¶
-
class
texar.tf.modules.
ConnectorBase
(output_size, hparams=None)[source]¶ Base class inherited by all connector classes. A connector is to transform inputs into outputs with any specified structure and shape. For example, tranforming the final state of an encoder to the initial state of a decoder, and performing stochastic sampling in between as in Variational Autoencoders (VAEs).
Parameters: - output_size – Size of output excluding the batch dimension. For example, set output_size to dim to generate output of shape [batch_size, dim]. Can be an int, a tuple of int, a Tensorshape, or a tuple of TensorShapes. For example, to transform inputs to have decoder state size, set output_size=decoder.state_size.
- hparams (dict, optional) – Hyperparameters. Missing
hyperparamerter will be set to default values. See
default_hparams()
for the hyperparameter sturcture and default values.
-
output_size
¶ The output size.
-
name
¶ The uniquified name of the module.
-
trainable_variables
¶ The list of trainable variables of the module.
-
variable_scope
¶ The variable scope of the module.
ConstantConnector¶
-
class
texar.tf.modules.
ConstantConnector
(output_size, hparams=None)[source]¶ Creates a constant Tensor or (nested) tuple of Tensors that contains a constant value.
Parameters: - output_size – Size of output excluding the batch dimension. For example, set output_size to dim to generate output of shape [batch_size, dim]. Can be an int, a tuple of int, a Tensorshape, or a tuple of TensorShapes. For example, to transform inputs to have decoder state size, set output_size=decoder.state_size.
- hparams (dict, optional) – Hyperparameters. Missing
hyperparamerter will be set to default values. See
default_hparams()
for the hyperparameter sturcture and default values.
This connector does not have trainable parameters. See
_build()
for the inputs and outputs of the connector.Example
connector = Connector(cell.state_size) zero_state = connector(batch_size=64, value=0.) one_state = connector(batch_size=64, value=1.)
-
_build
(batch_size, value=None)[source]¶ Creates output tensor(s) that has the given value.
Parameters: - batch_size – An int or int scalar Tensor, the batch size.
- value (optional) – A scalar, the value that
the output tensor(s) has. If None, “value” in
hparams
is used.
Returns: A (structure of) tensor whose structure is the same as
output_size
, with value speicified by value orhparams
.
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
{ "value": 0., "name": "constant_connector" }
Here:
- “value”: float
- The constant scalar that the output tensor(s) has. Ignored if
value is given to
_build()
. - “name”: str
- Name of the connector.
-
name
¶ The uniquified name of the module.
-
output_size
¶ The output size.
-
trainable_variables
¶ The list of trainable variables of the module.
-
variable_scope
¶ The variable scope of the module.
ForwardConnector¶
-
class
texar.tf.modules.
ForwardConnector
(output_size, hparams=None)[source]¶ Transforms inputs to have specified structure.
Parameters: - output_size – Size of output excluding the batch dimension. For example, set output_size to dim to generate output of shape [batch_size, dim]. Can be an int, a tuple of int, a Tensorshape, or a tuple of TensorShapes. For example, to transform inputs to have decoder state size, set output_size=decoder.state_size.
- hparams (dict, optional) – Hyperparameters. Missing
hyperparamerter will be set to default values. See
default_hparams()
for the hyperparameter sturcture and default values.
This connector does not have trainable parameters. See
_build()
for the inputs and outputs of the connector.The input to the connector must have the same structure with
output_size
, or must have the same number of elements and be re-packable into the structure ofoutput_size
. Note that if input is or contains a dict instance, the keys will be sorted to pack in deterministic order (See pack_sequence_as for more details).Example
cell = LSTMCell(num_units=256) # cell.state_size == LSTMStateTuple(c=256, h=256) connector = ForwardConnector(cell.state_size) output = connector([tensor_1, tensor_2]) # output == LSTMStateTuple(c=tensor_1, h=tensor_2)
-
_build
(inputs)[source]¶ Transforms inputs to have the same structure as with
output_size
. Values of the inputs are not changed.inputs
must either have the same structure, or have the same number of elements withoutput_size
.Parameters: inputs – The input (structure of) tensor to pass forward. Returns: A (structure of) tensors that re-packs inputs to have the specified structure of output_size.
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
{ "name": "forward_connector" }
Here:
- “name”: str
- Name of the connector.
-
name
¶ The uniquified name of the module.
-
output_size
¶ The output size.
-
trainable_variables
¶ The list of trainable variables of the module.
-
variable_scope
¶ The variable scope of the module.
MLPTransformConnector¶
-
class
texar.tf.modules.
MLPTransformConnector
(output_size, hparams=None)[source]¶ Transforms inputs with an MLP layer and packs the results into the specified structure and size.
Parameters: - output_size – Size of output excluding the batch dimension. For example, set output_size to dim to generate output of shape [batch_size, dim]. Can be an int, a tuple of int, a Tensorshape, or a tuple of TensorShapes. For example, to transform inputs to have decoder state size, set output_size=decoder.state_size.
- hparams (dict, optional) – Hyperparameters. Missing
hyperparamerter will be set to default values. See
default_hparams()
for the hyperparameter sturcture and default values.
See
_build()
for the inputs and outputs of the connector.The input to the connector can have arbitrary structure and size.
Example
cell = LSTMCell(num_units=256) # cell.state_size == LSTMStateTuple(c=256, h=256) connector = MLPTransformConnector(cell.state_size) inputs = tf.zeros([64, 10]) output = connector(inputs) # output == LSTMStateTuple(c=tensor_of_shape_(64, 256), # h=tensor_of_shape_(64, 256))
## Use to connect encoder and decoder with different state size encoder = UnidirectionalRNNEncoder(...) _, final_state = encoder(inputs=...) decoder = BasicRNNDecoder(...) connector = MLPTransformConnector(decoder.state_size) _ = decoder( initial_state=connector(final_state), ...)
-
_build
(inputs)[source]¶ Transforms inputs with an MLP layer and packs the results to have the same structure as specified by
output_size
.Parameters: inputs – Input (structure of) tensors to be transformed. Must be a Tensor of shape [batch_size, …] or a (nested) tuple of such Tensors. That is, the first dimension of (each) tensor must be the batch dimension. Returns: A Tensor or a (nested) tuple of Tensors of the same structure of output_size.
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
{ "activation_fn": "identity", "name": "mlp_connector" }
Here:
- “activation_fn”: str or callable
- The activation function applied to the outputs of the MLP transformation layer. Can be a function, or its name or module path.
- “name”: str
- Name of the connector.
-
name
¶ The uniquified name of the module.
-
output_size
¶ The output size.
-
trainable_variables
¶ The list of trainable variables of the module.
-
variable_scope
¶ The variable scope of the module.
ReparameterizedStochasticConnector¶
-
class
texar.tf.modules.
ReparameterizedStochasticConnector
(output_size, hparams=None)[source]¶ Samples from a distribution with reparameterization trick, and transforms samples into specified size.
Reparameterization allows gradients to be back-propagated through the stochastic samples. Used in, e.g., Variational Autoencoders (VAEs).
Parameters: - output_size – Size of output excluding the batch dimension. For example, set output_size to dim to generate output of shape [batch_size, dim]. Can be an int, a tuple of int, a Tensorshape, or a tuple of TensorShapes. For example, to transform inputs to have decoder state size, set output_size=decoder.state_size.
- hparams (dict, optional) – Hyperparameters. Missing
hyperparamerter will be set to default values. See
default_hparams()
for the hyperparameter sturcture and default values.
Example
cell = LSTMCell(num_units=256) # cell.state_size == LSTMStateTuple(c=256, h=256) connector = ReparameterizedStochasticConnector(cell.state_size) kwargs = { 'loc': tf.zeros([batch_size, 10]), 'scale_diag': tf.ones([batch_size, 10]) } output, sample = connector(distribution_kwargs=kwargs) # output == LSTMStateTuple(c=tensor_of_shape_(batch_size, 256), # h=tensor_of_shape_(batch_size, 256)) # sample == Tensor([batch_size, 10]) kwargs = { 'loc': tf.zeros([10]), 'scale_diag': tf.ones([10]) } output_, sample_ = connector(distribution_kwargs=kwargs, num_samples=batch_size_) # output_ == LSTMStateTuple(c=tensor_of_shape_(batch_size_, 256), # h=tensor_of_shape_(batch_size_, 256)) # sample == Tensor([batch_size_, 10])
-
_build
(distribution='MultivariateNormalDiag', distribution_kwargs=None, transform=True, num_samples=None)[source]¶ Samples from a distribution and optionally performs transformation with an MLP layer.
The distribution must be reparameterizable, i.e., distribution.reparameterization_type = FULLY_REPARAMETERIZED.
Parameters: - distribution – A instance of subclass of TF Distribution, or tensorflow_probability Distribution, Can be a class, its name or module path, or a class instance.
- distribution_kwargs (dict, optional) – Keyword arguments for the distribution constructor. Ignored if distribution is a class instance.
- transform (bool) – Whether to perform MLP transformation of the
distribution samples. If False, the structure/shape of a
sample must match
output_size
. - num_samples (optional) – An int or int Tensor. Number of samples to generate. If not given, generate a single sample. Note that if batch size has already been included in distribution’s dimensionality, num_samples should be left as None.
Returns: A tuple (output, sample), where
- output: A Tensor or a (nested) tuple of Tensors with the same structure and size of
output_size
. The batch dimension equalsnum_samples
if specified, or is determined by the distribution dimensionality. Iftransform
is False,output
will be equal tosample
. - sample: The sample from the distribution, prior to transformation.
Raises: ValueError
– If distribution cannot be reparametrized.ValueError
– The output does not matchoutput_size
.
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
{ "activation_fn": "identity", "name": "reparameterized_stochastic_connector" }
Here:
- “activation_fn”: str
- The activation function applied to the outputs of the MLP transformation layer. Can be a function, or its name or module path.
- “name”: str
- Name of the connector.
-
name
¶ The uniquified name of the module.
-
output_size
¶ The output size.
-
trainable_variables
¶ The list of trainable variables of the module.
-
variable_scope
¶ The variable scope of the module.
StochasticConnector¶
-
class
texar.tf.modules.
StochasticConnector
(output_size, hparams=None)[source]¶ Samples from a distribution and transforms samples into specified size.
The connector is the same as
ReparameterizedStochasticConnector
, except that here reparameterization is disabled, and thus the gradients cannot be back-propagated through the stochastic samples.Parameters: - output_size – Size of output excluding the batch dimension. For example, set output_size to dim to generate output of shape [batch_size, dim]. Can be an int, a tuple of int, a Tensorshape, or a tuple of TensorShapes. For example, to transform inputs to have decoder state size, set output_size=decoder.state_size.
- hparams (dict, optional) – Hyperparameters. Missing
hyperparamerter will be set to default values. See
default_hparams()
for the hyperparameter sturcture and default values.
-
_build
(distribution='MultivariateNormalDiag', distribution_kwargs=None, transform=True, num_samples=None)[source]¶ Samples from a distribution and optionally performs transformation with an MLP layer.
The inputs and outputs are the same as
ReparameterizedStochasticConnector
except that the distribution does not need to be reparameterizable, and gradient cannot be back-propagate through the samples.Parameters: - distribution – A instance of subclass of TF Distribution, or tensorflow_probability Distribution. Can be a class, its name or module path, or a class instance.
- distribution_kwargs (dict, optional) – Keyword arguments for the distribution constructor. Ignored if distribution is a class instance.
- transform (bool) – Whether to perform MLP transformation of the
distribution samples. If False, the structure/shape of a
sample must match
output_size
. - num_samples (optional) – An int or int Tensor. Number of samples to generate. If not given, generate a single sample. Note that if batch size has already been included in distribution’s dimensionality, num_samples should be left as None.
Returns: A tuple (output, sample), where
- output: A Tensor or a (nested) tuple of Tensors with the same structure and size of
output_size
. The batch dimension equalsnum_samples
if specified, or is determined by the distribution dimensionality. Iftransform
is False,output
will be equal tosample
. - sample: The sample from the distribution, prior to transformation.
Raises: ValueError
– The output does not matchoutput_size
.
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
{ "activation_fn": "identity", "name": "stochastic_connector" }
Here:
- “activation_fn”: str
- The activation function applied to the outputs of the MLP transformation layer. Can be a function, or its name or module path.
- “name”: str
- Name of the connector.
-
name
¶ The uniquified name of the module.
-
output_size
¶ The output size.
-
trainable_variables
¶ The list of trainable variables of the module.
-
variable_scope
¶ The variable scope of the module.
Networks¶
FeedForwardNetworkBase¶
-
class
texar.tf.modules.
FeedForwardNetworkBase
(hparams=None)[source]¶ Base class inherited by all feed-forward network classes.
Parameters: hparams (dict, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams()
for the hyperparameter sturcture and default values.See
_build()
for the inputs and outputs.-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
{ "name": "NN" }
-
append_layer
(layer)[source]¶ Appends a layer to the end of the network. The method is only feasible before
_build
is called.Parameters: layer – A tf.layers.Layer instance, or a dict of layer hyperparameters.
-
has_layer
(layer_name)[source]¶ Returns True if the network with the name exists. Returns False otherwise.
Parameters: layer_name (str) – Name of the layer.
-
layer_by_name
(layer_name)[source]¶ Returns the layer with the name. Returns ‘None’ if the layer name does not exist.
Parameters: layer_name (str) – Name of the layer.
-
layers_by_name
¶ A dictionary mapping layer names to the layers.
-
layers
¶ A list of the layers.
-
layer_names
¶ A list of uniquified layer names.
-
layer_outputs_by_name
(layer_name)[source]¶ Returns the output tensors of the layer with the specified name. Returns None if the layer name does not exist.
Parameters: layer_name (str) – Name of the layer.
-
layer_outputs
¶ A list containing output tensors of each layer.
-
name
¶ The uniquified name of the module.
-
trainable_variables
¶ The list of trainable variables of the module.
-
variable_scope
¶ The variable scope of the module.
-
static
FeedForwardNetwork¶
-
class
texar.tf.modules.
FeedForwardNetwork
(layers=None, hparams=None)[source]¶ Feed-forward neural network that consists of a sequence of layers.
Parameters: - layers (list, optional) – A list of Layer
instances composing the network. If not given, layers are created
according to
hparams
. - hparams (dict, optional) – Embedder hyperparameters. Missing
hyperparamerter will be set to default values. See
default_hparams()
for the hyperparameter sturcture and default values.
See
_build()
ofFeedForwardNetworkBase
for the inputs and outputs.Example
hparams = { # Builds a two-layer dense NN "layers": [ { "type": "Dense", "kwargs": { "units": 256 }, { "type": "Dense", "kwargs": { "units": 10 } ] } nn = FeedForwardNetwork(hparams=hparams) inputs = tf.random_uniform([64, 100]) outputs = nn(inputs) # outputs == Tensor of shape [64, 10]
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
{ "layers": [], "name": "NN" }
Here:
- “layers”: list
- A list of layer hyperparameters. See
get_layer()
for the details of layer hyperparameters. - “name”: str
- Name of the network.
-
append_layer
(layer)¶ Appends a layer to the end of the network. The method is only feasible before
_build
is called.Parameters: layer – A tf.layers.Layer instance, or a dict of layer hyperparameters.
-
has_layer
(layer_name)¶ Returns True if the network with the name exists. Returns False otherwise.
Parameters: layer_name (str) – Name of the layer.
-
layer_by_name
(layer_name)¶ Returns the layer with the name. Returns ‘None’ if the layer name does not exist.
Parameters: layer_name (str) – Name of the layer.
-
layer_names
¶ A list of uniquified layer names.
-
layer_outputs
¶ A list containing output tensors of each layer.
-
layer_outputs_by_name
(layer_name)¶ Returns the output tensors of the layer with the specified name. Returns None if the layer name does not exist.
Parameters: layer_name (str) – Name of the layer.
-
layers
¶ A list of the layers.
-
layers_by_name
¶ A dictionary mapping layer names to the layers.
-
name
¶ The uniquified name of the module.
-
trainable_variables
¶ The list of trainable variables of the module.
-
variable_scope
¶ The variable scope of the module.
- layers (list, optional) – A list of Layer
instances composing the network. If not given, layers are created
according to
Conv1DNetwork¶
-
class
texar.tf.modules.
Conv1DNetwork
(hparams=None)[source]¶ Simple Conv-1D network which consists of a sequence of conv layers followed with a sequence of dense layers.
Parameters: hparams (dict, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams()
for the hyperparameter sturcture and default values.See
_build()
for the inputs and outputs. The inputs must be a 3D Tensor of shape [batch_size, length, channels] (default), or [batch_size, channels, length] (if data_format is set to ‘channels_last’ throughhparams
). For example, for sequence classification, length corresponds to time steps, and channels corresponds to embedding dim.Example
nn = Conv1DNetwork() # Use the default structure inputs = tf.random_uniform([64, 20, 256]) outputs = nn(inputs) # outputs == Tensor of shape [64, 128], cuz the final dense layer # has size 128.
-
_build
(inputs, sequence_length=None, dtype=None, mode=None)[source]¶ Feeds forward inputs through the network layers and returns outputs.
Parameters: - inputs – The inputs to the network, which is a 3D tensor.
- sequence_length (optional) – An int tensor of shape [batch_size]
containing the length of each element in
inputs
. If given, time steps beyond the length will first be masked out before feeding to the layers. - dtype (optional) – Type of the inputs. If not provided, infers from inputs automatically.
- mode (optional) – A tensor taking value in
tf.estimator.ModeKeys, including
TRAIN, EVAL, and PREDICT. If None,
texar.tf.global_mode()
is used.
Returns: The output of the final layer.
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
{ # (1) Conv layers "num_conv_layers": 1, "filters": 128, "kernel_size": [3, 4, 5], "conv_activation": "relu", "conv_activation_kwargs": None, "other_conv_kwargs": None, # (2) Pooling layers "pooling": "MaxPooling1D", "pool_size": None, "pool_strides": 1, "other_pool_kwargs": None, # (3) Dense layers "num_dense_layers": 1, "dense_size": 128, "dense_activation": "identity", "dense_activation_kwargs": None, "final_dense_activation": None, "final_dense_activation_kwargs": None, "other_dense_kwargs": None, # (4) Dropout "dropout_conv": [1], "dropout_dense": [], "dropout_rate": 0.75, # (5) Others "name": "conv1d_network", }
Here:
For convolutional layers:
- “num_conv_layers”: int
Number of convolutional layers.
- “filters”: int or list
The number of filters in the convolution, i.e., the dimensionality of the output space. If “num_conv_layers” > 1, “filters” must be a list of “num_conv_layers” integers.
- “kernel_size”: int or list
Lengths of 1D convolution windows.
- If “num_conv_layers” == 1, this can be a list of arbitrary number of int denoting different sized conv windows. The number of filters of each size is specified by “filters”. For example, the default values will create 3 sets of filters, each of which has kernel size of 3, 4, and 5, respectively, and has filter number 128.
- If “num_conv_layers” > 1, this must be a list of length “num_conv_layers”. Each element can be an int or a list of arbitrary number of int denoting the kernel size of respective layer.
- “conv_activation”: str or callable
Activation function applied to the output of the convolutional layers. Set to “indentity” to maintain a linear activation. See
get_activation_fn()
for more details.- “conv_activation_kwargs”: dict, optional
Keyword arguments for conv layer activation functions. See
get_activation_fn()
for more details.- “other_conv_kwargs”: dict, optional
Other keyword arguments for tf.layers.Conv1D constructor, e.g., “data_format”, “padding”, etc.
For pooling layers:
- “pooling”: str or class or instance
Pooling layer after each of the convolutional layer(s). Can a pooling layer class, its name or module path, or a class instance.
- “pool_size”: int or list, optional
Size of the pooling window. If an int, all pooling layer will have the same pool size. If a list, the list length must equal “num_conv_layers”. If None and the pooling type is either MaxPooling or AveragePooling, the pool size will be set to input size. That is, the output of the pooling layer is a single unit.
- “pool_strides”: int or list, optional
Strides of the pooling operation. If an int, all pooling layer will have the same stride. If a list, the list length must equal “num_conv_layers”.
- “other_pool_kwargs”: dict, optional
Other keyword arguments for pooling layer class constructor.
For dense layers (note that here dense layers always follow conv and pooling layers):
- “num_dense_layers”: int
Number of dense layers.
- “dense_size”: int or list
Number of units of each dense layers. If an int, all dense layers will have the same size. If a list of int, the list length must equal “num_dense_layers”.
- “dense_activation”: str or callable
Activation function applied to the output of the dense layers except the last dense layer output . Set to “indentity” to maintain a linear activation. See
get_activation_fn()
for more details.- “dense_activation_kwargs”: dict, optional
Keyword arguments for dense layer activation functions before the last dense layer. See
get_activation_fn()
for more details.- “final_dense_activation”: str or callable
Activation function applied to the output of the last dense layer. Set to None or “indentity” to maintain a linear activation. See
get_activation_fn()
for more details.- “final_dense_activation_kwargs”: dict, optional
Keyword arguments for the activation function of last dense layer. See
get_activation_fn()
for more details.- “other_dense_kwargs”: dict, optional
Other keyword arguments for Dense layer class constructor.
For dropouts:
- “dropout_conv”: int or list
The indexes of conv layers (starting from 0) whose inputs are applied with dropout. The index =
num_conv_layers
means dropout applies to the final conv layer output. E.g.,{ "num_conv_layers": 2, "dropout_conv": [0, 2] }
will leads to a series of layers as -dropout-conv0-conv1-dropout-.
The dropout mode (training or not) is controlled by the
mode
argument of_build()
.- “dropout_dense”: int or list
Same as “dropout_conv” but applied to dense layers (index starting from 0).
- “dropout_rate”: float
The dropout rate, between 0 and 1. E.g., “dropout_rate”: 0.1 would drop out 10% of elements.
Others:
- “name”: str
Name of the network.
-
append_layer
(layer)¶ Appends a layer to the end of the network. The method is only feasible before
_build
is called.Parameters: layer – A tf.layers.Layer instance, or a dict of layer hyperparameters.
-
has_layer
(layer_name)¶ Returns True if the network with the name exists. Returns False otherwise.
Parameters: layer_name (str) – Name of the layer.
-
layer_by_name
(layer_name)¶ Returns the layer with the name. Returns ‘None’ if the layer name does not exist.
Parameters: layer_name (str) – Name of the layer.
-
layer_names
¶ A list of uniquified layer names.
-
layer_outputs
¶ A list containing output tensors of each layer.
-
layer_outputs_by_name
(layer_name)¶ Returns the output tensors of the layer with the specified name. Returns None if the layer name does not exist.
Parameters: layer_name (str) – Name of the layer.
-
layers
¶ A list of the layers.
-
layers_by_name
¶ A dictionary mapping layer names to the layers.
-
name
¶ The uniquified name of the module.
-
trainable_variables
¶ The list of trainable variables of the module.
-
variable_scope
¶ The variable scope of the module.
-
Memory¶
MemNetBase¶
-
class
texar.tf.modules.
MemNetBase
(raw_memory_dim, input_embed_fn=None, output_embed_fn=None, query_embed_fn=None, hparams=None)[source]¶ Base class inherited by all memory network classes.
Parameters: - raw_memory_dim (int) – Dimension size of raw memory entries (before embedding). For example, if a raw memory entry is a word, this is the vocabulary size (imagine a one-hot representation of word). If a raw memory entry is a dense vector, this is the dimension size of the vector.
- input_embed_fn (optional) – A callable that embeds raw memory entries
as inputs.
This corresponds to the A embedding operation in
(Sukhbaatar et al.)
If not provided, a default embedding operation is created as
specified in
hparams
. Seeget_default_embed_fn()
for details. - output_embed_fn (optional) – A callable that embeds raw memory entries
as outputs.
This corresponds to the C embedding operation in
(Sukhbaatar et al.)
If not provided, a default embedding operation is created as
specified in
hparams
. Seeget_default_embed_fn()
for details. - query_embed_fn (optional) – A callable that embeds query.
This corresponds to the B embedding operation in
(Sukhbaatar et al.). If not provided and “use_B” is True
in
hparams
, a default embedding operation is created as specified inhparams
. Seeget_default_embed_fn()
for details. Notice: If you’d like to customize this callable, please follow the same number and style of dimensions as in input_embed_fn or output_embed_fn, and assume that the 2nd dimension of its input and output (which corresponds to memory_size) is 1. - hparams (dict or HParams, optional) – Hyperparameters. Missing
hyperparamerter will be set to default values. See
default_hparams()
for the hyperparameter sturcture and default values.
-
get_default_embed_fn
(memory_size, embed_fn_hparams)[source]¶ Creates a default embedding function. Can be used for A, C, or B operation.
For B operation (i.e., query_embed_fn),
memory_size
must be 1.The function is a combination of both memory embedding and temporal embedding, with the combination method specified by “combine_mode” in the embed_fn_hparams.
Parameters: embed_fn_hparams (dict or HParams) – Hyperparameter of the embedding function. See default_memnet_embed_fn()
for details.Returns: A tuple (embed_fn, memory_dim), where - `memory_dim` is the dimension of memory entry embedding, inferred from
embed_fn_hparams
.- If combine_mode == ‘add’, memory_dim is the embedder dimension.
- If combine_mode == ‘concat’, memory_dim is the sum of the memory embedder dimension and the temporal embedder dimension.
- `embed_fn` is an embedding function that takes in memory and returns memory embedding. Specifically, the function has signature
memory_embedding= embed_fn(memory=None, soft_memory=None)
where one of memory and soft_memory is provided (but not both).
param memory: An int Tensor of shape [batch_size, memory_size] containing memory indexes used for embedding lookup. param soft_memory: A Tensor of shape [batch_size, memory_size, raw_memory_dim] containing soft weights used to mix the embedding vectors. returns: A Tensor of shape [batch_size, memory_size, memory_dim] containing the memory entry embeddings. - `memory_dim` is the dimension of memory entry embedding, inferred from
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
{ "n_hops": 1, "memory_dim": 100, "relu_dim": 50, "memory_size": 100, "A": default_embed_fn_hparams, "C": default_embed_fn_hparams, "B": default_embed_fn_hparams, "use_B": False, "use_H": False, "dropout_rate": 0, "variational": False, "name": "memnet", }
Here:
- “n_hops”: int
- Number of hops.
- “memory_dim”: int
- Memory dimension, i.e., the dimension size of a memory entry
embedding. Ignored if at least one of the embedding functions is
created according to
hparams
. In this casememory_dim
is inferred from the created embed_fn. - “relu_dim”: int
- Number of elements in
memory_dim
that have relu at the end of each hop. Should be not less than 0 and not more than :attr`memory_dim`. - “memory_size”: int
Number of entries in memory.
For example, the number of sentences {x_i} in Fig.1(a) of (Sukhbaatar et al.) End-To-End Memory Networks.
- “use_B”: bool
- Whether to create the query embedding function. Ignored if query_embed_fn is given to the constructor.
- “use_H”: bool
- Whether to perform a linear transformation with matrix H at the end of each A-C layer.
- “dropout_rate”: float
- The dropout rate to apply to the output of each hop. Should be between 0 and 1. E.g., dropout_rate=0.1 would drop out 10% of the units.
- “variational”: bool
- Whether to share dropout masks after each hop.
-
memory_size
¶ The memory size.
-
raw_memory_dim
¶ The dimension of memory element (or vocabulary size).
-
memory_dim
¶ The dimension of embedded memory and all vectors in hops.
-
name
¶ The uniquified name of the module.
-
trainable_variables
¶ The list of trainable variables of the module.
-
variable_scope
¶ The variable scope of the module.
MemNetRNNLike¶
-
class
texar.tf.modules.
MemNetRNNLike
(raw_memory_dim, input_embed_fn=None, output_embed_fn=None, query_embed_fn=None, hparams=None)[source]¶ An implementation of multi-layer end-to-end memory network, with RNN-like weight tying described in (Sukhbaatar et al.) End-To-End Memory Networks .
See
get_default_embed_fn()
for default embedding functions. Customized embedding functions must follow the same signature.Parameters: - raw_memory_dim (int) – Dimension size of raw memory entries (before embedding). For example, if a raw memory entry is a word, this is the vocabulary size (imagine a one-hot representation of word). If a raw memory entry is a dense vector, this is the dimension size of the vector.
- input_embed_fn (optional) – A callable that embeds raw memory entries
as inputs.
This corresponds to the A embedding operation in
(Sukhbaatar et al.)
If not provided, a default embedding operation is created as
specified in
hparams
. Seeget_default_embed_fn()
for details. - output_embed_fn (optional) – A callable that embeds raw memory entries
as outputs.
This corresponds to the C embedding operation in
(Sukhbaatar et al.)
If not provided, a default embedding operation is created as
specified in
hparams
. Seeget_default_embed_fn()
for details. - query_embed_fn (optional) – A callable that embeds query.
This corresponds to the B embedding operation in
(Sukhbaatar et al.). If not provided and “use_B” is True
in
hparams
, a default embedding operation is created as specified inhparams
. Seeget_default_embed_fn()
for details. For customized query_embed_fn, note that the function must follow the signature of the default embed_fn where memory_size must be 1. - hparams (dict or HParams, optional) – Hyperparameters. Missing
hyperparamerter will be set to default values. See
default_hparams()
for the hyperparameter sturcture and default values.
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
{ "n_hops": 1, "memory_dim": 100, "relu_dim": 50, "memory_size": 100, "A": default_embed_fn_hparams, "C": default_embed_fn_hparams, "B": default_embed_fn_hparams, "use_B": False, "use_H": True, "dropout_rate": 0, "variational": False, "name": "memnet_rnnlike", }
Here:
- “n_hops”: int
- Number of hops.
- “memory_dim”: int
- Memory dimension, i.e., the dimension size of a memory entry
embedding. Ignored if at least one of the embedding functions is
created according to
hparams
. In this casememory_dim
is inferred from the created embed_fn. - “relu_dim”: int
- Number of elements in
memory_dim
that have relu at the end of each hop. Should be not less than 0 and not more than :attr`memory_dim`. - “memory_size”: int
Number of entries in memory.
For example, the number of sentences {x_i} in Fig.1(a) of (Sukhbaatar et al.) End-To-End Memory Networks.
- “use_B”: bool
- Whether to create the query embedding function. Ignored if query_embed_fn is given to the constructor.
- “use_H”: bool
- Whether to perform a linear transformation with matrix H at the end of each A-C layer.
- “dropout_rate”: float
- The dropout rate to apply to the output of each hop. Should be between 0 and 1. E.g., dropout_rate=0.1 would drop out 10% of the units.
- “variational”: bool
- Whether to share dropout masks after each hop.
-
memory_dim
¶ The dimension of embedded memory and all vectors in hops.
-
memory_size
¶ The memory size.
-
name
¶ The uniquified name of the module.
-
raw_memory_dim
¶ The dimension of memory element (or vocabulary size).
-
trainable_variables
¶ The list of trainable variables of the module.
-
variable_scope
¶ The variable scope of the module.
default_memnet_embed_fn_hparams¶
-
texar.tf.modules.
default_memnet_embed_fn_hparams
()[source]¶ Returns a dictionary of hyperparameters with default hparams for
default_embed_fn()
{ "embedding": { "dim": 100 }, "temporal_embedding": { "dim": 100 }, "combine_mode": "add" }
Here:
- “embedding”: dict, optional
- Hyperparameters for embedding operations. See
default_hparams()
ofWordEmbedder
for details. If None, the default hyperparameters are used. - “temporal_embedding”: dict, optional
- Hyperparameters for temporal embedding operations. See
default_hparams()
ofPositionEmbedder
for details. If None, the default hyperparameters are used. - “combine_mode”: str
- Either ‘add’ or ‘concat’. If ‘add’, memory embedding and temporal embedding are added up. In this case the two embedders must have the same dimension. If ‘concat’, the two embeddings are concated.
Policy¶
PolicyNetBase¶
-
class
texar.tf.modules.
PolicyNetBase
(network=None, network_kwargs=None, hparams=None)[source]¶ Policy net that takes in states and outputs actions.
Parameters: - network (optional) – A network that takes in state and returns
outputs for generating actions. For example, an instance of subclass
of
FeedForwardNetworkBase
. If None, a network is created as specified inhparams
. - network_kwargs (dict, optional) – Keyword arguments for network
constructor.
Note that the hparams argument for network
constructor is specified in the “network_hparams” field of
hparams
and should not be included in network_kwargs. Ignored ifnetwork
is given. - hparams (dict or HParams, optional) – Hyperparameters. Missing
hyperparamerter will be set to default values. See
default_hparams()
for the hyperparameter sturcture and default values.
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
{ 'network_type': 'FeedForwardNetwork', 'network_hparams': { 'layers': [ { 'type': 'Dense', 'kwargs': {'units': 256, 'activation': 'relu'} }, { 'type': 'Dense', 'kwargs': {'units': 256, 'activation': 'relu'} }, ] }, 'distribution_kwargs': None, 'name': 'policy_net', }
Here:
- “network_type”: str or class or instance
- A network that takes in state and returns outputs for generating actions. This can be a class, its name or module path, or a class instance. Ignored if network is given to the constructor.
- “network_hparams”: dict
Hyperparameters for the network. With the
network_kwargs
argument to the constructor, a network is created withnetwork_class(**network_kwargs, hparams=network_hparams)
.For example, the default values creates a two-layer dense network.
- “distribution_kwargs”: dict, optional
- Keyword arguments for distribution constructor. A distribution would be created for action sampling.
- “name”: str
- Name of the policy.
-
network
¶ The network.
-
name
¶ The uniquified name of the module.
-
trainable_variables
¶ The list of trainable variables of the module.
-
variable_scope
¶ The variable scope of the module.
- network (optional) – A network that takes in state and returns
outputs for generating actions. For example, an instance of subclass
of
CategoricalPolicyNet¶
-
class
texar.tf.modules.
CategoricalPolicyNet
(action_space=None, network=None, network_kwargs=None, hparams=None)[source]¶ Policy net with Categorical distribution for discrete scalar actions.
This is a combination of a network with a top-layer distribution for action sampling.
Parameters: - action_space (optional) – An instance of
Space
specifying the action space. If not given, an discrete action space [0, high] is created with high specified inhparams
. - network (optional) – A network that takes in state and returns
outputs for generating actions. For example, an instance of subclass
of
FeedForwardNetworkBase
. If None, a network is created as specified inhparams
. - network_kwargs (dict, optional) – Keyword arguments for network
constructor.
Note that the hparams argument for network
constructor is specified in the “network_hparams” field of
hparams
and should not be included in network_kwargs. Ignored ifnetwork
is given. - hparams (dict or HParams, optional) – Hyperparameters. Missing
hyperparamerter will be set to default values. See
default_hparams()
for the hyperparameter sturcture and default values.
-
_build
(inputs, mode=None)[source]¶ Takes in states and outputs actions.
Parameters: - inputs – Inputs to the policy network with the first dimension the batch dimension.
- mode (optional) – A tensor taking value in
tf.estimator.ModeKeys, including
TRAIN, EVAL, and PREDICT. If None,
texar.tf.global_mode()
is used.
- Returns
A dict including fields “logits”, “action”, and “dist”, where
- “logits”: A Tensor of shape [batch_size] + action_space size used for categorical distribution sampling.
- “action”: A Tensor of shape [batch_size] + action_space.shape.
- “dist”: The Categorical based on the logits.
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
{ 'network_type': 'FeedForwardNetwork', 'network_hparams': { 'layers': [ { 'type': 'Dense', 'kwargs': {'units': 256, 'activation': 'relu'} }, { 'type': 'Dense', 'kwargs': {'units': 256, 'activation': 'relu'} }, ] }, 'distribution_kwargs': { 'dtype': 'int32', 'validate_args': False, 'allow_nan_stats': True }, 'action_space': 2, 'make_output_layer': True, 'name': 'categorical_policy_net' }
Here:
- “distribution_kwargs”: dict
- Keyword arguments for the Categorical distribution constructor. Arguments logits and probs should not be included as they are inferred from the inputs. Argument dtype can be a string (e.g., int32) and will be converted to a corresponding tf dtype.
- “action_space”: int
- Upper bound of the action space. The resulting action space is all discrete scalar numbers between 0 and the upper bound specified here (both inclusive).
- “make_output_layer”: bool
- Whether to append a dense layer to the network to transform features to logits for action sampling. If False, the final layer output of network must match the action space.
See
default_hparams
for details of other hyperparameters.
-
name
¶ The uniquified name of the module.
-
network
¶ The network.
-
trainable_variables
¶ The list of trainable variables of the module.
-
variable_scope
¶ The variable scope of the module.
- action_space (optional) – An instance of
Q-Nets¶
QNetBase¶
-
class
texar.tf.modules.
QNetBase
(network=None, network_kwargs=None, hparams=None)[source]¶ Base class inheritted by all Q net classes. A Q net takes in states and outputs Q value of actions.
Parameters: - network (optional) – A network that takes in state and returns
Q values. For example, an instance of subclass
of
FeedForwardNetworkBase
. If None, a network is created as specified inhparams
. - network_kwargs (dict, optional) – Keyword arguments for network
constructor.
Note that the hparams argument for network
constructor is specified in the “network_hparams” field of
hparams
and should not be included in network_kwargs. Ignored ifnetwork
is given. - hparams (dict or HParams, optional) – Hyperparameters. Missing
hyperparamerter will be set to default values. See
default_hparams()
for the hyperparameter sturcture and default values.
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
{ 'network_type': 'FeedForwardNetwork', 'network_hparams': { 'layers': [ { 'type': 'Dense', 'kwargs': {'units': 256, 'activation': 'relu'} }, { 'type': 'Dense', 'kwargs': {'units': 256, 'activation': 'relu'} }, ] }, 'name': 'q_net', }
Here:
- “network_type”: str or class or instance
- A network that takes in state and returns outputs for generating actions. This can be a class, its name or module path, or a class instance. Ignored if network is given to the constructor.
- “network_hparams”: dict
Hyperparameters for the network. With the
network_kwargs
argument to the constructor, a network is created withnetwork_class(**network_kwargs, hparams=network_hparams)
.For example, the default values creates a two-layer dense network.
- “name”: str
- Name of the Q net.
-
network
¶ The network.
-
name
¶ The uniquified name of the module.
-
trainable_variables
¶ The list of trainable variables of the module.
-
variable_scope
¶ The variable scope of the module.
- network (optional) – A network that takes in state and returns
Q values. For example, an instance of subclass
of
CategoricalPolicyNet¶
-
class
texar.tf.modules.
CategoricalQNet
(action_space=None, network=None, network_kwargs=None, hparams=None)[source]¶ Q net with categorical scalar action space.
Parameters: - action_space (optional) – An instance of
Space
specifying the action space. If not given, an discrete action space [0, high] is created with high specified inhparams
. - network (optional) – A network that takes in state and returns
Q values. For example, an instance of subclass
of
FeedForwardNetworkBase
. If None, a network is created as specified inhparams
. - network_kwargs (dict, optional) – Keyword arguments for network
constructor.
Note that the hparams argument for network
constructor is specified in the “network_hparams” field of
hparams
and should not be included in network_kwargs. Ignored ifnetwork
is given. - hparams (dict or HParams, optional) – Hyperparameters. Missing
hyperparamerter will be set to default values. See
default_hparams()
for the hyperparameter sturcture and default values.
-
_build
(inputs, mode=None)[source]¶ Takes in states and outputs Q values.
Parameters: - inputs – Inputs to the Q net with the first dimension the batch dimension.
- mode (optional) – A tensor taking value in
tf.estimator.ModeKeys, including
TRAIN, EVAL, and PREDICT. If None,
texar.tf.global_mode()
is used.
- Returns
A dict including fields “qvalues”. where
- “qvalues”: A Tensor of shape [batch_size] + action_space size containing Q values of all possible actions.
-
static
default_hparams
()[source]¶ Returns a dictionary of hyperparameters with default values.
{ 'network_type': 'FeedForwardNetwork', 'network_hparams': { 'layers': [ { 'type': 'Dense', 'kwargs': {'units': 256, 'activation': 'relu'} }, { 'type': 'Dense', 'kwargs': {'units': 256, 'activation': 'relu'} }, ] }, 'action_space': 2, 'make_output_layer': True, 'name': 'q_net' }
Here:
- “action_space”: int
- Upper bound of the action space. The resulting action space is all discrete scalar numbers between 0 and the upper bound specified here (both inclusive).
- “make_output_layer”: bool
- Whether to append a dense layer to the network to transform features to Q values. If False, the final layer output of network must match the action space.
See
default_hparams
for details of other hyperparameters.
-
name
¶ The uniquified name of the module.
-
network
¶ The network.
-
trainable_variables
¶ The list of trainable variables of the module.
-
variable_scope
¶ The variable scope of the module.
- action_space (optional) – An instance of