# Modules¶

## ModuleBase¶

class texar.ModuleBase(hparams=None)[source]

Base class inherited by modules that create Variables and are configurable through hyperparameters.

A Texar module inheriting ModuleBase has following key features:

• Convenient variable re-use: A module instance creates its own sets of variables, and automatically re-uses its variables on subsequent calls. Hence TF variable/name scope is transparent to users. For example:

encoder = UnidirectionalRNNEncoder(hparams) # create instance
output_1 = encoder(inputs_1) # variables are created
output_2 = encoder(inputs_2) # variables are re-used

print(encoder.trainable_variables) # access trainable variables
# [ ... ]

• Configurable through hyperparameters: Each module defines allowed hyperparameters and default values. Hyperparameters not specified by users will take default values.

• Callable: As the above example, a module instance is “called” with input tensors and returns output tensors. Every call of a module will add ops to the Graph to perform the module’s logic.

Parameters: hparams (dict, optional) – Hyperparameters of the module. See default_hparams() for the structure and default values.
_build(*args, **kwargs)[source]

Subclass must implement this method to build the logic.

Parameters: *args – Arguments. **kwargs – Keyword arguments. Output Tensor(s).
static default_hparams()[source]

Returns a dict of hyperparameters of the module with default values. Used to replace the missing values of input hparams during module construction.

{
"name": "module"
}

variable_scope

The variable scope of the module.

name

The uniquified name of the module.

trainable_variables

The list of trainable variables of the module.

hparams

An HParams instance. The hyperparameters of the module.

## Embedders¶

### WordEmbedder¶

class texar.modules.WordEmbedder(init_value=None, vocab_size=None, hparams=None)[source]

Simple word embedder that maps indexes into embeddings. The indexes can be soft (e.g., distributions over vocabulary).

Either init_value or vocab_size is required. If both are given, there must be init_value.shape[0]==vocab_size.

Parameters: init_value (optional) – A Tensor or numpy array that contains the initial value of embeddings. It is typically of shape [vocab_size] + embedding-dim. Embedding can have dimensionality > 1. If None, embedding is initialized as specified in hparams["initializer"]. Otherwise, the "initializer" and "dim" hyperparameters in hparams are ignored. vocab_size (int, optional) – The vocabulary size. Required if init_value is not given. hparams (dict, optional) – Embedder hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.

See _build() for the inputs and outputs of the embedder.

Example

ids = tf.random_uniform(shape=[32, 10], maxval=10, dtype=tf.int64)
soft_ids = tf.random_uniform(shape=[32, 10, 100])

embedder = WordEmbedder(vocab_size=100, hparams={'dim': 256})
ids_emb = embedder(ids=ids) # shape: [32, 10, 256]
soft_ids_emb = embedder(soft_ids=soft_ids) # shape: [32, 10, 256]

## Use with Texar data module
hparams={
'dataset': {
'embedding_init': {'file': 'word2vec.txt'}
...
},
}
data = MonoTextData(data_params)
iterator = DataIterator(data)
batch = iterator.get_next()

# Use data vocab size
embedder_1 = WordEmbedder(vocab_size=data.vocab.size)
emb_1 = embedder_1(batch['text_ids'])

# Use pre-trained embedding
embedder_2 = WordEmbedder(init_value=data.embedding_init_value)
emb_2 = embedder_2(batch['text_ids'])

_build(ids=None, soft_ids=None, mode=None, **kwargs)[source]

Embeds (soft) ids.

Either ids or soft_ids must be given, and they must not be given at the same time.

Parameters: ids (optional) – An integer tensor containing the ids to embed. soft_ids (optional) – A tensor of weights (probabilities) used to mix the embedding vectors. mode (optional) – A tensor taking value in tf.estimator.ModeKeys, including TRAIN, EVAL, and PREDICT. If None, dropout is controlled by texar.global_mode(). kwargs – Additional keyword arguments for tf.nn.embedding_lookup besides params and ids. If ids is given, returns a Tensor of shape shape(ids) + embedding-dim. For example, if shape(ids) = [batch_size, max_time] and shape(embedding) = [vocab_size, emb_dim], then the return tensor has shape [batch_size, max_time, emb_dim]. If soft_ids is given, returns a Tensor of shape shape(soft_ids)[:-1] + embdding-dim. For example, if shape(soft_ids) = [batch_size, max_time, vocab_size] and shape(embedding) = [vocab_size, emb_dim], then the return tensor has shape [batch_size, max_time, emb_dim].
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
"dim": 100,
"dropout_rate": 0,
"dropout_strategy": 'element',
"trainable": True,
"initializer": {
"type": "random_uniform_initializer",
"kwargs": {
"minval": -0.1,
"maxval": 0.1,
"seed": None
}
},
"regularizer": {
"type": "L1L2",
"kwargs": {
"l1": 0.,
"l2": 0.
}
},
"name": "word_embedder",
}


Here:

“dim” : int or list

Embedding dimension. Can be a list of integers to yield embeddings with dimensionality > 1.

Ignored if init_value is given to the embedder constructor.

“dropout_rate” : float
The dropout rate between 0 and 1. E.g., dropout_rate=0.1 would drop out 10% of the embedding. Set to 0 to disable dropout.
“dropout_strategy” : str

The dropout strategy. Can be one of the following

• "element": The regular strategy that drops individual elements of embedding vectors.
• "item": Drops individual items (e.g., words) entirely. E.g., for the word sequence ‘the simpler the better’, the strategy can yield ‘_ simpler the better’, where the first the is dropped.
• "item_type": Drops item types (e.g., word types). E.g., for the above sequence, the strategy can yield ‘_ simpler _ better’, where the word type ‘the’ is dropped. The dropout will never yield ‘_ simpler the better’ as in the ‘item’ strategy.
“trainable” : bool
Whether the embedding is trainable.
“initializer” : dict or None
Hyperparameters of the initializer for embedding values. See get_initializer() for the details. Ignored if init_value is given to the embedder constructor.
“regularizer” : dict
Hyperparameters of the regularizer for embedding values. See get_regularizer() for the details.
“name” : str
Name of the embedding variable.
embedding

The embedding tensor, of shape [vocab_size] + dim.

dim

The embedding dimension.

vocab_size

The vocabulary size.

### PositionEmbedder¶

class texar.modules.PositionEmbedder(init_value=None, position_size=None, hparams=None)[source]

Simple position embedder that maps position indexes into embeddings via lookup.

Either init_value or position_size is required. If both are given, there must be init_value.shape[0]==position_size.

Parameters: init_value (optional) – A Tensor or numpy array that contains the initial value of embeddings. It is typically of shape [position_size, embedding dim] If None, embedding is initialized as specified in hparams["initializer"]. Otherwise, the "initializer" and "dim" hyperparameters in hparams are ignored. position_size (int, optional) – The number of possible positions, e.g., the maximum sequence length. Required if init_value is not given. hparams (dict, optional) – Embedder hyperparameters. If it is not specified, the default hyperparameter setting is used. See default_hparams for the sturcture and default values.
_build(positions=None, sequence_length=None, mode=None, **kwargs)[source]

Embeds the positions.

Either position or sequence_length is required:

• If both are given, sequence_length is used to mask out embeddings of those time steps beyond the respective sequence lengths.
• If only sequence_length is given, then positions from 0 to sequence_length-1 are embedded.
Parameters: positions (optional) – An integer tensor containing the position ids to embed. sequence_length (optional) – An integer tensor of shape [batch_size]. Time steps beyond the respective sequence lengths will have zero-valued embeddings. mode (optional) – A tensor taking value in tf.estimator.ModeKeys, including TRAIN, EVAL, and PREDICT. If None, dropout will be controlled by texar.global_mode(). kwargs – Additional keyword arguments for tf.nn.embedding_lookup besides params and ids. A Tensor of shape shape(inputs) + embedding dimension.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
"dim": 100,
"initializer": {
"type": "random_uniform_initializer",
"kwargs": {
"minval": -0.1,
"maxval": 0.1,
"seed": None
}
},
"regularizer": {
"type": "L1L2",
"kwargs": {
"l1": 0.,
"l2": 0.
}
},
"dropout_rate": 0,
"trainable": True,
"name": "position_embedder"
}


The hyperparameters have the same meaning as those in texar.modules.WordEmbedder.default_hparams().

embedding

The embedding tensor.

dim

The embedding dimension.

position_size

The position size, i.e., maximum number of positions.

### SinusoidsPositionEmbedder¶

class texar.modules.SinusoidsPositionEmbedder(hparams=None)[source]

Sinusoid position embedder that maps position indexes into embeddings via sinusoid calculation. This module does not have trainable parameters. Used in, e.g., TransformerEncoder.

Each channel of the input Tensor is incremented by a sinusoid of a different frequency and phase. This allows attention to learn to use absolute and relative positions.

Timing signals should be added to some precursors of both the query and the memory inputs to attention. The use of relative position is possible because sin(x+y) and cos(x+y) can be experessed in terms of y, sin(x) and cos(x). In particular, we use a geometric sequence of timescales starting with min_timescale and ending with max_timescale. The number of different timescales is equal to dim / 2. For each timescale, we generate the two sinusoidal signals sin(timestep/timescale) and cos(timestep/timescale). All of these sinusoids are concatenated in the dim dimension.

_build(positions)[source]

Embeds.

Parameters: positions (optional) – An integer tensor containing the position ids to embed. A Tensor of shape [1, position_size, dim].
default_hparams()[source]

Returns a dictionary of hyperparameters with default values We use a geometric sequence of timescales starting with min_timescale and ending with max_timescale. The number of different timescales is equal to dim/2.

{
'min_timescale': 1.0,
'max_timescale': 10000.0,
'dim': 512,
'name':'sinusoid_posisiton_embedder',
}


### EmbedderBase¶

class texar.modules.EmbedderBase(num_embeds=None, hparams=None)[source]

The base embedder class that all embedder classes inherit.

Parameters: num_embeds (int, optional) – The number of embedding elements, e.g., the vocabulary size of a word embedder. hparams (dict or HParams, optional) – Embedder hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
"name": "embedder"
}

num_embeds

The number of embedding elements.

## Encoders¶

### UnidirectionalRNNEncoder¶

class texar.modules.UnidirectionalRNNEncoder(cell=None, cell_dropout_mode=None, output_layer=None, hparams=None)[source]

One directional RNN encoder.

Parameters: cell – (RNNCell, optional) If not specified, a cell is created as specified in hparams["rnn_cell"]. cell_dropout_mode (optional) – A Tensor taking value of tf.estimator.ModeKeys, which toggles dropout in the RNN cell (e.g., activates dropout in TRAIN mode). If None, global_mode() is used. Ignored if cell is given. output_layer (optional) – An instance of tf.layers.Layer. Applies to the RNN cell output of each step. If None (default), the output layer is created as specified in hparams["output_layer"]. hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.

See _build() for the inputs and outputs of the encoder.

Example

# Use with embedder
embedder = WordEmbedder(vocab_size, hparams=emb_hparams)
encoder = UnidirectionalRNNEncoder(hparams=enc_hparams)

outputs, final_state = encoder(
inputs=embedder(data_batch['text_ids']),
sequence_length=data_batch['length'])

_build(inputs, sequence_length=None, initial_state=None, time_major=False, mode=None, return_cell_output=False, return_output_size=False, **kwargs)[source]

Encodes the inputs.

Parameters: inputs – A 3D Tensor of shape [batch_size, max_time, dim]. The first two dimensions batch_size and max_time are exchanged if time_major=True is specified. sequence_length (optional) – A 1D int tensor of shape [batch_size]. Sequence lengths of the batch inputs. Used to copy-through state and zero-out outputs when past a batch element’s sequence length. initial_state (optional) – Initial state of the RNN. time_major (bool) – The shape format of the inputs and outputs Tensors. If True, these tensors are of shape [max_time, batch_size, depth]. If False (default), these tensors are of shape [batch_size, max_time, depth]. mode (optional) – A tensor taking value in tf.estimator.ModeKeys, including TRAIN, EVAL, and PREDICT. Controls output layer dropout if the output layer is specified with hparams. If None (default), texar.global_mode() is used. return_cell_output (bool) – Whether to return the output of the RNN cell. This is the results prior to the output layer. return_output_size (bool) – Whether to return the size of the output (i.e., the results after output layers). **kwargs – Optional keyword arguments of tf.nn.dynamic_rnn, such as swap_memory, dtype, parallel_iterations, etc. By default (both return_cell_output and return_output_size are False), returns a pair (outputs, final_state) outputs: The RNN output tensor by the output layer (if exists) or the RNN cell (otherwise). The tensor is of shape [batch_size, max_time, output_size] if time_major is False, or [max_time, batch_size, output_size] if time_major is True. If RNN cell output is a (nested) tuple of Tensors, then the outputs will be a (nested) tuple having the same nest structure as the cell output. final_state: The final state of the RNN, which is a Tensor of shape [batch_size] + cell.state_size or a (nested) tuple of Tensors if cell.state_size is a (nested) tuple. If return_cell_output is True, returns a triple (outputs, final_state, cell_outputs) cell_outputs: The outputs by the RNN cell prior to the output layer, having the same structure with outputs except for the output_dim. If return_output_size is True, returns a tuple (outputs, final_state, output_size) output_size: A (possibly nested tuple of) int representing the size of outputs. If a single int or an int array, then outputs has shape [batch/time, time/batch] + output_size. If a (nested) tuple, then output_size has the same structure as with outputs. If both return_cell_output and return_output_size are True, returns (outputs, final_state, cell_outputs, output_size).
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
"rnn_cell": default_rnn_cell_hparams(),
"output_layer": {
"num_layers": 0,
"layer_size": 128,
"activation": "identity",
"final_layer_activation": None,
"other_dense_kwargs": None,
"dropout_layer_ids": [],
"dropout_rate": 0.5,
"variational_dropout": False
},
"name": "unidirectional_rnn_encoder"
}


Here:

“rnn_cell” : dict

A dictionary of RNN cell hyperparameters. Ignored if cell is given to the encoder constructor.

The default value is defined in default_rnn_cell_hparams().

“output_layer” : dict

Output layer hyperparameters. Ignored if output_layer is given to the encoder constructor. Includes:

“num_layers” : int
The number of output (dense) layers. Set to 0 to avoid any output layers applied to the cell outputs..
“layer_size” : int or list

The size of each of the output (dense) layers.

If an int, each output layer will have the same size. If a list, the length must equal to num_layers.

“activation” : str or callable or None

Activation function for each of the output (dense) layer except for the final layer. This can be a function, or its string name or module path. If function name is given, the function must be from module tf.nn or tf. For example

"activation": "relu" # function name
"activation": "my_module.my_activation_fn" # module path
"activation": my_module.my_activation_fn # function


Default is None which maintains a linear activation.

“final_layer_activation” : str or callable or None
The activation function for the final output layer.
“other_dense_kwargs” : dict or None
Other keyword arguments to construct each of the output dense layers, e.g., use_bias. See Dense for the keyword arguments.
“dropout_layer_ids” : int or list

The indexes of layers (starting from 0) whose inputs are applied with dropout. The index = num_layers means dropout applies to the final layer output. E.g.,

{
"num_layers": 2,
"dropout_layer_ids": [0, 2]
}


will leads to a series of layers as -dropout-layer0-layer1-dropout-.

The dropout mode (training or not) is controlled by the mode argument of _build().

“dropout_rate” : float
The dropout rate, between 0 and 1. E.g., “dropout_rate”: 0.1 would drop out 10% of elements.
“variational_dropout”: bool
Whether the dropout mask is the same across all time steps.
“name” : str
Name of the encoder
cell

The RNN cell.

state_size

The state size of encoder cell.

Same as encoder.cell.state_size.

output_layer

The output layer.

### BidirectionalRNNEncoder¶

class texar.modules.BidirectionalRNNEncoder(cell_fw=None, cell_bw=None, cell_dropout_mode=None, output_layer_fw=None, output_layer_bw=None, hparams=None)[source]

Bidirectional forward-backward RNN encoder.

Parameters: cell_fw (RNNCell, optional) – The forward RNN cell. If not given, a cell is created as specified in hparams["rnn_cell_fw"]. cell_bw (RNNCell, optional) – The backward RNN cell. If not given, a cell is created as specified in hparams["rnn_cell_bw"]. cell_dropout_mode (optional) – A tensor taking value of tf.estimator.ModeKeys, which toggles dropout in the RNN cells (e.g., activates dropout in TRAIN mode). If None, global_mode() is used. Ignored if respective cell is given. output_layer_fw (optional) – An instance of tf.layers.Layer. Apply to the forward RNN cell output of each step. If None (default), the output layer is created as specified in hparams["output_layer_fw"]. output_layer_bw (optional) – An instance of tf.layers.Layer. Apply to the backward RNN cell output of each step. If None (default), the output layer is created as specified in hparams["output_layer_bw"]. hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.

See _build() for the inputs and outputs of the encoder.

Example

# Use with embedder
embedder = WordEmbedder(vocab_size, hparams=emb_hparams)
encoder = BidirectionalRNNEncoder(hparams=enc_hparams)

outputs, final_state = encoder(
inputs=embedder(data_batch['text_ids']),
sequence_length=data_batch['length'])
# outputs == (outputs_fw, outputs_bw)
# final_state == (final_state_fw, final_state_bw)

_build(inputs, sequence_length=None, initial_state_fw=None, initial_state_bw=None, time_major=False, mode=None, return_cell_output=False, return_output_size=False, **kwargs)[source]

Encodes the inputs.

Parameters: inputs – A 3D Tensor of shape [batch_size, max_time, dim]. The first two dimensions batch_size and max_time may be exchanged if time_major=True is specified. sequence_length (optional) – A 1D int tensor of shape [batch_size]. Sequence lengths of the batch inputs. Used to copy-through state and zero-out outputs when past a batch element’s sequence length. initial_state (optional) – Initial state of the RNN. time_major (bool) – The shape format of the inputs and outputs Tensors. If True, these tensors are of shape [max_time, batch_size, depth]. If False (default), these tensors are of shape [batch_size, max_time, depth]. mode (optional) – A tensor taking value in tf.estimator.ModeKeys, including TRAIN, EVAL, and PREDICT. Controls output layer dropout if the output layer is specified with hparams. If None (default), texar.global_mode() is used. return_cell_output (bool) – Whether to return the output of the RNN cell. This is the results prior to the output layer. **kwargs – Optional keyword arguments of tf.nn.dynamic_rnn, such as swap_memory, dtype, parallel_iterations, etc. By default (both return_cell_output and return_output_size are False), returns a pair (outputs, final_state) outputs: A tuple (outputs_fw, outputs_bw) containing the forward and the backward RNN outputs, each of which is of shape [batch_size, max_time, output_dim] if time_major is False, or [max_time, batch_size, output_dim] if time_major is True. If RNN cell output is a (nested) tuple of Tensors, then outputs_fw and outputs_bw will be a (nested) tuple having the same structure as the cell output. final_state: A tuple (final_state_fw, final_state_bw) containing the final states of the forward and backward RNNs, each of which is a Tensor of shape [batch_size] + cell.state_size, or a (nested) tuple of Tensors if cell.state_size is a (nested) tuple. If return_cell_output is True, returns a triple (outputs, final_state, cell_outputs) where cell_outputs: A tuple (cell_outputs_fw, cell_outputs_bw) containting the outputs by the forward and backward RNN cells prior to the output layers, having the same structure with outputs except for the output_dim. If return_output_size is True, returns a tuple (outputs, final_state, output_size) where output_size: A tupple (output_size_fw, output_size_bw) containing the size of outputs_fw and outputs_bw, respectively. Take *_fw for example, output_size_fw is a (possibly nested tuple of) int. If a single int or an int array, then outputs_fw has shape [batch/time, time/batch] + output_size_fw. If a (nested) tuple, then output_size_fw has the same structure as with outputs_fw. The same applies to output_size_bw. If both return_cell_output and return_output_size are True, returns (outputs, final_state, cell_outputs, output_size).
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
"rnn_cell_fw": default_rnn_cell_hparams(),
"rnn_cell_bw": default_rnn_cell_hparams(),
"rnn_cell_share_config": True,
"output_layer_fw": {
"num_layers": 0,
"layer_size": 128,
"activation": "identity",
"final_layer_activation": None,
"other_dense_kwargs": None,
"dropout_layer_ids": [],
"dropout_rate": 0.5,
"variational_dropout": False
},
"output_layer_bw": {
# Same hyperparams and default values as "output_layer_fw"
# ...
},
"output_layer_share_config": True,
"name": "bidirectional_rnn_encoder"
}


Here:

“rnn_cell_fw” : dict

Hyperparameters of the forward RNN cell. Ignored if cell_fw is given to the encoder constructor.

The default value is defined in default_rnn_cell_hparams().

“rnn_cell_bw” : dict

Hyperparameters of the backward RNN cell. Ignored if cell_bw is given to the encoder constructor , or if "rnn_cell_share_config" is True.

The default value is defined in default_rnn_cell_hparams().

“rnn_cell_share_config” : bool
Whether share hyperparameters of the backward cell with the forward cell. Note that the cell parameters (variables) are not shared.
“output_layer_fw” : dict
Hyperparameters of the forward output layer. Ignored if output_layer_fw is given to the constructor. See the “output_layer” field of default_hparams() for details.
“output_layer_bw” : dict

Hyperparameters of the backward output layer. Ignored if output_layer_bw is given to the constructor. Have the same structure and defaults with "output_layer_fw".

Ignored if "output_layer_share_config" is True.

“output_layer_share_config” : bool
Whether share hyperparameters of the backward output layer with the forward output layer. Note that the layer parameters (variables) are not shared.
“name” : str
Name of the encoder
cell_fw

The forward RNN cell.

cell_bw

The backward RNN cell.

state_size_fw

The state size of the forward encoder cell.

Same as encoder.cell_fw.state_size.

state_size_bw

The state size of the backward encoder cell.

Same as encoder.cell_bw.state_size.

output_layer_fw

The output layer of the forward RNN.

output_layer_bw

The output layer of the backward RNN.

### HierarchicalRNNEncoder¶

class texar.modules.HierarchicalRNNEncoder(encoder_major=None, encoder_minor=None, hparams=None)[source]

A hierarchical encoder that stacks basic RNN encoders into two layers. Can be used to encode long, structured sequences, e.g. paragraphs, dialog history, etc.

Parameters: encoder_major (optional) – An instance of subclass of RNNEncoderBase The high-level encoder taking final states from low-level encoder as its inputs. If not specified, an encoder is created as specified in hparams["encoder_major"]. encoder_minor (optional) – An instance of subclass of RNNEncoderBase The low-level encoder. If not specified, an encoder is created as specified in hparams["encoder_minor"]. hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.

See _build() for the inputs and outputs of the encoder.

_build(inputs, order='btu', medium=None, sequence_length_major=None, sequence_length_minor=None, **kwargs)[source]

Encodes the inputs.

Parameters: inputs – A 4-D tensor of shape [B, T, U, dim], where B: batch_size T: the max length of high-level sequences. E.g., the max number of utterances in dialog history. U: the max length of low-level sequences. E.g., the max length of each utterance in dialog history. dim: embedding dimension The order of first three dimensions can be changed according to order. order – A 3-char string containing ‘b’, ‘t’, and ‘u’, that specifies the order of inputs dimensions above. Following four can be accepted: ’btu’: None of the encoders are time-major. ’utb’: Both encoders are time-major. ’tbu’: The major encoder is time-major. ’ubt’: The minor encoder is time-major. medium (optional) – A list of callables that subsequently process the final states of minor encoder and obtain the inputs for the major encoder. If not specified, flatten() is used for processing the minor’s final states. sequence_length_major (optional) – The sequence_length argument sent to major encoder. This is a 1-D Tensor of shape [B]. sequence_length_minor (optional) – The sequence_length argument sent to minor encoder. It can be either a 1-D Tensor of shape [B*T], or a 2-D Tensor of shape [B, T] or [T, B] according to order. **kwargs – Other keyword arguments for the major and minor encoders, such as initial_state, etc. Note that sequence_length, and time_major must not be included here. time_major is derived from order automatically. By default, arguments will be sent to both major and minor encoders. To specify which encoder an argument should be sent to, add ‘_minor’/’_major’ as its suffix. Note that initial_state_minor must have a batch dimension of size B*T. If you have an initial state of batch dimension = T, use tile_initial_state_minor() to tile it according to order. A tuple (outputs, final_state) by the major encoder. See the return values of _build() method of respective encoder class for details.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
"encoder_major_type": "UnidirectionalRNNEncoder",
"encoder_major_hparams": {},
"encoder_minor_type": "UnidirectionalRNNEncoder",
"encoder_minor_hparams": {},
"config_share": False,
"name": "hierarchical_encoder_wrapper"
}


Here:

“encoder_major_type” : str or class or instance
The high-level encoder. Can be a RNN encoder class, its name or module path, or a class instance. Ignored if encoder_major is given to the encoder constructor.
“encoder_major_hparams” : dict
The hyperparameters for the high-level encoder. The high-level encoder is created with encoder_class(hparams=encoder_major_hparams). Ignored if encoder_major is given to the encoder constructor, or if “encoder_major_type” is an encoder instance.
“encoder_minor_type” : str or class or instance
The low-level encoder. Can be a RNN encoder class, its name or module path, or a class instance. Ignored if encoder_minor is given to the encoder constructor, or if “config_share” is True.
“encoder_minor_hparams” : dict
The hyperparameters for the low-level encoder. The high-level encoder is created with encoder_class(hparams=encoder_minor_hparams). Ignored if encoder_minor is given to the encoder constructor, or if “config_share” is True, or if “encoder_minor_type” is an encoder instance.
“config_share”:
Whether to use encoder_major’s hyperparameters to construct encoder_minor.
“name”:
Name of the encoder.
static tile_initial_state_minor(initial_state, order, inputs_shape)[source]

Tiles an initial state to be used for encoder minor.

The batch dimension of initial_state must equal T. The state will be copied for B times and used to start encoding each low-level sequence. For example, the first utterance in each dialog history in the batch will have the same initial state.

Parameters: initial_state – Initial state with the batch dimension of size T. order (str) – The dimension order of inputs. Must be the same as used in _build(). inputs_shape – Shape of inputs for _build(). Can usually be Obtained with tf.shape(inputs). A tiled initial state with batch dimension of size B*T
static flatten(x)[source]

Flattens a cell state by concatenating a sequence of cell states along the last dimension. If the cell states are LSTMStateTuple, only the hidden LSTMStateTuple.h is used.

This process is used by default if medium is not provided to _build().

encoder_major

The high-level encoder.

encoder_minor

The low-level encoder.

class texar.modules.MultiheadAttentionEncoder(hparams=None)[source]

Parameters: hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.
_build(queries, memory, memory_attention_bias, cache=None, mode=None)[source]

Encodes the inputs.

Parameters: queries – A 3d tensor with shape of [batch, length_query, depth_query]. memory – A 3d tensor with shape of [batch, length_key, depth_key]. memory_attention_bias – A 3d tensor with shape of [batch, length_key, num_units]. cache – Memory cache only when inferencing the sentence from sractch. mode (optional) – A tensor taking value in tf.estimator.ModeKeys, including TRAIN, EVAL and PREDICT. Controls dropout mode. If None (default), texar.global_mode() is used. A Tensor of shape [batch_size, max_time, dim] containing the encoded vectors.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
"initializer": None,
'output_dim': 512,
'num_units': 512,
'dropout_rate': 0.1,
'use_bias': False,
}


Here:

“initializer” : dict, optional
Hyperparameters of the default initializer that initializes variables created in this module. See get_initializer() for details.
Number of heads for attention calculation.
“output_dim” : int
Output dimension of the returned tensor.
“num_units” : int
Hidden dimension of the unsplitted attention space. Should be devicible by num_heads.
“dropout_rate: : float
Dropout rate in the attention.
“use_bias”: bool
Use bias when projecting the key, value and query.
“name” : str
Name of the module.

### TransformerEncoder¶

class texar.modules.TransformerEncoder(hparams=None)[source]

Transformer encoder that applies multi-head self attention for encoding sequences. Stacked ~texar.modules.encoders.MultiheadAttentionEncoder, ~texar.modules.FeedForwardNetwork and residual connections. :param hparams: Hyperparameters. Missing

hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.
_build(inputs, sequence_length, mode=None)[source]

Encodes the inputs.

Parameters: inputs – A 3D Tensor of shape [batch_size, max_time, dim], containing the word embeddings of input sequences. Note that the embedding dimension dim must equal “dim” in hparams. sequence_length – A 1D Tensor of shape [batch_size]. Input tokens beyond respective sequence lengths are masked out automatically. mode (optional) – A tensor taking value in tf.estimator.ModeKeys, including TRAIN, EVAL, and PREDICT. Used to toggle dropout. If None (default), texar.global_mode() is used. A Tensor of shape [batch_size, max_time, dim] containing the encoded vectors.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
"num_blocks": 6,
"dim": 512,
'position_embedder_type': 'sinusoids',
'position_size': None,
'position_embedder_hparams': None,
"embedding_dropout": 0.1,
"residual_dropout": 0.1,
"poswise_feedforward": default_transformer_poswise_net_hparams,
'num_units': 512,
'output_dim': 512,
'dropout_rate': 0.1,
'output_dim': 512,
'use_bias': False,
},
"initializer": None,
"name": "transformer_encoder"
'use_bert_config': False,
}


Here:

“num_blocks” : int
Number of stacked blocks.
“dim” : int
Hidden dimension of the encoders.
“use_bert_config”: bool

If False, apply the default Transformer Encoder architecture. If True, apply the Transformer Encoder architecture used in BERT. The differences lie in:

1. The Normalization of the input embedding with dimension
2. The attention bias for padding tokens.
3. The residual connections between the internal tensors.
“position_embedder_type”:

Choose from “sinusoids” or “variables”.

“sinusoids”:
create the position embedding as sinusoids, which is fixed.
“variables”:
create the position embedding as trainable variables.
“position_size”: int
The size of position embeddings. Only be used when “position_embedder_type”is “variables”.
“position_embedder_hparams” : dict, optional
Hyperparameters of a PositionEmbedder as position embedder if “position_embedder_type” is “variables”, or Hyperparameters of a SinusoidsPositionEmbedder as position embedder if “position_embedder_type” is “sinusoids”.
“embedding_dropout” : float
Dropout rate of the input word and position embeddings.
“residual_dropout” : float
Dropout rate of the residual connections.
“poswise_feedforward” : dict,

Hyperparameters for a feed-forward network used in residual connections. Make sure the dimension of the output tensor is equal to dim.

See default_transformer_poswise_net_hparams() for details.

Hyperparameters for the multihead attention strategy. Make sure the “output_dim” in this module is equal to “dim”. See :func:

“initializer” : dict, optional
Hyperparameters of the default initializer that initializes variables created in this module. See get_initializer() for details.
“name” : str
Name of the module.

### Conv1DEncoder¶

class texar.modules.Conv1DEncoder(hparams=None)[source]

Simple Conv-1D encoder which consists of a sequence of conv layers followed with a sequence of dense layers.

Wraps Conv1DNetwork to be a subclass of EncoderBase. Has exact the same functionality with Conv1DNetwork.

static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

The same as default_hparams() of Conv1DNetwork, except that the default name is ‘conv_encoder’.

### EncoderBase¶

class texar.modules.EncoderBase(hparams=None)[source]

Base class inherited by all encoder classes.

static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

### RNNEncoderBase¶

class texar.modules.RNNEncoderBase(hparams=None)[source]

Base class for all RNN encoder classes to inherit.

Parameters: hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
"name": "rnn_encoder"
}


### default_transformer_poswise_net_hparams¶

texar.modules.default_transformer_poswise_net_hparams(output_dim=512)[source]

Returns default hyperparameters of a FeedForwardNetwork as a pos-wise network used in TransformerEncoder and TransformerDecoder.

This is a 2-layer dense network with dropout in-between.

{
"layers": [
{
"type": "Dense",
"kwargs": {
"name": "conv1",
"units": output_dim*4,
"activation": "relu",
"use_bias": True,
}
},
{
"type": "Dropout",
"kwargs": {
"rate": 0.1,
}
},
{
"type": "Dense",
"kwargs": {
"name": "conv2",
"units": output_dim,
"use_bias": True,
}
}
],
"name": "ffn"
}

Parameters: output_dim (int) – The size of output dense layer.

## Decoders¶

### RNNDecoderBase¶

class texar.modules.RNNDecoderBase(cell=None, vocab_size=None, output_layer=None, cell_dropout_mode=None, hparams=None)[source]

Base class inherited by all RNN decoder classes. See BasicRNNDecoder for the argumenrts.

See _build() for the inputs and outputs of RNN decoders in general.

_build(decoding_strategy='train_greedy', initial_state=None, inputs=None, sequence_length=None, embedding=None, start_tokens=None, end_token=None, softmax_temperature=None, max_decoding_length=None, impute_finished=False, output_time_major=False, input_time_major=False, helper=None, mode=None, **kwargs)[source]

Performs decoding. This is a shared interface for both BasicRNNDecoder and AttentionRNNDecoder.

The function provides 3 ways to specify the decoding method, with varying flexibility:

1. The decoding_strategy argument: A string taking value of:

• “train_greedy”: decoding in teacher-forcing fashion (i.e., feeding ground truth to decode the next step), and each sample is obtained by taking the argmax of the RNN output logits. Arguments (inputs, sequence_length, input_time_major) are required for this strategy, and argument embedding is optional.
• “infer_greedy”: decoding in inference fashion (i.e., feeding the generated sample to decode the next step), and each sample is obtained by taking the argmax of the RNN output logits. Arguments (embedding, start_tokens, end_token) are required for this strategy, and argument max_decoding_length is optional.
• “infer_sample”: decoding in inference fashion, and each sample is obtained by random sampling from the RNN output distribution. Arguments (embedding, start_tokens, end_token) are required for this strategy, and argument max_decoding_length is optional.

This argument is used only when argument helper is None.

Example:

embedder = WordEmbedder(vocab_size=data.vocab.size)
decoder = BasicRNNDecoder(vocab_size=data.vocab.size)

# Teacher-forcing decoding
outputs_1, _, _ = decoder(
decoding_strategy='train_greedy',
inputs=embedder(data_batch['text_ids']),
sequence_length=data_batch['length']-1)

# Random sample decoding. Gets 100 sequence samples
outputs_2, _, sequence_length = decoder(
decoding_strategy='infer_sample',
start_tokens=[data.vocab.bos_token_id]*100,
end_token=data.vocab.eos.token_id,
embedding=embedder,
max_decoding_length=60)

1. The helper argument: An instance of subclass of tf.contrib.seq2seq.Helper. This provides a superset of decoding strategies than above, for example:

This means gives the maximal flexibility of configuring the decoding strategy.

Example:

embedder = WordEmbedder(vocab_size=data.vocab.size)
decoder = BasicRNNDecoder(vocab_size=data.vocab.size)

# Teacher-forcing decoding, same as above with
# decoding_strategy='train_greedy'
helper_1 = tf.contrib.seq2seq.TrainingHelper(
inputs=embedders(data_batch['text_ids']),
sequence_length=data_batch['length']-1)
outputs_1, _, _ = decoder(helper=helper_1)

# Gumbel-softmax decoding
helper_2 = GumbelSoftmaxEmbeddingHelper(
embedding=embedder,
start_tokens=[data.vocab.bos_token_id]*100,
end_token=data.vocab.eos_token_id,
tau=0.1)
outputs_2, _, sequence_length = decoder(
max_decoding_length=60, helper=helper_2)

1. hparams["helper_train"] and hparams["helper_infer"]: Specifying the helper through hyperparameters. Train and infer strategy is toggled based on mode. Appriopriate arguments (e.g., inputs, start_tokens, etc) are selected to construct the helper. Additional arguments for helper constructor can be provided either through **kwargs, or through hparams["helper_train/infer"]["kwargs"].

This means is used only when both decoding_strategy and helper are None.

Example:

h = {
"helper_infer": {
"type": "GumbelSoftmaxEmbeddingHelper",
"kwargs": { "tau": 0.1 }
}
}
embedder = WordEmbedder(vocab_size=data.vocab.size)
decoder = BasicRNNDecoder(vocab_size=data.vocab.size, hparams=h)

# Gumbel-softmax decoding
output, _, _ = decoder(
decoding_strategy=None, # Sets to None explicit
embedding=embedder,
start_tokens=[data.vocab.bos_token_id]*100,
end_token=data.vocab.eos_token_id,
max_decoding_length=60,
mode=tf.estimator.ModeKeys.PREDICT)
# PREDICT mode also shuts down dropout

Parameters: decoding_strategy (str) – A string specifying the decoding strategy. Different arguments are required based on the strategy. Ignored if helper is given. initial_state (optional) – Initial state of decoding. If None (default), zero state is used. inputs (optional) – Input tensors for teacher forcing decoding. Used when decoding_strategy is set to “train_greedy”, or when hparams-configured helper is used. If embedding is None, inputs is directly fed to the decoder. E.g., in “train_greedy” strategy, inputs must be a 3D Tensor of shape [batch_size, max_time, emb_dim] (or [max_time, batch_size, emb_dim] if input_time_major==True). If embedding is given, inputs is used as index to look up embeddings and feed in the decoder. E.g., if embedding is an instance of WordEmbedder, then inputs is usually a 2D int Tensor [batch_size, max_time] (or [max_time, batch_size] if input_time_major==True) containing the token indexes. sequence_length (optional) – A 1D int Tensor containing the sequence length of inputs. Used when decoding_strategy=”train_greedy” or hparams-configured helper is used. embedding (optional) – A callable that returns embedding vectors of inputs (e.g., an instance of subclass of EmbedderBase), or the params argument of tf.nn.embedding_lookup. If provided, inputs (if used) will be passed to embedding to fetch the embedding vectors of the inputs. Required when decoding_strategy=”infer_greedy” or “infer_sample”; optional when decoding_strategy=”train_greedy”. start_tokens (optional) – A int Tensor of shape [batch_size], the start tokens. Used when decoding_strategy=”infer_greedy” or “infer_sample”, or when hparams-configured helper is used. Companying with Texar data module, to get batch_size samples where batch_size is changing according to the data module, this can be set as start_tokens=tf.ones_like(batch[‘length’])*bos_token_id. end_token (optional) – A int 0D Tensor, the token that marks end of decoding. Used when decoding_strategy=”infer_greedy” or “infer_sample”, or when hparams-configured helper is used. softmax_temperature (optional) – A float 0D Tensor, value to divide the logits by before computing the softmax. Larger values (above 1.0) result in more random samples. Must > 0. If None, 1.0 is used. Used when decoding_strategy=”infer_sample”. max_decoding_length – A int scalar Tensor indicating the maximum allowed number of decoding steps. If None (default), either hparams[“max_decoding_length_train”] or hparams[“max_decoding_length_infer”] is used according to mode. impute_finished (bool) – If True, then states for batch entries which are marked as finished get copied through and the corresponding outputs get zeroed out. This causes some slowdown at each time step, but ensures that the final state and outputs have the correct values and that backprop ignores time steps that were marked as finished. output_time_major (bool) – If True, outputs are returned as time major tensors. If False (default), outputs are returned as batch major tensors. input_time_major (optional) – Whether the inputs tensor is time major. Used when decoding_strategy=”train_greedy” or hparams-configured helper is used. helper (optional) – An instance of Helper that defines the decoding strategy. If given, decoding_strategy and helper configs in hparams are ignored. mode (str, optional) – A string taking value in tf.estimator.ModeKeys. If TRAIN, training related hyperparameters are used (e.g., hparams[‘max_decoding_length_train’]), otherwise, inference related hyperparameters are used (e.g., hparams[‘max_decoding_length_infer’]). If None (default), TRAIN mode is used. **kwargs – Other keyword arguments for constructing helpers defined by hparams[“helper_trainn”] or hparams[“helper_infer”]. (outputs, final_state, sequence_lengths), where outputs: an object containing the decoder output on all time steps. final_state: is the cell state of the final time step. sequence_lengths: is an int Tensor of shape [batch_size] containing the length of each sample.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

The hyperparameters are the same as in default_hparams() of BasicRNNDecoder, except that the default “name” here is “rnn_decoder”.

batch_size

The batch size of input values.

cell

The RNN cell.

zero_state(batch_size, dtype)[source]

Zero state of the RNN cell. Equivalent to decoder.cell.zero_state.

state_size

The state size of decoder cell. Equivalent to decoder.cell.state_size.

hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

vocab_size

The vocab size.

output_layer

The output layer.

### BasicRNNDecoder¶

class texar.modules.BasicRNNDecoder(cell=None, cell_dropout_mode=None, vocab_size=None, output_layer=None, hparams=None)[source]

Basic RNN decoder.

Parameters: cell (RNNCell, optional) – An instance of RNNCell. If None (default), a cell is created as specified in hparams. cell_dropout_mode (optional) – A Tensor taking value of tf.estimator.ModeKeys, which toggles dropout in the RNN cell (e.g., activates dropout in TRAIN mode). If None, global_mode() is used. Ignored if cell is given. vocab_size (int, optional) – Vocabulary size. Required if output_layer is None. output_layer (optional) – An instance of tf.layers.Layer, or tf.identity. Apply to the RNN cell output to get logits. If None, a dense layer is used with output dimension set to vocab_size. Set output_layer=tf.identity if you do not want to have an output layer after the RNN cell outputs. hparams (dict, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.

See _build() for the inputs and outputs of the decoder. The decoder returns (outputs, final_state, sequence_lengths), where outputs is an instance of BasicRNNDecoderOutput.

Example

embedder = WordEmbedder(vocab_size=data.vocab.size)
decoder = BasicRNNDecoder(vocab_size=data.vocab.size)

# Training loss
outputs, _, _ = decoder(
decoding_strategy='train_greedy',
inputs=embedder(data_batch['text_ids']),
sequence_length=data_batch['length']-1)

loss = tx.losses.sequence_sparse_softmax_cross_entropy(
labels=data_batch['text_ids'][:, 1:],
logits=outputs.logits,
sequence_length=data_batch['length']-1)

# Inference sample
outputs, _, _ = decoder(
decoding_strategy='infer_sample',
start_tokens=[data.vocab.bos_token_id]*100,
end_token=data.vocab.eos.token_id,
embedding=embedder,
max_decoding_length=60,
mode=tf.estimator.ModeKeys.PREDICT)

sample_id = sess.run(outputs.sample_id)
sample_text = tx.utils.map_ids_to_strs(sample_id, data.vocab)
print(sample_text)
# [
#   the first sequence sample .
#   the second sequence sample .
#   ...
# ]

static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
"rnn_cell": default_rnn_cell_hparams(),
"max_decoding_length_train": None,
"max_decoding_length_infer": None,
"helper_train": {
"type": "TrainingHelper",
"kwargs": {}
}
"helper_infer": {
"type": "SampleEmbeddingHelper",
"kwargs": {}
}
"name": "basic_rnn_decoder"
}


Here:

“rnn_cell” : dict
A dictionary of RNN cell hyperparameters. Ignored if cell is given to the decoder constructor. The default value is defined in default_rnn_cell_hparams().
“max_decoding_length_train”: int or None
Maximum allowed number of decoding steps in training mode. If None (default), decoding is performed until fully done, e.g., encountering the <EOS> token. Ignored if max_decoding_length is given when calling the decoder.
“max_decoding_length_infer” : int or None
Same as “max_decoding_length_train” but for inference mode.
“helper_train” : dict
The hyperparameters of the helper used in training. “type” can be a helper class, its name or module path, or a helper instance. If a class name is given, the class must be from module tf.contrib.seq2seq, texar.modules, or texar.custom. This is used only when both decoding_strategy and helper augments are None when calling the decoder. See _build() for more details.
“helper_infer”: dict
Same as “helper_train” but during inference mode.
“name” : str

Name of the decoder.

The default value is “basic_rnn_decoder”.

batch_size

The batch size of input values.

cell

The RNN cell.

hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

output_layer

The output layer.

state_size

The state size of decoder cell. Equivalent to decoder.cell.state_size.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

vocab_size

The vocab size.

zero_state(batch_size, dtype)

Zero state of the RNN cell. Equivalent to decoder.cell.zero_state.

### BasicRNNDecoderOutput¶

class texar.modules.BasicRNNDecoderOutput[source]

The outputs of basic RNN decoder that include both RNN outputs and sampled ids at each step. This is also used to store results of all the steps after decoding the whole sequence.

logits

The outputs of RNN (at each step/of all steps) by applying the output layer on cell outputs. E.g., in BasicRNNDecoder with default hyperparameters, this is a Tensor of shape [batch_size, max_time, vocab_size] after decoding the whole sequence.

sample_id

The sampled results (at each step/of all steps). E.g., in BasicRNNDecoder with decoding strategy of train_greedy, this is a Tensor of shape [batch_size, max_time] containing the sampled token indexes of all steps.

cell_output

The output of RNN cell (at each step/of all steps). This is the results prior to the output layer. E.g., in BasicRNNDecoder with default hyperparameters, this is a Tensor of shape [batch_size, max_time, cell_output_size] after decoding the whole sequence.

### AttentionRNNDecoder¶

class texar.modules.AttentionRNNDecoder(memory, memory_sequence_length=None, cell=None, cell_dropout_mode=None, vocab_size=None, output_layer=None, cell_input_fn=None, hparams=None)[source]

RNN decoder with attention mechanism.

Parameters: memory – The memory to query, e.g., the output of an RNN encoder. This tensor should be shaped [batch_size, max_time, dim]. memory_sequence_length (optional) – A tensor of shape [batch_size] containing the sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths. cell (RNNCell, optional) – An instance of RNNCell. If None, a cell is created as specified in hparams. cell_dropout_mode (optional) – A Tensor taking value of tf.estimator.ModeKeys, which toggles dropout in the RNN cell (e.g., activates dropout in TRAIN mode). If None, global_mode() is used. Ignored if cell is given. vocab_size (int, optional) – Vocabulary size. Required if output_layer is None. output_layer (optional) – An instance of tf.layers.Layer, or tf.identity. Apply to the RNN cell output to get logits. If None, a dense layer is used with output dimension set to vocab_size. Set output_layer=tf.identity if you do not want to have an output layer after the RNN cell outputs. cell_input_fn (callable, optional) – A callable that produces RNN cell inputs. If None (default), the default is used: lambda inputs, attention: tf.concat([inputs, attention], -1), which cancats regular RNN cell inputs with attentions. hparams (dict, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.

See _build() for the inputs and outputs of the decoder. The decoder returns (outputs, final_state, sequence_lengths), where outputs is an instance of AttentionRNNDecoderOutput.

Example

# Encodes the source
enc_embedder = WordEmbedder(data.source_vocab.size, ...)
encoder = UnidirectionalRNNEncoder(...)

enc_outputs, _ = encoder(
inputs=enc_embedder(data_batch['source_text_ids']),
sequence_length=data_batch['source_length'])

# Decodes while attending to the source
dec_embedder = WordEmbedder(vocab_size=data.target_vocab.size, ...)
decoder = AttentionRNNDecoder(
memory=enc_outputs,
memory_sequence_length=data_batch['source_length'],
vocab_size=data.target_vocab.size)

outputs, _, _ = decoder(
decoding_strategy='train_greedy',
inputs=dec_embedder(data_batch['target_text_ids']),
sequence_length=data_batch['target_length']-1)

static default_hparams()[source]

Returns a dictionary of hyperparameters with default values:

Common hyperparameters are the same as in BasicRNNDecoder. default_hparams(). Additional hyperparameters are for attention mechanism configuration.

{
"attention": {
"type": "LuongAttention",
"kwargs": {
"num_units": 256,
},
"attention_layer_size": None,
"alignment_history": False,
"output_attention": True,
},
# The following hyperparameters are the same as with
# BasicRNNDecoder
"rnn_cell": default_rnn_cell_hparams(),
"max_decoding_length_train": None,
"max_decoding_length_infer": None,
"helper_train": {
"type": "TrainingHelper",
"kwargs": {}
}
"helper_infer": {
"type": "SampleEmbeddingHelper",
"kwargs": {}
}
"name": "attention_rnn_decoder"
}


Here:

“attention” : dict

Attention hyperparameters, including:

“type” : str or class or instance

The attention type. Can be an attention class, its name or module path, or a class instance. The class must be a subclass of TF AttentionMechanism. If class name is given, the class must be from modules tf.contrib.seq2seq or texar.custom.

Example:

# class name
"type": "LuongAttention"
"type": "BahdanauAttention"
# module path
"type": "tf.contrib.seq2seq.BahdanauMonotonicAttention"
"type": "my_module.MyAttentionMechanismClass"
# class
"type": tf.contrib.seq2seq.LuongMonotonicAttention
# instance
"type": LuongAttention(...)

“kwargs” : dict

keyword arguments for the attention class constructor. Arguments memory and memory_sequence_length should not be specified here because they are given to the decoder constructor. Ignored if “type” is an attention class instance. For example

Example:

"type": "LuongAttention",
"kwargs": {
"num_units": 256,
"probability_fn": tf.nn.softmax
}


Here “probability_fn” can also be set to the string name or module path to a probability function.

“attention_layer_size” : int or None
The depth of the attention (output) layer. The context and cell output are fed into the attention layer to generate attention at each time step. If None (default), use the context as attention at each time step.
“alignment_history”: bool
whether to store alignment history from all time steps in the final output state. (Stored as a time major TensorArray on which you must call stack().)
“output_attention”: bool
If True (default), the output at each time step is the attention value. This is the behavior of Luong-style attention mechanisms. If False, the output at each time step is the output of cell. This is the beahvior of Bhadanau-style attention mechanisms. In both cases, the attention tensor is propagated to the next time step via the state and is used there. This flag only controls whether the attention mechanism is propagated up to the next cell in an RNN stack or to the top RNN output.
zero_state(batch_size, dtype)[source]

Returns zero state of the basic cell. Equivalent to decoder.cell._cell.zero_state.

wrapper_zero_state(batch_size, dtype)[source]

Returns zero state of the attention-wrapped cell. Equivalent to decoder.cell.zero_state.

state_size

The state size of the basic cell. Equivalent to decoder.cell._cell.state_size.

wrapper_state_size

The state size of the attention-wrapped cell. Equivalent to decoder.cell.state_size.

batch_size

The batch size of input values.

cell

The RNN cell.

hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

output_layer

The output layer.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

vocab_size

The vocab size.

### AttentionRNNDecoderOutput¶

class texar.modules.AttentionRNNDecoderOutput[source]

The outputs of attention RNN decoders that additionally include attention results.

logits

The outputs of RNN (at each step/of all steps) by applying the output layer on cell outputs. E.g., in AttentionRNNDecoder, this is a Tensor of shape [batch_size, max_time, vocab_size] after decoding.

sample_id

The sampled results (at each step/of all steps). E.g., in AttentionRNNDecoder with decoding strategy of train_greedy, this is a Tensor of shape [batch_size, max_time] containing the sampled token indexes of all steps.

cell_output

The output of RNN cell (at each step/of all steps). This is the results prior to the output layer. E.g., in AttentionRNNDecoder with default hyperparameters, this is a Tensor of shape [batch_size, max_time, cell_output_size] after decoding the whole sequence.

attention_scores

A single or tuple of Tensor(s) containing the alignments emitted (at the previous time step/of all time steps) for each attention mechanism.

attention_context

The attention emitted (at the previous time step/of all time steps).

### beam_search_decode¶

texar.modules.beam_search_decode(decoder_or_cell, embedding, start_tokens, end_token, beam_width, initial_state=None, tiled_initial_state=None, output_layer=None, length_penalty_weight=0.0, max_decoding_length=None, output_time_major=False, **kwargs)[source]

Performs beam search sampling decoding.

Parameters: decoder_or_cell – An instance of subclass of RNNDecoderBase, or an instance of RNNCell. The decoder or RNN cell to perform decoding. embedding – A callable that takes a vector tensor of indexes (e.g., an instance of subclass of EmbedderBase), or the params argument for tf.nn.embedding_lookup. start_tokens – int32 vector shaped [batch_size], the start tokens. end_token – int32 scalar, the token that marks end of decoding. beam_width (int) – Python integer, the number of beams. initial_state (optional) – Initial state of decoding. If None (default), zero state is used. The state must not be tiled with tile_batch. If you have an already-tiled initial state, use tiled_initial_state instead. In the case of attention RNN decoder, initial_state must not be an AttentionWrapperState. Instead, it must be a state of the wrapped RNNCell, which state will be wrapped into AttentionWrapperState automatically. Ignored if tiled_initial_state is given. tiled_initial_state (optional) – Initial state that has been tiled (typicaly with tile_batch) so that the batch dimension has size batch_size * beam_width. In the case of attention RNN decoder, this can be either a state of the wrapped RNNCell, or an AttentionWrapperState. If not given, initial_state is used. output_layer (optional) – A Layer instance to apply to the RNN output prior to storing the result or sampling. If None and decoder_or_cell is a decoder, the decoder’s output layer will be used. length_penalty_weight – Float weight to penalize length. Disabled with 0.0 (default). max_decoding_length (optional) – A int scalar Tensor indicating the maximum allowed number of decoding steps. If None (default), decoding will continue until the end token is encountered. output_time_major (bool) – If True, outputs are returned as time major tensors. If False (default), outputs are returned as batch major tensors. **kwargs – Other keyword arguments for dynamic_decode except argument maximum_iterations which is set to max_decoding_length. A tuple (outputs, final_state, sequence_length), where outputs: An instance of FinalBeamSearchDecoderOutput. final_state: An instance of BeamSearchDecoderState. sequence_length: A Tensor of shape [batch_size] containing the lengths of samples.

Example

## Beam search with basic RNN decoder

embedder = WordEmbedder(vocab_size=data.vocab.size)
decoder = BasicRNNDecoder(vocab_size=data.vocab.size)

outputs, _, _, = beam_search_decode(
decoder_or_cell=decoder,
embedding=embedder,
start_tokens=[data.vocab.bos_token_id] * 100,
end_token=data.vocab.eos_token_id,
beam_width=5,
max_decoding_length=60)

sample_ids = sess.run(outputs.predicted_ids)
sample_text = tx.utils.map_ids_to_strs(sample_id[:,:,0], data.vocab)
print(sample_text)
# [
#   the first sequence sample .
#   the second sequence sample .
#   ...
# ]

## Beam search with attention RNN decoder

# Encodes the source
enc_embedder = WordEmbedder(data.source_vocab.size, ...)
encoder = UnidirectionalRNNEncoder(...)

enc_outputs, enc_state = encoder(
inputs=enc_embedder(data_batch['source_text_ids']),
sequence_length=data_batch['source_length'])

# Decodes while attending to the source
dec_embedder = WordEmbedder(vocab_size=data.target_vocab.size, ...)
decoder = AttentionRNNDecoder(
memory=enc_outputs,
memory_sequence_length=data_batch['source_length'],
vocab_size=data.target_vocab.size)

# Beam search
outputs, _, _, = beam_search_decode(
decoder_or_cell=decoder,
embedding=dec_embedder,
start_tokens=[data.vocab.bos_token_id] * 100,
end_token=data.vocab.eos_token_id,
beam_width=5,
initial_state=enc_state,
max_decoding_length=60)


### TransformerDecoder¶

class texar.modules.TransformerDecoder(embedding, hparams=None)[source]

Transformer decoder that applies multi-head attention for sequence decoding. Stacked ~texar.modules.encoders.MultiheadAttentionEncoder for encoder-decoder attention and self attention, ~texar.modules.FeedForwardNetwork and residual connections.

Use the passed embedding variable as the parameters of the transform layer from output to logits.

Parameters: embedding – A Tensor of shape [vocab_size, dim] containing the word embeddng. The Tensor is used as the decoder output layer. hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.
_build(memory, memory_sequence_length=None, memory_attention_bias=None, inputs=None, sequence_length=None, decoding_strategy='train_greedy', beam_width=1, alpha=0, start_tokens=None, end_token=None, max_decoding_length=None, mode=None)[source]

Performs decoding.

The decoder supports 4 decoding strategies. For the first 3 strategies, set decoding_strategy to the respective string.

• “train_greedy”: decoding in teacher-forcing fashion (i.e., feeding ground truth to decode the next step), and for each step sample is obtained by taking the argmax of logits. Argument inputs is required for this strategy. sequence_length is optional.
• “infer_greedy”: decoding in inference fashion (i.e., feeding generated sample to decode the next step), and for each step sample is obtained by taking the argmax of logits. Arguments (start_tokens, end_token) are required for this strategy, and argument max_decoding_length is optional.
• “infer_sample”: decoding in inference fashion, and for each step sample is obtained by random sampling from the logits. Arguments (start_tokens, end_token) are required for this strategy, and argument max_decoding_length is optional.
• Beam Search: set beam_width to > 1 to use beam search decoding. Arguments (start_tokens, end_token) are required, and argument max_decoding_length is optional.
Parameters: memory – The memory to attend, e.g., the output of an RNN encoder. A Tensor of shape [batch_size, memory_max_time, dim]. memory_sequence_length (optional) – A Tensor of shape [batch_size] containing the sequence lengths for the batch entries in memory. Used to create attention bias of memory_attention_bias is not given. Ignored if memory_attention_bias is provided. memory_attention_bias (optional) – A Tensor of shape [batch_size, num_heads, memory_max_time, dim]. An attention bias typically sets the value of a padding position to a large negative value for masking. If not given, memory_sequence_length is used to automatically create an attention bias. inputs (optional) – Input tensor for teacher forcing decoding, of shape [batch_size, target_max_time, emb_dim] containing the target sequence word embeddings. Used when decoding_strategy is set to “train_greedy”. sequence_length (optional) – A Tensor of shape [batch_size], containing the sequence length of inputs. Tokens beyond the respective sequence length are masked out. Used when decoding_strategy is set to “train_greedy”. decoding_strategy (str) – A string specifying the decoding strategy, including “train_greedy”, “infer_greedy”, “infer_sample”. Different arguments are required based on the strategy. See above for details. Ignored if beam_width > 1. beam_width (int) – Set to > 1 to use beam search. alpha (float) – Length penalty coefficient. Refer to https://arxiv.org/abs/1609.08144 for more details. start_tokens (optional) – An int Tensor of shape [batch_size], containing the start tokens. Used when decoding_strategy = “infer_greedy” or “infer_sample”, or beam_width > 1. end_token (optional) – An int 0D Tensor, the token that marks end of decoding. Used when decoding_strategy = “infer_greedy” or “infer_sample”, or beam_width > 1. max_decoding_length (optional) – An int scalar Tensor indicating the maximum allowed number of decoding steps. If None (default), use “max_decoding_length” defined in hparams. Ignored in “train_greedy” decoding. mode (optional) – A tensor taking value in tf.estimator.ModeKeys, including TRAIN, EVAL, and PREDICT. Controls dropout mode. If None (default), texar.global_mode() is used. For “train_greedy” decoding, returns an instance of TransformerDecoderOutput which contains sample_id and logits. For “infer_greedy” and “infer_sample” decoding, returns a tuple (outputs, sequence_lengths), where outputs is an instance of TransformerDecoderOutput as in “train_greedy”, and sequence_lengths is a Tensor of shape [batch_size] containing the length of each sample. For beam_search decoding, returns a dict containing keys “sample_id” and “log_prob”. ”sample_id” is an int Tensor of shape [batch_size, max_time, beam_width] containing generated token indexes. sample_id[:,:,0] is the highest-probable sample. ”log_porb” is a float Tensor of shape [batch_size, beam_width] containing the log probability of each sequence sample.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
# Same as in TransformerEncoder
"num_blocks": 6,
"dim": 512,
"position_embedder_hparams": None,
"embedding_dropout": 0.1,
"residual_dropout": 0.1,
"poswise_feedforward": default_transformer_poswise_net_hparams,
"num_units": 512,
},
"initializer": None,
"embedding_tie": True,
"output_layer_bias": False,
"max_decoding_length": 1e10,
"name": "transformer_decoder"
}


Here:

“num_blocks” : int
Number of stacked blocks.
“dim” : int
Hidden dimension of the encoder.
“position_embedder_hparams” : dict, optional
Hyperparameters of a SinusoidsPositionEmbedder as position embedder. If None, the default_hparams() is used.
“embedding_dropout”: float
Dropout rate of the input word and position embeddings.
“residual_dropout” : float
Dropout rate of the residual connections.
“poswise_feedforward” : dict,

Hyperparameters for a feed-forward network used in residual connections. Make sure the dimension of the output tensor is equal to dim.

See default_transformer_poswise_net_hparams() for details.

Hyperparameters for the multihead attention strategy. Make sure the output_dim in this module is equal to dim.

See :func:



“initializer” : dict, optional
Hyperparameters of the default initializer that initializes variables created in this module. See get_initializer() for details.
“embedding_tie” : bool
Whether to use the word embedding matrix as the output layer that computes logits. If False, an additional dense layer is created.
“output_layer_bias” : bool
Whether to use bias to the output layer.
“max_decoding_length” : int

The maximum allowed number of decoding steps. Set to a very large number of avoid the length constraint. Ignored if provided in _build() or “train_greedy” decoding is used.

Length penalty coefficient. Refer to https://arxiv.org/abs/1609.08144 for more details.

“name” : str
Name of the module.

### TransformerDecoderOutput¶

class texar.modules.TransformerDecoderOutput[source]

The output of TransformerDecoder.

logits

A float Tensor of shape [batch_size, max_time, vocab_size] containing the logits.

sample_id

An int Tensor of shape [batch_size, max_time] containing the sampled token indexes.

### SoftmaxEmbeddingHelper¶

class texar.modules.SoftmaxEmbeddingHelper(embedding, start_tokens, end_token, tau, stop_gradient=False, use_finish=True)[source]

A helper that feeds softmax probabilities over vocabulary to the next step. Uses the softmax probability vector to pass through word embeddings to get the next input (i.e., a mixed word embedding).

A subclass of Helper. Used as a helper to RNNDecoderBase _build() in inference mode.

Parameters: embedding – An embedding argument (params) for tf.nn.embedding_lookup, or an instance of subclass of texar.modules.EmbedderBase. Note that other callables are not acceptable here. start_tokens – An int tensor shaped [batch_size]. The start tokens. end_token – An int scalar tensor. The token that marks end of decoding. tau – A float scalar tensor, the softmax temperature. stop_gradient (bool) – Whether to stop the gradient backpropagation when feeding softmax vector to the next step. use_finish (bool) – Whether to stop decoding once end_token is generated. If False, decoding will continue until max_decoding_length of the decoder is reached.
batch_size

Batch size of tensor returned by sample.

Returns a scalar int32 tensor.

sample_ids_dtype

DType of tensor returned by sample.

Returns a DType.

sample_ids_shape

Shape of tensor returned by sample, excluding the batch dimension.

Returns a TensorShape.

initialize(name=None)[source]

Returns (initial_finished, initial_inputs).

sample(time, outputs, state, name=None)[source]

Returns sample_id which is softmax distributions over vocabulary with temperature tau. Shape = [batch_size, vocab_size]

next_inputs(time, outputs, state, sample_ids, name=None)[source]

Returns (finished, next_inputs, next_state).

### GumbelSoftmaxEmbeddingHelper¶

class texar.modules.GumbelSoftmaxEmbeddingHelper(embedding, start_tokens, end_token, tau, straight_through=False, stop_gradient=False, use_finish=True)[source]

A helper that feeds gumbel softmax sample to the next step. Uses the gumbel softmax vector to pass through word embeddings to get the next input (i.e., a mixed word embedding).

A subclass of Helper. Used as a helper to RNNDecoderBase _build() in inference mode.

Same as SoftmaxEmbeddingHelper except that here gumbel softmax (instead of softmax) is used.

Parameters: embedding – An embedding argument (params) for tf.nn.embedding_lookup, or an instance of subclass of texar.modules.EmbedderBase. Note that other callables are not acceptable here. start_tokens – An int tensor shaped [batch_size]. The start tokens. end_token – An int scalar tensor. The token that marks end of decoding. tau – A float scalar tensor, the softmax temperature. straight_through (bool) – Whether to use straight through gradient between time steps. If True, a single token with highest probability (i.e., greedy sample) is fed to the next step and gradient is computed using straight through. If False (default), the soft gumbel-softmax distribution is fed to the next step. stop_gradient (bool) – Whether to stop the gradient backpropagation when feeding softmax vector to the next step. use_finish (bool) – Whether to stop decoding once end_token is generated. If False, decoding will continue until max_decoding_length of the decoder is reached.
sample(time, outputs, state, name=None)[source]

Returns sample_id of shape [batch_size, vocab_size]. If straight_through is False, this is gumbel softmax distributions over vocabulary with temperature tau. If straight_through is True, this is one-hot vectors of the greedy samples.

### get_helper¶

texar.modules.get_helper(helper_type, inputs=None, sequence_length=None, embedding=None, start_tokens=None, end_token=None, **kwargs)[source]

Creates a Helper instance.

Parameters: helper_type – A Helper class, its name or module path, or a class instance. If a class instance is given, it is returned directly. inputs (optional) – Inputs to the RNN decoder, e.g., ground truth tokens for teacher forcing decoding. sequence_length (optional) – A 1D int Tensor containing the sequence length of inputs. embedding (optional) – A callable that takes a vector tensor of indexes (e.g., an instance of subclass of EmbedderBase), or the params argument for embedding_lookup (e.g., the embedding Tensor). start_tokens (optional) – A int Tensor of shape [batch_size], the start tokens. end_token (optional) – A int 0D Tensor, the token that marks end of decoding. **kwargs – Additional keyword arguments for constructing the helper. A helper instance.

## Connectors¶

### ConnectorBase¶

class texar.modules.ConnectorBase(output_size, hparams=None)[source]

Base class inherited by all connector classes. A connector is to transform inputs into outputs with any specified structure and shape. For example, tranforming the final state of an encoder to the initial state of a decoder, and performing stochastic sampling in between as in Variational Autoencoders (VAEs).

Parameters: output_size – Size of output excluding the batch dimension. For example, set output_size to dim to generate output of shape [batch_size, dim]. Can be an int, a tuple of int, a Tensorshape, or a tuple of TensorShapes. For example, to transform inputs to have decoder state size, set output_size=decoder.state_size. hparams (dict, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

output_size

The output size.

hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

### ConstantConnector¶

class texar.modules.ConstantConnector(output_size, hparams=None)[source]

Creates a constant Tensor or (nested) tuple of Tensors that contains a constant value.

Parameters: output_size – Size of output excluding the batch dimension. For example, set output_size to dim to generate output of shape [batch_size, dim]. Can be an int, a tuple of int, a Tensorshape, or a tuple of TensorShapes. For example, to transform inputs to have decoder state size, set output_size=decoder.state_size. hparams (dict, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.

This connector does not have trainable parameters. See _build() for the inputs and outputs of the connector.

Example

connector = Connector(cell.state_size)
zero_state = connector(batch_size=64, value=0.)
one_state = connector(batch_size=64, value=1.)

_build(batch_size, value=None)[source]

Creates output tensor(s) that has the given value.

Parameters: batch_size – An int or int scalar Tensor, the batch size. value (optional) – A scalar, the value that the output tensor(s) has. If None, “value” in hparams is used. A (structure of) tensor whose structure is the same as output_size, with value speicified by value or hparams.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
"value": 0.,
"name": "constant_connector"
}


Here:

“value” : float
The constant scalar that the output tensor(s) has. Ignored if value is given to _build().
“name” : str
Name of the connector.
hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

output_size

The output size.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

### ForwardConnector¶

class texar.modules.ForwardConnector(output_size, hparams=None)[source]

Transforms inputs to have specified structure.

Parameters: output_size – Size of output excluding the batch dimension. For example, set output_size to dim to generate output of shape [batch_size, dim]. Can be an int, a tuple of int, a Tensorshape, or a tuple of TensorShapes. For example, to transform inputs to have decoder state size, set output_size=decoder.state_size. hparams (dict, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.

This connector does not have trainable parameters. See _build() for the inputs and outputs of the connector.

The input to the connector must have the same structure with output_size, or must have the same number of elements and be re-packable into the structure of output_size. Note that if input is or contains a dict instance, the keys will be sorted to pack in deterministic order (See pack_sequence_as for more details).

Example

cell = LSTMCell(num_units=256)
# cell.state_size == LSTMStateTuple(c=256, h=256)

connector = ForwardConnector(cell.state_size)
output = connector([tensor_1, tensor_2])
# output == LSTMStateTuple(c=tensor_1, h=tensor_2)

_build(inputs)[source]

Transforms inputs to have the same structure as with output_size. Values of the inputs are not changed.

inputs must either have the same structure, or have the same number of elements with output_size.

Parameters: inputs – The input (structure of) tensor to pass forward. A (structure of) tensors that re-packs inputs to have the specified structure of output_size.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
"name": "forward_connector"
}


Here:

“name” : str
Name of the connector.
hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

output_size

The output size.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

### MLPTransformConnector¶

class texar.modules.MLPTransformConnector(output_size, hparams=None)[source]

Transforms inputs with an MLP layer and packs the results into the specified structure and size.

Parameters: output_size – Size of output excluding the batch dimension. For example, set output_size to dim to generate output of shape [batch_size, dim]. Can be an int, a tuple of int, a Tensorshape, or a tuple of TensorShapes. For example, to transform inputs to have decoder state size, set output_size=decoder.state_size. hparams (dict, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.

See _build() for the inputs and outputs of the connector.

The input to the connector can have arbitrary structure and size.

Example

cell = LSTMCell(num_units=256)
# cell.state_size == LSTMStateTuple(c=256, h=256)

connector = MLPTransformConnector(cell.state_size)
inputs = tf.zeros([64, 10])
output = connector(inputs)
# output == LSTMStateTuple(c=tensor_of_shape_(64, 256),
#                          h=tensor_of_shape_(64, 256))

## Use to connect encoder and decoder with different state size
encoder = UnidirectionalRNNEncoder(...)
_, final_state = encoder(inputs=...)

decoder = BasicRNNDecoder(...)
connector = MLPTransformConnector(decoder.state_size)

_ = decoder(
initial_state=connector(final_state),
...)

_build(inputs)[source]

Transforms inputs with an MLP layer and packs the results to have the same structure as specified by output_size.

Parameters: inputs – Input (structure of) tensors to be transformed. Must be a Tensor of shape [batch_size, …] or a (nested) tuple of such Tensors. That is, the first dimension of (each) tensor must be the batch dimension. A Tensor or a (nested) tuple of Tensors of the same structure of output_size.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
"activation_fn": "identity",
"name": "mlp_connector"
}


Here:

“activation_fn” : str or callable
The activation function applied to the outputs of the MLP transformation layer. Can be a function, or its name or module path.
“name” : str
Name of the connector.
hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

output_size

The output size.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

### ReparameterizedStochasticConnector¶

class texar.modules.ReparameterizedStochasticConnector(output_size, hparams=None)[source]

Samples from a distribution with reparameterization trick, and transforms samples into specified size.

Reparameterization allows gradients to be back-propagated through the stochastic samples. Used in, e.g., Variational Autoencoders (VAEs).

Parameters: output_size – Size of output excluding the batch dimension. For example, set output_size to dim to generate output of shape [batch_size, dim]. Can be an int, a tuple of int, a Tensorshape, or a tuple of TensorShapes. For example, to transform inputs to have decoder state size, set output_size=decoder.state_size. hparams (dict, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.

Example

cell = LSTMCell(num_units=256)
# cell.state_size == LSTMStateTuple(c=256, h=256)

connector = ReparameterizedStochasticConnector(cell.state_size)

kwargs = {
'loc': tf.zeros([batch_size, 10]),
'scale_diag': tf.ones([batch_size, 10])
}
output, sample = connector(distribution_kwargs=kwargs)
# output == LSTMStateTuple(c=tensor_of_shape_(batch_size, 256),
#                          h=tensor_of_shape_(batch_size, 256))
# sample == Tensor([batch_size, 10])

kwargs = {
'loc': tf.zeros([10]),
'scale_diag': tf.ones([10])
}
output_, sample_ = connector(distribution_kwargs=kwargs,
num_samples=batch_size_)
# output_ == LSTMStateTuple(c=tensor_of_shape_(batch_size_, 256),
#                           h=tensor_of_shape_(batch_size_, 256))
# sample == Tensor([batch_size_, 10])

_build(distribution='MultivariateNormalDiag', distribution_kwargs=None, transform=True, num_samples=None)[source]

Samples from a distribution and optionally performs transformation with an MLP layer.

The distribution must be reparameterizable, i.e., distribution.reparameterization_type = FULLY_REPARAMETERIZED.

Parameters: distribution – A instance of subclass of TF Distribution, or tensorflow_probability Distribution, Can be a class, its name or module path, or a class instance. distribution_kwargs (dict, optional) – Keyword arguments for the distribution constructor. Ignored if distribution is a class instance. transform (bool) – Whether to perform MLP transformation of the distribution samples. If False, the structure/shape of a sample must match output_size. num_samples (optional) – An int or int Tensor. Number of samples to generate. If not given, generate a single sample. Note that if batch size has already been included in distribution’s dimensionality, num_samples should be left as None. A tuple (output, sample), where output: A Tensor or a (nested) tuple of Tensors with the same structure and size of output_size. The batch dimension equals num_samples if specified, or is determined by the distribution dimensionality. sample: The sample from the distribution, prior to transformation. ValueError – If distribution cannot be reparametrized. ValueError – The output does not match output_size.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
"activation_fn": "identity",
"name": "reparameterized_stochastic_connector"
}


Here:

“activation_fn” : str
The activation function applied to the outputs of the MLP transformation layer. Can be a function, or its name or module path.
“name” : str
Name of the connector.
hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

output_size

The output size.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

### StochasticConnector¶

class texar.modules.StochasticConnector(output_size, hparams=None)[source]

Samples from a distribution and transforms samples into specified size.

The connector is the same as ReparameterizedStochasticConnector, except that here reparameterization is disabled, and thus the gradients cannot be back-propagated through the stochastic samples.

Parameters: output_size – Size of output excluding the batch dimension. For example, set output_size to dim to generate output of shape [batch_size, dim]. Can be an int, a tuple of int, a Tensorshape, or a tuple of TensorShapes. For example, to transform inputs to have decoder state size, set output_size=decoder.state_size. hparams (dict, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.
_build(distribution='MultivariateNormalDiag', distribution_kwargs=None, transform=False, num_samples=None)[source]

Samples from a distribution and optionally performs transformation with an MLP layer.

The inputs and outputs are the same as ReparameterizedStochasticConnector except that the distribution does not need to be reparameterizable, and gradient cannot be back-propagate through the samples.

Parameters: distribution – A instance of subclass of TF Distribution, or tensorflow_probability Distribution. Can be a class, its name or module path, or a class instance. distribution_kwargs (dict, optional) – Keyword arguments for the distribution constructor. Ignored if distribution is a class instance. transform (bool) – Whether to perform MLP transformation of the distribution samples. If False, the structure/shape of a sample must match output_size. num_samples (optional) – An int or int Tensor. Number of samples to generate. If not given, generate a single sample. Note that if batch size has already been included in distribution’s dimensionality, num_samples should be left as None. A tuple (output, sample), where output: A Tensor or a (nested) tuple of Tensors with the same structure and size of output_size. The batch dimension equals num_samples if specified, or is determined by the distribution dimensionality. sample: The sample from the distribution, prior to transformation. ValueError – The output does not match output_size.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
"activation_fn": "identity",
"name": "stochastic_connector"
}


Here:

“activation_fn” : str
The activation function applied to the outputs of the MLP transformation layer. Can be a function, or its name or module path.
“name” : str
Name of the connector.
hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

output_size

The output size.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

## Classifiers¶

### Conv1DClassifier¶

class texar.modules.Conv1DClassifier(hparams=None)[source]

Simple Conv-1D classifier. This is a combination of the Conv1DEncoder with a classification layer.

Parameters: hparams (dict, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.

Example

clas = Conv1DClassifier(hparams={'num_classes': 10})

inputs = tf.random_uniform([64, 20, 256])
logits, pred = clas(inputs)
# logits == Tensor of shape [64, 10]
# pred   == Tensor of shape [64]

_build(inputs, sequence_length=None, dtype=None, mode=None)[source]

Feeds the inputs through the network and makes classification.

The arguments are the same as in Conv1DEncoder.

The predictions of binary classification (“num_classes”=1) and multi-way classification (“num_classes”>1) are different, as explained below.

Parameters: inputs – The inputs to the network, which is a 3D tensor. See Conv1DEncoder for more details. sequence_length (optional) – An int tensor of shape [batch_size] containing the length of each element in inputs. If given, time steps beyond the length will first be masked out before feeding to the layers. dtype (optional) – Type of the inputs. If not provided, infers from inputs automatically. mode (optional) – A tensor taking value in tf.estimator.ModeKeys, including TRAIN, EVAL, and PREDICT. If None, texar.global_mode() is used. A tuple (logits, pred), where logits is a Tensor of shape [batch_size, num_classes] for num_classes >1, and [batch_size] for num_classes =1 (i.e., binary classification). pred is the prediction, a Tensor of shape [batch_size] and type tf.int64. For binary classification, the standard sigmoid function is used for prediction, and the class labels are {0, 1}.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
# (1) Same hyperparameters as in Conv1DEncoder
...

"num_classes": 2,
"logit_layer_kwargs": {
"use_bias": False
},
"name": "conv1d_classifier"
}


Here:

1. Same hyperparameters as in Conv1DEncoder. See the default_hparams(). An instance of Conv1DEncoder is created for feature extraction.

“num_classes” : int

Number of classes:

• If > 0, an additional Dense layer is appended to the encoder to compute the logits over classes.
• If <= 0, no dense layer is appended. The number of classes is assumed to be the final dense layer size of the encoder.
“logit_layer_kwargs” : dict

Keyword arguments for the logit Dense layer constructor, except for argument “units” which is set to “num_classes”. Ignored if no extra logit layer is appended.

“name” : str

Name of the classifier.

trainable_variables

The list of trainable variables of the module.

num_classes

The number of classes.

nn

The classifier neural network.

has_layer(layer_name)[source]

Returns True if the network with the name exists. Returns False otherwise.

Parameters: layer_name (str) – Name of the layer.
layer_by_name(layer_name)[source]

Returns the layer with the name. Returns ‘None’ if the layer name does not exist.

Parameters: layer_name (str) – Name of the layer.
layers_by_name

A dictionary mapping layer names to the layers.

layers

A list of the layers.

layer_names

A list of uniquified layer names.

layer_outputs_by_name(layer_name)[source]

Returns the output tensors of the layer with the specified name. Returns None if the layer name does not exist.

Parameters: layer_name (str) – Name of the layer.
layer_outputs

A list containing output tensors of each layer.

hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

variable_scope

The variable scope of the module.

### UnidirectionalRNNClassifier¶

class texar.modules.UnidirectionalRNNClassifier(cell=None, cell_dropout_mode=None, output_layer=None, hparams=None)[source]

One directional RNN classifier. This is a combination of the UnidirectionalRNNEncoder with a classification layer. Both step-wise classification and sequence-level classification are supported, specified in hparams.

Arguments are the same as in UnidirectionalRNNEncoder.

Parameters: cell – (RNNCell, optional) If not specified, a cell is created as specified in hparams["rnn_cell"]. cell_dropout_mode (optional) – A Tensor taking value of tf.estimator.ModeKeys, which toggles dropout in the RNN cell (e.g., activates dropout in TRAIN mode). If None, global_mode() is used. Ignored if cell is given. output_layer (optional) – An instance of tf.layers.Layer. Applies to the RNN cell output of each step. If None (default), the output layer is created as specified in hparams["output_layer"]. hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.
_build(inputs, sequence_length=None, initial_state=None, time_major=False, mode=None, **kwargs)[source]

Feeds the inputs through the network and makes classification.

The arguments are the same as in UnidirectionalRNNEncoder.

Parameters: inputs – A 3D Tensor of shape [batch_size, max_time, dim]. The first two dimensions batch_size and max_time may be exchanged if time_major=True is specified. sequence_length (optional) – A 1D int tensor of shape [batch_size]. Sequence lengths of the batch inputs. Used to copy-through state and zero-out outputs when past a batch element’s sequence length. initial_state (optional) – Initial state of the RNN. time_major (bool) – The shape format of the inputs and outputs Tensors. If True, these tensors are of shape [max_time, batch_size, depth]. If False (default), these tensors are of shape [batch_size, max_time, depth]. mode (optional) – A tensor taking value in tf.estimator.ModeKeys, including TRAIN, EVAL, and PREDICT. Controls output layer dropout if the output layer is specified with hparams. If None (default), texar.global_mode() is used. return_cell_output (bool) – Whether to return the output of the RNN cell. This is the results prior to the output layer. **kwargs – Optional keyword arguments of tf.nn.dynamic_rnn, such as swap_memory, dtype, parallel_iterations, etc. A tuple (logits, pred), containing the logits over classes and the predictions, respectively. If “clas_strategy”==”final_time” or “all_time” If “num_classes”==1, logits and pred are of both shape [batch_size] If “num_classes”>1, logits is of shape [batch_size, num_classes] and pred is of shape [batch_size]. If “clas_strategy”==”time_wise”, If “num_classes”==1, logits and pred are of both shape [batch_size, max_time] If “num_classes”>1, logits is of shape [batch_size, max_time, num_classes] and pred is of shape [batch_size, max_time]. If time_major is True, the batch and time dimensions are exchanged.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
# (1) Same hyperparameters as in UnidirectionalRNNEncoder
...

"num_classes": 2,
"logit_layer_kwargs": None,
"clas_strategy": "final_time",
"max_seq_length": None,
"name": "unidirectional_rnn_classifier"
}


Here:

1. Same hyperparameters as in UnidirectionalRNNEncoder. See the default_hparams(). An instance of UnidirectionalRNNEncoder is created for feature extraction.

“num_classes” : int

Number of classes:

• If > 0, an additional Dense layer is appended to the encoder to compute the logits over classes.
• If <= 0, no dense layer is appended. The number of classes is assumed to be the final dense layer size of the encoder.
“logit_layer_kwargs” : dict

Keyword arguments for the logit Dense layer constructor, except for argument “units” which is set to “num_classes”. Ignored if no extra logit layer is appended.

“clas_strategy” : str

The classification strategy, one of:

• “final_time”: Sequence-leve classification based on the output of the final time step. One sequence has one class.
• “all_time”: Sequence-level classification based on the output of all time steps. One sequence has one class.
• “time_wise”: Step-wise classfication, i.e., make classification for each time step based on its output.
“max_seq_length” : int, optional

Maximum possible length of input sequences. Required if “clas_strategy” is “all_time”.

“name” : str

Name of the classifier.

hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

num_classes

The number of classes, specified in hparams.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

## Networks¶

### FeedForwardNetworkBase¶

class texar.modules.FeedForwardNetworkBase(hparams=None)[source]

Base class inherited by all feed-forward network classes.

Parameters: hparams (dict, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.

See _build() for the inputs and outputs.

static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
"name": "NN"
}

append_layer(layer)[source]

Appends a layer to the end of the network. The method is only feasible before _build is called.

Parameters: layer – A tf.layers.Layer instance, or a dict of layer hyperparameters.
has_layer(layer_name)[source]

Returns True if the network with the name exists. Returns False otherwise.

Parameters: layer_name (str) – Name of the layer.
layer_by_name(layer_name)[source]

Returns the layer with the name. Returns ‘None’ if the layer name does not exist.

Parameters: layer_name (str) – Name of the layer.
layers_by_name

A dictionary mapping layer names to the layers.

layers

A list of the layers.

layer_names

A list of uniquified layer names.

layer_outputs_by_name(layer_name)[source]

Returns the output tensors of the layer with the specified name. Returns None if the layer name does not exist.

Parameters: layer_name (str) – Name of the layer.
layer_outputs

A list containing output tensors of each layer.

hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

### FeedForwardNetwork¶

class texar.modules.FeedForwardNetwork(layers=None, hparams=None)[source]

Feed-forward neural network that consists of a sequence of layers.

Parameters: layers (list, optional) – A list of Layer instances composing the network. If not given, layers are created according to hparams. hparams (dict, optional) – Embedder hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.

See _build() of FeedForwardNetworkBase for the inputs and outputs.

Example

hparams = { # Builds a two-layer dense NN
"layers": [
{ "type": "Dense", "kwargs": { "units": 256 },
{ "type": "Dense", "kwargs": { "units": 10 }
]
}
nn = FeedForwardNetwork(hparams=hparams)

inputs = tf.random_uniform([64, 100])
outputs = nn(inputs)
# outputs == Tensor of shape [64, 10]

static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
"layers": [],
"name": "NN"
}


Here:

“layers” : list
A list of layer hyperparameters. See get_layer() for the details of layer hyperparameters.
“name” : str
Name of the network.
append_layer(layer)

Appends a layer to the end of the network. The method is only feasible before _build is called.

Parameters: layer – A tf.layers.Layer instance, or a dict of layer hyperparameters.
has_layer(layer_name)

Returns True if the network with the name exists. Returns False otherwise.

Parameters: layer_name (str) – Name of the layer.
hparams

An HParams instance. The hyperparameters of the module.

layer_by_name(layer_name)

Returns the layer with the name. Returns ‘None’ if the layer name does not exist.

Parameters: layer_name (str) – Name of the layer.
layer_names

A list of uniquified layer names.

layer_outputs

A list containing output tensors of each layer.

layer_outputs_by_name(layer_name)

Returns the output tensors of the layer with the specified name. Returns None if the layer name does not exist.

Parameters: layer_name (str) – Name of the layer.
layers

A list of the layers.

layers_by_name

A dictionary mapping layer names to the layers.

name

The uniquified name of the module.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

### Conv1DNetwork¶

class texar.modules.Conv1DNetwork(hparams=None)[source]

Simple Conv-1D network which consists of a sequence of conv layers followed with a sequence of dense layers.

Parameters: hparams (dict, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.

See _build() for the inputs and outputs. The inputs must be a 3D Tensor of shape [batch_size, length, channels] (default), or [batch_size, channels, length] (if data_format is set to ‘channels_last’ through hparams). For example, for sequence classification, length corresponds to time steps, and channels corresponds to embedding dim.

Example

nn = Conv1DNetwork() # Use the default structure

inputs = tf.random_uniform([64, 20, 256])
outputs = nn(inputs)
# outputs == Tensor of shape [64, 128], cuz the final dense layer
# has size 128.

_build(inputs, sequence_length=None, dtype=None, mode=None)[source]

Feeds forward inputs through the network layers and returns outputs.

Parameters: inputs – The inputs to the network, which is a 3D tensor. sequence_length (optional) – An int tensor of shape [batch_size] containing the length of each element in inputs. If given, time steps beyond the length will first be masked out before feeding to the layers. dtype (optional) – Type of the inputs. If not provided, infers from inputs automatically. mode (optional) – A tensor taking value in tf.estimator.ModeKeys, including TRAIN, EVAL, and PREDICT. If None, texar.global_mode() is used. The output of the final layer.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
# (1) Conv layers
"num_conv_layers": 1,
"filters": 128,
"kernel_size": [3, 4, 5],
"conv_activation": "relu",
"conv_activation_kwargs": None,
"other_conv_kwargs": None,
# (2) Pooling layers
"pooling": "MaxPooling1D",
"pool_size": None,
"pool_strides": 1,
"other_pool_kwargs": None,
# (3) Dense layers
"num_dense_layers": 1,
"dense_size": 128,
"dense_activation": "identity",
"dense_activation_kwargs": None,
"final_dense_activation": None,
"final_dense_activation_kwargs": None,
"other_dense_kwargs": None,
# (4) Dropout
"dropout_conv": [1],
"dropout_dense": [],
"dropout_rate": 0.75,
# (5) Others
"name": "conv1d_network",
}


Here:

1. For convolutional layers:

“num_conv_layers” : int

Number of convolutional layers.

“filters” : int or list

The number of filters in the convolution, i.e., the dimensionality of the output space. If “num_conv_layers” > 1, “filters” must be a list of “num_conv_layers” integers.

“kernel_size” : int or list

Lengths of 1D convolution windows.

• If “num_conv_layers” == 1, this can be a list of arbitrary number of int denoting different sized conv windows. The number of filters of each size is specified by “filters”. For example, the default values will create 3 sets of filters, each of which has kernel size of 3, 4, and 5, respectively, and has filter number 128.
• If “num_conv_layers” > 1, this must be a list of length “num_conv_layers”. Each element can be an int or a list of arbitrary number of int denoting the kernel size of respective layer.
“conv_activation”: str or callable

Activation function applied to the output of the convolutional layers. Set to “indentity” to maintain a linear activation. See get_activation_fn() for more details.

“conv_activation_kwargs” : dict, optional

Keyword arguments for conv layer activation functions. See get_activation_fn() for more details.

“other_conv_kwargs” : dict, optional

Other keyword arguments for tf.layers.Conv1D constructor, e.g., “data_format”, “padding”, etc.

2. For pooling layers:

“pooling” : str or class or instance

Pooling layer after each of the convolutional layer(s). Can a pooling layer class, its name or module path, or a class instance.

“pool_size” : int or list, optional

Size of the pooling window. If an int, all pooling layer will have the same pool size. If a list, the list length must equal “num_conv_layers”. If None and the pooling type is either MaxPooling or AveragePooling, the pool size will be set to input size. That is, the output of the pooling layer is a single unit.

“pool_strides” : int or list, optional

Strides of the pooling operation. If an int, all pooling layer will have the same stride. If a list, the list length must equal “num_conv_layers”.

“other_pool_kwargs” : dict, optional

Other keyword arguments for pooling layer class constructor.

3. For dense layers (note that here dense layers always follow conv and pooling layers):

“num_dense_layers” : int

Number of dense layers.

“dense_size” : int or list

Number of units of each dense layers. If an int, all dense layers will have the same size. If a list of int, the list length must equal “num_dense_layers”.

“dense_activation” : str or callable

Activation function applied to the output of the dense layers except the last dense layer output . Set to “indentity” to maintain a linear activation. See get_activation_fn() for more details.

“dense_activation_kwargs” : dict, optional

Keyword arguments for dense layer activation functions before the last dense layer. See get_activation_fn() for more details.

“final_dense_activation” : str or callable

Activation function applied to the output of the last dense layer. Set to None or “indentity” to maintain a linear activation. See get_activation_fn() for more details.

“final_dense_activation_kwargs” : dict, optional

Keyword arguments for the activation function of last dense layer. See get_activation_fn() for more details.

“other_dense_kwargs” : dict, optional

Other keyword arguments for Dense layer class constructor.

4. For dropouts:

“dropout_conv” : int or list

The indexes of conv layers (starting from 0) whose inputs are applied with dropout. The index = num_conv_layers means dropout applies to the final conv layer output. E.g.,

{
"num_conv_layers": 2,
"dropout_conv": [0, 2]
}


will leads to a series of layers as -dropout-conv0-conv1-dropout-.

The dropout mode (training or not) is controlled by the mode argument of _build().

“dropout_dense” : int or list

Same as “dropout_conv” but applied to dense layers (index starting from 0).

“dropout_rate” : float

The dropout rate, between 0 and 1. E.g., “dropout_rate”: 0.1 would drop out 10% of elements.

5. Others:

“name” : str

Name of the network.

append_layer(layer)

Appends a layer to the end of the network. The method is only feasible before _build is called.

Parameters: layer – A tf.layers.Layer instance, or a dict of layer hyperparameters.
has_layer(layer_name)

Returns True if the network with the name exists. Returns False otherwise.

Parameters: layer_name (str) – Name of the layer.
hparams

An HParams instance. The hyperparameters of the module.

layer_by_name(layer_name)

Returns the layer with the name. Returns ‘None’ if the layer name does not exist.

Parameters: layer_name (str) – Name of the layer.
layer_names

A list of uniquified layer names.

layer_outputs

A list containing output tensors of each layer.

layer_outputs_by_name(layer_name)

Returns the output tensors of the layer with the specified name. Returns None if the layer name does not exist.

Parameters: layer_name (str) – Name of the layer.
layers

A list of the layers.

layers_by_name

A dictionary mapping layer names to the layers.

name

The uniquified name of the module.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

## Memory¶

### MemNetBase¶

class texar.modules.MemNetBase(raw_memory_dim, input_embed_fn=None, output_embed_fn=None, query_embed_fn=None, hparams=None)[source]

Base class inherited by all memory network classes.

Parameters: raw_memory_dim (int) – Dimension size of raw memory entries (before embedding). For example, if a raw memory entry is a word, this is the vocabulary size (imagine a one-hot representation of word). If a raw memory entry is a dense vector, this is the dimension size of the vector. input_embed_fn (optional) – A callable that embeds raw memory entries as inputs. This corresponds to the A embedding operation in (Sukhbaatar et al.) If not provided, a default embedding operation is created as specified in hparams. See get_default_embed_fn() for details. output_embed_fn (optional) – A callable that embeds raw memory entries as outputs. This corresponds to the C embedding operation in (Sukhbaatar et al.) If not provided, a default embedding operation is created as specified in hparams. See get_default_embed_fn() for details. query_embed_fn (optional) – A callable that embeds query. This corresponds to the B embedding operation in (Sukhbaatar et al.). If not provided and “use_B” is True in hparams, a default embedding operation is created as specified in hparams. See get_default_embed_fn() for details. Notice: If you’d like to customize this callable, please follow the same number and style of dimensions as in input_embed_fn or output_embed_fn, and assume that the 2nd dimension of its input and output (which corresponds to memory_size) is 1. hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.
get_default_embed_fn(memory_size, embed_fn_hparams)[source]

Creates a default embedding function. Can be used for A, C, or B operation.

For B operation (i.e., query_embed_fn), memory_size must be 1.

The function is a combination of both memory embedding and temporal embedding, with the combination method specified by “combine_mode” in the embed_fn_hparams.

Parameters:embed_fn_hparams (dict or HParams) – Hyperparameter of the embedding function. See default_memnet_embed_fn() for details.
Returns:A tuple (embed_fn, memory_dim), where
• memory_dim is the dimension of memory entry embedding, inferred from embed_fn_hparams.
• If combine_mode == ‘add’, memory_dim is the embedder dimension.
• If combine_mode == ‘concat’, memory_dim is the sum of the memory embedder dimension and the temporal embedder dimension.
• embed_fn is an embedding function that takes in memory and returns memory embedding. Specifically, the function has signature memory_embedding= embed_fn(memory=None, soft_memory=None) where one of memory and soft_memory is provided (but not both).
param memory: param soft_memory: An int Tensor of shape [batch_size, memory_size] containing memory indexes used for embedding lookup. A Tensor of shape [batch_size, memory_size, raw_memory_dim] containing soft weights used to mix the embedding vectors. A Tensor of shape [batch_size, memory_size, memory_dim] containing the memory entry embeddings.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
"n_hops": 1,
"memory_dim": 100,
"relu_dim": 50,
"memory_size": 100,
"A": default_embed_fn_hparams,
"C": default_embed_fn_hparams,
"B": default_embed_fn_hparams,
"use_B": False,
"use_H": False,
"dropout_rate": 0,
"variational": False,
"name": "memnet",
}


Here:

“n_hops” : int
Number of hops.
“memory_dim” : int
Memory dimension, i.e., the dimension size of a memory entry embedding. Ignored if at least one of the embedding functions is created according to hparams. In this case memory_dim is inferred from the created embed_fn.
“relu_dim” : int
Number of elements in memory_dim that have relu at the end of each hop. Should be not less than 0 and not more than :attrmemory_dim.
“memory_size” : int

Number of entries in memory.

For example, the number of sentences {x_i} in Fig.1(a) of (Sukhbaatar et al.) End-To-End Memory Networks.

“use_B” : bool
Whether to create the query embedding function. Ignored if query_embed_fn is given to the constructor.
“use_H” : bool
Whether to perform a linear transformation with matrix H at the end of each A-C layer.
“dropout_rate” : float
The dropout rate to apply to the output of each hop. Should be between 0 and 1. E.g., dropout_rate=0.1 would drop out 10% of the units.
“variational” : bool
Whether to share dropout masks after each hop.
memory_size

The memory size.

raw_memory_dim

The dimension of memory element (or vocabulary size).

memory_dim

The dimension of embedded memory and all vectors in hops.

hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

### MemNetRNNLike¶

class texar.modules.MemNetRNNLike(raw_memory_dim, input_embed_fn=None, output_embed_fn=None, query_embed_fn=None, hparams=None)[source]

An implementation of multi-layer end-to-end memory network, with RNN-like weight tying described in (Sukhbaatar et al.) End-To-End Memory Networks .

See get_default_embed_fn() for default embedding functions. Customized embedding functions must follow the same signature.

Parameters: raw_memory_dim (int) – Dimension size of raw memory entries (before embedding). For example, if a raw memory entry is a word, this is the vocabulary size (imagine a one-hot representation of word). If a raw memory entry is a dense vector, this is the dimension size of the vector. input_embed_fn (optional) – A callable that embeds raw memory entries as inputs. This corresponds to the A embedding operation in (Sukhbaatar et al.) If not provided, a default embedding operation is created as specified in hparams. See get_default_embed_fn() for details. output_embed_fn (optional) – A callable that embeds raw memory entries as outputs. This corresponds to the C embedding operation in (Sukhbaatar et al.) If not provided, a default embedding operation is created as specified in hparams. See get_default_embed_fn() for details. query_embed_fn (optional) – A callable that embeds query. This corresponds to the B embedding operation in (Sukhbaatar et al.). If not provided and “use_B” is True in hparams, a default embedding operation is created as specified in hparams. See get_default_embed_fn() for details. For customized query_embed_fn, note that the function must follow the signature of the default embed_fn where memory_size must be 1. hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
"n_hops": 1,
"memory_dim": 100,
"relu_dim": 50,
"memory_size": 100,
"A": default_embed_fn_hparams,
"C": default_embed_fn_hparams,
"B": default_embed_fn_hparams,
"use_B": False,
"use_H": True,
"dropout_rate": 0,
"variational": False,
"name": "memnet_rnnlike",
}


Here:

“n_hops” : int
Number of hops.
“memory_dim” : int
Memory dimension, i.e., the dimension size of a memory entry embedding. Ignored if at least one of the embedding functions is created according to hparams. In this case memory_dim is inferred from the created embed_fn.
“relu_dim” : int
Number of elements in memory_dim that have relu at the end of each hop. Should be not less than 0 and not more than :attrmemory_dim.
“memory_size” : int

Number of entries in memory.

For example, the number of sentences {x_i} in Fig.1(a) of (Sukhbaatar et al.) End-To-End Memory Networks.

“use_B” : bool
Whether to create the query embedding function. Ignored if query_embed_fn is given to the constructor.
“use_H” : bool
Whether to perform a linear transformation with matrix H at the end of each A-C layer.
“dropout_rate” : float
The dropout rate to apply to the output of each hop. Should be between 0 and 1. E.g., dropout_rate=0.1 would drop out 10% of the units.
“variational” : bool
Whether to share dropout masks after each hop.
hparams

An HParams instance. The hyperparameters of the module.

memory_dim

The dimension of embedded memory and all vectors in hops.

memory_size

The memory size.

name

The uniquified name of the module.

raw_memory_dim

The dimension of memory element (or vocabulary size).

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

### default_memnet_embed_fn_hparams¶

texar.modules.default_memnet_embed_fn_hparams()[source]

Returns a dictionary of hyperparameters with default hparams for default_embed_fn()

{
"embedding": {
"dim": 100
},
"temporal_embedding": {
"dim": 100
},
}


Here:

“embedding” : dict, optional
Hyperparameters for embedding operations. See default_hparams() of WordEmbedder for details. If None, the default hyperparameters are used.
“temporal_embedding” : dict, optional
Hyperparameters for temporal embedding operations. See default_hparams() of PositionEmbedder for details. If None, the default hyperparameters are used.
“combine_mode” : str
Either ‘add’ or ‘concat’. If ‘add’, memory embedding and temporal embedding are added up. In this case the two embedders must have the same dimension. If ‘concat’, the two embeddings are concated.

## Policy¶

### PolicyNetBase¶

class texar.modules.PolicyNetBase(network=None, network_kwargs=None, hparams=None)[source]

Policy net that takes in states and outputs actions.

Parameters: network (optional) – A network that takes in state and returns outputs for generating actions. For example, an instance of subclass of FeedForwardNetworkBase. If None, a network is created as specified in hparams. network_kwargs (dict, optional) – Keyword arguments for network constructor. Note that the hparams argument for network constructor is specified in the “network_hparams” field of hparams and should not be included in network_kwargs. Ignored if network is given. hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
'network_type': 'FeedForwardNetwork',
'network_hparams': {
'layers': [
{
'type': 'Dense',
'kwargs': {'units': 256, 'activation': 'relu'}
},
{
'type': 'Dense',
'kwargs': {'units': 256, 'activation': 'relu'}
},
]
},
'distribution_kwargs': None,
'name': 'policy_net',
}


Here:

“network_type” : str or class or instance
A network that takes in state and returns outputs for generating actions. This can be a class, its name or module path, or a class instance. Ignored if network is given to the constructor.
“network_hparams” : dict

Hyperparameters for the network. With the network_kwargs argument to the constructor, a network is created with network_class(**network_kwargs, hparams=network_hparams).

For example, the default values creates a two-layer dense network.

“distribution_kwargs” : dict, optional
Keyword arguments for distribution constructor. A distribution would be created for action sampling.
“name” : str
Name of the policy.
network

The network.

hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

### CategoricalPolicyNet¶

class texar.modules.CategoricalPolicyNet(action_space=None, network=None, network_kwargs=None, hparams=None)[source]

Policy net with Categorical distribution for discrete scalar actions.

This is a combination of a network with a top-layer distribution for action sampling.

Parameters: action_space (optional) – An instance of Space specifying the action space. If not given, an discrete action space [0, high] is created with high specified in hparams. network (optional) – A network that takes in state and returns outputs for generating actions. For example, an instance of subclass of FeedForwardNetworkBase. If None, a network is created as specified in hparams. network_kwargs (dict, optional) – Keyword arguments for network constructor. Note that the hparams argument for network constructor is specified in the “network_hparams” field of hparams and should not be included in network_kwargs. Ignored if network is given. hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.
_build(inputs, mode=None)[source]

Takes in states and outputs actions.

Parameters: inputs – Inputs to the policy network with the first dimension the batch dimension. mode (optional) – A tensor taking value in tf.estimator.ModeKeys, including TRAIN, EVAL, and PREDICT. If None, texar.global_mode() is used.
Returns

A dict including fields “logits”, “action”, and “dist”, where

• “logits”: A Tensor of shape [batch_size] + action_space size used for categorical distribution sampling.
• “action”: A Tensor of shape [batch_size] + action_space.shape.
• “dist”: The Categorical based on the logits.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
'network_type': 'FeedForwardNetwork',
'network_hparams': {
'layers': [
{
'type': 'Dense',
'kwargs': {'units': 256, 'activation': 'relu'}
},
{
'type': 'Dense',
'kwargs': {'units': 256, 'activation': 'relu'}
},
]
},
'distribution_kwargs': {
'dtype': 'int32',
'validate_args': False,
'allow_nan_stats': True
},
'action_space': 2,
'make_output_layer': True,
'name': 'categorical_policy_net'
}


Here:

“distribution_kwargs” : dict
Keyword arguments for the Categorical distribution constructor. Arguments logits and probs should not be included as they are inferred from the inputs. Argument dtype can be a string (e.g., int32) and will be converted to a corresponding tf dtype.
“action_space” : int
Upper bound of the action space. The resulting action space is all discrete scalar numbers between 0 and the upper bound specified here (both inclusive).
“make_output_layer” : bool
Whether to append a dense layer to the network to transform features to logits for action sampling. If False, the final layer output of network must match the action space.

See default_hparams for details of other hyperparameters.

action_space

An instance of Space specifiying the action space.

hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

network

The network.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

## Q-Nets¶

### QNetBase¶

class texar.modules.QNetBase(network=None, network_kwargs=None, hparams=None)[source]

Base class inheritted by all Q net classes. A Q net takes in states and outputs Q value of actions.

Parameters: network (optional) – A network that takes in state and returns Q values. For example, an instance of subclass of FeedForwardNetworkBase. If None, a network is created as specified in hparams. network_kwargs (dict, optional) – Keyword arguments for network constructor. Note that the hparams argument for network constructor is specified in the “network_hparams” field of hparams and should not be included in network_kwargs. Ignored if network is given. hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
'network_type': 'FeedForwardNetwork',
'network_hparams': {
'layers': [
{
'type': 'Dense',
'kwargs': {'units': 256, 'activation': 'relu'}
},
{
'type': 'Dense',
'kwargs': {'units': 256, 'activation': 'relu'}
},
]
},
'name': 'q_net',
}


Here:

“network_type” : str or class or instance
A network that takes in state and returns outputs for generating actions. This can be a class, its name or module path, or a class instance. Ignored if network is given to the constructor.
“network_hparams” : dict

Hyperparameters for the network. With the network_kwargs argument to the constructor, a network is created with network_class(**network_kwargs, hparams=network_hparams).

For example, the default values creates a two-layer dense network.

“name” : str
Name of the Q net.
network

The network.

hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

### CategoricalPolicyNet¶

class texar.modules.CategoricalQNet(action_space=None, network=None, network_kwargs=None, hparams=None)[source]

Q net with categorical scalar action space.

Parameters: action_space (optional) – An instance of Space specifying the action space. If not given, an discrete action space [0, high] is created with high specified in hparams. network (optional) – A network that takes in state and returns Q values. For example, an instance of subclass of FeedForwardNetworkBase. If None, a network is created as specified in hparams. network_kwargs (dict, optional) – Keyword arguments for network constructor. Note that the hparams argument for network constructor is specified in the “network_hparams” field of hparams and should not be included in network_kwargs. Ignored if network is given. hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.
_build(inputs, mode=None)[source]

Takes in states and outputs Q values.

Parameters: inputs – Inputs to the Q net with the first dimension the batch dimension. mode (optional) – A tensor taking value in tf.estimator.ModeKeys, including TRAIN, EVAL, and PREDICT. If None, texar.global_mode() is used.
Returns

A dict including fields “qvalues”. where

• “qvalues”: A Tensor of shape [batch_size] + action_space size containing Q values of all possible actions.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
'network_type': 'FeedForwardNetwork',
'network_hparams': {
'layers': [
{
'type': 'Dense',
'kwargs': {'units': 256, 'activation': 'relu'}
},
{
'type': 'Dense',
'kwargs': {'units': 256, 'activation': 'relu'}
},
]
},
'action_space': 2,
'make_output_layer': True,
'name': 'q_net'
}


Here:

“action_space” : int
Upper bound of the action space. The resulting action space is all discrete scalar numbers between 0 and the upper bound specified here (both inclusive).
“make_output_layer” : bool
Whether to append a dense layer to the network to transform features to Q values. If False, the final layer output of network must match the action space.

See default_hparams for details of other hyperparameters.

action_space

An instance of Space specifiying the action space.

hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

network

The network.

trainable_variables

The list of trainable variables of the module.

variable_scope`

The variable scope of the module.