Modules

ModuleBase

class texar.tf.ModuleBase(hparams=None)[source]

Base class inherited by modules that create Variables and are configurable through hyperparameters.

A Texar module inheriting ModuleBase has following key features:

  • Convenient variable re-use: A module instance creates its own sets of variables, and automatically re-uses its variables on subsequent calls. Hence TF variable/name scope is transparent to users. For example:

    encoder = UnidirectionalRNNEncoder(hparams) # create instance
    output_1 = encoder(inputs_1) # variables are created
    output_2 = encoder(inputs_2) # variables are re-used
    
    print(encoder.trainable_variables) # access trainable variables
    # [ ... ]
    
  • Configurable through hyperparameters: Each module defines allowed hyperparameters and default values. Hyperparameters not specified by users will take default values.

  • Callable: As the above example, a module instance is “called” with input tensors and returns output tensors. Every call of a module will add ops to the Graph to perform the module’s logic.

Parameters:hparams (dict, optional) – Hyperparameters of the module. See default_hparams() for the structure and default values.
_build(*args, **kwargs)[source]

Subclass must implement this method to build the logic.

Parameters:
  • *args – Arguments.
  • **kwargs – Keyword arguments.
Returns:

Output Tensor(s).

static default_hparams()[source]

Returns a dict of hyperparameters of the module with default values. Used to replace the missing values of input hparams during module construction.

{
    "name": "module"
}
variable_scope

The variable scope of the module.

name

The uniquified name of the module.

trainable_variables

The list of trainable variables of the module.

hparams

An HParams instance. The hyperparameters of the module.

Embedders

WordEmbedder

class texar.tf.modules.WordEmbedder(init_value=None, vocab_size=None, hparams=None)[source]

Simple word embedder that maps indexes into embeddings. The indexes can be soft (e.g., distributions over vocabulary).

Either init_value or vocab_size is required. If both are given, there must be init_value.shape[0]==vocab_size.

Parameters:
  • init_value (optional) –

    A Tensor or numpy array that contains the initial value of embeddings. It is typically of shape [vocab_size] + embedding-dim. Embedding can have dimensionality > 1.

    If None, embedding is initialized as specified in hparams["initializer"]. Otherwise, the "initializer" and "dim" hyperparameters in hparams are ignored.

  • vocab_size (int, optional) – The vocabulary size. Required if init_value is not given.
  • hparams (dict, optional) – Embedder hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter structure and default values.

See _build() for the inputs and outputs of the embedder.

Example

ids = tf.random_uniform(shape=[32, 10], maxval=10, dtype=tf.int64)
soft_ids = tf.random_uniform(shape=[32, 10, 100])

embedder = WordEmbedder(vocab_size=100, hparams={'dim': 256})
ids_emb = embedder(ids=ids) # shape: [32, 10, 256]
soft_ids_emb = embedder(soft_ids=soft_ids) # shape: [32, 10, 256]
# Use with Texar data module
hparams={
    'dataset': {
        'embedding_init': {'file': 'word2vec.txt'}
        ...
    },
}
data = MonoTextData(data_params)
iterator = DataIterator(data)
batch = iterator.get_next()

# Use data vocab size
embedder_1 = WordEmbedder(vocab_size=data.vocab.size)
emb_1 = embedder_1(batch['text_ids'])

# Use pre-trained embedding
embedder_2 = WordEmbedder(init_value=data.embedding_init_value)
emb_2 = embedder_2(batch['text_ids'])
_build(ids=None, soft_ids=None, mode=None, **kwargs)[source]

Embeds (soft) ids.

Either ids or soft_ids must be given, and they must not be given at the same time.

Parameters:
  • ids (optional) – An integer tensor containing the ids to embed.
  • soft_ids (optional) – A tensor of weights (probabilities) used to mix the embedding vectors.
  • mode (optional) – A tensor taking value in tf.estimator.ModeKeys, including TRAIN, EVAL, and PREDICT. If None, dropout is controlled by texar.tf.global_mode().
  • kwargs – Additional keyword arguments for tf.nn.embedding_lookup besides params and ids.
Returns:

If ids is given, returns a Tensor of shape shape(ids) + embedding-dim. For example, if shape(ids) = [batch_size, max_time] and shape(embedding) = [vocab_size, emb_dim], then the return tensor has shape [batch_size, max_time, emb_dim].

If soft_ids is given, returns a Tensor of shape shape(soft_ids)[:-1] + embdding-dim. For example, if shape(soft_ids) = [batch_size, max_time, vocab_size] and shape(embedding) = [vocab_size, emb_dim], then the return tensor has shape [batch_size, max_time, emb_dim].

static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
    "dim": 100,
    "dropout_rate": 0,
    "dropout_strategy": 'element',
    "trainable": True,
    "initializer": {
        "type": "random_uniform_initializer",
        "kwargs": {
            "minval": -0.1,
            "maxval": 0.1,
            "seed": None
        }
    },
    "regularizer": {
        "type": "L1L2",
        "kwargs": {
            "l1": 0.,
            "l2": 0.
        }
    },
    "name": "word_embedder",
}

Here:

“dim”: int or list

Embedding dimension. Can be a list of integers to yield embeddings with dimensionality > 1.

Ignored if init_value is given to the embedder constructor.

“dropout_rate”: float
The dropout rate between 0 and 1. E.g., dropout_rate=0.1 would drop out 10% of the embedding. Set to 0 to disable dropout.
“dropout_strategy”: str

The dropout strategy. Can be one of the following

  • "element": The regular strategy that drops individual elements of embedding vectors.
  • "item": Drops individual items (e.g., words) entirely. E.g., for the word sequence “the simpler the better”, the strategy can yield “_ simpler the better”, where the first “the” is dropped.
  • "item_type": Drops item types (e.g., word types). E.g., for the above sequence, the strategy can yield “_ simpler _ better”, where the word type “the” is dropped. The dropout will never yield “_ simpler the better” as in the "item" strategy.
“trainable”: bool
Whether the embedding is trainable.
“initializer”: dict or None
Hyperparameters of the initializer for embedding values. See get_initializer() for the details. Ignored if init_value is given to the embedder constructor.
“regularizer”: dict
Hyperparameters of the regularizer for embedding values. See get_regularizer() for the details.
“name”: str
Name of the embedding variable.
embedding

The embedding tensor, of shape [vocab_size] + dim.

dim

The embedding dimension.

vocab_size

The vocabulary size.

PositionEmbedder

class texar.tf.modules.PositionEmbedder(init_value=None, position_size=None, hparams=None)[source]

Simple position embedder that maps position indexes into embeddings via lookup.

Either init_value or position_size is required. If both are given, there must be init_value.shape[0]==position_size.

Parameters:
  • init_value (optional) –

    A Tensor or numpy array that contains the initial value of embeddings. It is typically of shape [position_size, embedding dim]

    If None, embedding is initialized as specified in hparams["initializer"]. Otherwise, the "initializer" and "dim" hyperparameters in hparams are ignored.

  • position_size (int, optional) – The number of possible positions, e.g., the maximum sequence length. Required if init_value is not given.
  • hparams (dict, optional) – Embedder hyperparameters. If it is not specified, the default hyperparameter setting is used. See default_hparams for the structure and default values.
_build(positions=None, sequence_length=None, mode=None, **kwargs)[source]

Embeds the positions.

Either positions or sequence_length is required:

  • If both are given, sequence_length is used to mask out embeddings of those time steps beyond the respective sequence lengths.
  • If only sequence_length is given, then positions from 0 to sequence_length-1 are embedded.
Parameters:
  • positions (optional) – An integer tensor containing the position ids to embed.
  • sequence_length (optional) – An integer tensor of shape [batch_size]. Time steps beyond the respective sequence lengths will have zero-valued embeddings.
  • mode (optional) – A tensor taking value in tf.estimator.ModeKeys, including TRAIN, EVAL, and PREDICT. If None, dropout will be controlled by texar.tf.global_mode().
  • kwargs – Additional keyword arguments for tf.nn.embedding_lookup besides params and ids.
Returns:

A Tensor of shape shape(inputs) + embedding dimension.

static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
    "dim": 100,
    "initializer": {
        "type": "random_uniform_initializer",
        "kwargs": {
            "minval": -0.1,
            "maxval": 0.1,
            "seed": None
        }
    },
    "regularizer": {
        "type": "L1L2",
        "kwargs": {
            "l1": 0.,
            "l2": 0.
        }
    },
    "dropout_rate": 0,
    "trainable": True,
    "name": "position_embedder"
}

The hyperparameters have the same meaning as those in texar.tf.modules.WordEmbedder.default_hparams().

embedding

The embedding tensor.

dim

The embedding dimension.

position_size

The position size, i.e., maximum number of positions.

SinusoidsPositionEmbedder

class texar.tf.modules.SinusoidsPositionEmbedder(position_size, hparams=None)[source]

Sinusoid position embedder that maps position indexes into embeddings via sinusoid calculation. This module does not have trainable parameters. Used in, e.g., Transformer models (Vaswani et al.) “Attention Is All You Need”.

Each channel of the input Tensor is incremented by a sinusoid of a different frequency and phase. This allows attention to learn to use absolute and relative positions.

Timing signals should be added to some precursors of both the query and the memory inputs to attention. The use of relative position is possible because sin(x+y) and cos(x+y) can be expressed in terms of y, sin(x), and cos(x). In particular, we use a geometric sequence of timescales starting with min_timescale and ending with max_timescale. The number of different timescales is equal to dim / 2. For each timescale, we generate the two sinusoidal signals sin(timestep/timescale) and cos(timestep/timescale). All of these sinusoids are concatenated in the dim dimension.

Parameters:position_size (int) – The number of possible positions, e.g., the maximum sequence length. Set position_size=None and hparams['cache_embeddings']=False to enable infinite large or negative position indexes.
_build(positions=None, sequence_length=None)[source]

Embeds. Either positions or sequence_length is required:

  • If both are given, sequence_length is used to mask out embeddings of those time steps beyond the respective sequence lengths.
  • If only sequence_length is given, then positions from 0 to sequence_length-1 are embedded.
Parameters:
  • positions (optional) – An integer tensor containing the position ids to embed.
  • sequence_length (optional) – An integer tensor of shape [batch_size]. Time steps beyond the respective sequence lengths will have zero-valued embeddings.
Returns:

A Tensor of shape [batch_size, max_time, dim].

static default_hparams()[source]

Returns a dictionary of hyperparameters with default values We use a geometric sequence of timescales starting with min_timescale and ending with max_timescale. The number of different timescales is equal to dim/2.

{
    'min_timescale': 1.0,
    'max_timescale': 10000.0,
    'dim': 512,
    'cache_embeddings': True,
    'name':'sinusoid_posisiton_embedder',
}

Here:

“cache_embeddings”: bool

If True, precompute embeddings for positions in range [0, position_size - 1]. This leads to faster lookup but requires lookup indices to be within this range.

If False, embeddings are computed on-the-fly during lookup. Set to False if your application needs to handle sequences of arbitrary length, or requires embeddings at negative positions.

EmbedderBase

class texar.tf.modules.EmbedderBase(num_embeds=None, hparams=None)[source]

The base embedder class that all embedder classes inherit.

Parameters:
  • num_embeds (int, optional) – The number of embedding elements, e.g., the vocabulary size of a word embedder.
  • hparams (dict or HParams, optional) – Embedder hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter structure and default values.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
    "name": "embedder"
}
num_embeds

The number of embedding elements.

Encoders

UnidirectionalRNNEncoder

class texar.tf.modules.UnidirectionalRNNEncoder(cell=None, cell_dropout_mode=None, output_layer=None, hparams=None)[source]

One directional RNN encoder.

Parameters:
  • cell – (RNNCell, optional) If not specified, a cell is created as specified in hparams["rnn_cell"].
  • cell_dropout_mode (optional) – A Tensor taking value of tf.estimator.ModeKeys, which toggles dropout in the RNN cell (e.g., activates dropout in TRAIN mode). If None, global_mode() is used. Ignored if cell is given.
  • output_layer (optional) – An instance of tf.layers.Layer. Applies to the RNN cell output of each step. If None (default), the output layer is created as specified in hparams["output_layer"].
  • hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.

See _build() for the inputs and outputs of the encoder.

Example

# Use with embedder
embedder = WordEmbedder(vocab_size, hparams=emb_hparams)
encoder = UnidirectionalRNNEncoder(hparams=enc_hparams)

outputs, final_state = encoder(
    inputs=embedder(data_batch['text_ids']),
    sequence_length=data_batch['length'])
_build(inputs, sequence_length=None, initial_state=None, time_major=False, mode=None, return_cell_output=False, return_output_size=False, **kwargs)[source]

Encodes the inputs.

Parameters:
  • inputs – A 3D Tensor of shape [batch_size, max_time, dim]. The first two dimensions batch_size and max_time are exchanged if time_major=True is specified.
  • sequence_length (optional) – A 1D int tensor of shape [batch_size]. Sequence lengths of the batch inputs. Used to copy-through state and zero-out outputs when past a batch element’s sequence length.
  • initial_state (optional) – Initial state of the RNN.
  • time_major (bool) – The shape format of the inputs and outputs Tensors. If True, these tensors are of shape [max_time, batch_size, depth]. If False (default), these tensors are of shape [batch_size, max_time, depth].
  • mode (optional) – A tensor taking value in tf.estimator.ModeKeys, including TRAIN, EVAL, and PREDICT. Controls output layer dropout if the output layer is specified with hparams. If None (default), texar.tf.global_mode() is used.
  • return_cell_output (bool) – Whether to return the output of the RNN cell. This is the results prior to the output layer.
  • return_output_size (bool) – Whether to return the size of the output (i.e., the results after output layers).
  • **kwargs – Optional keyword arguments of tf.nn.dynamic_rnn, such as swap_memory, dtype, parallel_iterations, etc.
Returns:

  • By default (both return_cell_output and return_output_size are False), returns a pair (outputs, final_state)

    • outputs: The RNN output tensor by the output layer (if exists) or the RNN cell (otherwise). The tensor is of shape [batch_size, max_time, output_size] if time_major is False, or [max_time, batch_size, output_size] if time_major is True. If RNN cell output is a (nested) tuple of Tensors, then the outputs will be a (nested) tuple having the same nest structure as the cell output.
    • final_state: The final state of the RNN, which is a Tensor of shape [batch_size] + cell.state_size or a (nested) tuple of Tensors if cell.state_size is a (nested) tuple.
  • If return_cell_output is True, returns a triple (outputs, final_state, cell_outputs)

    • cell_outputs: The outputs by the RNN cell prior to the output layer, having the same structure with outputs except for the output_dim.
  • If return_output_size is True, returns a tuple (outputs, final_state, output_size)

    • output_size: A (possibly nested tuple of) int representing the size of outputs. If a single int or an int array, then outputs has shape [batch/time, time/batch] + output_size. If a (nested) tuple, then output_size has the same structure as with outputs.
  • If both return_cell_output and return_output_size are True, returns (outputs, final_state, cell_outputs, output_size).

static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
    "rnn_cell": default_rnn_cell_hparams(),
    "output_layer": {
        "num_layers": 0,
        "layer_size": 128,
        "activation": "identity",
        "final_layer_activation": None,
        "other_dense_kwargs": None,
        "dropout_layer_ids": [],
        "dropout_rate": 0.5,
        "variational_dropout": False
    },
    "name": "unidirectional_rnn_encoder"
}

Here:

“rnn_cell”: dict

A dictionary of RNN cell hyperparameters. Ignored if cell is given to the encoder constructor.

The default value is defined in default_rnn_cell_hparams().

“output_layer”: dict

Output layer hyperparameters. Ignored if output_layer is given to the encoder constructor. Includes:

“num_layers”: int
The number of output (dense) layers. Set to 0 to avoid any output layers applied to the cell outputs..
“layer_size”: int or list

The size of each of the output (dense) layers.

If an int, each output layer will have the same size. If a list, the length must equal to num_layers.

“activation”: str or callable or None

Activation function for each of the output (dense) layer except for the final layer. This can be a function, or its string name or module path. If function name is given, the function must be from module tf.nn or tf. For example

"activation": "relu" # function name
"activation": "my_module.my_activation_fn" # module path
"activation": my_module.my_activation_fn # function

Default is None which maintains a linear activation.

“final_layer_activation”: str or callable or None
The activation function for the final output layer.
“other_dense_kwargs”: dict or None
Other keyword arguments to construct each of the output dense layers, e.g., use_bias. See Dense for the keyword arguments.
“dropout_layer_ids”: int or list

The indexes of layers (starting from 0) whose inputs are applied with dropout. The index = num_layers means dropout applies to the final layer output. E.g.,

{
    "num_layers": 2,
    "dropout_layer_ids": [0, 2]
}

will leads to a series of layers as -dropout-layer0-layer1-dropout-.

The dropout mode (training or not) is controlled by the mode argument of _build().

“dropout_rate”: float
The dropout rate, between 0 and 1. E.g., “dropout_rate”: 0.1 would drop out 10% of elements.
“variational_dropout”: bool
Whether the dropout mask is the same across all time steps.
“name”: str
Name of the encoder
cell

The RNN cell.

state_size

The state size of encoder cell.

Same as encoder.cell.state_size.

output_layer

The output layer.

BidirectionalRNNEncoder

class texar.tf.modules.BidirectionalRNNEncoder(cell_fw=None, cell_bw=None, cell_dropout_mode=None, output_layer_fw=None, output_layer_bw=None, hparams=None)[source]

Bidirectional forward-backward RNN encoder.

Parameters:
  • cell_fw (RNNCell, optional) – The forward RNN cell. If not given, a cell is created as specified in hparams["rnn_cell_fw"].
  • cell_bw (RNNCell, optional) – The backward RNN cell. If not given, a cell is created as specified in hparams["rnn_cell_bw"].
  • cell_dropout_mode (optional) – A tensor taking value of tf.estimator.ModeKeys, which toggles dropout in the RNN cells (e.g., activates dropout in TRAIN mode). If None, global_mode() is used. Ignored if respective cell is given.
  • output_layer_fw (optional) – An instance of tf.layers.Layer. Apply to the forward RNN cell output of each step. If None (default), the output layer is created as specified in hparams["output_layer_fw"].
  • output_layer_bw (optional) – An instance of tf.layers.Layer. Apply to the backward RNN cell output of each step. If None (default), the output layer is created as specified in hparams["output_layer_bw"].
  • hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.

See _build() for the inputs and outputs of the encoder.

Example

# Use with embedder
embedder = WordEmbedder(vocab_size, hparams=emb_hparams)
encoder = BidirectionalRNNEncoder(hparams=enc_hparams)

outputs, final_state = encoder(
    inputs=embedder(data_batch['text_ids']),
    sequence_length=data_batch['length'])
# outputs == (outputs_fw, outputs_bw)
# final_state == (final_state_fw, final_state_bw)
_build(inputs, sequence_length=None, initial_state_fw=None, initial_state_bw=None, time_major=False, mode=None, return_cell_output=False, return_output_size=False, **kwargs)[source]

Encodes the inputs.

Parameters:
  • inputs – A 3D Tensor of shape [batch_size, max_time, dim]. The first two dimensions batch_size and max_time may be exchanged if time_major=True is specified.
  • sequence_length (optional) – A 1D int tensor of shape [batch_size]. Sequence lengths of the batch inputs. Used to copy-through state and zero-out outputs when past a batch element’s sequence length.
  • initial_state (optional) – Initial state of the RNN.
  • time_major (bool) – The shape format of the inputs and outputs Tensors. If True, these tensors are of shape [max_time, batch_size, depth]. If False (default), these tensors are of shape [batch_size, max_time, depth].
  • mode (optional) – A tensor taking value in tf.estimator.ModeKeys, including TRAIN, EVAL, and PREDICT. Controls output layer dropout if the output layer is specified with hparams. If None (default), texar.tf.global_mode() is used.
  • return_cell_output (bool) – Whether to return the output of the RNN cell. This is the results prior to the output layer.
  • **kwargs – Optional keyword arguments of tf.nn.dynamic_rnn, such as swap_memory, dtype, parallel_iterations, etc.
Returns:

  • By default (both return_cell_output and return_output_size are False), returns a pair (outputs, final_state)

    • outputs: A tuple (outputs_fw, outputs_bw) containing the forward and the backward RNN outputs, each of which is of shape [batch_size, max_time, output_dim] if time_major is False, or [max_time, batch_size, output_dim] if time_major is True. If RNN cell output is a (nested) tuple of Tensors, then outputs_fw and outputs_bw will be a (nested) tuple having the same structure as the cell output.
    • final_state: A tuple (final_state_fw, final_state_bw) containing the final states of the forward and backward RNNs, each of which is a Tensor of shape [batch_size] + cell.state_size, or a (nested) tuple of Tensors if cell.state_size is a (nested) tuple.
  • If return_cell_output is True, returns a triple (outputs, final_state, cell_outputs) where

    • cell_outputs: A tuple (cell_outputs_fw, cell_outputs_bw) containting the outputs by the forward and backward RNN cells prior to the output layers, having the same structure with outputs except for the output_dim.
  • If return_output_size is True, returns a tuple (outputs, final_state, output_size) where

    • output_size: A tupple (output_size_fw, output_size_bw) containing the size of outputs_fw and outputs_bw, respectively. Take *_fw for example, output_size_fw is a (possibly nested tuple of) int. If a single int or an int array, then outputs_fw has shape [batch/time, time/batch] + output_size_fw. If a (nested) tuple, then output_size_fw has the same structure as with outputs_fw. The same applies to output_size_bw.
  • If both return_cell_output and return_output_size are True, returns (outputs, final_state, cell_outputs, output_size).

static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
    "rnn_cell_fw": default_rnn_cell_hparams(),
    "rnn_cell_bw": default_rnn_cell_hparams(),
    "rnn_cell_share_config": True,
    "output_layer_fw": {
        "num_layers": 0,
        "layer_size": 128,
        "activation": "identity",
        "final_layer_activation": None,
        "other_dense_kwargs": None,
        "dropout_layer_ids": [],
        "dropout_rate": 0.5,
        "variational_dropout": False
    },
    "output_layer_bw": {
        # Same hyperparams and default values as "output_layer_fw"
        # ...
    },
    "output_layer_share_config": True,
    "name": "bidirectional_rnn_encoder"
}

Here:

“rnn_cell_fw”: dict

Hyperparameters of the forward RNN cell. Ignored if cell_fw is given to the encoder constructor.

The default value is defined in default_rnn_cell_hparams().

“rnn_cell_bw”: dict

Hyperparameters of the backward RNN cell. Ignored if cell_bw is given to the encoder constructor , or if "rnn_cell_share_config" is True.

The default value is defined in default_rnn_cell_hparams().

“rnn_cell_share_config”: bool
Whether share hyperparameters of the backward cell with the forward cell. Note that the cell parameters (variables) are not shared.
“output_layer_fw”: dict
Hyperparameters of the forward output layer. Ignored if output_layer_fw is given to the constructor. See the “output_layer” field of default_hparams() for details.
“output_layer_bw”: dict

Hyperparameters of the backward output layer. Ignored if output_layer_bw is given to the constructor. Have the same structure and defaults with "output_layer_fw".

Ignored if "output_layer_share_config" is True.

“output_layer_share_config”: bool
Whether share hyperparameters of the backward output layer with the forward output layer. Note that the layer parameters (variables) are not shared.
“name”: str
Name of the encoder
cell_fw

The forward RNN cell.

cell_bw

The backward RNN cell.

state_size_fw

The state size of the forward encoder cell.

Same as encoder.cell_fw.state_size.

state_size_bw

The state size of the backward encoder cell.

Same as encoder.cell_bw.state_size.

output_layer_fw

The output layer of the forward RNN.

output_layer_bw

The output layer of the backward RNN.

HierarchicalRNNEncoder

class texar.tf.modules.HierarchicalRNNEncoder(encoder_major=None, encoder_minor=None, hparams=None)[source]

A hierarchical encoder that stacks basic RNN encoders into two layers. Can be used to encode long, structured sequences, e.g. paragraphs, dialog history, etc.

Parameters:
  • encoder_major (optional) – An instance of subclass of RNNEncoderBase The high-level encoder taking final states from low-level encoder as its inputs. If not specified, an encoder is created as specified in hparams["encoder_major"].
  • encoder_minor (optional) – An instance of subclass of RNNEncoderBase The low-level encoder. If not specified, an encoder is created as specified in hparams["encoder_minor"].
  • hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.

See _build() for the inputs and outputs of the encoder.

_build(inputs, order='btu', medium=None, sequence_length_major=None, sequence_length_minor=None, **kwargs)[source]

Encodes the inputs.

Parameters:
  • inputs

    A 4-D tensor of shape [B, T, U, dim], where

    • B: batch_size
    • T: the max length of high-level sequences. E.g., the max number of utterances in dialog history.
    • U: the max length of low-level sequences. E.g., the max length of each utterance in dialog history.
    • dim: embedding dimension

    The order of first three dimensions can be changed according to order.

  • order

    A 3-char string containing ‘b’, ‘t’, and ‘u’, that specifies the order of inputs dimensions above. Following four can be accepted:

    • ’btu’: None of the encoders are time-major.
    • ’utb’: Both encoders are time-major.
    • ’tbu’: The major encoder is time-major.
    • ’ubt’: The minor encoder is time-major.
  • medium (optional) – A list of callables that subsequently process the final states of minor encoder and obtain the inputs for the major encoder. If not specified, flatten() is used for processing the minor’s final states.
  • sequence_length_major (optional) – The sequence_length argument sent to major encoder. This is a 1-D Tensor of shape [B].
  • sequence_length_minor (optional) – The sequence_length argument sent to minor encoder. It can be either a 1-D Tensor of shape [B*T], or a 2-D Tensor of shape [B, T] or [T, B] according to order.
  • **kwargs

    Other keyword arguments for the major and minor encoders, such as initial_state, etc. Note that sequence_length, and time_major must not be included here. time_major is derived from order automatically. By default, arguments will be sent to both major and minor encoders. To specify which encoder an argument should be sent to, add ‘_minor’/’_major’ as its suffix.

    Note that initial_state_minor must have a batch dimension of size B*T. If you have an initial state of batch dimension = T, use tile_initial_state_minor() to tile it according to order.

Returns:

A tuple (outputs, final_state) by the major encoder.

See the return values of _build() method of respective encoder class for details.

static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
    "encoder_major_type": "UnidirectionalRNNEncoder",
    "encoder_major_hparams": {},
    "encoder_minor_type": "UnidirectionalRNNEncoder",
    "encoder_minor_hparams": {},
    "config_share": False,
    "name": "hierarchical_encoder_wrapper"
}

Here:

“encoder_major_type”: str or class or instance
The high-level encoder. Can be a RNN encoder class, its name or module path, or a class instance. Ignored if encoder_major is given to the encoder constructor.
“encoder_major_hparams”: dict
The hyperparameters for the high-level encoder. The high-level encoder is created with encoder_class(hparams=encoder_major_hparams). Ignored if encoder_major is given to the encoder constructor, or if “encoder_major_type” is an encoder instance.
“encoder_minor_type”: str or class or instance
The low-level encoder. Can be a RNN encoder class, its name or module path, or a class instance. Ignored if encoder_minor is given to the encoder constructor, or if “config_share” is True.
“encoder_minor_hparams”: dict
The hyperparameters for the low-level encoder. The high-level encoder is created with encoder_class(hparams=encoder_minor_hparams). Ignored if encoder_minor is given to the encoder constructor, or if “config_share” is True, or if “encoder_minor_type” is an encoder instance.
“config_share”:
Whether to use encoder_major’s hyperparameters to construct encoder_minor.
“name”:
Name of the encoder.
static tile_initial_state_minor(initial_state, order, inputs_shape)[source]

Tiles an initial state to be used for encoder minor.

The batch dimension of initial_state must equal T. The state will be copied for B times and used to start encoding each low-level sequence. For example, the first utterance in each dialog history in the batch will have the same initial state.

Parameters:
  • initial_state – Initial state with the batch dimension of size T.
  • order (str) – The dimension order of inputs. Must be the same as used in _build().
  • inputs_shape – Shape of inputs for _build(). Can usually be Obtained with tf.shape(inputs).
Returns:

A tiled initial state with batch dimension of size B*T

static flatten(x)[source]

Flattens a cell state by concatenating a sequence of cell states along the last dimension. If the cell states are LSTMStateTuple, only the hidden LSTMStateTuple.h is used.

This process is used by default if medium is not provided to _build().

encoder_major

The high-level encoder.

encoder_minor

The low-level encoder.

MultiheadAttentionEncoder

class texar.tf.modules.MultiheadAttentionEncoder(hparams=None)[source]

Multihead Attention Encoder

Parameters:hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.
_build(queries, memory, memory_attention_bias, cache=None, mode=None)[source]

Encodes the inputs.

Parameters:
  • queries – A 3d tensor with shape of [batch, length_query, depth_query].
  • memory – A 3d tensor with shape of [batch, length_key, depth_key].
  • memory_attention_bias – A 3d tensor with shape of [batch, length_key, num_units].
  • cache – Memory cache only when inferencing the sentence from sractch.
  • mode (optional) – A tensor taking value in tf.estimator.ModeKeys, including TRAIN, EVAL and PREDICT. Controls dropout mode. If None (default), texar.tf.global_mode() is used.
Returns:

A Tensor of shape [batch_size, max_time, dim] containing the encoded vectors.

static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
    "initializer": None,
    'num_heads': 8,
    'output_dim': 512,
    'num_units': 512,
    'dropout_rate': 0.1,
    'use_bias': False,
    "name": "multihead_attention"
}

Here:

“initializer”: dict, optional
Hyperparameters of the default initializer that initializes variables created in this module. See get_initializer() for details.
“num_heads”: int
Number of heads for attention calculation.
“output_dim”: int
Output dimension of the returned tensor.
“num_units”: int
Hidden dimension of the unsplitted attention space. Should be devisible by num_heads.
“dropout_rate: : float
Dropout rate in the attention.
“use_bias”: bool
Use bias when projecting the key, value and query.
“name”: str
Name of the module.

TransformerEncoder

class texar.tf.modules.TransformerEncoder(hparams=None)[source]

Transformer encoder that applies multi-head self attention for encoding sequences.

This module basically stacks MultiheadAttentionEncoder, FeedForwardNetwork and residual connections.

This module supports two types of architectures, namely, the standard Transformer Encoder architecture first proposed in (Vaswani et al.) “Attention is All You Need”, and the variant first used in (Devlin et al.) BERT. See default_hparams() for the nuance between the two types of architectures.

Parameters:hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparameter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.
_build(inputs, sequence_length, mode=None)[source]

Encodes the inputs.

Parameters:
  • inputs – A 3D Tensor of shape [batch_size, max_time, dim], containing the embedding of input sequences. Note that the embedding dimension dim must equal “dim” in hparams. The input embedding is typically an aggregation of word embedding and position embedding.
  • sequence_length – A 1D Tensor of shape [batch_size]. Input tokens beyond respective sequence lengths are masked out automatically.
  • mode (optional) – A tensor taking value in tf.estimator.ModeKeys, including TRAIN, EVAL, and PREDICT. Used to toggle dropout. If None (default), texar.tf.global_mode() is used.
Returns:

A Tensor of shape [batch_size, max_time, dim] containing the encoded vectors.

static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
    "num_blocks": 6,
    "dim": 512,
    'use_bert_config': False,
    "embedding_dropout": 0.1,
    "residual_dropout": 0.1,
    "poswise_feedforward": default_transformer_poswise_net_hparams,
    'multihead_attention': {
        'name': 'multihead_attention',
        'num_units': 512,
        'output_dim': 512,
        'num_heads': 8,
        'dropout_rate': 0.1,
        'output_dim': 512,
        'use_bias': False,
    },
    "initializer": None,
    "name": "transformer_encoder"
}

Here:

“num_blocks”: int
Number of stacked blocks.
“dim”: int
Hidden dimension of the encoders.
“use_bert_config”: bool

If False, apply the standard Transformer Encoder architecture from the original paper (Vaswani et al.) “Attention is All You Need”. If True, apply the Transformer Encoder architecture used in BERT (Devlin et al.).

The differences lie in:

  1. The standard arch restricts the word embedding of PAD token to all zero. The BERT arch does not.
  2. The attention bias for padding tokens: The standard arch uses -1e8 for nagative attention mask. BERT uses -1e4 instead.
  3. The residual connections between internal tensors: In BERT, a residual layer connects the tensors after layer normalization. In the standard arch, the tensors are connected before layer normalization.
“embedding_dropout”: float
Dropout rate of the input embedding.
“residual_dropout”: float
Dropout rate of the residual connections.
“poswise_feedforward”: dict

Hyperparameters for a feed-forward network used in residual connections. Make sure the dimension of the output tensor is equal to dim.

See default_transformer_poswise_net_hparams() for details.

“multihead_attention”: dict
Hyperparameters for the multihead attention strategy. Make sure the “output_dim” in this module is equal to “dim”. See default_harams() for details.
“initializer”: dict, optional
Hyperparameters of the default initializer that initializes variables created in this module. See get_initializer() for details.
“name”: str
Name of the module.

BERTEncoder

class texar.tf.modules.BERTEncoder(pretrained_model_name=None, cache_dir=None, hparams=None)[source]

Raw BERT Transformer for encoding sequences. Please see PretrainedBERTMixin for a brief description of BERT.

This module basically stacks WordEmbedder, PositionEmbedder, TransformerEncoder and a dense pooler.

Parameters:
  • pretrained_model_name (optional) – a str, the name of pre-trained model (e.g., bert-base-uncased). Please refer to PretrainedBERTMixin for all supported models. If None, the model name in hparams is used.
  • cache_dir (optional) – the path to a folder in which the pre-trained models will be cached. If None (default), a default directory (texar_data folder under user’s home directory) will be used.
  • hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparameter will be set to default values. See default_hparams() for the hyperparameter structure and default values.
_build(inputs, sequence_length=None, segment_ids=None, mode=None, **kwargs)[source]

Encodes the inputs.

Parameters:
  • inputs – A 2D Tensor of shape [batch_size, max_time], containing the token ids of tokens in the input sequences.
  • segment_ids (optional) – A 2D Tensor of shape [batch_size, max_time], containing the segment ids of tokens in input sequences. If None (default), a tensor with all elements set to zero is used.
  • sequence_length (optional) – A 1D Tensor of shape [batch_size]. Input tokens beyond respective sequence lengths are masked out automatically.
  • mode (optional) – A tensor taking value in tf.estimator.ModeKeys, including TRAIN, EVAL, and PREDICT. Used to toggle dropout. If None (default), texar.tf.global_mode() is used.
  • **kwargs – Keyword arguments.
Returns:

A pair (outputs, pooled_output)

  • outputs: A Tensor of shape [batch_size, max_time, dim] containing the encoded vectors.
  • pooled_output: A Tensor of size [batch_size, hidden_size] which is the output of a pooler berts on top of the hidden state associated to the first character of the input (CLS), see BERT’s paper.

reset_parameters()[source]

Initialize parameters of the pre-trained model. This method is only called if pre-trained checkpoints are not loaded.

static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

  • The encoder arch is determined by the constructor argument pretrained_model_name if it’s specified. In this case, hparams are ignored.
  • Otherwise, the encoder arch is determined by hparams[‘pretrained_model_name’] if it’s specified. All other configurations in hparams are ignored.
  • If the above two are None, the encoder arch is defined by the configurations in hparams and weights are randomly initialized.
{
    "pretrained_model_name": "bert-base-uncased",
    "embed": {
        "dim": 768,
        "name": "word_embeddings"
    },
    "vocab_size": 30522,
    "segment_embed": {
        "dim": 768,
        "name": "token_type_embeddings"
    },
    "type_vocab_size": 2,
    "position_embed": {
        "dim": 768,
        "name": "position_embeddings"
    },
    "position_size": 512,

    "encoder": {
        "dim": 768,
        "embedding_dropout": 0.1,
        "multihead_attention": {
            "dropout_rate": 0.1,
            "name": "self",
            "num_heads": 12,
            "num_units": 768,
            "output_dim": 768,
            "use_bias": True
        },
        "name": "encoder",
        "num_blocks": 12,
        "poswise_feedforward": {
            "layers": [
                {   "kwargs": {
                        "activation": "gelu",
                        "name": "intermediate",
                        "units": 3072,
                        "use_bias": True
                    },
                    "type": "Dense"
                },
                {   "kwargs": {"activation": None,
                    "name": "output",
                    "units": 768,
                    "use_bias": True
                    },
                    "type": "Dense"
                }
            ]
        },
        "residual_dropout": 0.1,
        "use_bert_config": True
    },
    "hidden_size": 768,
    "initializer": None,
    "name": "bert_encoder"
}

Here:

The default parameters are values for uncased BERT-Base model.

“pretrained_model_name”: str or None
The name of the pre-trained BERT model. If None, the model will be randomly initialized.
“embed”: dict
Hyperparameters for word embedding layer.
“vocab_size”: int
The vocabulary size of inputs in BERT model.
“segment_embed”: dict
Hyperparameters for segment embedding layer.
“type_vocab_size”: int
The vocabulary size of the segment_ids passed into BertModel.
“position_embed”: dict
Hyperparameters for position embedding layer.
“position_size”: int
The maximum sequence length that this model might ever be used with.
“encoder”: dict
Hyperparameters for the TransformerEncoder. See default_harams() for details.
“hidden_size”: int
Size of the pooler dense layer.
“initializer”: dict, optional
Hyperparameters of the default initializer that initializes variables created in this module. See get_initializer() for details.
“name”: str
Name of the module.

Conv1DEncoder

class texar.tf.modules.Conv1DEncoder(hparams=None)[source]

Simple Conv-1D encoder which consists of a sequence of conv layers followed with a sequence of dense layers.

Wraps Conv1DNetwork to be a subclass of EncoderBase. Has exact the same functionality with Conv1DNetwork.

static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

The same as default_hparams() of Conv1DNetwork, except that the default name is ‘conv_encoder’.

EncoderBase

class texar.tf.modules.EncoderBase(hparams=None)[source]

Base class inherited by all encoder classes.

static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

RNNEncoderBase

class texar.tf.modules.RNNEncoderBase(hparams=None)[source]

Base class for all RNN encoder classes to inherit.

Parameters:hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
    "name": "rnn_encoder"
}

XLNetEncoder

class texar.tf.modules.XLNetEncoder(pretrained_model_name=None, cache_dir=None, hparams=None)[source]

Raw XLNet module for encoding sequences. Please see PretrainedXLNetMixin for a brief description of XLNet.

Parameters:
  • pretrained_model_name (optional) – a str, the name of pre-trained model (e.g., xlnet-based-cased). Please refer to PretrainedXLNetMixin for all supported models. If None, the model name in hparams is used.
  • cache_dir (optional) – the path to a folder in which the pre-trained models will be cached. If None (default), a default directory (texar_data folder under user’s home directory) will be used.
  • hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparameter will be set to default values. See default_hparams() for the hyperparameter structure and default values.
_build(token_ids, segment_ids=None, input_mask=None, memory=None, permute_mask=None, target_mapping=None, bi_data=False, clamp_len=None, cache_len=0, same_length=False, attn_type='bi', two_stream=False, mode=None)[source]

Compute XLNet representations for the input.

Parameters:
  • token_ids – Shape [batch_size, max_time].
  • segment_ids – Shape [batch_size, max_time].
  • input_mask – Float tensor of shape [batch_size, max_time]. Note that positions with value 1 are masked out.
  • memory – Memory from previous batches. A list of length num_layers, each tensor of shape [batch_size, mem_len, hidden_dim].
  • permute_mask – The permutation mask. Float tensor of shape [batch_size, max_time, max_time]. A value of 0 for permute_mask[i, j, k] indicates that position i attends to position j in batch k.
  • target_mapping – The target token mapping. Float tensor of shape [batch_size, num_targets, max_time]. A value of 1 for target_mapping[i, j, k] indicates that the i-th target token (in order of permutation) in batch k is the token at position j. Each row target_mapping[i, :, k] can have no more than one value of 1.
  • bi_data (bool) – Whether to use bidirectional data input pipeline.
  • clamp_len (int) – Clamp all relative distances larger than clamp_len. A value of -1 means no clamping.
  • cache_len (int) – Length of memory (number of tokens) to cache.
  • same_length (bool) – Whether to use the same attention length for each token.
  • attn_type (str) – Attention type. Supported values are “uni” and “bi”.
  • two_stream (bool) – Whether to use two-stream attention. Only set to True when pre-training or generating text. Defaults to False.

Returns: A tuple of (output, new_memory):

  • output: The final layer output representations. Shape [batch_size, max_time, hidden_dim].
  • new_memory: The memory of the current batch. If cache_len is 0, then new_memory is None. Otherwise, it is a list of length num_layers, each tensor of shape [batch_size, cache_len, hidden_dim]. This can be used as the memory argument in the next batch.
reset_parameters()[source]

Initialize parameters of the pre-trained model. This method is only called if pre-trained checkpoints are not loaded.

static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

  • The encoder arch is determined by the constructor argument pretrained_model_name if it’s specified. In this case, hparams are ignored.
  • Otherwise, the encoder arch is determined by hparams[‘pretrained_model_name’] if it’s specified. All other configurations in hparams are ignored.
  • If the above two are None, the encoder arch is defined by the configurations in hparams and weights are randomly initialized.
{
    "name": "xlnet_encoder",
    "pretrained_model_name": "xlnet-base-cased",
    "untie_r": True,
    "num_layers": 12,
    "mem_len": 0,
    "reuse_len": 0,
    "initializer": None,
    "num_heads": 12,
    "hidden_dim": 768,
    "head_dim": 64,
    "dropout": 0.1,
    "attention_dropout": 0.1,
    "use_segments": True,
    "ffn_inner_dim": 3072,
    "activation": 'gelu',
    "vocab_size": 32000,
    "max_seq_len": 512,
}

Here:

The default parameters are values for cased XLNet-Base model.

“pretrained_model_name”: str or None
The name of the pre-trained bert model. If None, the model will be randomly initialized.
“untie_r”: bool
Boolean value to indicate if biases should be untied for all the layers
“num_layers”: int
Number of layers in the network
“mem_len”: int
Length of the memory to be used during attention score calculation.
“reuse_len”: int
Length of the memory that can be re-used
“initializer”: dict, optional
Hyperparameters of the default initializer that initializes variables created in this module. See get_initializer() for details.
“num_heads”: int
Number of heads in the attention
“hidden_dim”: int
Hidden dimension of the embeddings
“head_dim”: int
Size of the vectors after head projection.
“dropout”: float
Dropout rate for layers
“attention_dropout”: float
Dropout rate for attention layers
“use_segments”: bool
Boolean to indicate if the input has segments
“ffn_inner_dim”: int
Dimension of PositionWise FF network’s hidden layer
“activation”: str or callable
Activation function applied to the output of the PositionWise FF. See get_activation_fn() for more details.
“vocab_size”: int
The vocabulary size of inputs in XLNet.
“max_seq_len”: int
Maximum len of the sequence allowed in one segment
“name”: str
Name of the module.
param_groups(lr=None, lr_layer_scale=1.0, decay_base_params=False)[source]

Create parameter groups for optimizers. When lr_layer_decay_rate is not 1.0, parameters from each layer form separate groups with different base learning rates.

This method should be called before applying gradients to the variables through the optimizer. Particularly, after calling the optimizer’s compute_gradients method, the user can call this method to get variable-specific learning rates for the network. The gradients for each variables can then be scaled accordingly. These scaled gradients are finally applied by calling optimizer’s apply_gradients method.

Example

grads_and_vars = optimizer.compute_gradients(loss)

vars_to_grads = {key: value for key, value in grads_and_vars}

vars_to_learning_rates = xlnet_encoder.param_groups(
                                        lr=1,
                                        ly_layer_scale=0.75)

for key in vars_to_grads.keys():
    vars_to_grads[key] *= vars_to_learning_rates[key]

train_op = optimizer.apply_gradients(zip(
    *vars_to_grads.items()))
Parameters:
  • lr (float) – The learning rate. Can be omitted if lr_layer_decay_rate is 1.0.
  • lr_layer_scale (float) – Per-layer LR scaling rate. The i-th layer will be scaled by lr_layer_scale ^ (num_layers - i - 1).
  • decay_base_params (bool) – If True, treat non-layer parameters (e.g. embeddings) as if they’re in layer 0. If False, these parameters are not scaled.

Returns: A dict mapping tensorflow variables to their learning rates.

output_size

The last dimension of the encoder output.

Note: The _build() returns two tensors of shapes [batch_size, max_time, hidden_dim] and [batch_size, cache_len, hidden_dim]. output_size here equals hidden_dim

default_transformer_poswise_net_hparams

texar.tf.modules.default_transformer_poswise_net_hparams(output_dim=512)[source]

Returns default hyperparameters of a FeedForwardNetwork as a pos-wise network used in TransformerEncoder and TransformerDecoder.

This is a 2-layer dense network with dropout in-between.

{
    "layers": [
        {
            "type": "Dense",
            "kwargs": {
                "name": "conv1",
                "units": output_dim*4,
                "activation": "relu",
                "use_bias": True,
            }
        },
        {
            "type": "Dropout",
            "kwargs": {
                "rate": 0.1,
            }
        },
        {
            "type": "Dense",
            "kwargs": {
                "name": "conv2",
                "units": output_dim,
                "use_bias": True,
            }
        }
    ],
    "name": "ffn"
}
Parameters:output_dim (int) – The size of output dense layer.

Decoders

RNNDecoderBase

class texar.tf.modules.RNNDecoderBase(cell=None, vocab_size=None, output_layer=None, cell_dropout_mode=None, hparams=None)[source]

Base class inherited by all RNN decoder classes. See BasicRNNDecoder for the argumenrts.

See _build() for the inputs and outputs of RNN decoders in general.

_build(decoding_strategy='train_greedy', initial_state=None, inputs=None, sequence_length=None, embedding=None, start_tokens=None, end_token=None, softmax_temperature=None, max_decoding_length=None, impute_finished=False, output_time_major=False, input_time_major=False, helper=None, mode=None, **kwargs)[source]

Performs decoding. This is a shared interface for both BasicRNNDecoder and AttentionRNNDecoder.

The function provides 3 ways to specify the decoding method, with varying flexibility:

  1. The decoding_strategy argument: A string taking value of:

    • “train_greedy”: decoding in teacher-forcing fashion (i.e., feeding ground truth to decode the next step), and each sample is obtained by taking the argmax of the RNN output logits. Arguments (inputs, sequence_length, input_time_major) are required for this strategy, and argument embedding is optional.
    • “infer_greedy”: decoding in inference fashion (i.e., feeding the generated sample to decode the next step), and each sample is obtained by taking the argmax of the RNN output logits. Arguments (embedding, start_tokens, end_token) are required for this strategy, and argument max_decoding_length is optional.
    • “infer_sample”: decoding in inference fashion, and each sample is obtained by random sampling from the RNN output distribution. Arguments (embedding, start_tokens, end_token) are required for this strategy, and argument max_decoding_length is optional.

This argument is used only when argument helper is None.

Example:

embedder = WordEmbedder(vocab_size=data.vocab.size)
decoder = BasicRNNDecoder(vocab_size=data.vocab.size)

# Teacher-forcing decoding
outputs_1, _, _ = decoder(
    decoding_strategy='train_greedy',
    inputs=embedder(data_batch['text_ids']),
    sequence_length=data_batch['length']-1)

# Random sample decoding. Gets 100 sequence samples
outputs_2, _, sequence_length = decoder(
    decoding_strategy='infer_sample',
    start_tokens=[data.vocab.bos_token_id]*100,
    end_token=data.vocab.eos.token_id,
    embedding=embedder,
    max_decoding_length=60)
  1. The helper argument: An instance of subclass of texar.tf.modules.Helper. This provides a superset of decoding strategies than above, for example:

Helpers give the maximal flexibility of configuring the decoding strategy.

Example:

embedder = WordEmbedder(vocab_size=data.vocab.size)
decoder = BasicRNNDecoder(vocab_size=data.vocab.size)

# Teacher-forcing decoding, same as above with
# `decoding_strategy='train_greedy'`
helper_1 = tx.modules.TrainingHelper(
    inputs=embedders(data_batch['text_ids']),
    sequence_length=data_batch['length']-1)
outputs_1, _, _ = decoder(helper=helper_1)

# Gumbel-softmax decoding
helper_2 = GumbelSoftmaxEmbeddingHelper(
    embedding=embedder,
    start_tokens=[data.vocab.bos_token_id]*100,
    end_token=data.vocab.eos_token_id,
    tau=0.1)
outputs_2, _, sequence_length = decoder(
    max_decoding_length=60, helper=helper_2)
  1. hparams["helper_train"] and hparams["helper_infer"]: Specifying the helper through hyperparameters. Train and infer strategy is toggled based on mode. Appriopriate arguments (e.g., inputs, start_tokens, etc) are selected to construct the helper. Additional arguments for helper constructor can be provided either through **kwargs, or through hparams["helper_train/infer"]["kwargs"].

    This means is used only when both decoding_strategy and helper are None.

    Example:

    h = {
        "helper_infer": {
            "type": "GumbelSoftmaxEmbeddingHelper",
            "kwargs": { "tau": 0.1 }
        }
    }
    embedder = WordEmbedder(vocab_size=data.vocab.size)
    decoder = BasicRNNDecoder(vocab_size=data.vocab.size,
                              hparams=h)
    
    # Gumbel-softmax decoding
    output, _, _ = decoder(
        decoding_strategy=None, # Sets to None explicit
        embedding=embedder,
        start_tokens=[data.vocab.bos_token_id]*100,
        end_token=data.vocab.eos_token_id,
        max_decoding_length=60,
        mode=tf.estimator.ModeKeys.PREDICT)
            # PREDICT mode also shuts down dropout
    
Parameters:
  • decoding_strategy (str) – A string specifying the decoding strategy. Different arguments are required based on the strategy. Ignored if helper is given.
  • initial_state (optional) – Initial state of decoding. If None (default), zero state is used.
  • inputs (optional) –

    Input tensors for teacher forcing decoding. Used when decoding_strategy is set to "train_greedy", or when hparams-configured helper is used.

    • If embedding is None, inputs is directly fed to the decoder. E.g., in “train_greedy” strategy, inputs must be a 3D Tensor of shape [batch_size, max_time, emb_dim] (or [max_time, batch_size, emb_dim] if input_time_major == True).
    • If embedding is given, inputs is used as index to look up embeddings and feed in the decoder. E.g., if embedding is an instance of WordEmbedder, then inputs is usually a 2D int Tensor [batch_size, max_time] (or [max_time, batch_size] if input_time_major == True) containing the token indexes.
  • sequence_length (optional) – A 1D int Tensor containing the sequence length of inputs. Used when decoding_strategy=”train_greedy” or hparams-configured helper is used.
  • embedding (optional) –

    Embedding used when:

    • ”infer_greedy” or “infer_sample” decoding_strategy is used. This can be a callable or the params argument for embedding_lookup. If a callable, it can take a vector tensor of token ids, or take two arguments (ids, times), where ids is a vector tensor of token ids, and times is a vector tensor of time steps (i.e., position ids). The latter case can be used when attr:embedding is a combination of word embedding and position embedding. embedding is required in this case.
    • ”train_greedy” decoding_strategy is used. This can be a callable or the params argument for embedding_lookup. If a callable, it can take inputs and returns the input embedding. embedding is optional in this case.
  • start_tokens (optional) –

    A int Tensor of shape [batch_size], the start tokens. Used when decoding_strategy=”infer_greedy” or “infer_sample”, or when the helper specified in hparams is used.

    Example

    data = tx.data.MonoTextData(hparams)
    iterator = DataIterator(data)
    batch = iterator.get_next()
    
    bos_token_id = data.vocab.bos_token_id
    start_tokens=tf.ones_like(batch['length'])*bos_token_id
    
  • end_token (optional) – A int 0D Tensor, the token that marks end of decoding. Used when decoding_strategy=”infer_greedy” or “infer_sample”, or when the helper specified in hparams is used.
  • softmax_temperature (optional) – A float 0D Tensor, value to divide the logits by before computing the softmax. Larger values (above 1.0) result in more random samples. Must > 0. If None, 1.0 is used. Used when decoding_strategy=”infer_sample”.
  • max_decoding_length – A int scalar Tensor indicating the maximum allowed number of decoding steps. If None (default), either hparams[“max_decoding_length_train”] or hparams[“max_decoding_length_infer”] is used according to mode.
  • impute_finished (bool) – If True, then states for batch entries which are marked as finished get copied through and the corresponding outputs get zeroed out. This causes some slowdown at each time step, but ensures that the final state and outputs have the correct values and that backprop ignores time steps that were marked as finished.
  • output_time_major (bool) – If True, outputs are returned as time major tensors. If False (default), outputs are returned as batch major tensors.
  • input_time_major (optional) – Whether the inputs tensor is time major. Used when decoding_strategy=”train_greedy” or hparams-configured helper is used.
  • helper (optional) – An instance of texar.tf.modules.Helper that defines the decoding strategy. If given, decoding_strategy and helper configs in hparams are ignored.
  • mode (str, optional) – A string taking value in tf.estimator.ModeKeys. If TRAIN, training related hyperparameters are used (e.g., hparams[‘max_decoding_length_train’]), otherwise, inference related hyperparameters are used (e.g., hparams[‘max_decoding_length_infer’]). If None (default), TRAIN mode is used.
  • **kwargs – Other keyword arguments for constructing helpers defined by hparams[“helper_trainn”] or hparams[“helper_infer”].
Returns:

(outputs, final_state, sequence_lengths), where

  • `outputs`: an object containing the decoder output on all time steps.
  • `final_state`: is the cell state of the final time step.
  • `sequence_lengths`: is an int Tensor of shape [batch_size] containing the length of each sample.

static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

The hyperparameters are the same as in default_hparams() of BasicRNNDecoder, except that the default “name” here is “rnn_decoder”.

batch_size

The batch size of input values.

cell

The RNN cell.

zero_state(batch_size, dtype)[source]

Zero state of the RNN cell. Equivalent to decoder.cell.zero_state.

state_size

The state size of decoder cell. Equivalent to decoder.cell.state_size.

vocab_size

The vocab size.

hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

output_layer

The output layer.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

BasicRNNDecoder

class texar.tf.modules.BasicRNNDecoder(cell=None, cell_dropout_mode=None, vocab_size=None, output_layer=None, hparams=None)[source]

Basic RNN decoder.

Parameters:
  • cell (RNNCell, optional) – An instance of RNNCell. If None (default), a cell is created as specified in hparams.
  • cell_dropout_mode (optional) – A Tensor taking value of tf.estimator.ModeKeys, which toggles dropout in the RNN cell (e.g., activates dropout in TRAIN mode). If None, global_mode() is used. Ignored if cell is given.
  • vocab_size (int, optional) – Vocabulary size. Required if output_layer is None.
  • output_layer (optional) –

    An output layer that transforms cell output to logits. This can be:

    • A callable layer, e.g., an instance of tf.layers.Layer.
    • A tensor. A dense layer will be created using the tensor as the kernel weights. The bias of the dense layer is determined by hparams.output_layer_bias. This can be used to tie the output layer with the input embedding matrix, as proposed in https://arxiv.org/pdf/1608.05859.pdf
    • None. A dense layer will be created based on attr:vocab_size and hparams.output_layer_bias.
    • If no output layer after the cell output is needed, set (vocab_size=None, output_layer=tf.identity).
  • hparams (dict, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.

See _build() for the inputs and outputs of the decoder. The decoder returns (outputs, final_state, sequence_lengths), where outputs is an instance of BasicRNNDecoderOutput.

Example

embedder = WordEmbedder(vocab_size=data.vocab.size)
decoder = BasicRNNDecoder(vocab_size=data.vocab.size)

# Training loss
outputs, _, _ = decoder(
    decoding_strategy='train_greedy',
    inputs=embedder(data_batch['text_ids']),
    sequence_length=data_batch['length']-1)

loss = tx.losses.sequence_sparse_softmax_cross_entropy(
    labels=data_batch['text_ids'][:, 1:],
    logits=outputs.logits,
    sequence_length=data_batch['length']-1)

# Inference sample
outputs, _, _ = decoder(
    decoding_strategy='infer_sample',
    start_tokens=[data.vocab.bos_token_id]*100,
    end_token=data.vocab.eos.token_id,
    embedding=embedder,
    max_decoding_length=60,
    mode=tf.estimator.ModeKeys.PREDICT)

sample_id = sess.run(outputs.sample_id)
sample_text = tx.utils.map_ids_to_strs(sample_id, data.vocab)
print(sample_text)
# [
#   the first sequence sample .
#   the second sequence sample .
#   ...
# ]
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
    "rnn_cell": default_rnn_cell_hparams(),
    "max_decoding_length_train": None,
    "max_decoding_length_infer": None,
    "helper_train": {
        "type": "TrainingHelper",
        "kwargs": {}
    }
    "helper_infer": {
        "type": "SampleEmbeddingHelper",
        "kwargs": {}
    }
    "name": "basic_rnn_decoder"
}

Here:

“rnn_cell”: dict
A dictionary of RNN cell hyperparameters. Ignored if cell is given to the decoder constructor. The default value is defined in default_rnn_cell_hparams().
“max_decoding_length_train”: int or None
Maximum allowed number of decoding steps in training mode. If None (default), decoding is performed until fully done, e.g., encountering the <EOS> token. Ignored if max_decoding_length is given when calling the decoder.
“max_decoding_length_infer”: int or None
Same as “max_decoding_length_train” but for inference mode.
“helper_train”: dict
The hyperparameters of the helper used in training. “type” can be a helper class, its name or module path, or a helper instance. If a class name is given, the class must be from module tf.contrib.seq2seq, texar.tf.modules, or texar.tf.custom. This is used only when both decoding_strategy and helper augments are None when calling the decoder. See _build() for more details.
“helper_infer”: dict
Same as “helper_train” but during inference mode.
“name”: str

Name of the decoder.

The default value is “basic_rnn_decoder”.

batch_size

The batch size of input values.

cell

The RNN cell.

hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

output_layer

The output layer.

state_size

The state size of decoder cell. Equivalent to decoder.cell.state_size.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

vocab_size

The vocab size.

zero_state(batch_size, dtype)

Zero state of the RNN cell. Equivalent to decoder.cell.zero_state.

BasicRNNDecoderOutput

class texar.tf.modules.BasicRNNDecoderOutput[source]

The outputs of basic RNN decoder that include both RNN outputs and sampled ids at each step. This is also used to store results of all the steps after decoding the whole sequence.

logits

The outputs of RNN (at each step/of all steps) by applying the output layer on cell outputs. E.g., in BasicRNNDecoder with default hyperparameters, this is a Tensor of shape [batch_size, max_time, vocab_size] after decoding the whole sequence.

sample_id

The sampled results (at each step/of all steps). E.g., in BasicRNNDecoder with decoding strategy of train_greedy, this is a Tensor of shape [batch_size, max_time] containing the sampled token indexes of all steps.

cell_output

The output of RNN cell (at each step/of all steps). This is the results prior to the output layer. E.g., in BasicRNNDecoder with default hyperparameters, this is a Tensor of shape [batch_size, max_time, cell_output_size] after decoding the whole sequence.

AttentionRNNDecoder

class texar.tf.modules.AttentionRNNDecoder(memory, memory_sequence_length=None, cell=None, cell_dropout_mode=None, vocab_size=None, output_layer=None, cell_input_fn=None, hparams=None)[source]

RNN decoder with attention mechanism.

Parameters:
  • memory – The memory to query, e.g., the output of an RNN encoder. This tensor should be shaped [batch_size, max_time, dim].
  • memory_sequence_length (optional) – A tensor of shape [batch_size] containing the sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths.
  • cell (RNNCell, optional) – An instance of RNNCell. If None, a cell is created as specified in hparams.
  • cell_dropout_mode (optional) – A Tensor taking value of tf.estimator.ModeKeys, which toggles dropout in the RNN cell (e.g., activates dropout in TRAIN mode). If None, global_mode() is used. Ignored if cell is given.
  • vocab_size (int, optional) – Vocabulary size. Required if output_layer is None.
  • output_layer (optional) –

    An output layer that transforms cell output to logits. This can be:

    • A callable layer, e.g., an instance of tf.layers.Layer.
    • A tensor. A dense layer will be created using the tensor as the kernel weights. The bias of the dense layer is determined by hparams.output_layer_bias. This can be used to tie the output layer with the input embedding matrix, as proposed in https://arxiv.org/pdf/1608.05859.pdf
    • None. A dense layer will be created based on attr:vocab_size and hparams.output_layer_bias.
    • If no output layer after the cell output is needed, set (vocab_size=None, output_layer=tf.identity).
  • cell_input_fn (callable, optional) – A callable that produces RNN cell inputs. If None (default), the default is used: lambda inputs, attention: tf.concat([inputs, attention], -1), which cancats regular RNN cell inputs with attentions.
  • hparams (dict, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.

See _build() for the inputs and outputs of the decoder. The decoder returns (outputs, final_state, sequence_lengths), where outputs is an instance of AttentionRNNDecoderOutput.

Example

# Encodes the source
enc_embedder = WordEmbedder(data.source_vocab.size, ...)
encoder = UnidirectionalRNNEncoder(...)

enc_outputs, _ = encoder(
    inputs=enc_embedder(data_batch['source_text_ids']),
    sequence_length=data_batch['source_length'])

# Decodes while attending to the source
dec_embedder = WordEmbedder(vocab_size=data.target_vocab.size, ...)
decoder = AttentionRNNDecoder(
    memory=enc_outputs,
    memory_sequence_length=data_batch['source_length'],
    vocab_size=data.target_vocab.size)

outputs, _, _ = decoder(
    decoding_strategy='train_greedy',
    inputs=dec_embedder(data_batch['target_text_ids']),
    sequence_length=data_batch['target_length']-1)
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values:

Common hyperparameters are the same as in BasicRNNDecoder. default_hparams(). Additional hyperparameters are for attention mechanism configuration.

{
    "attention": {
        "type": "LuongAttention",
        "kwargs": {
            "num_units": 256,
        },
        "attention_layer_size": None,
        "alignment_history": False,
        "output_attention": True,
    },
    # The following hyperparameters are the same as with
    # `BasicRNNDecoder`
    "rnn_cell": default_rnn_cell_hparams(),
    "max_decoding_length_train": None,
    "max_decoding_length_infer": None,
    "helper_train": {
        "type": "TrainingHelper",
        "kwargs": {}
    }
    "helper_infer": {
        "type": "SampleEmbeddingHelper",
        "kwargs": {}
    }
    "name": "attention_rnn_decoder"
}

Here:

“attention”: dict

Attention hyperparameters, including:

“type”: str or class or instance

The attention type. Can be an attention class, its name or module path, or a class instance. The class must be a subclass of TF AttentionMechanism. If class name is given, the class must be from modules tf.contrib.seq2seq or texar.tf.custom.

Example:

# class name
"type": "LuongAttention"
"type": "BahdanauAttention"
# module path
"type": "tf.contrib.seq2seq.BahdanauMonotonicAttention"
"type": "my_module.MyAttentionMechanismClass"
# class
"type": tf.contrib.seq2seq.LuongMonotonicAttention
# instance
"type": LuongAttention(...)
“kwargs”: dict

keyword arguments for the attention class constructor. Arguments memory and memory_sequence_length should not be specified here because they are given to the decoder constructor. Ignored if “type” is an attention class instance. For example

Example:

"type": "LuongAttention",
"kwargs": {
    "num_units": 256,
    "probability_fn": tf.nn.softmax
}

Here “probability_fn” can also be set to the string name or module path to a probability function.

“attention_layer_size”: int or None
The depth of the attention (output) layer. The context and cell output are fed into the attention layer to generate attention at each time step. If None (default), use the context as attention at each time step.
“alignment_history”: bool
whether to store alignment history from all time steps in the final output state. (Stored as a time major TensorArray on which you must call stack().)
“output_attention”: bool
If True (default), the output at each time step is the attention value. This is the behavior of Luong-style attention mechanisms. If False, the output at each time step is the output of cell. This is the beahvior of Bhadanau-style attention mechanisms. In both cases, the attention tensor is propagated to the next time step via the state and is used there. This flag only controls whether the attention mechanism is propagated up to the next cell in an RNN stack or to the top RNN output.
zero_state(batch_size, dtype)[source]

Returns zero state of the basic cell. Equivalent to decoder.cell._cell.zero_state.

wrapper_zero_state(batch_size, dtype)[source]

Returns zero state of the attention-wrapped cell. Equivalent to decoder.cell.zero_state.

state_size

The state size of the basic cell. Equivalent to decoder.cell._cell.state_size.

wrapper_state_size

The state size of the attention-wrapped cell. Equivalent to decoder.cell.state_size.

batch_size

The batch size of input values.

cell

The RNN cell.

hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

output_layer

The output layer.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

vocab_size

The vocab size.

AttentionRNNDecoderOutput

class texar.tf.modules.AttentionRNNDecoderOutput[source]

The outputs of attention RNN decoders that additionally include attention results.

logits

The outputs of RNN (at each step/of all steps) by applying the output layer on cell outputs. E.g., in AttentionRNNDecoder, this is a Tensor of shape [batch_size, max_time, vocab_size] after decoding.

sample_id

The sampled results (at each step/of all steps). E.g., in AttentionRNNDecoder with decoding strategy of train_greedy, this is a Tensor of shape [batch_size, max_time] containing the sampled token indexes of all steps.

cell_output

The output of RNN cell (at each step/of all steps). This is the results prior to the output layer. E.g., in AttentionRNNDecoder with default hyperparameters, this is a Tensor of shape [batch_size, max_time, cell_output_size] after decoding the whole sequence.

attention_scores

A single or tuple of Tensor(s) containing the alignments emitted (at the previous time step/of all time steps) for each attention mechanism.

attention_context

The attention emitted (at the previous time step/of all time steps).

GPT2Decoder

class texar.tf.modules.GPT2Decoder(pretrained_model_name=None, cache_dir=None, hparams=None)[source]

Raw GPT2 Transformer for decoding sequences. Please see PretrainedGPT2Mixin for a brief description of GPT2.

This module basically stacks WordEmbedder, PositionEmbedder, TransformerDecoder.

This module supports the architecture first proposed in (Radford et al.) GPT2.

Parameters:
  • pretrained_model_name (optional) – a str, the name of pre-trained model (e.g., gpt2-small). Please refer to PretrainedGPT2Mixin for all supported models. If None, the model name in hparams is used.
  • cache_dir (optional) – the path to a folder in which the pre-trained models will be cached. If None (default), a default directory (texar_data folder under user’s home directory) will be used.
  • hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparameter will be set to default values. See default_hparams() for the hyperparameter structure and default values.
_build(decoding_strategy='train_greedy', inputs=None, memory=None, memory_sequence_length=None, memory_attention_bias=None, beam_width=None, length_penalty=0.0, start_tokens=None, end_token=None, context=None, context_sequence_length=None, softmax_temperature=None, max_decoding_length=None, impute_finished=False, helper=None, mode=None)[source]

Performs decoding. Has exact the same interfaces with texar.tf.modules.TransformerDecoder._build() except inputs which is a tensor with shape [batch_size, max_time]. Please refer to it for the detailed usage.

static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

  • The decoder arch is determined by the constructor argument pretrained_model_name if it’s specified. In this case, hparams are ignored.
  • Otherwise, the encoder arch is determined by hparams[‘pretrained_model_name’] if it’s specified. All other configurations in hparams are ignored.
  • If the above two are None, the decoder arch is defined by the configurations in hparams and weights are randomly initialized.
{
    "name": "gpt2_decoder",
    "pretrained_model_name": "gpt2-small",
    "vocab_size": 50257,
    "context_size": 1024,
    "embedding_size": 768,
    "embed": {
        "dim": 768,
        "name": "word_embeddings"
    },
    "position_size": 1024,
    "position_embed": {
        "dim": 768,
        "name": "position_embeddings"
    },

    # hparams for TransformerDecoder
    "decoder": {
        "dim": 768,
        "num_blocks": 12,
        "use_gpt_config": True,
        "embedding_dropout": 0,
        "residual_dropout": 0,
        "multihead_attention": {
            "use_bias": True,
            "num_units": 768,
            "num_heads": 12,
            "dropout_rate": 0.0,
            "output_dim": 768
        },
        "initializer": {
            "type": "variance_scaling_initializer",
            "kwargs": {
                "factor": 1.0,
                "mode": "FAN_AVG",
                "uniform": True
            }
        },
        "poswise_feedforward": {
            "layers": [
                {
                    "type": "Dense",
                    "kwargs": {
                        "activation": "gelu",
                        "name": "intermediate",
                        "units": 3072,
                        "use_bias": True
                    }
                },
                {
                    "type": "Dense",
                    "kwargs": {
                        "activation": None,
                        "name": "output",
                        "units": 3072,
                        "use_bias": True
                    }
                }
            ],
            "name": "ffn"
        }
    },
    "name": "gpt2_decoder",
}

Here:

The default parameters are values for 124M GPT2 model.

“pretrained_model_name”: str or None
The name of the pre-trained GPT2 model. If None, the model will be randomly initialized.
“embed”: dict
Hyperparameters for word embedding layer.
“vocab_size”: int
The vocabulary size of inputs in GPT2Model.
“position_embed”: dict
Hyperparameters for position embedding layer.
“position_size”: int
The maximum sequence length that this model might ever be used with.
“name”: str
Name of the module.

beam_search_decode

texar.tf.modules.beam_search_decode(decoder_or_cell, embedding, start_tokens, end_token, beam_width, initial_state=None, tiled_initial_state=None, output_layer=None, length_penalty_weight=0.0, max_decoding_length=None, output_time_major=False, **kwargs)[source]

Performs beam search sampling decoding.

Parameters:
  • decoder_or_cell – An instance of subclass of RNNDecoderBase, or an instance of RNNCell. The decoder or RNN cell to perform decoding.
  • embedding – A callable that takes a vector tensor of indexes (e.g., an instance of subclass of EmbedderBase), or the params argument for tf.nn.embedding_lookup.
  • start_tokensint32 vector shaped [batch_size], the start tokens.
  • end_tokenint32 scalar, the token that marks end of decoding.
  • beam_width (int) – Python integer, the number of beams.
  • initial_state (optional) –

    Initial state of decoding. If None (default), zero state is used.

    The state must not be tiled with tile_batch. If you have an already-tiled initial state, use tiled_initial_state instead.

    In the case of attention RNN decoder, initial_state must not be an AttentionWrapperState. Instead, it must be a state of the wrapped RNNCell, which state will be wrapped into AttentionWrapperState automatically.

    Ignored if tiled_initial_state is given.

  • tiled_initial_state (optional) –

    Initial state that has been tiled (typicaly with tile_batch) so that the batch dimension has size batch_size * beam_width.

    In the case of attention RNN decoder, this can be either a state of the wrapped RNNCell, or an AttentionWrapperState.

    If not given, initial_state is used.

  • output_layer (optional) – A Layer instance to apply to the RNN output prior to storing the result or sampling. If None and decoder_or_cell is a decoder, the decoder’s output layer will be used.
  • length_penalty_weight – Float weight to penalize length. Disabled with 0.0 (default).
  • max_decoding_length (optional) – A int scalar Tensor indicating the maximum allowed number of decoding steps. If None (default), decoding will continue until the end token is encountered.
  • output_time_major (bool) – If True, outputs are returned as time major tensors. If False (default), outputs are returned as batch major tensors.
  • **kwargs – Other keyword arguments for dynamic_decode except argument maximum_iterations which is set to max_decoding_length.
Returns:

A tuple (outputs, final_state, sequence_length), where

Example

## Beam search with basic RNN decoder

embedder = WordEmbedder(vocab_size=data.vocab.size)
decoder = BasicRNNDecoder(vocab_size=data.vocab.size)

outputs, _, _, = beam_search_decode(
    decoder_or_cell=decoder,
    embedding=embedder,
    start_tokens=[data.vocab.bos_token_id] * 100,
    end_token=data.vocab.eos_token_id,
    beam_width=5,
    max_decoding_length=60)

sample_ids = sess.run(outputs.predicted_ids)
sample_text = tx.utils.map_ids_to_strs(sample_id[:,:,0], data.vocab)
print(sample_text)
# [
#   the first sequence sample .
#   the second sequence sample .
#   ...
# ]
## Beam search with attention RNN decoder

# Encodes the source
enc_embedder = WordEmbedder(data.source_vocab.size, ...)
encoder = UnidirectionalRNNEncoder(...)

enc_outputs, enc_state = encoder(
    inputs=enc_embedder(data_batch['source_text_ids']),
    sequence_length=data_batch['source_length'])

# Decodes while attending to the source
dec_embedder = WordEmbedder(vocab_size=data.target_vocab.size, ...)
decoder = AttentionRNNDecoder(
    memory=enc_outputs,
    memory_sequence_length=data_batch['source_length'],
    vocab_size=data.target_vocab.size)

# Beam search
outputs, _, _, = beam_search_decode(
    decoder_or_cell=decoder,
    embedding=dec_embedder,
    start_tokens=[data.vocab.bos_token_id] * 100,
    end_token=data.vocab.eos_token_id,
    beam_width=5,
    initial_state=enc_state,
    max_decoding_length=60)

TransformerDecoder

class texar.tf.modules.TransformerDecoder(vocab_size=None, output_layer=None, hparams=None)[source]

Transformer decoder that applies multi-head self-attention for sequence decoding.

It is a stack of MultiheadAttentionEncoder, FeedForwardNetwork and residual connections.

Parameters:
  • vocab_size (int, optional) – Vocabulary size. Required if output_layer is None.
  • output_layer (optional) –

    An output layer that transforms cell output to logits. This can be:

    • A callable layer, e.g., an instance of tf.layers.Layer.
    • A tensor. A dense layer will be created using the tensor as the kernel weights. The bias of the dense layer is determined by hparams.output_layer_bias. This can be used to tie the output layer with the input embedding matrix, as proposed in https://arxiv.org/pdf/1608.05859.pdf
    • None. A dense layer will be created based on attr:vocab_size and hparams.output_layer_bias.
    • If no output layer in the end is needed, set (vocab_size=None, output_layer=tf.identity).
_build(decoding_strategy='train_greedy', inputs=None, memory=None, memory_sequence_length=None, memory_attention_bias=None, beam_width=None, length_penalty=0.0, start_tokens=None, end_token=None, context=None, context_sequence_length=None, softmax_temperature=None, max_decoding_length=None, impute_finished=False, embedding=None, helper=None, mode=None)[source]

Performs decoding.

The interface is mostly the same with that of RNN decoders (see _build()). The main difference is that, here, sequence_length is not needed, and continuation generation is additionally supported.

The function provides 3 ways to specify the decoding method, with varying flexibility:

  1. The decoding_strategy argument.

    • “train_greedy”: decoding in teacher-forcing fashion (i.e., feeding ground truth to decode the next step), and for each step sample is obtained by taking the argmax of logits. Argument inputs is required for this strategy.
    • “infer_greedy”: decoding in inference fashion (i.e., feeding generated sample to decode the next step), and for each step sample is obtained by taking the argmax of logits. Arguments (start_tokens, end_token, embedding) are required for this strategy, and argument max_decoding_length is optional.
    • “infer_sample”: decoding in inference fashion, and for each step sample is obtained by random sampling from the logits. Arguments (start_tokens, end_token, embedding) are required for this strategy, and argument max_decoding_length is optional.
This argument is used only when arguments helper and beam_width are both None.
  1. The helper argument: An instance of subclass of texar.tf.modules.Helper. This provides a superset of decoding strategies than above. The interface is the same as in RNN decoders. Please refer to texar.tf.modules.RNNDecoderBase._build() for detailed usage and examples.

    Note that, here, though using a TrainingHelper corresponds to the “train_greedy” strategy above and will get the same output results, the implementation is slower than directly setting decoding_strategy = “train_greedy”.

    Argument max_decoding_length is optional.

  2. Beam search: set beam_width to use beam search decoding. Arguments (start_tokens, end_token) are required, and argument max_decoding_length is optional.

Parameters:
  • memory (optional) – The memory to attend, e.g., the output of an RNN encoder. A Tensor of shape [batch_size, memory_max_time, dim].
  • memory_sequence_length (optional) – A Tensor of shape [batch_size] containing the sequence lengths for the batch entries in memory. Used to create attention bias of memory_attention_bias is not given. Ignored if memory_attention_bias is provided.
  • memory_attention_bias (optional) – A Tensor of shape [batch_size, num_heads, memory_max_time, dim]. An attention bias typically sets the value of a padding position to a large negative value for masking. If not given, memory_sequence_length is used to automatically create an attention bias.
  • inputs (optional) – Input tensor for teacher forcing decoding, of shape [batch_size, target_max_time, emb_dim] containing the target sequence word embeddings. Used when decoding_strategy is set to “train_greedy”.
  • decoding_strategy (str) – A string specifying the decoding strategy, including “train_greedy”, “infer_greedy”, “infer_sample”. Different arguments are required based on the strategy. See above for details. Ignored if beam_width or helper is set.
  • beam_width (int) – Set to use beam search. If given, decoding_strategy is ignored.
  • length_penalty (float) – Length penalty coefficient used in beam search decoding. Refer to https://arxiv.org/abs/1609.08144 for more details. It Should be larger if longer sentences are wanted.
  • start_tokens (optional) – An int Tensor of shape [batch_size], containing the start tokens. Used when decoding_strategy = “infer_greedy” or “infer_sample”, or beam_width is set. Ignored when context is set.
  • end_token (optional) – An int 0D Tensor, the token that marks end of decoding. Used when decoding_strategy = “infer_greedy” or “infer_sample”, or beam_width is set.
  • context (optional) – An int Tensor of shape [batch_size, length], containing the starting tokens for decoding. If context is set, the start_tokens will be ignored.
  • context_sequence_length (optional) – specify the length of context.
  • softmax_temperature (optional) – A float 0D Tensor, value to divide the logits by before computing the softmax. Larger values (above 1.0) result in more random samples. Must > 0. If None, 1.0 is used. Used when decoding_strategy = “infer_sample”.
  • max_decoding_length (optional) – An int scalar Tensor indicating the maximum allowed number of decoding steps. If None (default), use “max_decoding_length” defined in hparams. Ignored in “train_greedy” decoding.
  • impute_finished (bool) – If True, then states for batch entries which are marked as finished get copied through and the corresponding outputs get zeroed out. This causes some slowdown at each time step, but ensures that the final state and outputs have the correct values and that backprop ignores time steps that were marked as finished. Ignored in “train_greedy” decoding.
  • embedding (optional) – Embedding used when “infer_greedy” or “infer_sample” decoding_strategy, or beam search, is used. This can be a callable or the params argument for embedding_lookup. If a callable, it can take a vector tensor of token ids, or take two arguments (ids, times), where ids is a vector tensor of token ids, and times is a vector tensor of time steps (i.e., position ids). The latter case can be used when attr:embedding is a combination of word embedding and position embedding.
  • helper (optional) – An instance of Helper that defines the decoding strategy. If given, decoding_strategy is ignored.
  • mode (optional) – A tensor taking value in tf.estimator.ModeKeys, including TRAIN, EVAL, and PREDICT. Controls dropout mode. If None (default), texar.tf.global_mode() is used.
Returns:

  • For “train_greedy” decoding, returns an instance of TransformerDecoderOutput which contains sample_id and logits.

  • For “infer_greedy” and “infer_sample” decoding or decoding with helper, returns a tuple (outputs, sequence_lengths), where outputs is an instance of TransformerDecoderOutput as in “train_greedy”, and sequence_lengths is a Tensor of shape [batch_size] containing the length of each sample.

  • For beam search decoding, returns a dict containing keys “sample_id” and “log_prob”.

    • ”sample_id” is an int Tensor of shape [batch_size, max_time, beam_width] containing generated token indexes. sample_id[:,:,0] is the highest-probable sample.
    • ”log_prob” is a float Tensor of shape [batch_size, beam_width] containing the log probability of each sequence sample.

static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
    # Same as in TransformerEncoder
    "num_blocks": 6,
    "dim": 512,
    "embedding_dropout": 0.1,
    "residual_dropout": 0.1,
    "poswise_feedforward": default_transformer_poswise_net_hparams,
    "multihead_attention": {
        'name': 'multihead_attention',
        'num_units': 512,
        'output_dim': 512,
        'num_heads': 8,
        'dropout_rate': 0.1,
        'output_dim': 512,
        'use_bias': False,
    },
    "initializer": None,
    "name": "transformer_decoder"
    # Additional for TransformerDecoder
    "embedding_tie": True,
    "output_layer_bias": False,
    "max_decoding_length": int(1e10),
}

Here:

“num_blocks”: int
Number of stacked blocks.
“dim”: int
Hidden dimension of the encoder.
“embedding_dropout”: float
Dropout rate of the input word and position embeddings.
“residual_dropout”: float
Dropout rate of the residual connections.
“poswise_feedforward”: dict

Hyperparameters for a feed-forward network used in residual connections. Make sure the dimension of the output tensor is equal to dim.

See default_transformer_poswise_net_hparams() for details.

“multihead_attention”: dict

Hyperparameters for the multihead attention strategy. Make sure the output_dim in this module is equal to dim.

See default_hparams() for details.

“initializer”: dict, optional
Hyperparameters of the default initializer that initializes variables created in this module. See get_initializer() for details.
“output_layer_bias”: bool
Whether to use bias to the output layer. Used only if output_layer is None when constructing the class instance.
“max_decoding_length”: int

The maximum allowed number of decoding steps. Set to a very large number of avoid the length constraint. Ignored if provided in _build() or “train_greedy” decoding is used.

Length penalty coefficient. Refer to https://arxiv.org/abs/1609.08144 for more details.

“name”: str
Name of the module.
batch_size

The batch size of input values.

output_size

Output size of one step.

output_dtype

Types of output of one step.

initialize(name=None)[source]

Called before any decoding iterations.

This methods computes initial input values and initial state (i.e. cache).

Parameters:name – Name scope for any created operations.
Returns:(finished, initial_inputs, initial_state), representing initial values of finished flags, inputs and state (i.e. cache).
step(time, inputs, state, name=None)[source]

Called per step of decoding.

Parameters:
  • time – Scalar int32 tensor. Current step number.
  • inputs – Input tensor for this time step.
  • state – State (i.e. cache) from previous time step.
  • name – Name scope for any created operations.
Returns:

(outputs, next_state, next_inputs, finished). outputs is an object containing the decoder output, next_state is the state (i.e. cache), next_inputs is the tensor that should be used as input for the next step, finished is a boolean tensor telling whether the sequence is complete, for each sequence in the batch.

vocab_size

The vocab size.

TransformerDecoderOutput

class texar.tf.modules.TransformerDecoderOutput[source]

The output of TransformerDecoder.

logits

A float Tensor of shape [batch_size, max_time, vocab_size] containing the logits.

sample_id

An int Tensor of shape [batch_size, max_time] containing the sampled token indexes.

Helper

class texar.tf.modules.Helper[source]

Interface for implementing different decoding strategies in RNN decoders and Transformer decoder.

Adapted from the tensorflow.contrib.seq2seq package.

batch_size

Batch size of tensor returned by sample.

Returns a scalar int32 tensor.

sample_ids_shape

Shape of tensor returned by sample, excluding the batch dimension.

Returns a TensorShape.

sample_ids_dtype

DType of tensor returned by sample.

Returns a DType.

initialize(name=None)[source]

Returns (initial_finished, initial_inputs).

sample(time, outputs, state, name=None)[source]

Returns sample_ids.

next_inputs(time, outputs, state, sample_ids, name=None)[source]

Returns (finished, next_inputs, next_state).

GreedyEmbeddingHelper

class texar.tf.modules.GreedyEmbeddingHelper(embedding, start_tokens, end_token)[source]

A helper for use during inference.

Uses the argmax of the output (treated as logits) and passes the result through an embedding layer to get the next input.

Note that for greedy decoding, Texar’s decoders provide a simpler interface by specifying decoding_strategy=’infer_greedy’ when calling a decoder (see, e.g.,, RNN decoder). In this case, use of GreedyEmbeddingHelper is not necessary.

batch_size

Batch size of tensor returned by sample.

Returns a scalar int32 tensor.

sample_ids_shape

Shape of tensor returned by sample, excluding the batch dimension.

Returns a TensorShape.

sample_ids_dtype

DType of tensor returned by sample.

Returns a DType.

initialize(name=None)[source]

Returns (initial_finished, initial_inputs).

sample(time, outputs, state, name=None)[source]

Gets a sample for one step.

next_inputs(time, outputs, state, sample_ids, name=None)[source]

Gets the inputs for next step.

SampleEmbeddingHelper

class texar.tf.modules.SampleEmbeddingHelper(embedding, start_tokens, end_token, softmax_temperature=None, seed=None)[source]

A helper for use during inference.

Uses sampling (from a distribution) instead of argmax and passes the result through an embedding layer to get the next input.

Note that for sample decoding, Texar’s decoders provide a simpler interface by specifying decoding_strategy=’infer_sample’ when calling a decoder (see, e.g.,, RNN decoder). In this case, use of SampleEmbeddingHelper is not necessary.

sample(time, outputs, state, name=None)[source]

Gets a sample for one step.

batch_size

Batch size of tensor returned by sample.

Returns a scalar int32 tensor.

initialize(name=None)

Returns (initial_finished, initial_inputs).

next_inputs(time, outputs, state, sample_ids, name=None)

Gets the inputs for next step.

sample_ids_dtype

DType of tensor returned by sample.

Returns a DType.

sample_ids_shape

Shape of tensor returned by sample, excluding the batch dimension.

Returns a TensorShape.

TopKSampleEmbeddingHelper

class texar.tf.modules.TopKSampleEmbeddingHelper(embedding, start_tokens, end_token, top_k=10, softmax_temperature=None, seed=None)[source]

A helper for use during inference.

Samples from top_k most likely candidates from a vocab distribution, and passes the result through an embedding layer to get the next input.

sample(time, outputs, state, name=None)[source]

Gets a sample for one step.

batch_size

Batch size of tensor returned by sample.

Returns a scalar int32 tensor.

initialize(name=None)

Returns (initial_finished, initial_inputs).

next_inputs(time, outputs, state, sample_ids, name=None)

Gets the inputs for next step.

sample_ids_dtype

DType of tensor returned by sample.

Returns a DType.

sample_ids_shape

Shape of tensor returned by sample, excluding the batch dimension.

Returns a TensorShape.

SoftmaxEmbeddingHelper

class texar.tf.modules.SoftmaxEmbeddingHelper(embedding, start_tokens, end_token, tau, embedding_size=None, stop_gradient=False, use_finish=True)[source]

A helper that feeds softmax probabilities over vocabulary to the next step. Uses the softmax probability vector to pass through word embeddings to get the next input (i.e., a mixed word embedding).

A subclass of Helper. Used as a helper to RNNDecoderBase _build() in inference mode.

Parameters:
  • embedding – A callable or the params argument for tf.nn.embedding_lookup. If a callable, it can take a float tensor named soft_ids which is a distribution over indexes. For example, the shape of the tensor is typically [batch_size, vocab_size]. The callable can also take two arguments (soft_ids, times), where soft_ids is as above, and times is an int vector tensor of current time steps (i.e., position ids). The latter case can be used when attr:embedding is a combination of word embedding and position embedding.
  • start_tokens – An int tensor shaped [batch_size]. The start tokens.
  • end_token – An int scalar tensor. The token that marks end of decoding.
  • tau – A float scalar tensor, the softmax temperature.
  • embedding_size (optional) – An int scalar tensor, the number of embedding vectors. Usually it is the vocab size. Required if embedding is a callable.
  • stop_gradient (bool) – Whether to stop the gradient backpropagation when feeding softmax vector to the next step.
  • use_finish (bool) – Whether to stop decoding once end_token is generated. If False, decoding will continue until max_decoding_length of the decoder is reached.
batch_size

Batch size of tensor returned by sample.

Returns a scalar int32 tensor.

sample_ids_dtype

DType of tensor returned by sample.

Returns a DType.

sample_ids_shape

Shape of tensor returned by sample, excluding the batch dimension.

Returns a TensorShape.

initialize(name=None)[source]

Returns (initial_finished, initial_inputs).

sample(time, outputs, state, name=None)[source]

Returns sample_id which is softmax distributions over vocabulary with temperature tau. Shape = [batch_size, vocab_size]

next_inputs(time, outputs, state, sample_ids, name=None)[source]

Returns (finished, next_inputs, next_state).

GumbelSoftmaxEmbeddingHelper

class texar.tf.modules.GumbelSoftmaxEmbeddingHelper(embedding, start_tokens, end_token, tau, embedding_size=None, straight_through=False, stop_gradient=False, use_finish=True)[source]

A helper that feeds gumbel softmax sample to the next step. Uses the gumbel softmax vector to pass through word embeddings to get the next input (i.e., a mixed word embedding).

A subclass of Helper. Used as a helper to RNNDecoderBase _build() in inference mode.

Same as SoftmaxEmbeddingHelper except that here gumbel softmax (instead of softmax) is used.

Parameters:
  • embedding – A callable or the params argument for tf.nn.embedding_lookup. If a callable, it can take a float tensor named soft_ids which is a distribution over indexes. For example, the shape of the tensor is typically [batch_size, vocab_size]. The callable can also take two arguments (soft_ids, times), where soft_ids is as above, and times is an int vector tensor of current time steps (i.e., position ids). The latter case can be used when attr:embedding is a combination of word embedding and position embedding.
  • start_tokens – An int tensor shaped [batch_size]. The start tokens.
  • end_token – An int scalar tensor. The token that marks end of decoding.
  • tau – A float scalar tensor, the softmax temperature.
  • embedding_size (optional) – An int scalar tensor, the number of embedding vectors. Usually it is the vocab size. Required if embedding is a callable.
  • straight_through (bool) – Whether to use straight through gradient between time steps. If True, a single token with highest probability (i.e., greedy sample) is fed to the next step and gradient is computed using straight through. If False (default), the soft gumbel-softmax distribution is fed to the next step.
  • stop_gradient (bool) – Whether to stop the gradient backpropagation when feeding softmax vector to the next step.
  • use_finish (bool) – Whether to stop decoding once end_token is generated. If False, decoding will continue until max_decoding_length of the decoder is reached.
batch_size

Batch size of tensor returned by sample.

Returns a scalar int32 tensor.

initialize(name=None)

Returns (initial_finished, initial_inputs).

next_inputs(time, outputs, state, sample_ids, name=None)

Returns (finished, next_inputs, next_state).

sample_ids_dtype

DType of tensor returned by sample.

Returns a DType.

sample_ids_shape

Shape of tensor returned by sample, excluding the batch dimension.

Returns a TensorShape.

sample(time, outputs, state, name=None)[source]

Returns sample_id of shape [batch_size, vocab_size]. If straight_through is False, this is gumbel softmax distributions over vocabulary with temperature tau. If straight_through is True, this is one-hot vectors of the greedy samples.

TrainingHelper

class texar.tf.modules.TrainingHelper(inputs, sequence_length, time_major=False, name=None)[source]

A helper for use during training. Performs teacher-forcing decoding.

Returned sample_ids are the argmax of the RNN output logits.

Note that for teacher-forcing decoding, Texar’s decoders provide a simpler interface by specifying decoding_strategy=’train_greedy’ when calling a decoder (see, e.g.,, RNN decoder). In this case, use of TrainingHelper is not necessary.

batch_size

Batch size of tensor returned by sample.

Returns a scalar int32 tensor.

sample_ids_shape

Shape of tensor returned by sample, excluding the batch dimension.

Returns a TensorShape.

sample_ids_dtype

DType of tensor returned by sample.

Returns a DType.

initialize(name=None)[source]

Returns (initial_finished, initial_inputs).

sample(time, outputs, name=None, **unused_kwargs)[source]

Gets a sample for one step.

next_inputs(time, outputs, state, name=None, **unused_kwargs)[source]

Gets the inputs for next step.

ScheduledEmbeddingTrainingHelper

class texar.tf.modules.ScheduledEmbeddingTrainingHelper(inputs, sequence_length, embedding, sampling_probability, time_major=False, seed=None, scheduling_seed=None, name=None)[source]

A training helper that adds scheduled sampling.

Returns -1s for sample_ids where no sampling took place; valid sample id values elsewhere.

initialize(name=None)[source]

Returns (initial_finished, initial_inputs).

sample(time, outputs, state, name=None)[source]

Gets a sample for one step.

next_inputs(time, outputs, state, sample_ids, name=None)[source]

Gets the outputs for next step.

batch_size

Batch size of tensor returned by sample.

Returns a scalar int32 tensor.

sample_ids_dtype

DType of tensor returned by sample.

Returns a DType.

sample_ids_shape

Shape of tensor returned by sample, excluding the batch dimension.

Returns a TensorShape.

ScheduledOutputTrainingHelper

class texar.tf.modules.ScheduledOutputTrainingHelper(inputs, sequence_length, sampling_probability, time_major=False, seed=None, next_inputs_fn=None, auxiliary_inputs=None, name=None)[source]

A training helper that adds scheduled sampling directly to outputs.

Returns False for sample_ids where no sampling took place; True elsewhere.

initialize(name=None)[source]

Returns (initial_finished, initial_inputs).

sample(time, outputs, state, name=None)[source]

Gets a sample for one step.

next_inputs(time, outputs, state, sample_ids, name=None)[source]

Gets the next inputs for next step.

batch_size

Batch size of tensor returned by sample.

Returns a scalar int32 tensor.

sample_ids_dtype

DType of tensor returned by sample.

Returns a DType.

sample_ids_shape

Shape of tensor returned by sample, excluding the batch dimension.

Returns a TensorShape.

InferenceHelper

class texar.tf.modules.InferenceHelper(sample_fn, sample_shape, sample_dtype, start_inputs, end_fn, next_inputs_fn=None)[source]

A helper to use during inference with a custom sampling function.

batch_size

Batch size of tensor returned by sample.

Returns a scalar int32 tensor.

sample_ids_shape

Shape of tensor returned by sample, excluding the batch dimension.

Returns a TensorShape.

sample_ids_dtype

DType of tensor returned by sample.

Returns a DType.

initialize(name=None)[source]

Returns (initial_finished, initial_inputs).

sample(time, outputs, state, name=None)[source]

Gets a sample for one step.

next_inputs(time, outputs, state, sample_ids, name=None)[source]

Gets the outputs for next step.

CustomHelper

class texar.tf.modules.CustomHelper(initialize_fn, sample_fn, next_inputs_fn, sample_ids_shape=None, sample_ids_dtype=None)[source]

Base abstract class that allows the user to customize decoding.

batch_size

Batch size of tensor returned by sample.

Returns a scalar int32 tensor.

sample_ids_shape

Shape of tensor returned by sample, excluding the batch dimension.

Returns a TensorShape.

sample_ids_dtype

DType of tensor returned by sample.

Returns a DType.

initialize(name=None)[source]

Returns (initial_finished, initial_inputs).

sample(time, outputs, state, name=None)[source]

Returns sample_ids.

next_inputs(time, outputs, state, sample_ids, name=None)[source]

Returns (finished, next_inputs, next_state).

get_helper

texar.tf.modules.get_helper(helper_type, inputs=None, sequence_length=None, embedding=None, start_tokens=None, end_token=None, **kwargs)[source]

Creates a Helper instance.

Parameters:
  • helper_type – A Helper class, its name or module path, or a class instance. If a class instance is given, it is returned directly.
  • inputs (optional) – Inputs to the RNN decoder, e.g., ground truth tokens for teacher forcing decoding.
  • sequence_length (optional) – A 1D int Tensor containing the sequence length of inputs.
  • embedding (optional) – A callable that takes a vector tensor of indexes (e.g., an instance of subclass of EmbedderBase), or the params argument for embedding_lookup (e.g., the embedding Tensor).
  • start_tokens (optional) – A int Tensor of shape [batch_size], the start tokens.
  • end_token (optional) – A int 0D Tensor, the token that marks end of decoding.
  • **kwargs – Additional keyword arguments for constructing the helper.
Returns:

A helper instance.

Classifiers

Conv1DClassifier

class texar.tf.modules.Conv1DClassifier(hparams=None)[source]

Simple Conv-1D classifier. This is a combination of the Conv1DEncoder with a classification layer.

Parameters:hparams (dict, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.

Example

clas = Conv1DClassifier(hparams={'num_classes': 10})

inputs = tf.random_uniform([64, 20, 256])
logits, pred = clas(inputs)
# logits == Tensor of shape [64, 10]
# pred   == Tensor of shape [64]
_build(inputs, sequence_length=None, dtype=None, mode=None)[source]

Feeds the inputs through the network and makes classification.

The arguments are the same as in Conv1DEncoder.

The predictions of binary classification (“num_classes”=1) and multi-way classification (“num_classes”>1) are different, as explained below.

Parameters:
  • inputs – The inputs to the network, which is a 3D tensor. See Conv1DEncoder for more details.
  • sequence_length (optional) – An int tensor of shape [batch_size] containing the length of each element in inputs. If given, time steps beyond the length will first be masked out before feeding to the layers.
  • dtype (optional) – Type of the inputs. If not provided, infers from inputs automatically.
  • mode (optional) – A tensor taking value in tf.estimator.ModeKeys, including TRAIN, EVAL, and PREDICT. If None, texar.tf.global_mode() is used.
Returns:

A tuple (logits, pred), where

  • `logits` is a Tensor of shape [batch_size, num_classes] for num_classes >1, and [batch_size] for num_classes =1 (i.e., binary classification).
  • `pred` is the prediction, a Tensor of shape [batch_size] and type tf.int64. For binary classification, the standard sigmoid function is used for prediction, and the class labels are {0, 1}.

static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
    # (1) Same hyperparameters as in Conv1DEncoder
    ...

    # (2) Additional hyperparameters
    "num_classes": 2,
    "logit_layer_kwargs": {
        "use_bias": False
    },
    "name": "conv1d_classifier"
}

Here:

1. Same hyperparameters as in Conv1DEncoder. See the default_hparams(). An instance of Conv1DEncoder is created for feature extraction.

  1. Additional hyperparameters:

    “num_classes”: int

    Number of classes:

    • If `> 0`, an additional Dense layer is appended to the encoder to compute the logits over classes.
    • If `<= 0`, no dense layer is appended. The number of classes is assumed to be the final dense layer size of the encoder.
    “logit_layer_kwargs”: dict

    Keyword arguments for the logit Dense layer constructor, except for argument “units” which is set to “num_classes”. Ignored if no extra logit layer is appended.

    “name”: str

    Name of the classifier.

trainable_variables

The list of trainable variables of the module.

num_classes

The number of classes.

nn

The classifier neural network.

has_layer(layer_name)[source]

Returns True if the network with the name exists. Returns False otherwise.

Parameters:layer_name (str) – Name of the layer.
layer_by_name(layer_name)[source]

Returns the layer with the name. Returns ‘None’ if the layer name does not exist.

Parameters:layer_name (str) – Name of the layer.
layers_by_name

A dictionary mapping layer names to the layers.

layers

A list of the layers.

layer_names

A list of uniquified layer names.

layer_outputs_by_name(layer_name)[source]

Returns the output tensors of the layer with the specified name. Returns None if the layer name does not exist.

Parameters:layer_name (str) – Name of the layer.
layer_outputs

A list containing output tensors of each layer.

hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

variable_scope

The variable scope of the module.

UnidirectionalRNNClassifier

class texar.tf.modules.UnidirectionalRNNClassifier(cell=None, cell_dropout_mode=None, output_layer=None, hparams=None)[source]

One directional RNN classifier. This is a combination of the UnidirectionalRNNEncoder with a classification layer. Both step-wise classification and sequence-level classification are supported, specified in hparams.

Arguments are the same as in UnidirectionalRNNEncoder.

Parameters:
  • cell – (RNNCell, optional) If not specified, a cell is created as specified in hparams["rnn_cell"].
  • cell_dropout_mode (optional) – A Tensor taking value of tf.estimator.ModeKeys, which toggles dropout in the RNN cell (e.g., activates dropout in TRAIN mode). If None, global_mode() is used. Ignored if cell is given.
  • output_layer (optional) – An instance of tf.layers.Layer. Applies to the RNN cell output of each step. If None (default), the output layer is created as specified in hparams["output_layer"].
  • hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.
_build(inputs, sequence_length=None, initial_state=None, time_major=False, mode=None, **kwargs)[source]

Feeds the inputs through the network and makes classification.

The arguments are the same as in UnidirectionalRNNEncoder.

Parameters:
  • inputs – A 3D Tensor of shape [batch_size, max_time, dim]. The first two dimensions batch_size and max_time may be exchanged if time_major=True is specified.
  • sequence_length (optional) – A 1D int tensor of shape [batch_size]. Sequence lengths of the batch inputs. Used to copy-through state and zero-out outputs when past a batch element’s sequence length.
  • initial_state (optional) – Initial state of the RNN.
  • time_major (bool) – The shape format of the inputs and outputs Tensors. If True, these tensors are of shape [max_time, batch_size, depth]. If False (default), these tensors are of shape [batch_size, max_time, depth].
  • mode (optional) – A tensor taking value in tf.estimator.ModeKeys, including TRAIN, EVAL, and PREDICT. Controls output layer dropout if the output layer is specified with hparams. If None (default), texar.tf.global_mode() is used.
  • return_cell_output (bool) – Whether to return the output of the RNN cell. This is the results prior to the output layer.
  • **kwargs – Optional keyword arguments of tf.nn.dynamic_rnn, such as swap_memory, dtype, parallel_iterations, etc.
Returns:

A tuple (logits, pred), containing the logits over classes and the predictions, respectively.

  • If “clas_strategy”==”final_time” or “all_time”

    • If “num_classes”==1, logits and pred are of both shape [batch_size]
    • If “num_classes”>1, logits is of shape [batch_size, num_classes] and pred is of shape [batch_size].
  • If “clas_strategy”==”time_wise”,

    • If “num_classes”==1, logits and pred are of both shape [batch_size, max_time]
    • If “num_classes”>1, logits is of shape [batch_size, max_time, num_classes] and pred is of shape [batch_size, max_time].
    • If time_major is True, the batch and time dimensions are exchanged.

static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
    # (1) Same hyperparameters as in UnidirectionalRNNEncoder
    ...

    # (2) Additional hyperparameters
    "num_classes": 2,
    "logit_layer_kwargs": None,
    "clas_strategy": "final_time",
    "max_seq_length": None,
    "name": "unidirectional_rnn_classifier"
}

Here:

1. Same hyperparameters as in UnidirectionalRNNEncoder. See the default_hparams(). An instance of UnidirectionalRNNEncoder is created for feature extraction.

  1. Additional hyperparameters:

    “num_classes”: int

    Number of classes:

    • If `> 0`, an additional Dense layer is appended to the encoder to compute the logits over classes.
    • If `<= 0`, no dense layer is appended. The number of classes is assumed to be the final dense layer size of the encoder.
    “logit_layer_kwargs”: dict

    Keyword arguments for the logit Dense layer constructor, except for argument “units” which is set to “num_classes”. Ignored if no extra logit layer is appended.

    “clas_strategy”: str

    The classification strategy, one of:

    • “final_time”: Sequence-leve classification based on the output of the final time step. One sequence has one class.
    • “all_time”: Sequence-level classification based on the output of all time steps. One sequence has one class.
    • “time_wise”: Step-wise classfication, i.e., make classification for each time step based on its output.
    “max_seq_length”: int, optional

    Maximum possible length of input sequences. Required if “clas_strategy” is “all_time”.

    “name”: str

    Name of the classifier.

hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

num_classes

The number of classes, specified in hparams.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

BertClassifier

class texar.tf.modules.BERTClassifier(pretrained_model_name=None, cache_dir=None, hparams=None)[source]

Classifier based on BERT modules. Please see PretrainedBERTMixin for a brief description of BERT.

This is a combination of the BertEncoder with a classification layer. Both step-wise classification and sequence-level classification are supported, specified in hparams.

Arguments are the same as in BERTEncoder.

Parameters:
  • pretrained_model_name (optional) – a str, the name of pre-trained model (e.g., bert-base-uncased). Please refer to PretrainedBERTMixin for all supported models. If None, the model name in hparams is used.
  • cache_dir (optional) – the path to a folder in which the pre-trained models will be cached. If None (default), a default directory (texar_data folder under user’s home directory) will be used.
  • hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparameters will be set to default values. See default_hparams() for the hyperparameter structure and default values.
_build(inputs, sequence_length=None, segment_ids=None, mode=None, **kwargs)[source]

Feeds the inputs through the network and makes classification.

The arguments are the same as in BertEncoder.

Parameters:
  • inputs – A 2D Tensor of shape [batch_size, max_time], containing the token ids of tokens in input sequences.
  • sequence_length (optional) – A 1D Tensor of shape [batch_size]. Input tokens beyond respective sequence lengths are masked out automatically.
  • segment_ids (optional) – A 2D Tensor of shape [batch_size, max_time], containing the segment ids of tokens in input sequences. If None (default), a tensor with all elements set to zero is used.
  • mode (optional) – A tensor taking value in tf.estimator.ModeKeys, including TRAIN, EVAL, and PREDICT. Used to toggle dropout. If None (default), texar.tf.global_mode() is used.
  • **kwargs – Keyword arguments.
Returns:

A tuple (logits, pred), containing the logits over classes and the predictions, respectively.

  • If “clas_strategy”==”cls_time” or “all_time”

    • If “num_classes”==1, logits and pred are of both shape [batch_size]
    • If “num_classes”>1, logits is of shape [batch_size, num_classes] and pred is of shape [batch_size].
  • If “clas_strategy”==”time_wise”,

    • If “num_classes”==1, logits and pred are of both shape [batch_size, max_time]
    • If “num_classes”>1, logits is of shape [batch_size, max_time, num_classes] and pred is of shape [batch_size, max_time].

static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
    # (1) Same hyperparameters as in BertEncoder
    ...
    # (2) Additional hyperparameters
    "num_classes": 2,
    "logit_layer_kwargs": None,
    "clas_strategy": "cls_time",
    "max_seq_length": None,
    "dropout": 0.1,
    "name": "bert_classifier"
}

Here:

1. Same hyperparameters as in BertEncoder. See the default_hparams(). An instance of BertEncoder is created for feature extraction.

  1. Additional hyperparameters:

    “num_classes”: int

    Number of classes:

    • If > 0, an additional Dense layer is appended to the encoder to compute the logits over classes.
    • If <= 0, no dense layer is appended. The number of classes is assumed to be the final dense layer size of the encoder.
    “logit_layer_kwargs”: dict

    Keyword arguments for the logit Dense layer constructor, except for argument “units” which is set to num_classes. Ignored if no extra logit layer is appended.

    “clas_strategy”: str

    The classification strategy, one of:

    • cls_time: Sequence-level classification based on the output of the first time step (which is the CLS token). Each sequence has a class.
    • all_time: Sequence-level classification based on the output of all time steps. Each sequence has a class.
    • time_wise: Step-wise classification, i.e., make classification for each time step based on its output.
    “max_seq_length”: int, optional

    Maximum possible length of input sequences. Required if clas_strategy is all_time.

    “dropout”: float

    The dropout rate of the BERT encoder output.

    “name”: str

    Name of the classifier.

classmethod download_checkpoint(pretrained_model_name, cache_dir=None)

Download the specified pre-trained checkpoint, and return the directory in which the checkpoint is cached.

Parameters:
  • pretrained_model_name (str) – Name of the model checkpoint.
  • cache_dir (str, optional) – Path to the cache directory. If None, uses the default directory (user’s home directory).
Returns:

Path to the cache directory.

hparams

An HParams instance. The hyperparameters of the module.

load_pretrained_config(pretrained_model_name=None, cache_dir=None, hparams=None)

Load paths and configurations of the pre-trained model.

Parameters:
  • pretrained_model_name (optional) – A str with the name of a pre-trained model to load. If None, will use the model name in hparams.
  • cache_dir (optional) – The path to a folder in which the pre-trained models will be cached. If None (default), a default directory will be used.
  • hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparameter will be set to default values. See default_hparams() for the hyperparameter structure and default values.
name

The uniquified name of the module.

reset_parameters()

Initialize parameters of the pre-trained model. This method is only called if pre-trained checkpoints are not loaded.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

XLNetClassifier

class texar.tf.modules.XLNetClassifier(pretrained_model_name=None, cache_dir=None, hparams=None)[source]

Classifier based on XLNet modules. Please see PretrainedXLNetMixin for a brief description of XLNet.

This is a combination of the XLNetEncoder with a classification layer. Both step-wise classification and sequence-level classification are supported, specified in hparams.

Arguments are the same as in XLNetEncoder.

Parameters:
  • pretrained_model_name (optional) – a str, the name of pre-trained model (e.g., xlnet-based-cased). Please refer to PretrainedXLNetMixin for all supported models. If None, the model name in hparams is used.
  • cache_dir (optional) – the path to a folder in which the pre-trained models will be cached. If None (default), a default directory (texar_data folder under user’s home directory) will be used.
  • hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparameters will be set to default values. See default_hparams() for the hyperparameter structure and default values.
_build(token_ids, segment_ids=None, input_mask=None, mode=None)[source]

Feeds the inputs through the network and makes classification.

Parameters:
  • token_ids – Shape [batch_size, max_time].
  • segment_ids – Shape [batch_size, max_time].
  • input_mask – Float tensor of shape [batch_size, max_time]. Note that positions with value 1 are masked out.
  • mode (optional) – A tensor taking value in tf.estimator.ModeKeys, including TRAIN, EVAL, and PREDICT. Used to toggle dropout. If None (default), texar.tf.global_mode() is used.
Returns:

A tuple (logits, preds), containing the logits over classes and the predictions, respectively.

  • If clas_strategy is cls_time or all_time:

    • If num_classes == 1, logits and pred are both of shape [batch_size].
    • If num_classes > 1, logits is of shape [batch_size, num_classes] and pred is of shape [batch_size].
  • If clas_strategy is time_wise:

    • num_classes == 1, logits and pred are both of shape [batch_size, max_time].
    • If num_classes > 1, logits is of shape [batch_size, max_time, num_classes] and pred is of shape [batch_size, max_time].

static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
    # (1) Same hyperparameters as in XLNetEncoder
    ...
    # (2) Additional hyperparameters
    "clas_strategy": "cls_time",
    "use_projection": True,
    "num_classes": 2,
    "logit_layer_kwargs": None,
    "name": "xlnet_classifier",
}

Here:

  1. Same hyperparameters as in

    XLNetEncoder. See the default_hparams(). An instance of XLNetEncoder is created for feature extraction.

  2. Additional hyperparameters:

    “clas_strategy”: str

    The classification strategy, one of:

    • cls_time: Sequence-level classification based on the output of the last time step (which is the CLS token). Each sequence has a class.
    • all_time: Sequence-level classification based on the output of all time steps. Each sequence has a class.
    • time_wise: Step-wise classification, i.e., make classification for each time step based on its output.
    “use_projection”: bool

    If True, an additional Dense layer is added after the summary step.

    “num_classes”: int

    Number of classes:

    • If > 0, an additional dense layer is appended to the encoder to compute the logits over classes.
    • If <= 0, no dense layer is appended. The number of classes is assumed to be the final dense layer size of the encoder.
    “logit_layer_kwargs” : dict

    Keyword arguments for the logit Dense layer constructor, except for argument “units” which is set to “num_classes”. Ignored if no extra logit layer is appended.

    “name”: str

    Name of the classifier.

param_groups(lr=None, lr_layer_scale=1.0, decay_base_params=False)[source]

Create parameter groups for optimizers. When lr_layer_decay_rate is not 1.0, parameters from each layer form separate groups with different base learning rates.

This method should be called before applying gradients to the variables through the optimizer. Particularly, after calling the optimizer’s compute_gradients method, the user can call this method to get variable-specific learning rates for the network. The gradients for each variables can then be scaled accordingly. These scaled gradients are finally applied by calling optimizer’s apply_gradients method.

Parameters:
  • lr (float) – The learning rate. Can be omitted if lr_layer_decay_rate is 1.0.
  • lr_layer_scale (float) – Per-layer LR scaling rate. The i-th layer will be scaled by lr_layer_scale ^ (num_layers - i - 1).
  • decay_base_params (bool) – If True, treat non-layer parameters (e.g. embeddings) as if they’re in layer 0. If False, these parameters are not scaled.

Returns: A dict mapping tensorflow variables to their learning rates.

Regressors

XLNetRegressor

class texar.tf.modules.XLNetRegressor(pretrained_model_name=None, cache_dir=None, hparams=None)[source]

Regressor based on XLNet modules. Please see PretrainedXLNetMixin for a brief description of XLNet.

This is a combination of the XLNetEncoder with a classification layer. Both step-wise classification and sequence-level classification are supported, specified in hparams.

Arguments are the same as in XLNetEncoder.

Parameters:
  • pretrained_model_name (optional) – a str, the name of pre-trained model (e.g., xlnet-based-cased). Please refer to PretrainedXLNetMixin for all supported models. If None, the model name in hparams is used.
  • cache_dir (optional) – the path to a folder in which the pre-trained models will be cached. If None (default), a default directory (texar_data folder under user’s home directory) will be used.
  • hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparameters will be set to default values. See default_hparams() for the hyperparameter structure and default values.
_build(token_ids, segment_ids=None, input_mask=None, mode=None)[source]

Feeds the inputs through the network and makes regression.

Parameters:
  • token_ids – Shape [batch_size, max_time].
  • segment_ids – Shape [batch_size, max_time].
  • input_mask – Float tensor of shape [batch_size, max_time]. Note that positions with value 1 are masked out.
  • mode (optional) – A tensor taking value in tf.estimator.ModeKeys, including TRAIN, EVAL, and PREDICT. Used to toggle dropout. If None (default), texar.tf.global_mode() is used.
Returns:

Regression predictions.

  • If regr_strategy is cls_time or all_time, predictions have shape [batch_size].
  • If clas_strategy is time_wise, predictions have shape [batch_size, max_time].

static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
    # (1) Same hyperparameters as in XLNetEncoder
    ...
    # (2) Additional hyperparameters
    "regr_strategy": "cls_time",
    "use_projection": True,
    "logit_layer_kwargs": None,
    "name": "xlnet_regressor",
}

Here:

  1. Same hyperparameters as in XLNetEncoder. See the default_hparams(). An instance of XLNetEncoder is created for feature extraction.

  2. Additional hyperparameters:

    “regr_strategy”: str

    The regression strategy, one of:

    • cls_time: Sequence-level regression based on the output of the first time step (which is the CLS token). Each sequence has a prediction.
    • all_time: Sequence-level regression based on the output of all time steps. Each sequence has a prediction.
    • time_wise: Step-wise regression, i.e., make regression for each time step based on its output.
    “logit_layer_kwargs” : dict

    Keyword arguments for the logit Dense layer constructor, except for argument “units” which is set to “num_classes”. Ignored if no extra logit layer is appended.

    “use_projection”: bool

    If True, an additional dense layer is added after the summary step.

    “name”: str

    Name of the regressor.

param_groups(lr=None, lr_layer_scale=1.0, decay_base_params=False)[source]

Create parameter groups for optimizers. When lr_layer_decay_rate is not 1.0, parameters from each layer form separate groups with different base learning rates.

This method should be called before applying gradients to the variables through the optimizer. Particularly, after calling the optimizer’s compute_gradients method, the user can call this method to get variable-specific learning rates for the network. The gradients for each variables can then be scaled accordingly. These scaled gradients are finally applied by calling optimizer’s apply_gradients method.

Parameters:
  • lr (float) – The learning rate. Can be omitted if lr_layer_decay_rate is 1.0.
  • lr_layer_scale (float) – Per-layer LR scaling rate. The i-th layer will be scaled by lr_layer_scale ^ (num_layers - i - 1).
  • decay_base_params (bool) – If True, treat non-layer parameters (e.g. embeddings) as if they’re in layer 0. If False, these parameters are not scaled.

Returns: A dict mapping tensorflow variables to their learning rates.

Pre-trained

PretrainedMixin

class texar.tf.modules.PretrainedMixin(hparams=None)[source]

A mixin class for all pre-trained classes to inherit.

load_pretrained_config(pretrained_model_name=None, cache_dir=None, hparams=None)[source]

Load paths and configurations of the pre-trained model.

Parameters:
  • pretrained_model_name (optional) – A str with the name of a pre-trained model to load. If None, will use the model name in hparams.
  • cache_dir (optional) – The path to a folder in which the pre-trained models will be cached. If None (default), a default directory will be used.
  • hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparameter will be set to default values. See default_hparams() for the hyperparameter structure and default values.
reset_parameters()[source]

Initialize parameters of the pre-trained model. This method is only called if pre-trained checkpoints are not loaded.

static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
    "pretrained_model_name": None,
    "name": "pretrained_base"
}
classmethod download_checkpoint(pretrained_model_name, cache_dir=None)[source]

Download the specified pre-trained checkpoint, and return the directory in which the checkpoint is cached.

Parameters:
  • pretrained_model_name (str) – Name of the model checkpoint.
  • cache_dir (str, optional) – Path to the cache directory. If None, uses the default directory (user’s home directory).
Returns:

Path to the cache directory.

classmethod _transform_config(pretrained_model_name, cache_dir)[source]

Load the official configuration file and transform it into Texar-style hyperparameters.

Parameters:
  • pretrained_model_name (str) – Name of the pre-trained model.
  • cache_dir (str) – Path to the cache directory.
Returns:

Texar module hyperparameters.

Return type:

dict

_init_from_checkpoint(pretrained_model_name, cache_dir, scope_name, **kwargs)[source]

Initialize model parameters from weights stored in the pre-trained checkpoint.

Parameters:
  • pretrained_model_name (str) – Name of the pre-trained model.
  • cache_dir (str) – Path to the cache directory.
  • scope_name – Variable scope.
  • **kwargs – Additional arguments for specific models.

PretrainedBERTMixin

class texar.tf.modules.PretrainedBERTMixin(hparams=None)[source]

A mixin class to support loading pre-trained checkpoints for modules that implement the BERT model.

The BERT model was proposed in (Devlin et al. 2018) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding . A bidirectional Transformer language model pre-trained on large text corpora. Available model names include:

  • bert-base-uncased: 12-layer, 768-hidden, 12-heads, 110M parameters.
  • bert-large-uncased: 24-layer, 1024-hidden, 16-heads, 340M parameters.
  • bert-base-cased: 12-layer, 768-hidden, 12-heads , 110M parameters.
  • bert-large-cased: 24-layer, 1024-hidden, 16-heads, 340M parameters.
  • bert-base-multilingual-uncased: 102 languages, 12-layer, 768-hidden, 12-heads, 110M parameters.
  • bert-base-multilingual-cased: 104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters.
  • bert-base-chinese: Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters.

We provide the following BERT classes:

PretrainedXLNetMixin

class texar.tf.modules.PretrainedXLNetMixin(hparams=None)[source]

A mixin class to support loading pre-trained checkpoints for modules that implement the XLNet model.

The XLNet model was proposed in XLNet: Generalized Autoregressive Pretraining for Language Understanding by Yang et al. It is based on the Transformer-XL model, pre-trained on a large corpus using a language modeling objective that considers all permutations of the input sentence.

The available XLNet models are as follows:

  • xlnet-based-cased: 12-layer, 768-hidden, 12-heads. This model is trained on full data (different from the one in the paper).
  • xlnet-large-cased: 24-layer, 1024-hidden, 16-heads.

We provide the following XLNet classes:

  • XLNetEncoder for text encoding.
  • XLNetDecoder for text generation and decoding.
  • XLNetClassifier for text classification and sequence tagging.
  • XLNetRegressor for text regression.

Connectors

ConnectorBase

class texar.tf.modules.ConnectorBase(output_size, hparams=None)[source]

Base class inherited by all connector classes. A connector is to transform inputs into outputs with any specified structure and shape. For example, tranforming the final state of an encoder to the initial state of a decoder, and performing stochastic sampling in between as in Variational Autoencoders (VAEs).

Parameters:
  • output_size – Size of output excluding the batch dimension. For example, set output_size to dim to generate output of shape [batch_size, dim]. Can be an int, a tuple of int, a Tensorshape, or a tuple of TensorShapes. For example, to transform inputs to have decoder state size, set output_size=decoder.state_size.
  • hparams (dict, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

output_size

The output size.

hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

ConstantConnector

class texar.tf.modules.ConstantConnector(output_size, hparams=None)[source]

Creates a constant Tensor or (nested) tuple of Tensors that contains a constant value.

Parameters:
  • output_size – Size of output excluding the batch dimension. For example, set output_size to dim to generate output of shape [batch_size, dim]. Can be an int, a tuple of int, a Tensorshape, or a tuple of TensorShapes. For example, to transform inputs to have decoder state size, set output_size=decoder.state_size.
  • hparams (dict, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.

This connector does not have trainable parameters. See _build() for the inputs and outputs of the connector.

Example

connector = Connector(cell.state_size)
zero_state = connector(batch_size=64, value=0.)
one_state = connector(batch_size=64, value=1.)
_build(batch_size, value=None)[source]

Creates output tensor(s) that has the given value.

Parameters:
  • batch_size – An int or int scalar Tensor, the batch size.
  • value (optional) – A scalar, the value that the output tensor(s) has. If None, “value” in hparams is used.
Returns:

A (structure of) tensor whose structure is the same as output_size, with value speicified by value or hparams.

static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
    "value": 0.,
    "name": "constant_connector"
}

Here:

“value”: float
The constant scalar that the output tensor(s) has. Ignored if value is given to _build().
“name”: str
Name of the connector.
hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

output_size

The output size.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

ForwardConnector

class texar.tf.modules.ForwardConnector(output_size, hparams=None)[source]

Transforms inputs to have specified structure.

Parameters:
  • output_size – Size of output excluding the batch dimension. For example, set output_size to dim to generate output of shape [batch_size, dim]. Can be an int, a tuple of int, a Tensorshape, or a tuple of TensorShapes. For example, to transform inputs to have decoder state size, set output_size=decoder.state_size.
  • hparams (dict, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.

This connector does not have trainable parameters. See _build() for the inputs and outputs of the connector.

The input to the connector must have the same structure with output_size, or must have the same number of elements and be re-packable into the structure of output_size. Note that if input is or contains a dict instance, the keys will be sorted to pack in deterministic order (See pack_sequence_as for more details).

Example

cell = LSTMCell(num_units=256)
# cell.state_size == LSTMStateTuple(c=256, h=256)

connector = ForwardConnector(cell.state_size)
output = connector([tensor_1, tensor_2])
# output == LSTMStateTuple(c=tensor_1, h=tensor_2)
_build(inputs)[source]

Transforms inputs to have the same structure as with output_size. Values of the inputs are not changed.

inputs must either have the same structure, or have the same number of elements with output_size.

Parameters:inputs – The input (structure of) tensor to pass forward.
Returns:A (structure of) tensors that re-packs inputs to have the specified structure of output_size.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
    "name": "forward_connector"
}

Here:

“name”: str
Name of the connector.
hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

output_size

The output size.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

MLPTransformConnector

class texar.tf.modules.MLPTransformConnector(output_size, hparams=None)[source]

Transforms inputs with an MLP layer and packs the results into the specified structure and size.

Parameters:
  • output_size – Size of output excluding the batch dimension. For example, set output_size to dim to generate output of shape [batch_size, dim]. Can be an int, a tuple of int, a Tensorshape, or a tuple of TensorShapes. For example, to transform inputs to have decoder state size, set output_size=decoder.state_size.
  • hparams (dict, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.

See _build() for the inputs and outputs of the connector.

The input to the connector can have arbitrary structure and size.

Example

cell = LSTMCell(num_units=256)
# cell.state_size == LSTMStateTuple(c=256, h=256)

connector = MLPTransformConnector(cell.state_size)
inputs = tf.zeros([64, 10])
output = connector(inputs)
# output == LSTMStateTuple(c=tensor_of_shape_(64, 256),
#                          h=tensor_of_shape_(64, 256))
## Use to connect encoder and decoder with different state size
encoder = UnidirectionalRNNEncoder(...)
_, final_state = encoder(inputs=...)

decoder = BasicRNNDecoder(...)
connector = MLPTransformConnector(decoder.state_size)

_ = decoder(
    initial_state=connector(final_state),
    ...)
_build(inputs)[source]

Transforms inputs with an MLP layer and packs the results to have the same structure as specified by output_size.

Parameters:inputs – Input (structure of) tensors to be transformed. Must be a Tensor of shape [batch_size, …] or a (nested) tuple of such Tensors. That is, the first dimension of (each) tensor must be the batch dimension.
Returns:A Tensor or a (nested) tuple of Tensors of the same structure of output_size.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
    "activation_fn": "identity",
    "name": "mlp_connector"
}

Here:

“activation_fn”: str or callable
The activation function applied to the outputs of the MLP transformation layer. Can be a function, or its name or module path.
“name”: str
Name of the connector.
hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

output_size

The output size.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

ReparameterizedStochasticConnector

class texar.tf.modules.ReparameterizedStochasticConnector(output_size, hparams=None)[source]

Samples from a distribution with reparameterization trick, and transforms samples into specified size.

Reparameterization allows gradients to be back-propagated through the stochastic samples. Used in, e.g., Variational Autoencoders (VAEs).

Parameters:
  • output_size – Size of output excluding the batch dimension. For example, set output_size to dim to generate output of shape [batch_size, dim]. Can be an int, a tuple of int, a Tensorshape, or a tuple of TensorShapes. For example, to transform inputs to have decoder state size, set output_size=decoder.state_size.
  • hparams (dict, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.

Example

cell = LSTMCell(num_units=256)
# cell.state_size == LSTMStateTuple(c=256, h=256)

connector = ReparameterizedStochasticConnector(cell.state_size)

kwargs = {
    'loc': tf.zeros([batch_size, 10]),
    'scale_diag': tf.ones([batch_size, 10])
}
output, sample = connector(distribution_kwargs=kwargs)
# output == LSTMStateTuple(c=tensor_of_shape_(batch_size, 256),
#                          h=tensor_of_shape_(batch_size, 256))
# sample == Tensor([batch_size, 10])


kwargs = {
    'loc': tf.zeros([10]),
    'scale_diag': tf.ones([10])
}
output_, sample_ = connector(distribution_kwargs=kwargs,
                             num_samples=batch_size_)
# output_ == LSTMStateTuple(c=tensor_of_shape_(batch_size_, 256),
#                           h=tensor_of_shape_(batch_size_, 256))
# sample == Tensor([batch_size_, 10])
_build(distribution='MultivariateNormalDiag', distribution_kwargs=None, transform=True, num_samples=None)[source]

Samples from a distribution and optionally performs transformation with an MLP layer.

The distribution must be reparameterizable, i.e., distribution.reparameterization_type = FULLY_REPARAMETERIZED.

Parameters:
  • distribution – A instance of subclass of TF Distribution, or tensorflow_probability Distribution, Can be a class, its name or module path, or a class instance.
  • distribution_kwargs (dict, optional) – Keyword arguments for the distribution constructor. Ignored if distribution is a class instance.
  • transform (bool) – Whether to perform MLP transformation of the distribution samples. If False, the structure/shape of a sample must match output_size.
  • num_samples (optional) – An int or int Tensor. Number of samples to generate. If not given, generate a single sample. Note that if batch size has already been included in distribution’s dimensionality, num_samples should be left as None.
Returns:

A tuple (output, sample), where

  • output: A Tensor or a (nested) tuple of Tensors with the same structure and size of output_size. The batch dimension equals num_samples if specified, or is determined by the distribution dimensionality. If transform is False, output will be equal to sample.
  • sample: The sample from the distribution, prior to transformation.

Raises:
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
    "activation_fn": "identity",
    "name": "reparameterized_stochastic_connector"
}

Here:

“activation_fn”: str
The activation function applied to the outputs of the MLP transformation layer. Can be a function, or its name or module path.
“name”: str
Name of the connector.
hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

output_size

The output size.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

StochasticConnector

class texar.tf.modules.StochasticConnector(output_size, hparams=None)[source]

Samples from a distribution and transforms samples into specified size.

The connector is the same as ReparameterizedStochasticConnector, except that here reparameterization is disabled, and thus the gradients cannot be back-propagated through the stochastic samples.

Parameters:
  • output_size – Size of output excluding the batch dimension. For example, set output_size to dim to generate output of shape [batch_size, dim]. Can be an int, a tuple of int, a Tensorshape, or a tuple of TensorShapes. For example, to transform inputs to have decoder state size, set output_size=decoder.state_size.
  • hparams (dict, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.
_build(distribution='MultivariateNormalDiag', distribution_kwargs=None, transform=True, num_samples=None)[source]

Samples from a distribution and optionally performs transformation with an MLP layer.

The inputs and outputs are the same as ReparameterizedStochasticConnector except that the distribution does not need to be reparameterizable, and gradient cannot be back-propagate through the samples.

Parameters:
  • distribution – A instance of subclass of TF Distribution, or tensorflow_probability Distribution. Can be a class, its name or module path, or a class instance.
  • distribution_kwargs (dict, optional) – Keyword arguments for the distribution constructor. Ignored if distribution is a class instance.
  • transform (bool) – Whether to perform MLP transformation of the distribution samples. If False, the structure/shape of a sample must match output_size.
  • num_samples (optional) – An int or int Tensor. Number of samples to generate. If not given, generate a single sample. Note that if batch size has already been included in distribution’s dimensionality, num_samples should be left as None.
Returns:

A tuple (output, sample), where

  • output: A Tensor or a (nested) tuple of Tensors with the same structure and size of output_size. The batch dimension equals num_samples if specified, or is determined by the distribution dimensionality. If transform is False, output will be equal to sample.
  • sample: The sample from the distribution, prior to transformation.

Raises:

ValueError – The output does not match output_size.

static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
    "activation_fn": "identity",
    "name": "stochastic_connector"
}

Here:

“activation_fn”: str
The activation function applied to the outputs of the MLP transformation layer. Can be a function, or its name or module path.
“name”: str
Name of the connector.
hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

output_size

The output size.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

Networks

FeedForwardNetworkBase

class texar.tf.modules.FeedForwardNetworkBase(hparams=None)[source]

Base class inherited by all feed-forward network classes.

Parameters:hparams (dict, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.

See _build() for the inputs and outputs.

static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
    "name": "NN"
}
append_layer(layer)[source]

Appends a layer to the end of the network. The method is only feasible before _build is called.

Parameters:layer – A tf.layers.Layer instance, or a dict of layer hyperparameters.
has_layer(layer_name)[source]

Returns True if the network with the name exists. Returns False otherwise.

Parameters:layer_name (str) – Name of the layer.
layer_by_name(layer_name)[source]

Returns the layer with the name. Returns ‘None’ if the layer name does not exist.

Parameters:layer_name (str) – Name of the layer.
layers_by_name

A dictionary mapping layer names to the layers.

layers

A list of the layers.

layer_names

A list of uniquified layer names.

layer_outputs_by_name(layer_name)[source]

Returns the output tensors of the layer with the specified name. Returns None if the layer name does not exist.

Parameters:layer_name (str) – Name of the layer.
layer_outputs

A list containing output tensors of each layer.

hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

FeedForwardNetwork

class texar.tf.modules.FeedForwardNetwork(layers=None, hparams=None)[source]

Feed-forward neural network that consists of a sequence of layers.

Parameters:
  • layers (list, optional) – A list of Layer instances composing the network. If not given, layers are created according to hparams.
  • hparams (dict, optional) – Embedder hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.

See _build() of FeedForwardNetworkBase for the inputs and outputs.

Example

hparams = { # Builds a two-layer dense NN
    "layers": [
        { "type": "Dense", "kwargs": { "units": 256 },
        { "type": "Dense", "kwargs": { "units": 10 }
    ]
}
nn = FeedForwardNetwork(hparams=hparams)

inputs = tf.random_uniform([64, 100])
outputs = nn(inputs)
# outputs == Tensor of shape [64, 10]
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
    "layers": [],
    "name": "NN"
}

Here:

“layers”: list
A list of layer hyperparameters. See get_layer() for the details of layer hyperparameters.
“name”: str
Name of the network.
append_layer(layer)

Appends a layer to the end of the network. The method is only feasible before _build is called.

Parameters:layer – A tf.layers.Layer instance, or a dict of layer hyperparameters.
has_layer(layer_name)

Returns True if the network with the name exists. Returns False otherwise.

Parameters:layer_name (str) – Name of the layer.
hparams

An HParams instance. The hyperparameters of the module.

layer_by_name(layer_name)

Returns the layer with the name. Returns ‘None’ if the layer name does not exist.

Parameters:layer_name (str) – Name of the layer.
layer_names

A list of uniquified layer names.

layer_outputs

A list containing output tensors of each layer.

layer_outputs_by_name(layer_name)

Returns the output tensors of the layer with the specified name. Returns None if the layer name does not exist.

Parameters:layer_name (str) – Name of the layer.
layers

A list of the layers.

layers_by_name

A dictionary mapping layer names to the layers.

name

The uniquified name of the module.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

Conv1DNetwork

class texar.tf.modules.Conv1DNetwork(hparams=None)[source]

Simple Conv-1D network which consists of a sequence of conv layers followed with a sequence of dense layers.

Parameters:hparams (dict, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.

See _build() for the inputs and outputs. The inputs must be a 3D Tensor of shape [batch_size, length, channels] (default), or [batch_size, channels, length] (if data_format is set to ‘channels_last’ through hparams). For example, for sequence classification, length corresponds to time steps, and channels corresponds to embedding dim.

Example

nn = Conv1DNetwork() # Use the default structure

inputs = tf.random_uniform([64, 20, 256])
outputs = nn(inputs)
# outputs == Tensor of shape [64, 128], cuz the final dense layer
# has size 128.
_build(inputs, sequence_length=None, dtype=None, mode=None)[source]

Feeds forward inputs through the network layers and returns outputs.

Parameters:
  • inputs – The inputs to the network, which is a 3D tensor.
  • sequence_length (optional) – An int tensor of shape [batch_size] containing the length of each element in inputs. If given, time steps beyond the length will first be masked out before feeding to the layers.
  • dtype (optional) – Type of the inputs. If not provided, infers from inputs automatically.
  • mode (optional) – A tensor taking value in tf.estimator.ModeKeys, including TRAIN, EVAL, and PREDICT. If None, texar.tf.global_mode() is used.
Returns:

The output of the final layer.

static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
    # (1) Conv layers
    "num_conv_layers": 1,
    "filters": 128,
    "kernel_size": [3, 4, 5],
    "conv_activation": "relu",
    "conv_activation_kwargs": None,
    "other_conv_kwargs": None,
    # (2) Pooling layers
    "pooling": "MaxPooling1D",
    "pool_size": None,
    "pool_strides": 1,
    "other_pool_kwargs": None,
    # (3) Dense layers
    "num_dense_layers": 1,
    "dense_size": 128,
    "dense_activation": "identity",
    "dense_activation_kwargs": None,
    "final_dense_activation": None,
    "final_dense_activation_kwargs": None,
    "other_dense_kwargs": None,
    # (4) Dropout
    "dropout_conv": [1],
    "dropout_dense": [],
    "dropout_rate": 0.75,
    # (5) Others
    "name": "conv1d_network",
}

Here:

  1. For convolutional layers:

    “num_conv_layers”: int

    Number of convolutional layers.

    “filters”: int or list

    The number of filters in the convolution, i.e., the dimensionality of the output space. If “num_conv_layers” > 1, “filters” must be a list of “num_conv_layers” integers.

    “kernel_size”: int or list

    Lengths of 1D convolution windows.

    • If “num_conv_layers” == 1, this can be a list of arbitrary number of int denoting different sized conv windows. The number of filters of each size is specified by “filters”. For example, the default values will create 3 sets of filters, each of which has kernel size of 3, 4, and 5, respectively, and has filter number 128.
    • If “num_conv_layers” > 1, this must be a list of length “num_conv_layers”. Each element can be an int or a list of arbitrary number of int denoting the kernel size of respective layer.
    “conv_activation”: str or callable

    Activation function applied to the output of the convolutional layers. Set to “indentity” to maintain a linear activation. See get_activation_fn() for more details.

    “conv_activation_kwargs”: dict, optional

    Keyword arguments for conv layer activation functions. See get_activation_fn() for more details.

    “other_conv_kwargs”: dict, optional

    Other keyword arguments for tf.layers.Conv1D constructor, e.g., “data_format”, “padding”, etc.

  2. For pooling layers:

    “pooling”: str or class or instance

    Pooling layer after each of the convolutional layer(s). Can a pooling layer class, its name or module path, or a class instance.

    “pool_size”: int or list, optional

    Size of the pooling window. If an int, all pooling layer will have the same pool size. If a list, the list length must equal “num_conv_layers”. If None and the pooling type is either MaxPooling or AveragePooling, the pool size will be set to input size. That is, the output of the pooling layer is a single unit.

    “pool_strides”: int or list, optional

    Strides of the pooling operation. If an int, all pooling layer will have the same stride. If a list, the list length must equal “num_conv_layers”.

    “other_pool_kwargs”: dict, optional

    Other keyword arguments for pooling layer class constructor.

  3. For dense layers (note that here dense layers always follow conv and pooling layers):

    “num_dense_layers”: int

    Number of dense layers.

    “dense_size”: int or list

    Number of units of each dense layers. If an int, all dense layers will have the same size. If a list of int, the list length must equal “num_dense_layers”.

    “dense_activation”: str or callable

    Activation function applied to the output of the dense layers except the last dense layer output . Set to “indentity” to maintain a linear activation. See get_activation_fn() for more details.

    “dense_activation_kwargs”: dict, optional

    Keyword arguments for dense layer activation functions before the last dense layer. See get_activation_fn() for more details.

    “final_dense_activation”: str or callable

    Activation function applied to the output of the last dense layer. Set to None or “indentity” to maintain a linear activation. See get_activation_fn() for more details.

    “final_dense_activation_kwargs”: dict, optional

    Keyword arguments for the activation function of last dense layer. See get_activation_fn() for more details.

    “other_dense_kwargs”: dict, optional

    Other keyword arguments for Dense layer class constructor.

  4. For dropouts:

    “dropout_conv”: int or list

    The indexes of conv layers (starting from 0) whose inputs are applied with dropout. The index = num_conv_layers means dropout applies to the final conv layer output. E.g.,

    {
        "num_conv_layers": 2,
        "dropout_conv": [0, 2]
    }
    

    will leads to a series of layers as -dropout-conv0-conv1-dropout-.

    The dropout mode (training or not) is controlled by the mode argument of _build().

    “dropout_dense”: int or list

    Same as “dropout_conv” but applied to dense layers (index starting from 0).

    “dropout_rate”: float

    The dropout rate, between 0 and 1. E.g., “dropout_rate”: 0.1 would drop out 10% of elements.

  5. Others:

    “name”: str

    Name of the network.

append_layer(layer)

Appends a layer to the end of the network. The method is only feasible before _build is called.

Parameters:layer – A tf.layers.Layer instance, or a dict of layer hyperparameters.
has_layer(layer_name)

Returns True if the network with the name exists. Returns False otherwise.

Parameters:layer_name (str) – Name of the layer.
hparams

An HParams instance. The hyperparameters of the module.

layer_by_name(layer_name)

Returns the layer with the name. Returns ‘None’ if the layer name does not exist.

Parameters:layer_name (str) – Name of the layer.
layer_names

A list of uniquified layer names.

layer_outputs

A list containing output tensors of each layer.

layer_outputs_by_name(layer_name)

Returns the output tensors of the layer with the specified name. Returns None if the layer name does not exist.

Parameters:layer_name (str) – Name of the layer.
layers

A list of the layers.

layers_by_name

A dictionary mapping layer names to the layers.

name

The uniquified name of the module.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

Memory

MemNetBase

class texar.tf.modules.MemNetBase(raw_memory_dim, input_embed_fn=None, output_embed_fn=None, query_embed_fn=None, hparams=None)[source]

Base class inherited by all memory network classes.

Parameters:
  • raw_memory_dim (int) – Dimension size of raw memory entries (before embedding). For example, if a raw memory entry is a word, this is the vocabulary size (imagine a one-hot representation of word). If a raw memory entry is a dense vector, this is the dimension size of the vector.
  • input_embed_fn (optional) – A callable that embeds raw memory entries as inputs. This corresponds to the A embedding operation in (Sukhbaatar et al.) If not provided, a default embedding operation is created as specified in hparams. See get_default_embed_fn() for details.
  • output_embed_fn (optional) – A callable that embeds raw memory entries as outputs. This corresponds to the C embedding operation in (Sukhbaatar et al.) If not provided, a default embedding operation is created as specified in hparams. See get_default_embed_fn() for details.
  • query_embed_fn (optional) – A callable that embeds query. This corresponds to the B embedding operation in (Sukhbaatar et al.). If not provided and “use_B” is True in hparams, a default embedding operation is created as specified in hparams. See get_default_embed_fn() for details. Notice: If you’d like to customize this callable, please follow the same number and style of dimensions as in input_embed_fn or output_embed_fn, and assume that the 2nd dimension of its input and output (which corresponds to memory_size) is 1.
  • hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.
get_default_embed_fn(memory_size, embed_fn_hparams)[source]

Creates a default embedding function. Can be used for A, C, or B operation.

For B operation (i.e., query_embed_fn), memory_size must be 1.

The function is a combination of both memory embedding and temporal embedding, with the combination method specified by “combine_mode” in the embed_fn_hparams.

Parameters:embed_fn_hparams (dict or HParams) – Hyperparameter of the embedding function. See default_memnet_embed_fn() for details.
Returns:A tuple (embed_fn, memory_dim), where
  • `memory_dim` is the dimension of memory entry embedding, inferred from embed_fn_hparams.
    • If combine_mode == ‘add’, memory_dim is the embedder dimension.
    • If combine_mode == ‘concat’, memory_dim is the sum of the memory embedder dimension and the temporal embedder dimension.
  • `embed_fn` is an embedding function that takes in memory and returns memory embedding. Specifically, the function has signature memory_embedding= embed_fn(memory=None, soft_memory=None) where one of memory and soft_memory is provided (but not both).
param memory:An int Tensor of shape [batch_size, memory_size] containing memory indexes used for embedding lookup.
param soft_memory:
 A Tensor of shape [batch_size, memory_size, raw_memory_dim] containing soft weights used to mix the embedding vectors.
returns:A Tensor of shape [batch_size, memory_size, memory_dim] containing the memory entry embeddings.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
    "n_hops": 1,
    "memory_dim": 100,
    "relu_dim": 50,
    "memory_size": 100,
    "A": default_embed_fn_hparams,
    "C": default_embed_fn_hparams,
    "B": default_embed_fn_hparams,
    "use_B": False,
    "use_H": False,
    "dropout_rate": 0,
    "variational": False,
    "name": "memnet",
}

Here:

“n_hops”: int
Number of hops.
“memory_dim”: int
Memory dimension, i.e., the dimension size of a memory entry embedding. Ignored if at least one of the embedding functions is created according to hparams. In this case memory_dim is inferred from the created embed_fn.
“relu_dim”: int
Number of elements in memory_dim that have relu at the end of each hop. Should be not less than 0 and not more than :attr`memory_dim`.
“memory_size”: int

Number of entries in memory.

For example, the number of sentences {x_i} in Fig.1(a) of (Sukhbaatar et al.) End-To-End Memory Networks.

“use_B”: bool
Whether to create the query embedding function. Ignored if query_embed_fn is given to the constructor.
“use_H”: bool
Whether to perform a linear transformation with matrix H at the end of each A-C layer.
“dropout_rate”: float
The dropout rate to apply to the output of each hop. Should be between 0 and 1. E.g., dropout_rate=0.1 would drop out 10% of the units.
“variational”: bool
Whether to share dropout masks after each hop.
memory_size

The memory size.

raw_memory_dim

The dimension of memory element (or vocabulary size).

memory_dim

The dimension of embedded memory and all vectors in hops.

hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

MemNetRNNLike

class texar.tf.modules.MemNetRNNLike(raw_memory_dim, input_embed_fn=None, output_embed_fn=None, query_embed_fn=None, hparams=None)[source]

An implementation of multi-layer end-to-end memory network, with RNN-like weight tying described in (Sukhbaatar et al.) End-To-End Memory Networks .

See get_default_embed_fn() for default embedding functions. Customized embedding functions must follow the same signature.

Parameters:
  • raw_memory_dim (int) – Dimension size of raw memory entries (before embedding). For example, if a raw memory entry is a word, this is the vocabulary size (imagine a one-hot representation of word). If a raw memory entry is a dense vector, this is the dimension size of the vector.
  • input_embed_fn (optional) – A callable that embeds raw memory entries as inputs. This corresponds to the A embedding operation in (Sukhbaatar et al.) If not provided, a default embedding operation is created as specified in hparams. See get_default_embed_fn() for details.
  • output_embed_fn (optional) – A callable that embeds raw memory entries as outputs. This corresponds to the C embedding operation in (Sukhbaatar et al.) If not provided, a default embedding operation is created as specified in hparams. See get_default_embed_fn() for details.
  • query_embed_fn (optional) – A callable that embeds query. This corresponds to the B embedding operation in (Sukhbaatar et al.). If not provided and “use_B” is True in hparams, a default embedding operation is created as specified in hparams. See get_default_embed_fn() for details. For customized query_embed_fn, note that the function must follow the signature of the default embed_fn where memory_size must be 1.
  • hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
    "n_hops": 1,
    "memory_dim": 100,
    "relu_dim": 50,
    "memory_size": 100,
    "A": default_embed_fn_hparams,
    "C": default_embed_fn_hparams,
    "B": default_embed_fn_hparams,
    "use_B": False,
    "use_H": True,
    "dropout_rate": 0,
    "variational": False,
    "name": "memnet_rnnlike",
}

Here:

“n_hops”: int
Number of hops.
“memory_dim”: int
Memory dimension, i.e., the dimension size of a memory entry embedding. Ignored if at least one of the embedding functions is created according to hparams. In this case memory_dim is inferred from the created embed_fn.
“relu_dim”: int
Number of elements in memory_dim that have relu at the end of each hop. Should be not less than 0 and not more than :attr`memory_dim`.
“memory_size”: int

Number of entries in memory.

For example, the number of sentences {x_i} in Fig.1(a) of (Sukhbaatar et al.) End-To-End Memory Networks.

“use_B”: bool
Whether to create the query embedding function. Ignored if query_embed_fn is given to the constructor.
“use_H”: bool
Whether to perform a linear transformation with matrix H at the end of each A-C layer.
“dropout_rate”: float
The dropout rate to apply to the output of each hop. Should be between 0 and 1. E.g., dropout_rate=0.1 would drop out 10% of the units.
“variational”: bool
Whether to share dropout masks after each hop.
hparams

An HParams instance. The hyperparameters of the module.

memory_dim

The dimension of embedded memory and all vectors in hops.

memory_size

The memory size.

name

The uniquified name of the module.

raw_memory_dim

The dimension of memory element (or vocabulary size).

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

default_memnet_embed_fn_hparams

texar.tf.modules.default_memnet_embed_fn_hparams()[source]

Returns a dictionary of hyperparameters with default hparams for default_embed_fn()

{
    "embedding": {
        "dim": 100
    },
    "temporal_embedding": {
        "dim": 100
    },
    "combine_mode": "add"
}

Here:

“embedding”: dict, optional
Hyperparameters for embedding operations. See default_hparams() of WordEmbedder for details. If None, the default hyperparameters are used.
“temporal_embedding”: dict, optional
Hyperparameters for temporal embedding operations. See default_hparams() of PositionEmbedder for details. If None, the default hyperparameters are used.
“combine_mode”: str
Either ‘add’ or ‘concat’. If ‘add’, memory embedding and temporal embedding are added up. In this case the two embedders must have the same dimension. If ‘concat’, the two embeddings are concated.

Policy

PolicyNetBase

class texar.tf.modules.PolicyNetBase(network=None, network_kwargs=None, hparams=None)[source]

Policy net that takes in states and outputs actions.

Parameters:
  • network (optional) – A network that takes in state and returns outputs for generating actions. For example, an instance of subclass of FeedForwardNetworkBase. If None, a network is created as specified in hparams.
  • network_kwargs (dict, optional) – Keyword arguments for network constructor. Note that the hparams argument for network constructor is specified in the “network_hparams” field of hparams and should not be included in network_kwargs. Ignored if network is given.
  • hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
    'network_type': 'FeedForwardNetwork',
    'network_hparams': {
        'layers': [
            {
                'type': 'Dense',
                'kwargs': {'units': 256, 'activation': 'relu'}
            },
            {
                'type': 'Dense',
                'kwargs': {'units': 256, 'activation': 'relu'}
            },
        ]
    },
    'distribution_kwargs': None,
    'name': 'policy_net',
}

Here:

“network_type”: str or class or instance
A network that takes in state and returns outputs for generating actions. This can be a class, its name or module path, or a class instance. Ignored if network is given to the constructor.
“network_hparams”: dict

Hyperparameters for the network. With the network_kwargs argument to the constructor, a network is created with network_class(**network_kwargs, hparams=network_hparams).

For example, the default values creates a two-layer dense network.

“distribution_kwargs”: dict, optional
Keyword arguments for distribution constructor. A distribution would be created for action sampling.
“name”: str
Name of the policy.
network

The network.

hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

CategoricalPolicyNet

class texar.tf.modules.CategoricalPolicyNet(action_space=None, network=None, network_kwargs=None, hparams=None)[source]

Policy net with Categorical distribution for discrete scalar actions.

This is a combination of a network with a top-layer distribution for action sampling.

Parameters:
  • action_space (optional) – An instance of Space specifying the action space. If not given, an discrete action space [0, high] is created with high specified in hparams.
  • network (optional) – A network that takes in state and returns outputs for generating actions. For example, an instance of subclass of FeedForwardNetworkBase. If None, a network is created as specified in hparams.
  • network_kwargs (dict, optional) – Keyword arguments for network constructor. Note that the hparams argument for network constructor is specified in the “network_hparams” field of hparams and should not be included in network_kwargs. Ignored if network is given.
  • hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.
_build(inputs, mode=None)[source]

Takes in states and outputs actions.

Parameters:
  • inputs – Inputs to the policy network with the first dimension the batch dimension.
  • mode (optional) – A tensor taking value in tf.estimator.ModeKeys, including TRAIN, EVAL, and PREDICT. If None, texar.tf.global_mode() is used.
Returns

A dict including fields “logits”, “action”, and “dist”, where

  • “logits”: A Tensor of shape [batch_size] + action_space size used for categorical distribution sampling.
  • “action”: A Tensor of shape [batch_size] + action_space.shape.
  • “dist”: The Categorical based on the logits.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
    'network_type': 'FeedForwardNetwork',
    'network_hparams': {
        'layers': [
            {
                'type': 'Dense',
                'kwargs': {'units': 256, 'activation': 'relu'}
            },
            {
                'type': 'Dense',
                'kwargs': {'units': 256, 'activation': 'relu'}
            },
        ]
    },
    'distribution_kwargs': {
        'dtype': 'int32',
        'validate_args': False,
        'allow_nan_stats': True
    },
    'action_space': 2,
    'make_output_layer': True,
    'name': 'categorical_policy_net'
}

Here:

“distribution_kwargs”: dict
Keyword arguments for the Categorical distribution constructor. Arguments logits and probs should not be included as they are inferred from the inputs. Argument dtype can be a string (e.g., int32) and will be converted to a corresponding tf dtype.
“action_space”: int
Upper bound of the action space. The resulting action space is all discrete scalar numbers between 0 and the upper bound specified here (both inclusive).
“make_output_layer”: bool
Whether to append a dense layer to the network to transform features to logits for action sampling. If False, the final layer output of network must match the action space.

See default_hparams for details of other hyperparameters.

action_space

An instance of Space specifiying the action space.

hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

network

The network.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

Q-Nets

QNetBase

class texar.tf.modules.QNetBase(network=None, network_kwargs=None, hparams=None)[source]

Base class inheritted by all Q net classes. A Q net takes in states and outputs Q value of actions.

Parameters:
  • network (optional) – A network that takes in state and returns Q values. For example, an instance of subclass of FeedForwardNetworkBase. If None, a network is created as specified in hparams.
  • network_kwargs (dict, optional) – Keyword arguments for network constructor. Note that the hparams argument for network constructor is specified in the “network_hparams” field of hparams and should not be included in network_kwargs. Ignored if network is given.
  • hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
    'network_type': 'FeedForwardNetwork',
    'network_hparams': {
        'layers': [
            {
                'type': 'Dense',
                'kwargs': {'units': 256, 'activation': 'relu'}
            },
            {
                'type': 'Dense',
                'kwargs': {'units': 256, 'activation': 'relu'}
            },
        ]
    },
    'name': 'q_net',
}

Here:

“network_type”: str or class or instance
A network that takes in state and returns outputs for generating actions. This can be a class, its name or module path, or a class instance. Ignored if network is given to the constructor.
“network_hparams”: dict

Hyperparameters for the network. With the network_kwargs argument to the constructor, a network is created with network_class(**network_kwargs, hparams=network_hparams).

For example, the default values creates a two-layer dense network.

“name”: str
Name of the Q net.
network

The network.

hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.

CategoricalPolicyNet

class texar.tf.modules.CategoricalQNet(action_space=None, network=None, network_kwargs=None, hparams=None)[source]

Q net with categorical scalar action space.

Parameters:
  • action_space (optional) – An instance of Space specifying the action space. If not given, an discrete action space [0, high] is created with high specified in hparams.
  • network (optional) – A network that takes in state and returns Q values. For example, an instance of subclass of FeedForwardNetworkBase. If None, a network is created as specified in hparams.
  • network_kwargs (dict, optional) – Keyword arguments for network constructor. Note that the hparams argument for network constructor is specified in the “network_hparams” field of hparams and should not be included in network_kwargs. Ignored if network is given.
  • hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparamerter will be set to default values. See default_hparams() for the hyperparameter sturcture and default values.
_build(inputs, mode=None)[source]

Takes in states and outputs Q values.

Parameters:
  • inputs – Inputs to the Q net with the first dimension the batch dimension.
  • mode (optional) – A tensor taking value in tf.estimator.ModeKeys, including TRAIN, EVAL, and PREDICT. If None, texar.tf.global_mode() is used.
Returns

A dict including fields “qvalues”. where

  • “qvalues”: A Tensor of shape [batch_size] + action_space size containing Q values of all possible actions.
static default_hparams()[source]

Returns a dictionary of hyperparameters with default values.

{
    'network_type': 'FeedForwardNetwork',
    'network_hparams': {
        'layers': [
            {
                'type': 'Dense',
                'kwargs': {'units': 256, 'activation': 'relu'}
            },
            {
                'type': 'Dense',
                'kwargs': {'units': 256, 'activation': 'relu'}
            },
        ]
    },
    'action_space': 2,
    'make_output_layer': True,
    'name': 'q_net'
}

Here:

“action_space”: int
Upper bound of the action space. The resulting action space is all discrete scalar numbers between 0 and the upper bound specified here (both inclusive).
“make_output_layer”: bool
Whether to append a dense layer to the network to transform features to Q values. If False, the final layer output of network must match the action space.

See default_hparams for details of other hyperparameters.

action_space

An instance of Space specifiying the action space.

hparams

An HParams instance. The hyperparameters of the module.

name

The uniquified name of the module.

network

The network.

trainable_variables

The list of trainable variables of the module.

variable_scope

The variable scope of the module.