tianshou.utils¶

Utils package.

class tianshou.utils.MovAvg(size: int = 100)[source]

Bases: `object`

Class for moving average.

It will automatically exclude the infinity and NaN. Usage:

```>>> stat = MovAvg(size=66)
5.0
5.0
6.5
>>> stat.get()
6.5
>>> print(f'{stat.mean():.2f}±{stat.std():.2f}')
6.50±1.12
```
add(data_array: Union[Number, number, list, ndarray, Tensor]) float[source]

Add a scalar into `MovAvg`.

You can add `torch.Tensor` with only one element, a python scalar, or a list of python scalar.

get() float[source]

Get the average.

mean() float[source]

Get the average. Same as `get()`.

std() float[source]

Get the standard deviation.

class tianshou.utils.RunningMeanStd(mean: Union[float, ndarray] = 0.0, std: Union[float, ndarray] = 1.0, clip_max: Optional[float] = 10.0, epsilon: float = 1.1920928955078125e-07)[source]

Bases: `object`

Calculates the running mean and std of a data stream.

https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm

Parameters
• mean – the initial mean estimation for data array. Default to 0.

• std – the initial standard error estimation for data array. Default to 1.

• clip_max (float) – the maximum absolute value for data array. Default to 10.0.

• epsilon (float) – To avoid division by zero.

norm(data_array: Union[float, ndarray]) Union[float, ndarray][source]
update(data_array: ndarray) None[source]

Add a batch of item into RMS with the same shape, modify mean/var/count.

class tianshou.utils.DummyTqdm(total: int, **kwargs: Any)[source]

Bases: `object`

A dummy tqdm class that keeps stats but without progress bar.

It supports `__enter__` and `__exit__`, update and a dummy `set_postfix`, which is the interface that trainers use.

Note

Using `disable=True` in tqdm config results in infinite loop, thus this class is created. See the discussion at #641 for details.

set_postfix(**kwargs: Any) None[source]
update(n: int = 1) None[source]
class tianshou.utils.BaseLogger(train_interval: int = 1000, test_interval: int = 1, update_interval: int = 1000)[source]

Bases: `ABC`

The base class for any logger which is compatible with trainer.

Try to overwrite write() method to use your own writer.

Parameters
• train_interval (int) – the log interval in log_train_data(). Default to 1000.

• test_interval (int) – the log interval in log_test_data(). Default to 1.

• update_interval (int) – the log interval in log_update_data(). Default to 1000.

abstract write(step_type: str, step: int, data: Dict[str, Union[int, Number, number, ndarray]]) None[source]

Specify how the writer is used to log data.

Parameters
• step_type (str) – namespace which the data dict belongs to.

• step (int) – stands for the ordinate of the data dict.

• data (dict) – the data to write with format `{key: value}`.

log_train_data(collect_result: dict, step: int) None[source]

Use writer to log statistics generated during training.

Parameters
• collect_result – a dict containing information of data collected in training stage, i.e., returns of collector.collect().

• step (int) – stands for the timestep the collect_result being logged.

log_test_data(collect_result: dict, step: int) None[source]

Use writer to log statistics generated during evaluating.

Parameters
• collect_result – a dict containing information of data collected in evaluating stage, i.e., returns of collector.collect().

• step (int) – stands for the timestep the collect_result being logged.

log_update_data(update_result: dict, step: int) None[source]

Use writer to log statistics generated during updating.

Parameters
• update_result – a dict containing information of data collected in updating stage, i.e., returns of policy.update().

• step (int) – stands for the timestep the collect_result being logged.

abstract save_data(epoch: int, env_step: int, gradient_step: int, save_checkpoint_fn: Optional[Callable[[int, int, int], str]] = None) None[source]

Use writer to log metadata when calling `save_checkpoint_fn` in trainer.

Parameters
• epoch (int) – the epoch in trainer.

• env_step (int) – the env_step in trainer.

• save_checkpoint_fn (function) – a hook defined by user, see trainer documentation for detail.

abstract restore_data() Tuple[int, int, int][source]

Return the metadata from existing log.

If it finds nothing or an error occurs during the recover process, it will return the default parameters.

Returns

class tianshou.utils.TensorboardLogger(writer: SummaryWriter, train_interval: int = 1000, test_interval: int = 1, update_interval: int = 1000, save_interval: int = 1, write_flush: bool = True)[source]

Bases: `BaseLogger`

A logger that relies on tensorboard SummaryWriter by default to visualize and log statistics.

Parameters
• writer (SummaryWriter) – the writer to log data.

• train_interval (int) – the log interval in log_train_data(). Default to 1000.

• test_interval (int) – the log interval in log_test_data(). Default to 1.

• update_interval (int) – the log interval in log_update_data(). Default to 1000.

• save_interval (int) – the save interval in save_data(). Default to 1 (save at the end of each epoch).

• write_flush (bool) – whether to flush tensorboard result after each add_scalar operation. Default to True.

write(step_type: str, step: int, data: Dict[str, Union[int, Number, number, ndarray]]) None[source]

Specify how the writer is used to log data.

Parameters
• step_type (str) – namespace which the data dict belongs to.

• step (int) – stands for the ordinate of the data dict.

• data (dict) – the data to write with format `{key: value}`.

save_data(epoch: int, env_step: int, gradient_step: int, save_checkpoint_fn: Optional[Callable[[int, int, int], str]] = None) None[source]

Use writer to log metadata when calling `save_checkpoint_fn` in trainer.

Parameters
• epoch (int) – the epoch in trainer.

• env_step (int) – the env_step in trainer.

• save_checkpoint_fn (function) – a hook defined by user, see trainer documentation for detail.

restore_data() Tuple[int, int, int][source]

Return the metadata from existing log.

If it finds nothing or an error occurs during the recover process, it will return the default parameters.

Returns

class tianshou.utils.BasicLogger(*args: Any, **kwargs: Any)[source]

BasicLogger has changed its name to TensorboardLogger in #427.

This class is for compatibility.

class tianshou.utils.LazyLogger[source]

Bases: `BaseLogger`

A logger that does nothing. Used as the placeholder in trainer.

write(step_type: str, step: int, data: Dict[str, Union[int, Number, number, ndarray]]) None[source]

The LazyLogger writes nothing.

save_data(epoch: int, env_step: int, gradient_step: int, save_checkpoint_fn: Optional[Callable[[int, int, int], str]] = None) None[source]

Use writer to log metadata when calling `save_checkpoint_fn` in trainer.

Parameters
• epoch (int) – the epoch in trainer.

• env_step (int) – the env_step in trainer.

• save_checkpoint_fn (function) – a hook defined by user, see trainer documentation for detail.

restore_data() Tuple[int, int, int][source]

Return the metadata from existing log.

If it finds nothing or an error occurs during the recover process, it will return the default parameters.

Returns

class tianshou.utils.WandbLogger(train_interval: int = 1000, test_interval: int = 1, update_interval: int = 1000, save_interval: int = 1000, write_flush: bool = True, project: Optional[str] = None, name: Optional[str] = None, entity: Optional[str] = None, run_id: Optional[str] = None, config: Optional[Namespace] = None)[source]

Bases: `BaseLogger`

Weights and Biases logger that sends data to https://wandb.ai/.

This logger creates three panels with plots: train, test, and update. Make sure to select the correct access for each panel in weights and biases:

Example of usage:

```logger = WandbLogger()
result = onpolicy_trainer(policy, train_collector, test_collector,
logger=logger)
```
Parameters
• train_interval (int) – the log interval in log_train_data(). Default to 1000.

• test_interval (int) – the log interval in log_test_data(). Default to 1.

• update_interval (int) – the log interval in log_update_data(). Default to 1000.

• save_interval (int) – the save interval in save_data(). Default to 1 (save at the end of each epoch).

• write_flush (bool) – whether to flush tensorboard result after each add_scalar operation. Default to True.

• project (str) – W&B project name. Default to “tianshou”.

• name (str) – W&B run name. Default to None. If None, random name is assigned.

• entity (str) – W&B team/organization name. Default to None.

• run_id (str) – run id of W&B run to be resumed. Default to None.

• config (argparse.Namespace) – experiment configurations. Default to None.

write(step_type: str, step: int, data: Dict[str, Union[int, Number, number, ndarray]]) None[source]

Specify how the writer is used to log data.

Parameters
• step_type (str) – namespace which the data dict belongs to.

• step (int) – stands for the ordinate of the data dict.

• data (dict) – the data to write with format `{key: value}`.

save_data(epoch: int, env_step: int, gradient_step: int, save_checkpoint_fn: Optional[Callable[[int, int, int], str]] = None) None[source]

Use writer to log metadata when calling `save_checkpoint_fn` in trainer.

Parameters
• epoch (int) – the epoch in trainer.

• env_step (int) – the env_step in trainer.

• save_checkpoint_fn (function) – a hook defined by user, see trainer documentation for detail.

restore_data() Tuple[int, int, int][source]

Return the metadata from existing log.

If it finds nothing or an error occurs during the recover process, it will return the default parameters.

Returns

tianshou.utils.deprecation(msg: str) None[source]

Deprecation warning wrapper.

class tianshou.utils.MultipleLRSchedulers(*args: LambdaLR)[source]

Bases: `object`

A wrapper for multiple learning rate schedulers.

Every time `step()` is called, it calls the step() method of each of the schedulers that it contains. Example usage:

```scheduler1 = ConstantLR(opt1, factor=0.1, total_iters=2)
scheduler2 = ExponentialLR(opt2, gamma=0.9)
scheduler = MultipleLRSchedulers(scheduler1, scheduler2)
policy = PPOPolicy(..., lr_scheduler=scheduler)
```
step() None[source]

Take a step in each of the learning rate schedulers.

state_dict() List[Dict][source]

Get state_dict for each of the learning rate schedulers.

Returns

A list of state_dict of learning rate schedulers.

Parameters

state_dict (List[Dict]) – A list of learning rate scheduler state_dict, in the same order as the schedulers.

Pre-defined Networks¶

Common¶

tianshou.utils.net.common.miniblock(input_size: int, output_size: int = 0, norm_layer: ~typing.Optional[~typing.Type[~torch.nn.modules.module.Module]] = None, norm_args: ~typing.Optional[~typing.Union[~typing.Tuple[~typing.Any, ...], ~typing.Dict[~typing.Any, ~typing.Any]]] = None, activation: ~typing.Optional[~typing.Type[~torch.nn.modules.module.Module]] = None, act_args: ~typing.Optional[~typing.Union[~typing.Tuple[~typing.Any, ...], ~typing.Dict[~typing.Any, ~typing.Any]]] = None, linear_layer: ~typing.Type[~torch.nn.modules.linear.Linear] = <class 'torch.nn.modules.linear.Linear'>) List[Module][source]

Construct a miniblock with given input/output-size, norm layer and activation.

class tianshou.utils.net.common.MLP(input_dim: int, output_dim: int = 0, hidden_sizes: ~typing.Sequence[int] = (), norm_layer: ~typing.Optional[~typing.Union[~typing.Type[~torch.nn.modules.module.Module], ~typing.Sequence[~typing.Type[~torch.nn.modules.module.Module]]]] = None, norm_args: ~typing.Optional[~typing.Union[~typing.Tuple[~typing.Any, ...], ~typing.Dict[~typing.Any, ~typing.Any], ~typing.Sequence[~typing.Tuple[~typing.Any, ...]], ~typing.Sequence[~typing.Dict[~typing.Any, ~typing.Any]]]] = None, activation: ~typing.Optional[~typing.Union[~typing.Type[~torch.nn.modules.module.Module], ~typing.Sequence[~typing.Type[~torch.nn.modules.module.Module]]]] = <class 'torch.nn.modules.activation.ReLU'>, act_args: ~typing.Optional[~typing.Union[~typing.Tuple[~typing.Any, ...], ~typing.Dict[~typing.Any, ~typing.Any], ~typing.Sequence[~typing.Tuple[~typing.Any, ...]], ~typing.Sequence[~typing.Dict[~typing.Any, ~typing.Any]]]] = None, device: ~typing.Optional[~typing.Union[str, int, ~torch.device]] = None, linear_layer: ~typing.Type[~torch.nn.modules.linear.Linear] = <class 'torch.nn.modules.linear.Linear'>, flatten_input: bool = True)[source]

Bases: `Module`

Simple MLP backbone.

Create a MLP of size input_dim * hidden_sizes[0] * hidden_sizes[1] * … * hidden_sizes[-1] * output_dim

Parameters
• input_dim (int) – dimension of the input vector.

• output_dim (int) – dimension of the output vector. If set to 0, there is no final linear layer.

• hidden_sizes – shape of MLP passed in as a list, not including input_dim and output_dim.

• norm_layer – use which normalization before activation, e.g., `nn.LayerNorm` and `nn.BatchNorm1d`. Default to no normalization. You can also pass a list of normalization modules with the same length of hidden_sizes, to use different normalization module in different layers. Default to no normalization.

• activation – which activation to use after each layer, can be both the same activation for all layers if passed in nn.Module, or different activation for different Modules if passed in a list. Default to nn.ReLU.

• device – which device to create this model on. Default to None.

• linear_layer – use this module as linear layer. Default to nn.Linear.

• flatten_input (bool) – whether to flatten input data. Default to True.

forward(obs: Union[ndarray, Tensor]) Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the `Module` instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class tianshou.utils.net.common.Net(state_shape: ~typing.Union[int, ~typing.Sequence[int]], action_shape: ~typing.Union[int, ~typing.Sequence[int]] = 0, hidden_sizes: ~typing.Sequence[int] = (), norm_layer: ~typing.Optional[~typing.Union[~typing.Type[~torch.nn.modules.module.Module], ~typing.Sequence[~typing.Type[~torch.nn.modules.module.Module]]]] = None, norm_args: ~typing.Optional[~typing.Union[~typing.Tuple[~typing.Any, ...], ~typing.Dict[~typing.Any, ~typing.Any], ~typing.Sequence[~typing.Tuple[~typing.Any, ...]], ~typing.Sequence[~typing.Dict[~typing.Any, ~typing.Any]]]] = None, activation: ~typing.Optional[~typing.Union[~typing.Type[~torch.nn.modules.module.Module], ~typing.Sequence[~typing.Type[~torch.nn.modules.module.Module]]]] = <class 'torch.nn.modules.activation.ReLU'>, act_args: ~typing.Optional[~typing.Union[~typing.Tuple[~typing.Any, ...], ~typing.Dict[~typing.Any, ~typing.Any], ~typing.Sequence[~typing.Tuple[~typing.Any, ...]], ~typing.Sequence[~typing.Dict[~typing.Any, ~typing.Any]]]] = None, device: ~typing.Union[str, int, ~torch.device] = 'cpu', softmax: bool = False, concat: bool = False, num_atoms: int = 1, dueling_param: ~typing.Optional[~typing.Tuple[~typing.Dict[str, ~typing.Any], ~typing.Dict[str, ~typing.Any]]] = None, linear_layer: ~typing.Type[~torch.nn.modules.linear.Linear] = <class 'torch.nn.modules.linear.Linear'>)[source]

Bases: `Module`

Wrapper of MLP to support more specific DRL usage.

For advanced usage (how to customize the network), please refer to Build the Network.

Parameters
• state_shape – int or a sequence of int of the shape of state.

• action_shape – int or a sequence of int of the shape of action.

• hidden_sizes – shape of MLP passed in as a list.

• norm_layer – use which normalization before activation, e.g., `nn.LayerNorm` and `nn.BatchNorm1d`. Default to no normalization. You can also pass a list of normalization modules with the same length of hidden_sizes, to use different normalization module in different layers. Default to no normalization.

• activation – which activation to use after each layer, can be both the same activation for all layers if passed in nn.Module, or different activation for different Modules if passed in a list. Default to nn.ReLU.

• device – specify the device when the network actually runs. Default to “cpu”.

• softmax (bool) – whether to apply a softmax layer over the last layer’s output.

• concat (bool) – whether the input shape is concatenated by state_shape and action_shape. If it is True, `action_shape` is not the output shape, but affects the input shape only.

• num_atoms (int) – in order to expand to the net of distributional RL. Default to 1 (not use).

• dueling_param (bool) – whether to use dueling network to calculate Q values (for Dueling DQN). If you want to use dueling option, you should pass a tuple of two dict (first for Q and second for V) stating self-defined arguments as stated in class:~tianshou.utils.net.common.MLP. Default to None.

• linear_layer – use this module as linear layer. Default to nn.Linear.

Please refer to `MLP` for more detailed explanation on the usage of activation, norm_layer, etc.

You can also refer to `Actor`, `Critic`, etc, to see how it’s suggested be used.

forward(obs: Union[ndarray, Tensor], state: Optional[Any] = None, info: Dict[str, Any] = {}) Tuple[Tensor, Any][source]

Mapping: obs -> flatten (inside MLP)-> logits.

training: bool
class tianshou.utils.net.common.Recurrent(layer_num: int, state_shape: Union[int, Sequence[int]], action_shape: Union[int, Sequence[int]], device: Union[str, int, device] = 'cpu', hidden_layer_size: int = 128)[source]

Bases: `Module`

Simple Recurrent network based on LSTM.

For advanced usage (how to customize the network), please refer to Build the Network.

forward(obs: Union[ndarray, Tensor], state: Optional[Dict[str, Tensor]] = None, info: Dict[str, Any] = {}) Tuple[Tensor, Dict[str, Tensor]][source]

Mapping: obs -> flatten -> logits.

In the evaluation mode, obs should be with shape `[bsz, dim]`; in the training mode, obs should be with shape `[bsz, len, dim]`. See the code and comment for more detail.

training: bool
class tianshou.utils.net.common.ActorCritic(actor: Module, critic: Module)[source]

Bases: `Module`

An actor-critic network for parsing parameters.

Using `actor_critic.parameters()` instead of set.union or list+list to avoid issue #449.

Parameters
• actor (nn.Module) – the actor network.

• critic (nn.Module) – the critic network.

training: bool
class tianshou.utils.net.common.DataParallelNet(net: Module)[source]

Bases: `Module`

DataParallel wrapper for training agent with multi-GPU.

This class does only the conversion of input data type, from numpy array to torch’s Tensor. If the input is a nested dictionary, the user should create a similar class to do the same thing.

Parameters

net (nn.Module) – the network to be distributed in different GPUs.

forward(obs: Union[ndarray, Tensor], *args: Any, **kwargs: Any) Tuple[Any, Any][source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the `Module` instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class tianshou.utils.net.common.EnsembleLinear(ensemble_size: int, in_feature: int, out_feature: int, bias: bool = True)[source]

Bases: `Module`

Linear Layer of Ensemble network.

Parameters
• ensemble_size (int) – Number of subnets in the ensemble.

• inp_feature (int) – dimension of the input vector.

• out_feature (int) – dimension of the output vector.

• bias (bool) – whether to include an additive bias, default to be True.

forward(x: Tensor) Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the `Module` instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class tianshou.utils.net.common.BranchingNet(state_shape: ~typing.Union[int, ~typing.Sequence[int]], num_branches: int = 0, action_per_branch: int = 2, common_hidden_sizes: ~typing.List[int] = [], value_hidden_sizes: ~typing.List[int] = [], action_hidden_sizes: ~typing.List[int] = [], norm_layer: ~typing.Optional[~typing.Type[~torch.nn.modules.module.Module]] = None, norm_args: ~typing.Optional[~typing.Union[~typing.Tuple[~typing.Any, ...], ~typing.Dict[~typing.Any, ~typing.Any], ~typing.Sequence[~typing.Tuple[~typing.Any, ...]], ~typing.Sequence[~typing.Dict[~typing.Any, ~typing.Any]]]] = None, activation: ~typing.Optional[~typing.Type[~torch.nn.modules.module.Module]] = <class 'torch.nn.modules.activation.ReLU'>, act_args: ~typing.Optional[~typing.Union[~typing.Tuple[~typing.Any, ...], ~typing.Dict[~typing.Any, ~typing.Any], ~typing.Sequence[~typing.Tuple[~typing.Any, ...]], ~typing.Sequence[~typing.Dict[~typing.Any, ~typing.Any]]]] = None, device: ~typing.Union[str, int, ~torch.device] = 'cpu')[source]

Bases: `Module`

Branching dual Q network.

Network for the BranchingDQNPolicy, it uses a common network module, a value module and action “branches” one for each dimension.It allows for a linear scaling of Q-value the output w.r.t. the number of dimensions in the action space. For more info please refer to: arXiv:1711.08946. :param state_shape: int or a sequence of int of the shape of state. :param action_shape: int or a sequence of int of the shape of action. :param action_peer_branch: int or a sequence of int of the number of actions in each dimension. :param common_hidden_sizes: shape of the common MLP network passed in as a list. :param value_hidden_sizes: shape of the value MLP network passed in as a list. :param action_hidden_sizes: shape of the action MLP network passed in as a list. :param norm_layer: use which normalization before activation, e.g., `nn.LayerNorm` and `nn.BatchNorm1d`. Default to no normalization. You can also pass a list of normalization modules with the same length of hidden_sizes, to use different normalization module in different layers. Default to no normalization. :param activation: which activation to use after each layer, can be both the same activation for all layers if passed in nn.Module, or different activation for different Modules if passed in a list. Default to nn.ReLU. :param device: specify the device when the network actually runs. Default to “cpu”. :param bool softmax: whether to apply a softmax layer over the last layer’s output.

training: bool
forward(obs: Union[ndarray, Tensor], state: Optional[Any] = None, info: Dict[str, Any] = {}) Tuple[Tensor, Any][source]

Mapping: obs -> model -> logits.

tianshou.utils.net.common.get_dict_state_decorator(state_shape: Dict[str, Union[int, Sequence[int]]], keys: Sequence[str]) Tuple[Callable, int][source]

A helper function to make Net or equivalent classes (e.g. Actor, Critic) applicable to dict state.

The first return item, `decorator_fn`, will alter the implementation of forward function of the given class by preprocessing the observation. The preprocessing is basically flatten the observation and concatenate them based on the `keys` order. The batch dimension is preserved if presented. The result observation shape will be equal to `new_state_shape`, the second return item.

Parameters
• state_shape – A dictionary indicating each state’s shape

• keys – A list of state’s keys. The flatten observation will be according to this list order.

Returns

a 2-items tuple `decorator_fn` and `new_state_shape`

Discrete¶

class tianshou.utils.net.discrete.Actor(preprocess_net: Module, action_shape: Sequence[int], hidden_sizes: Sequence[int] = (), softmax_output: bool = True, preprocess_net_output_dim: Optional[int] = None, device: Union[str, int, device] = 'cpu')[source]

Bases: `Module`

Simple actor network.

Will create an actor operated in discrete action space with structure of preprocess_net —> action_shape.

Parameters
• preprocess_net – a self-defined preprocess_net which output a flattened hidden state.

• action_shape – a sequence of int for the shape of action.

• hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).

• softmax_output (bool) – whether to apply a softmax layer over the last layer’s output.

• preprocess_net_output_dim (int) – the output dimension of preprocess_net.

For advanced usage (how to customize the network), please refer to Build the Network.

Please refer to `Net` as an instance of how preprocess_net is suggested to be defined.

forward(obs: Union[ndarray, Tensor], state: Optional[Any] = None, info: Dict[str, Any] = {}) Tuple[Tensor, Any][source]

Mapping: s -> Q(s, *).

training: bool
class tianshou.utils.net.discrete.Critic(preprocess_net: Module, hidden_sizes: Sequence[int] = (), last_size: int = 1, preprocess_net_output_dim: Optional[int] = None, device: Union[str, int, device] = 'cpu')[source]

Bases: `Module`

Simple critic network. Will create an actor operated in discrete action space with structure of preprocess_net —> 1(q value).

Parameters
• preprocess_net – a self-defined preprocess_net which output a flattened hidden state.

• hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).

• last_size (int) – the output dimension of Critic network. Default to 1.

• preprocess_net_output_dim (int) – the output dimension of preprocess_net.

For advanced usage (how to customize the network), please refer to Build the Network.

Please refer to `Net` as an instance of how preprocess_net is suggested to be defined.

forward(obs: Union[ndarray, Tensor], **kwargs: Any) Tensor[source]

Mapping: s -> V(s).

training: bool
class tianshou.utils.net.discrete.CosineEmbeddingNetwork(num_cosines: int, embedding_dim: int)[source]

Bases: `Module`

Cosine embedding network for IQN. Convert a scalar in [0, 1] to a list of n-dim vectors.

Parameters
• num_cosines – the number of cosines used for the embedding.

• embedding_dim – the dimension of the embedding/output.

Note

From https://github.com/ku2482/fqf-iqn-qrdqn.pytorch/blob/master /fqf_iqn_qrdqn/network.py .

forward(taus: Tensor) Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the `Module` instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class tianshou.utils.net.discrete.ImplicitQuantileNetwork(preprocess_net: Module, action_shape: Sequence[int], hidden_sizes: Sequence[int] = (), num_cosines: int = 64, preprocess_net_output_dim: Optional[int] = None, device: Union[str, int, device] = 'cpu')[source]

Bases: `Critic`

Implicit Quantile Network.

Parameters
• preprocess_net – a self-defined preprocess_net which output a flattened hidden state.

• action_shape (int) – a sequence of int for the shape of action.

• hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).

• num_cosines (int) – the number of cosines to use for cosine embedding. Default to 64.

• preprocess_net_output_dim (int) – the output dimension of preprocess_net.

Note

Although this class inherits Critic, it is actually a quantile Q-Network with output shape (batch_size, action_dim, sample_size).

The second item of the first return value is tau vector.

forward(obs: Union[ndarray, Tensor], sample_size: int, **kwargs: Any) Tuple[Any, Tensor][source]

Mapping: s -> Q(s, *).

training: bool
class tianshou.utils.net.discrete.FractionProposalNetwork(num_fractions: int, embedding_dim: int)[source]

Bases: `Module`

Fraction proposal network for FQF.

Parameters
• num_fractions – the number of factions to propose.

• embedding_dim – the dimension of the embedding/input.

Note

forward(obs_embeddings: Tensor) Tuple[Tensor, Tensor, Tensor][source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the `Module` instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class tianshou.utils.net.discrete.FullQuantileFunction(preprocess_net: Module, action_shape: Sequence[int], hidden_sizes: Sequence[int] = (), num_cosines: int = 64, preprocess_net_output_dim: Optional[int] = None, device: Union[str, int, device] = 'cpu')[source]

Full(y parameterized) Quantile Function.

Parameters
• preprocess_net – a self-defined preprocess_net which output a flattened hidden state.

• action_shape (int) – a sequence of int for the shape of action.

• hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).

• num_cosines (int) – the number of cosines to use for cosine embedding. Default to 64.

• preprocess_net_output_dim (int) – the output dimension of preprocess_net.

Note

The first return value is a tuple of (quantiles, fractions, quantiles_tau), where fractions is a Batch(taus, tau_hats, entropies).

forward(obs: Union[ndarray, Tensor], propose_model: FractionProposalNetwork, fractions: Optional[Batch] = None, **kwargs: Any) Tuple[Any, Tensor][source]

Mapping: s -> Q(s, *).

training: bool
class tianshou.utils.net.discrete.NoisyLinear(in_features: int, out_features: int, noisy_std: float = 0.5)[source]

Bases: `Module`

Implementation of Noisy Networks. arXiv:1706.10295.

Parameters
• in_features (int) – the number of input features.

• out_features (int) – the number of output features.

• noisy_std (float) – initial standard deviation of noisy linear layers.

Note

reset() None[source]
f(x: Tensor) Tensor[source]
sample() None[source]
forward(x: Tensor) Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the `Module` instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
tianshou.utils.net.discrete.sample_noise(model: Module) bool[source]

Sample the random noises of NoisyLinear modules in the model.

Parameters

model – a PyTorch module which may have NoisyLinear submodules.

Returns

True if model has at least one NoisyLinear submodule; otherwise, False.

class tianshou.utils.net.discrete.IntrinsicCuriosityModule(feature_net: Module, feature_dim: int, action_dim: int, hidden_sizes: Sequence[int] = (), device: Union[str, device] = 'cpu')[source]

Bases: `Module`

Implementation of Intrinsic Curiosity Module. arXiv:1705.05363.

Parameters
• feature_net (torch.nn.Module) – a self-defined feature_net which output a flattened hidden state.

• feature_dim (int) – input dimension of the feature net.

• action_dim (int) – dimension of the action space.

• hidden_sizes – hidden layer sizes for forward and inverse models.

• device – device for the module.

forward(s1: Union[ndarray, Tensor], act: Union[ndarray, Tensor], s2: Union[ndarray, Tensor], **kwargs: Any) Tuple[Tensor, Tensor][source]

Mapping: s1, act, s2 -> mse_loss, act_hat.

training: bool

Continuous¶

class tianshou.utils.net.continuous.Actor(preprocess_net: Module, action_shape: Sequence[int], hidden_sizes: Sequence[int] = (), max_action: float = 1.0, device: Union[str, int, device] = 'cpu', preprocess_net_output_dim: Optional[int] = None)[source]

Bases: `Module`

Simple actor network. Will create an actor operated in continuous action space with structure of preprocess_net —> action_shape.

Parameters
• preprocess_net – a self-defined preprocess_net which output a flattened hidden state.

• action_shape – a sequence of int for the shape of action.

• hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).

• max_action (float) – the scale for the final action logits. Default to 1.

• preprocess_net_output_dim (int) – the output dimension of preprocess_net.

For advanced usage (how to customize the network), please refer to Build the Network.

Please refer to `Net` as an instance of how preprocess_net is suggested to be defined.

forward(obs: Union[ndarray, Tensor], state: Optional[Any] = None, info: Dict[str, Any] = {}) Tuple[Tensor, Any][source]

Mapping: obs -> logits -> action.

training: bool
class tianshou.utils.net.continuous.Critic(preprocess_net: ~torch.nn.modules.module.Module, hidden_sizes: ~typing.Sequence[int] = (), device: ~typing.Union[str, int, ~torch.device] = 'cpu', preprocess_net_output_dim: ~typing.Optional[int] = None, linear_layer: ~typing.Type[~torch.nn.modules.linear.Linear] = <class 'torch.nn.modules.linear.Linear'>, flatten_input: bool = True)[source]

Bases: `Module`

Simple critic network. Will create an actor operated in continuous action space with structure of preprocess_net —> 1(q value).

Parameters
• preprocess_net – a self-defined preprocess_net which output a flattened hidden state.

• hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).

• preprocess_net_output_dim (int) – the output dimension of preprocess_net.

• linear_layer – use this module as linear layer. Default to nn.Linear.

• flatten_input (bool) – whether to flatten input data for the last layer. Default to True.

For advanced usage (how to customize the network), please refer to Build the Network.

Please refer to `Net` as an instance of how preprocess_net is suggested to be defined.

forward(obs: Union[ndarray, Tensor], act: Optional[Union[ndarray, Tensor]] = None, info: Dict[str, Any] = {}) Tensor[source]

Mapping: (s, a) -> logits -> Q(s, a).

training: bool
class tianshou.utils.net.continuous.ActorProb(preprocess_net: Module, action_shape: Sequence[int], hidden_sizes: Sequence[int] = (), max_action: float = 1.0, device: Union[str, int, device] = 'cpu', unbounded: bool = False, conditioned_sigma: bool = False, preprocess_net_output_dim: Optional[int] = None)[source]

Bases: `Module`

Simple actor network (output with a Gauss distribution).

Parameters
• preprocess_net – a self-defined preprocess_net which output a flattened hidden state.

• action_shape – a sequence of int for the shape of action.

• hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).

• max_action (float) – the scale for the final action logits. Default to 1.

• unbounded (bool) – whether to apply tanh activation on final logits. Default to False.

• conditioned_sigma (bool) – True when sigma is calculated from the input, False when sigma is an independent parameter. Default to False.

• preprocess_net_output_dim (int) – the output dimension of preprocess_net.

For advanced usage (how to customize the network), please refer to Build the Network.

Please refer to `Net` as an instance of how preprocess_net is suggested to be defined.

forward(obs: Union[ndarray, Tensor], state: Optional[Any] = None, info: Dict[str, Any] = {}) Tuple[Tuple[Tensor, Tensor], Any][source]

Mapping: obs -> logits -> (mu, sigma).

training: bool
class tianshou.utils.net.continuous.RecurrentActorProb(layer_num: int, state_shape: Sequence[int], action_shape: Sequence[int], hidden_layer_size: int = 128, max_action: float = 1.0, device: Union[str, int, device] = 'cpu', unbounded: bool = False, conditioned_sigma: bool = False)[source]

Bases: `Module`

Recurrent version of ActorProb.

For advanced usage (how to customize the network), please refer to Build the Network.

forward(obs: Union[ndarray, Tensor], state: Optional[Dict[str, Tensor]] = None, info: Dict[str, Any] = {}) Tuple[Tuple[Tensor, Tensor], Dict[str, Tensor]][source]

Almost the same as `Recurrent`.

training: bool
class tianshou.utils.net.continuous.RecurrentCritic(layer_num: int, state_shape: Sequence[int], action_shape: Sequence[int] = [0], device: Union[str, int, device] = 'cpu', hidden_layer_size: int = 128)[source]

Bases: `Module`

Recurrent version of Critic.

For advanced usage (how to customize the network), please refer to Build the Network.

forward(obs: Union[ndarray, Tensor], act: Optional[Union[ndarray, Tensor]] = None, info: Dict[str, Any] = {}) Tensor[source]

Almost the same as `Recurrent`.

training: bool
class tianshou.utils.net.continuous.Perturbation(preprocess_net: Module, max_action: float, device: Union[str, int, device] = 'cpu', phi: float = 0.05)[source]

Bases: `Module`

Implementation of perturbation network in BCQ algorithm. Given a state and action, it can generate perturbed action.

Parameters
• preprocess_net (torch.nn.Module) – a self-defined preprocess_net which output a flattened hidden state.

• max_action (float) – the maximum value of each dimension of action.

• device (Union[str, int, torch.device]) – which device to create this model on. Default to cpu.

• phi (float) – max perturbation parameter for BCQ. Default to 0.05.

For advanced usage (how to customize the network), please refer to Build the Network.

You can refer to examples/offline/offline_bcq.py to see how to use it.

forward(state: Tensor, action: Tensor) Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the `Module` instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class tianshou.utils.net.continuous.VAE(encoder: Module, decoder: Module, hidden_dim: int, latent_dim: int, max_action: float, device: Union[str, device] = 'cpu')[source]

Bases: `Module`

Implementation of VAE. It models the distribution of action. Given a state, it can generate actions similar to those in batch. It is used in BCQ algorithm.

Parameters
• encoder (torch.nn.Module) – the encoder in VAE. Its input_dim must be state_dim + action_dim, and output_dim must be hidden_dim.

• decoder (torch.nn.Module) – the decoder in VAE. Its input_dim must be state_dim + latent_dim, and output_dim must be action_dim.

• hidden_dim (int) – the size of the last linear-layer in encoder.

• latent_dim (int) – the size of latent layer.

• max_action (float) – the maximum value of each dimension of action.

• device (Union[str, torch.device]) – which device to create this model on. Default to “cpu”.

For advanced usage (how to customize the network), please refer to Build the Network.

You can refer to examples/offline/offline_bcq.py to see how to use it.

forward(state: Tensor, action: Tensor) Tuple[Tensor, Tensor, Tensor][source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the `Module` instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

decode(state: Tensor, latent_z: Optional[Tensor] = None) Tensor[source]
training: bool