tianshou.utils¶

class tianshou.utils.MovAvg(size: int = 100)[source]¶

Bases: object

Class for moving average.

It will automatically exclude the infinity and NaN. Usage:

>>> stat = MovAvg(size=66)
>>> stat.add(torch.tensor(5))
5.0
>>> stat.add(float('inf'))  # which will not add to stat
5.0
>>> stat.add([6, 7, 8])
6.5
>>> stat.get()
6.5
>>> print(f'{stat.mean():.2f}±{stat.std():.2f}')
6.50±1.12

add(x: Union[numbers.Number, numpy.number, list, numpy.ndarray, torch.Tensor]) → float[source]¶

Add a scalar into MovAvg.

You can add torch.Tensor with only one element, a python scalar, or a list of python scalar.

get() → float[source]¶: Get the average.

mean() → float[source]¶: Get the average. Same as get().

std() → float[source]¶: Get the standard deviation.

class tianshou.utils.RunningMeanStd(mean: Union[float, numpy.ndarray] = 0.0, std: Union[float, numpy.ndarray] = 1.0)[source]¶

Bases: object

Calulates the running mean and std of a data stream.

https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm

update(x: numpy.ndarray) → None[source]¶: Add a batch of item into RMS with the same shape, modify mean/var/count.

class tianshou.utils.BaseLogger(writer: Any)[source]¶

Bases: abc.ABC

The base class for any logger which is compatible with trainer.

abstract write(key: str, x: int, y: Union[int, numbers.Number, numpy.number, numpy.ndarray], **kwargs: Any) → None[source]¶

Specify how the writer is used to log data.

Parameters

key (str) – namespace which the input data tuple belongs to.
x (int) – stands for the ordinate of the input data tuple.
y – stands for the abscissa of the input data tuple.

log_train_data(collect_result: dict, step: int) → None[source]¶

Use writer to log statistics generated during training.

Parameters

collect_result – a dict containing information of data collected in training stage, i.e., returns of collector.collect().
step (int) – stands for the timestep the collect_result being logged.

log_update_data(update_result: dict, step: int) → None[source]¶

Use writer to log statistics generated during updating.

Parameters

update_result – a dict containing information of data collected in updating stage, i.e., returns of policy.update().
step (int) – stands for the timestep the collect_result being logged.

log_test_data(collect_result: dict, step: int) → None[source]¶

Use writer to log statistics generated during evaluating.

Parameters

collect_result – a dict containing information of data collected in evaluating stage, i.e., returns of collector.collect().
step (int) – stands for the timestep the collect_result being logged.

save_data(epoch: int, env_step: int, gradient_step: int, save_checkpoint_fn: Optional[Callable[[int, int, int], None]] = None) → None[source]¶

Use writer to log metadata when calling save_checkpoint_fn in trainer.

Parameters

epoch (int) – the epoch in trainer.
env_step (int) – the env_step in trainer.
gradient_step (int) – the gradient_step in trainer.
save_checkpoint_fn (function) – a hook defined by user, see trainer documentation for detail.

restore_data() → Tuple[int, int, int][source]¶

Return the metadata from existing log.

If it finds nothing or an error occurs during the recover process, it will return the default parameters.

Returns: epoch, env_step, gradient_step.

class tianshou.utils.BasicLogger(writer: torch.utils.tensorboard.writer.SummaryWriter, train_interval: int = 1000, test_interval: int = 1, update_interval: int = 1000, save_interval: int = 1)[source]¶

Bases: tianshou.utils.log_tools.BaseLogger

A loggger that relies on tensorboard SummaryWriter by default to visualize and log statistics.

You can also rewrite write() func to use your own writer.

Parameters

writer (SummaryWriter) – the writer to log data.
train_interval (int) – the log interval in log_train_data(). Default to 1000.
test_interval (int) – the log interval in log_test_data(). Default to 1.
update_interval (int) – the log interval in log_update_data(). Default to 1000.
save_interval (int) – the save interval in save_data(). Default to 1 (save at the end of each epoch).

write(key: str, x: int, y: Union[int, numbers.Number, numpy.number, numpy.ndarray], **kwargs: Any) → None[source]¶

Specify how the writer is used to log data.

Parameters

key (str) – namespace which the input data tuple belongs to.
x (int) – stands for the ordinate of the input data tuple.
y – stands for the abscissa of the input data tuple.

log_train_data(collect_result: dict, step: int) → None[source]¶

Use writer to log statistics generated during training.

Parameters

collect_result – a dict containing information of data collected in training stage, i.e., returns of collector.collect().
step (int) – stands for the timestep the collect_result being logged.

Note

collect_result will be modified in-place with “rew” and “len” keys.

log_test_data(collect_result: dict, step: int) → None[source]¶

Use writer to log statistics generated during evaluating.

Parameters

collect_result – a dict containing information of data collected in evaluating stage, i.e., returns of collector.collect().
step (int) – stands for the timestep the collect_result being logged.

Note

collect_result will be modified in-place with “rew”, “rew_std”, “len”, and “len_std” keys.

log_update_data(update_result: dict, step: int) → None[source]¶

Use writer to log statistics generated during updating.

Parameters

update_result – a dict containing information of data collected in updating stage, i.e., returns of policy.update().
step (int) – stands for the timestep the collect_result being logged.

save_data(epoch: int, env_step: int, gradient_step: int, save_checkpoint_fn: Optional[Callable[[int, int, int], None]] = None) → None[source]¶

Use writer to log metadata when calling save_checkpoint_fn in trainer.

Parameters

epoch (int) – the epoch in trainer.
env_step (int) – the env_step in trainer.
gradient_step (int) – the gradient_step in trainer.
save_checkpoint_fn (function) – a hook defined by user, see trainer documentation for detail.

restore_data() → Tuple[int, int, int][source]¶

Return the metadata from existing log.

If it finds nothing or an error occurs during the recover process, it will return the default parameters.

Returns: epoch, env_step, gradient_step.

class tianshou.utils.LazyLogger[source]¶

Bases: tianshou.utils.log_tools.BasicLogger

A loggger that does nothing. Used as the placeholder in trainer.

write(key: str, x: int, y: Union[int, numbers.Number, numpy.number, numpy.ndarray], **kwargs: Any) → None[source]¶: The LazyLogger writes nothing.

Pre-defined Networks¶

Common¶

tianshou.utils.net.common.miniblock(input_size: int, output_size: int = 0, norm_layer: Optional[Type[torch.nn.modules.module.Module]] = None, activation: Optional[Type[torch.nn.modules.module.Module]] = None) → List[torch.nn.modules.module.Module][source]¶: Construct a miniblock with given input/output-size, norm layer and activation.

class tianshou.utils.net.common.MLP(input_dim: int, output_dim: int = 0, hidden_sizes: Sequence[int] = (), norm_layer: Optional[Union[Type[torch.nn.modules.module.Module], Sequence[Type[torch.nn.modules.module.Module]]]] = None, activation: Optional[Union[Type[torch.nn.modules.module.Module], Sequence[Type[torch.nn.modules.module.Module]]]] = <class 'torch.nn.modules.activation.ReLU'>, device: Optional[Union[str, int, torch.device]] = None)[source]¶

Bases: torch.nn.modules.module.Module

Simple MLP backbone.

Create a MLP of size input_dim * hidden_sizes[0] * hidden_sizes[1] * … * hidden_sizes[-1] * output_dim

Parameters

input_dim (int) – dimension of the input vector.
output_dim (int) – dimension of the output vector. If set to 0, there is no final linear layer.
hidden_sizes – shape of MLP passed in as a list, not incluing input_dim and output_dim.
norm_layer – use which normalization before activation, e.g., nn.LayerNorm and nn.BatchNorm1d. Default to no normalization. You can also pass a list of normalization modules with the same length of hidden_sizes, to use different normalization module in different layers. Default to no normalization.
activation – which activation to use after each layer, can be both the same actvition for all layers if passed in nn.Module, or different activation for different Modules if passed in a list. Default to nn.ReLU.

forward(x: Union[numpy.ndarray, torch.Tensor]) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool¶

class tianshou.utils.net.common.Net(state_shape: Union[int, Sequence[int]], action_shape: Union[int, Sequence[int]] = 0, hidden_sizes: Sequence[int] = (), norm_layer: Optional[Type[torch.nn.modules.module.Module]] = None, activation: Optional[Type[torch.nn.modules.module.Module]] = <class 'torch.nn.modules.activation.ReLU'>, device: Union[str, int, torch.device] = 'cpu', softmax: bool = False, concat: bool = False, num_atoms: int = 1, dueling_param: Optional[Tuple[Dict[str, Any], Dict[str, Any]]] = None)[source]¶

Bases: torch.nn.modules.module.Module

Wrapper of MLP to support more specific DRL usage.

For advanced usage (how to customize the network), please refer to Build the Network.

Parameters

state_shape – int or a sequence of int of the shape of state.
action_shape – int or a sequence of int of the shape of action.
hidden_sizes – shape of MLP passed in as a list.
norm_layer – use which normalization before activation, e.g., nn.LayerNorm and nn.BatchNorm1d. Default to no normalization. You can also pass a list of normalization modules with the same length of hidden_sizes, to use different normalization module in different layers. Default to no normalization.
activation – which activation to use after each layer, can be both the same actvition for all layers if passed in nn.Module, or different activation for different Modules if passed in a list. Default to nn.ReLU.
device – specify the device when the network actually runs. Default to “cpu”.
softmax (bool) – whether to apply a softmax layer over the last layer’s output.
concat (bool) – whether the input shape is concatenated by state_shape and action_shape. If it is True, action_shape is not the output shape, but affects the input shape only.
num_atoms (int) – in order to expand to the net of distributional RL. Default to 1 (not use).
dueling_param (bool) – whether to use dueling network to calculate Q values (for Dueling DQN). If you want to use dueling option, you should pass a tuple of two dict (first for Q and second for V) stating self-defined arguments as stated in class:~tianshou.utils.net.common.MLP. Default to None.

See also

Please refer to MLP for more detailed explanation on the usage of activation, norm_layer, etc.

You can also refer to Actor, Critic, etc, to see how it’s suggested be used.

forward(s: Union[numpy.ndarray, torch.Tensor], state: Optional[Any] = None, info: Dict[str, Any] = {}) → Tuple[torch.Tensor, Any][source]¶: Mapping: s -> flatten (inside MLP)-> logits.

training: bool¶

class tianshou.utils.net.common.Recurrent(layer_num: int, state_shape: Union[int, Sequence[int]], action_shape: Union[int, Sequence[int]], device: Union[str, int, torch.device] = 'cpu', hidden_layer_size: int = 128)[source]¶

Bases: torch.nn.modules.module.Module

Simple Recurrent network based on LSTM.

For advanced usage (how to customize the network), please refer to Build the Network.

forward(s: Union[numpy.ndarray, torch.Tensor], state: Optional[Dict[str, torch.Tensor]] = None, info: Dict[str, Any] = {}) → Tuple[torch.Tensor, Dict[str, torch.Tensor]][source]¶

Mapping: s -> flatten -> logits.

In the evaluation mode, s should be with shape [bsz, dim]; in the training mode, s should be with shape [bsz, len, dim]. See the code and comment for more detail.

training: bool¶

Discrete¶

class tianshou.utils.net.discrete.Actor(preprocess_net: torch.nn.modules.module.Module, action_shape: Sequence[int], hidden_sizes: Sequence[int] = (), softmax_output: bool = True, preprocess_net_output_dim: Optional[int] = None, device: Union[str, int, torch.device] = 'cpu')[source]¶

Bases: torch.nn.modules.module.Module

Simple actor network.

Will create an actor operated in discrete action space with structure of preprocess_net —> action_shape.

Parameters

preprocess_net – a self-defined preprocess_net which output a flattened hidden state.
action_shape – a sequence of int for the shape of action.
hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).
softmax_output (bool) – whether to apply a softmax layer over the last layer’s output.
preprocess_net_output_dim (int) – the output dimension of preprocess_net.

For advanced usage (how to customize the network), please refer to Build the Network.

See also

Please refer to Net as an instance of how preprocess_net is suggested to be defined.

forward(s: Union[numpy.ndarray, torch.Tensor], state: Optional[Any] = None, info: Dict[str, Any] = {}) → Tuple[torch.Tensor, Any][source]¶: Mapping: s -> Q(s, *).

training: bool¶

class tianshou.utils.net.discrete.Critic(preprocess_net: torch.nn.modules.module.Module, hidden_sizes: Sequence[int] = (), last_size: int = 1, preprocess_net_output_dim: Optional[int] = None, device: Union[str, int, torch.device] = 'cpu')[source]¶

Bases: torch.nn.modules.module.Module

Simple critic network. Will create an actor operated in discrete action space with structure of preprocess_net —> 1(q value).

Parameters

preprocess_net – a self-defined preprocess_net which output a flattened hidden state.
hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).
last_size (int) – the output dimension of Critic network. Default to 1.
preprocess_net_output_dim (int) – the output dimension of preprocess_net.

For advanced usage (how to customize the network), please refer to Build the Network.

See also

Please refer to Net as an instance of how preprocess_net is suggested to be defined.

forward(s: Union[numpy.ndarray, torch.Tensor], **kwargs: Any) → torch.Tensor[source]¶: Mapping: s -> V(s).

training: bool¶

class tianshou.utils.net.discrete.CosineEmbeddingNetwork(num_cosines: int, embedding_dim: int)[source]¶

Bases: torch.nn.modules.module.Module

Cosine embedding network for IQN. Convert a scalar in [0, 1] to a list of n-dim vectors.

Parameters

num_cosines – the number of cosines used for the embedding.
embedding_dim – the dimension of the embedding/output.

Note

From https://github.com/ku2482/fqf-iqn-qrdqn.pytorch/blob/master /fqf_iqn_qrdqn/network.py .

forward(taus: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool¶

class tianshou.utils.net.discrete.ImplicitQuantileNetwork(preprocess_net: torch.nn.modules.module.Module, action_shape: Sequence[int], hidden_sizes: Sequence[int] = (), num_cosines: int = 64, preprocess_net_output_dim: Optional[int] = None, device: Union[str, int, torch.device] = 'cpu')[source]¶

Bases: tianshou.utils.net.discrete.Critic

Implicit Quantile Network.

Parameters

preprocess_net – a self-defined preprocess_net which output a flattened hidden state.
action_dim (int) – the dimension of action space.
hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).
num_cosines (int) – the number of cosines to use for cosine embedding. Default to 64.
preprocess_net_output_dim (int) – the output dimension of preprocess_net.

Note

Although this class inherits Critic, it is actually a quantile Q-Network with output shape (batch_size, action_dim, sample_size).

The second item of the first return value is tau vector.

forward(s: Union[numpy.ndarray, torch.Tensor], sample_size: int, **kwargs: Any) → Tuple[Any, torch.Tensor][source]¶: Mapping: s -> Q(s, *).

training: bool¶

class tianshou.utils.net.discrete.FractionProposalNetwork(num_fractions: int, embedding_dim: int)[source]¶

Bases: torch.nn.modules.module.Module

Fraction proposal network for FQF.

Parameters

num_fractions – the number of factions to propose.
embedding_dim – the dimension of the embedding/input.

Note

Adapted from https://github.com/ku2482/fqf-iqn-qrdqn.pytorch/blob/master /fqf_iqn_qrdqn/network.py .

forward(state_embeddings: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool¶

class tianshou.utils.net.discrete.FullQuantileFunction(preprocess_net: torch.nn.modules.module.Module, action_shape: Sequence[int], hidden_sizes: Sequence[int] = (), num_cosines: int = 64, preprocess_net_output_dim: Optional[int] = None, device: Union[str, int, torch.device] = 'cpu')[source]¶

Bases: tianshou.utils.net.discrete.ImplicitQuantileNetwork

Full(y parameterized) Quantile Function.

Parameters

preprocess_net – a self-defined preprocess_net which output a flattened hidden state.
action_dim (int) – the dimension of action space.
hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).
num_cosines (int) – the number of cosines to use for cosine embedding. Default to 64.
preprocess_net_output_dim (int) – the output dimension of preprocess_net.

Note

The first return value is a tuple of (quantiles, fractions, quantiles_tau), where fractions is a Batch(taus, tau_hats, entropies).

forward(s: Union[numpy.ndarray, torch.Tensor], propose_model: tianshou.utils.net.discrete.FractionProposalNetwork, fractions: Optional[tianshou.data.batch.Batch] = None, **kwargs: Any) → Tuple[Any, torch.Tensor][source]¶: Mapping: s -> Q(s, *).

training: bool¶

Continuous¶

class tianshou.utils.net.continuous.Actor(preprocess_net: torch.nn.modules.module.Module, action_shape: Sequence[int], hidden_sizes: Sequence[int] = (), max_action: float = 1.0, device: Union[str, int, torch.device] = 'cpu', preprocess_net_output_dim: Optional[int] = None)[source]¶

Bases: torch.nn.modules.module.Module

Simple actor network. Will create an actor operated in continuous action space with structure of preprocess_net —> action_shape.

Parameters

preprocess_net – a self-defined preprocess_net which output a flattened hidden state.
action_shape – a sequence of int for the shape of action.
hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).
max_action (float) – the scale for the final action logits. Default to 1.
preprocess_net_output_dim (int) – the output dimension of preprocess_net.

For advanced usage (how to customize the network), please refer to Build the Network.

See also

Please refer to Net as an instance of how preprocess_net is suggested to be defined.

forward(s: Union[numpy.ndarray, torch.Tensor], state: Optional[Any] = None, info: Dict[str, Any] = {}) → Tuple[torch.Tensor, Any][source]¶: Mapping: s -> logits -> action.

training: bool¶

class tianshou.utils.net.continuous.Critic(preprocess_net: torch.nn.modules.module.Module, hidden_sizes: Sequence[int] = (), device: Union[str, int, torch.device] = 'cpu', preprocess_net_output_dim: Optional[int] = None)[source]¶

Bases: torch.nn.modules.module.Module

Simple critic network. Will create an actor operated in continuous action space with structure of preprocess_net —> 1(q value).

Parameters

preprocess_net – a self-defined preprocess_net which output a flattened hidden state.
hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).
preprocess_net_output_dim (int) – the output dimension of preprocess_net.

For advanced usage (how to customize the network), please refer to Build the Network.

See also

Please refer to Net as an instance of how preprocess_net is suggested to be defined.

forward(s: Union[numpy.ndarray, torch.Tensor], a: Optional[Union[numpy.ndarray, torch.Tensor]] = None, info: Dict[str, Any] = {}) → torch.Tensor[source]¶: Mapping: (s, a) -> logits -> Q(s, a).

training: bool¶

class tianshou.utils.net.continuous.ActorProb(preprocess_net: torch.nn.modules.module.Module, action_shape: Sequence[int], hidden_sizes: Sequence[int] = (), max_action: float = 1.0, device: Union[str, int, torch.device] = 'cpu', unbounded: bool = False, conditioned_sigma: bool = False, preprocess_net_output_dim: Optional[int] = None)[source]¶

Bases: torch.nn.modules.module.Module

Simple actor network (output with a Gauss distribution).

Parameters

preprocess_net – a self-defined preprocess_net which output a flattened hidden state.
action_shape – a sequence of int for the shape of action.
hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).
max_action (float) – the scale for the final action logits. Default to 1.
unbounded (bool) – whether to apply tanh activation on final logits. Default to False.
conditioned_sigma (bool) – True when sigma is calculated from the input, False when sigma is an independent parameter. Default to False.
preprocess_net_output_dim (int) – the output dimension of preprocess_net.

For advanced usage (how to customize the network), please refer to Build the Network.

See also

Please refer to Net as an instance of how preprocess_net is suggested to be defined.

forward(s: Union[numpy.ndarray, torch.Tensor], state: Optional[Any] = None, info: Dict[str, Any] = {}) → Tuple[Tuple[torch.Tensor, torch.Tensor], Any][source]¶: Mapping: s -> logits -> (mu, sigma).

training: bool¶

class tianshou.utils.net.continuous.RecurrentActorProb(layer_num: int, state_shape: Sequence[int], action_shape: Sequence[int], hidden_layer_size: int = 128, max_action: float = 1.0, device: Union[str, int, torch.device] = 'cpu', unbounded: bool = False, conditioned_sigma: bool = False)[source]¶

Bases: torch.nn.modules.module.Module

Recurrent version of ActorProb.

For advanced usage (how to customize the network), please refer to Build the Network.

forward(s: Union[numpy.ndarray, torch.Tensor], state: Optional[Dict[str, torch.Tensor]] = None, info: Dict[str, Any] = {}) → Tuple[Tuple[torch.Tensor, torch.Tensor], Dict[str, torch.Tensor]][source]¶: Almost the same as Recurrent.

training: bool¶

class tianshou.utils.net.continuous.RecurrentCritic(layer_num: int, state_shape: Sequence[int], action_shape: Sequence[int] = [0], device: Union[str, int, torch.device] = 'cpu', hidden_layer_size: int = 128)[source]¶

Bases: torch.nn.modules.module.Module

Recurrent version of Critic.

For advanced usage (how to customize the network), please refer to Build the Network.

forward(s: Union[numpy.ndarray, torch.Tensor], a: Optional[Union[numpy.ndarray, torch.Tensor]] = None, info: Dict[str, Any] = {}) → torch.Tensor[source]¶: Almost the same as Recurrent.

training: bool¶