tianshou.utils¶

class tianshou.utils.MovAvg(size: int = 100)[source]¶

Bases: object

Class for moving average.

It will automatically exclude the infinity and NaN. Usage:

>>> stat = MovAvg(size=66)
>>> stat.add(torch.tensor(5))
5.0
>>> stat.add(float('inf'))  # which will not add to stat
5.0
>>> stat.add([6, 7, 8])
6.5
>>> stat.get()
6.5
>>> print(f'{stat.mean():.2f}±{stat.std():.2f}')
6.50±1.12

add(x: Union[numbers.Number, numpy.number, list, numpy.ndarray, torch.Tensor]) → numpy.number[source]¶

Add a scalar into MovAvg.

You can add torch.Tensor with only one element, a python scalar, or a list of python scalar.

get() → numpy.number[source]¶: Get the average.

mean() → numpy.number[source]¶: Get the average. Same as get().

std() → numpy.number[source]¶: Get the standard deviation.

class tianshou.utils.net.common.Net(layer_num: int, state_shape: tuple, action_shape: Optional[Union[tuple, int]] = 0, device: Union[str, int, torch.device] = 'cpu', softmax: bool = False, concat: bool = False, hidden_layer_size: int = 128, dueling: Optional[Tuple[int, int]] = None, norm_layer: Optional[Callable[[int], torch.nn.modules.module.Module]] = None)[source]¶

Bases: torch.nn.modules.module.Module

Simple MLP backbone.

For advanced usage (how to customize the network), please refer to Build the Network.

Parameters

concat (bool) – whether the input shape is concatenated by state_shape and action_shape. If it is True, action_shape is not the output shape, but affects the input shape.
dueling (bool) – whether to use dueling network to calculate Q values (for Dueling DQN), defaults to False.
norm_layer – use which normalization before ReLU, e.g., nn.LayerNorm and nn.BatchNorm1d, defaults to None.

forward(s: Union[numpy.ndarray, torch.Tensor], state: Optional[Any] = None, info: Dict[str, Any] = {}) → Tuple[torch.Tensor, Any][source]¶: Mapping: s -> flatten -> logits.

training: bool¶

class tianshou.utils.net.common.Recurrent(layer_num: int, state_shape: Sequence[int], action_shape: Sequence[int], device: Union[str, int, torch.device] = 'cpu', hidden_layer_size: int = 128)[source]¶

Bases: torch.nn.modules.module.Module

Simple Recurrent network based on LSTM.

For advanced usage (how to customize the network), please refer to Build the Network.

forward(s: Union[numpy.ndarray, torch.Tensor], state: Optional[Dict[str, torch.Tensor]] = None, info: Dict[str, Any] = {}) → Tuple[torch.Tensor, Dict[str, torch.Tensor]][source]¶

Mapping: s -> flatten -> logits.

In the evaluation mode, s should be with shape [bsz, dim]; in the training mode, s should be with shape [bsz, len, dim]. See the code and comment for more detail.

training: bool¶

tianshou.utils.net.common.miniblock(inp: int, oup: int, norm_layer: Optional[Callable[[int], torch.nn.modules.module.Module]]) → List[torch.nn.modules.module.Module][source]¶: Construct a miniblock with given input/output-size and norm layer.

class tianshou.utils.net.discrete.Actor(preprocess_net: torch.nn.modules.module.Module, action_shape: Sequence[int], hidden_layer_size: int = 128, softmax_output: bool = True)[source]¶

Bases: torch.nn.modules.module.Module

Simple actor network with MLP.

For advanced usage (how to customize the network), please refer to Build the Network.

forward(s: Union[numpy.ndarray, torch.Tensor], state: Optional[Any] = None, info: Dict[str, Any] = {}) → Tuple[torch.Tensor, Any][source]¶: Mapping: s -> Q(s, *).

training: bool¶

class tianshou.utils.net.discrete.Critic(preprocess_net: torch.nn.modules.module.Module, hidden_layer_size: int = 128, last_size: int = 1)[source]¶

Bases: torch.nn.modules.module.Module

Simple critic network with MLP.

For advanced usage (how to customize the network), please refer to Build the Network.

forward(s: Union[numpy.ndarray, torch.Tensor], **kwargs: Any) → torch.Tensor[source]¶: Mapping: s -> V(s).

training: bool¶

class tianshou.utils.net.discrete.DQN(c: int, h: int, w: int, action_shape: Sequence[int], device: Union[str, int, torch.device] = 'cpu')[source]¶

Bases: torch.nn.modules.module.Module

Reference: Human-level control through deep reinforcement learning.

For advanced usage (how to customize the network), please refer to Build the Network.

forward(x: Union[numpy.ndarray, torch.Tensor], state: Optional[Any] = None, info: Dict[str, Any] = {}) → Tuple[torch.Tensor, Any][source]¶: Mapping: x -> Q(x, *).

training: bool¶

class tianshou.utils.net.continuous.Actor(preprocess_net: torch.nn.modules.module.Module, action_shape: Sequence[int], max_action: float = 1.0, device: Union[str, int, torch.device] = 'cpu', hidden_layer_size: int = 128)[source]¶

Bases: torch.nn.modules.module.Module

Simple actor network with MLP.

For advanced usage (how to customize the network), please refer to Build the Network.

forward(s: Union[numpy.ndarray, torch.Tensor], state: Optional[Any] = None, info: Dict[str, Any] = {}) → Tuple[torch.Tensor, Any][source]¶: Mapping: s -> logits -> action.

training: bool¶

class tianshou.utils.net.continuous.ActorProb(preprocess_net: torch.nn.modules.module.Module, action_shape: Sequence[int], max_action: float = 1.0, device: Union[str, int, torch.device] = 'cpu', unbounded: bool = False, hidden_layer_size: int = 128)[source]¶

Bases: torch.nn.modules.module.Module

Simple actor network (output with a Gauss distribution) with MLP.

For advanced usage (how to customize the network), please refer to Build the Network.

forward(s: Union[numpy.ndarray, torch.Tensor], state: Optional[Any] = None, info: Dict[str, Any] = {}) → Tuple[Tuple[torch.Tensor, torch.Tensor], Any][source]¶: Mapping: s -> logits -> (mu, sigma).

training: bool¶

class tianshou.utils.net.continuous.Critic(preprocess_net: torch.nn.modules.module.Module, device: Union[str, int, torch.device] = 'cpu', hidden_layer_size: int = 128)[source]¶

Bases: torch.nn.modules.module.Module

Simple critic network with MLP.

For advanced usage (how to customize the network), please refer to Build the Network.

forward(s: Union[numpy.ndarray, torch.Tensor], a: Optional[Union[numpy.ndarray, torch.Tensor]] = None, info: Dict[str, Any] = {}) → torch.Tensor[source]¶: Mapping: (s, a) -> logits -> Q(s, a).

training: bool¶

class tianshou.utils.net.continuous.RecurrentActorProb(layer_num: int, state_shape: Sequence[int], action_shape: Sequence[int], max_action: float = 1.0, device: Union[str, int, torch.device] = 'cpu', unbounded: bool = False, hidden_layer_size: int = 128)[source]¶

Bases: torch.nn.modules.module.Module

Recurrent version of ActorProb.

For advanced usage (how to customize the network), please refer to Build the Network.

forward(s: Union[numpy.ndarray, torch.Tensor], state: Optional[Dict[str, torch.Tensor]] = None, info: Dict[str, Any] = {}) → Tuple[Tuple[torch.Tensor, torch.Tensor], Dict[str, torch.Tensor]][source]¶: Almost the same as Recurrent.

training: bool¶

class tianshou.utils.net.continuous.RecurrentCritic(layer_num: int, state_shape: Sequence[int], action_shape: Sequence[int] = [0], device: Union[str, int, torch.device] = 'cpu', hidden_layer_size: int = 128)[source]¶

Bases: torch.nn.modules.module.Module

Recurrent version of Critic.

For advanced usage (how to customize the network), please refer to Build the Network.

forward(s: Union[numpy.ndarray, torch.Tensor], a: Optional[Union[numpy.ndarray, torch.Tensor]] = None, info: Dict[str, Any] = {}) → torch.Tensor[source]¶: Almost the same as Recurrent.

training: bool¶