tianshou.utils¶
-
class
tianshou.utils.
MovAvg
(size: int = 100)[source]¶ Bases:
object
Class for moving average.
It will automatically exclude the infinity and NaN. Usage:
>>> stat = MovAvg(size=66) >>> stat.add(torch.tensor(5)) 5.0 >>> stat.add(float('inf')) # which will not add to stat 5.0 >>> stat.add([6, 7, 8]) 6.5 >>> stat.get() 6.5 >>> print(f'{stat.mean():.2f}±{stat.std():.2f}') 6.50±1.12
-
class
tianshou.utils.
SummaryWriter
(log_dir=None, comment='', purge_step=None, max_queue=10, flush_secs=120, filename_suffix='')[source]¶ Bases:
torch.utils.tensorboard.writer.SummaryWriter
A more convenient Summary Writer(tensorboard.SummaryWriter).
You can get the same instance of summary writer everywhere after you created one.
>>> writer1 = SummaryWriter.get_instance( key="first", log_dir="log/test_sw/first") >>> writer2 = SummaryWriter.get_instance() >>> writer1 is writer2 True >>> writer4 = SummaryWriter.get_instance( key="second", log_dir="log/test_sw/second") >>> writer5 = SummaryWriter.get_instance(key="second") >>> writer1 is not writer4 True >>> writer4 is writer5 True
-
class
tianshou.utils.net.common.
MLP
(input_dim: int, output_dim: int = 0, hidden_sizes: Sequence[int] = (), norm_layer: Optional[Union[Type[torch.nn.modules.module.Module], Sequence[Type[torch.nn.modules.module.Module]]]] = None, activation: Optional[Union[Type[torch.nn.modules.module.Module], Sequence[Type[torch.nn.modules.module.Module]]]] = <class 'torch.nn.modules.activation.ReLU'>, device: Optional[Union[str, int, torch.device]] = None)[source]¶ Bases:
torch.nn.modules.module.Module
Simple MLP backbone.
Create a MLP of size input_dim * hidden_sizes[0] * hidden_sizes[1] * … * hidden_sizes[-1] * output_dim
- Parameters
input_dim (int) – dimension of the input vector.
output_dim (int) – dimension of the output vector. If set to 0, there is no final linear layer.
hidden_sizes – shape of MLP passed in as a list, not incluing input_dim and output_dim.
norm_layer – use which normalization before activation, e.g.,
nn.LayerNorm
andnn.BatchNorm1d
, defaults to no normalization. You can also pass a list of normalization modules with the same length of hidden_sizes, to use different normalization module in different layers. Default to no normalization.activation – which activation to use after each layer, can be both the same actvition for all layers if passed in nn.Module, or different activation for different Modules if passed in a list. Default to nn.ReLU.
-
forward
(x: Union[numpy.ndarray, torch.Tensor]) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
training
: bool¶
-
class
tianshou.utils.net.common.
Net
(state_shape: Union[int, Sequence[int]], action_shape: Optional[Union[int, Sequence[int]]] = 0, hidden_sizes: Sequence[int] = (), norm_layer: Optional[Type[torch.nn.modules.module.Module]] = None, activation: Optional[Type[torch.nn.modules.module.Module]] = <class 'torch.nn.modules.activation.ReLU'>, device: Union[str, int, torch.device] = 'cpu', softmax: bool = False, concat: bool = False, num_atoms: int = 1, dueling_param: Optional[Tuple[Dict[str, Any], Dict[str, Any]]] = None)[source]¶ Bases:
torch.nn.modules.module.Module
Wrapper of MLP to support more specific DRL usage.
For advanced usage (how to customize the network), please refer to Build the Network.
- Parameters
state_shape – int or a sequence of int of the shape of state.
action_shape – int or a sequence of int of the shape of action.
hidden_sizes – shape of MLP passed in as a list.
norm_layer – use which normalization before activation, e.g.,
nn.LayerNorm
andnn.BatchNorm1d
, defaults to no normalization. You can also pass a list of normalization modules with the same length of hidden_sizes, to use different normalization module in different layers. Default to no normalization.activation – which activation to use after each layer, can be both the same actvition for all layers if passed in nn.Module, or different activation for different Modules if passed in a list. Default to nn.ReLU.
device – specify the device when the network actually runs. Default to “cpu”.
softmax (bool) – whether to apply a softmax layer over the last layer’s output.
concat (bool) – whether the input shape is concatenated by state_shape and action_shape. If it is True,
action_shape
is not the output shape, but affects the input shape only.num_atoms (int) – in order to expand to the net of distributional RL, defaults to 1 (not use).
dueling_param (bool) – whether to use dueling network to calculate Q values (for Dueling DQN). If you want to use dueling option, you should pass a tuple of two dict (first for Q and second for V) stating self-defined arguments as stated in class:~tianshou.utils.net.common.MLP. Defaults to None.
See also
Please refer to
MLP
for more detailed explanation on the usage of activation, norm_layer, etc.You can also refer to
Actor
,Critic
, etc, to see how it’s suggested be used.-
forward
(s: Union[numpy.ndarray, torch.Tensor], state: Optional[Any] = None, info: Dict[str, Any] = {}) → Tuple[torch.Tensor, Any][source]¶ Mapping: s -> flatten (inside MLP)-> logits.
-
training
: bool¶
-
class
tianshou.utils.net.common.
Recurrent
(layer_num: int, state_shape: Union[int, Sequence[int]], action_shape: Union[int, Sequence[int]], device: Union[str, int, torch.device] = 'cpu', hidden_layer_size: int = 128)[source]¶ Bases:
torch.nn.modules.module.Module
Simple Recurrent network based on LSTM.
For advanced usage (how to customize the network), please refer to Build the Network.
-
forward
(s: Union[numpy.ndarray, torch.Tensor], state: Optional[Dict[str, torch.Tensor]] = None, info: Dict[str, Any] = {}) → Tuple[torch.Tensor, Dict[str, torch.Tensor]][source]¶ Mapping: s -> flatten -> logits.
In the evaluation mode, s should be with shape
[bsz, dim]
; in the training mode, s should be with shape[bsz, len, dim]
. See the code and comment for more detail.
-
training
: bool¶
-
-
tianshou.utils.net.common.
miniblock
(input_size: int, output_size: int = 0, norm_layer: Optional[Type[torch.nn.modules.module.Module]] = None, activation: Optional[Type[torch.nn.modules.module.Module]] = None) → List[torch.nn.modules.module.Module][source]¶ Construct a miniblock with given input/output-size, norm layer and activation.
-
class
tianshou.utils.net.discrete.
Actor
(preprocess_net: torch.nn.modules.module.Module, action_shape: Sequence[int], hidden_sizes: Sequence[int] = (), softmax_output: bool = True, preprocess_net_output_dim: Optional[int] = None)[source]¶ Bases:
torch.nn.modules.module.Module
Simple actor network.
Will create an actor operated in discrete action space with structure of preprocess_net —> action_shape.
- Parameters
preprocess_net – a self-defined preprocess_net which output a flattened hidden state.
action_shape – a sequence of int for the shape of action.
hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).
softmax_output (bool) – whether to apply a softmax layer over the last layer’s output.
preprocess_net_output_dim (int) – the output dimension of preprocess_net.
For advanced usage (how to customize the network), please refer to Build the Network.
See also
Please refer to
Net
as an instance of how preprocess_net is suggested to be defined.-
forward
(s: Union[numpy.ndarray, torch.Tensor], state: Optional[Any] = None, info: Dict[str, Any] = {}) → Tuple[torch.Tensor, Any][source]¶ Mapping: s -> Q(s, *).
-
training
: bool¶
-
class
tianshou.utils.net.discrete.
Critic
(preprocess_net: torch.nn.modules.module.Module, hidden_sizes: Sequence[int] = (), last_size: int = 1, preprocess_net_output_dim: Optional[int] = None)[source]¶ Bases:
torch.nn.modules.module.Module
Simple critic network. Will create an actor operated in discrete action space with structure of preprocess_net —> 1(q value).
- Parameters
preprocess_net – a self-defined preprocess_net which output a flattened hidden state.
hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).
last_size (int) – the output dimension of Critic network. Default to 1.
preprocess_net_output_dim (int) – the output dimension of preprocess_net.
For advanced usage (how to customize the network), please refer to Build the Network.
See also
Please refer to
Net
as an instance of how preprocess_net is suggested to be defined.-
forward
(s: Union[numpy.ndarray, torch.Tensor], **kwargs: Any) → torch.Tensor[source]¶ Mapping: s -> V(s).
-
training
: bool¶
-
class
tianshou.utils.net.continuous.
Actor
(preprocess_net: torch.nn.modules.module.Module, action_shape: Sequence[int], hidden_sizes: Sequence[int] = (), max_action: float = 1.0, device: Union[str, int, torch.device] = 'cpu', preprocess_net_output_dim: Optional[int] = None)[source]¶ Bases:
torch.nn.modules.module.Module
Simple actor network. Will create an actor operated in continuous action space with structure of preprocess_net —> action_shape.
- Parameters
preprocess_net – a self-defined preprocess_net which output a flattened hidden state.
action_shape – a sequence of int for the shape of action.
hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).
max_action (float) – the scale for the final action logits. Default to 1.
preprocess_net_output_dim (int) – the output dimension of preprocess_net.
For advanced usage (how to customize the network), please refer to Build the Network.
See also
Please refer to
Net
as an instance of how preprocess_net is suggested to be defined.-
forward
(s: Union[numpy.ndarray, torch.Tensor], state: Optional[Any] = None, info: Dict[str, Any] = {}) → Tuple[torch.Tensor, Any][source]¶ Mapping: s -> logits -> action.
-
training
: bool¶
-
class
tianshou.utils.net.continuous.
ActorProb
(preprocess_net: torch.nn.modules.module.Module, action_shape: Sequence[int], hidden_sizes: Sequence[int] = (), max_action: float = 1.0, device: Union[str, int, torch.device] = 'cpu', unbounded: bool = False, conditioned_sigma: bool = False, preprocess_net_output_dim: Optional[int] = None)[source]¶ Bases:
torch.nn.modules.module.Module
Simple actor network (output with a Gauss distribution).
- Parameters
preprocess_net – a self-defined preprocess_net which output a flattened hidden state.
action_shape – a sequence of int for the shape of action.
hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).
max_action (float) – the scale for the final action logits. Default to 1.
unbounded (bool) – whether to apply tanh activation on final logits. Default to False.
conditioned_sigma (bool) – True when sigma is calculated from the input, False when sigma is an independent parameter. Default to False.
preprocess_net_output_dim (int) – the output dimension of preprocess_net.
For advanced usage (how to customize the network), please refer to Build the Network.
See also
Please refer to
Net
as an instance of how preprocess_net is suggested to be defined.-
forward
(s: Union[numpy.ndarray, torch.Tensor], state: Optional[Any] = None, info: Dict[str, Any] = {}) → Tuple[Tuple[torch.Tensor, torch.Tensor], Any][source]¶ Mapping: s -> logits -> (mu, sigma).
-
training
: bool¶
-
class
tianshou.utils.net.continuous.
Critic
(preprocess_net: torch.nn.modules.module.Module, hidden_sizes: Sequence[int] = (), device: Union[str, int, torch.device] = 'cpu', preprocess_net_output_dim: Optional[int] = None)[source]¶ Bases:
torch.nn.modules.module.Module
Simple critic network. Will create an actor operated in continuous action space with structure of preprocess_net —> 1(q value).
- Parameters
preprocess_net – a self-defined preprocess_net which output a flattened hidden state.
hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).
preprocess_net_output_dim (int) – the output dimension of preprocess_net.
For advanced usage (how to customize the network), please refer to Build the Network.
See also
Please refer to
Net
as an instance of how preprocess_net is suggested to be defined.-
forward
(s: Union[numpy.ndarray, torch.Tensor], a: Optional[Union[numpy.ndarray, torch.Tensor]] = None, info: Dict[str, Any] = {}) → torch.Tensor[source]¶ Mapping: (s, a) -> logits -> Q(s, a).
-
training
: bool¶
-
class
tianshou.utils.net.continuous.
RecurrentActorProb
(layer_num: int, state_shape: Sequence[int], action_shape: Sequence[int], hidden_layer_size: int = 128, max_action: float = 1.0, device: Union[str, int, torch.device] = 'cpu', unbounded: bool = False, conditioned_sigma: bool = False)[source]¶ Bases:
torch.nn.modules.module.Module
Recurrent version of ActorProb.
For advanced usage (how to customize the network), please refer to Build the Network.
-
forward
(s: Union[numpy.ndarray, torch.Tensor], state: Optional[Dict[str, torch.Tensor]] = None, info: Dict[str, Any] = {}) → Tuple[Tuple[torch.Tensor, torch.Tensor], Dict[str, torch.Tensor]][source]¶ Almost the same as
Recurrent
.
-
training
: bool¶
-
-
class
tianshou.utils.net.continuous.
RecurrentCritic
(layer_num: int, state_shape: Sequence[int], action_shape: Sequence[int] = [0], device: Union[str, int, torch.device] = 'cpu', hidden_layer_size: int = 128)[source]¶ Bases:
torch.nn.modules.module.Module
Recurrent version of Critic.
For advanced usage (how to customize the network), please refer to Build the Network.
-
forward
(s: Union[numpy.ndarray, torch.Tensor], a: Optional[Union[numpy.ndarray, torch.Tensor]] = None, info: Dict[str, Any] = {}) → torch.Tensor[source]¶ Almost the same as
Recurrent
.
-
training
: bool¶
-