tianshou.utils¶
-
class
tianshou.utils.
MovAvg
(size: int = 100)[source]¶ Bases:
object
Class for moving average.
It will automatically exclude the infinity and NaN. Usage:
>>> stat = MovAvg(size=66) >>> stat.add(torch.tensor(5)) 5.0 >>> stat.add(float('inf')) # which will not add to stat 5.0 >>> stat.add([6, 7, 8]) 6.5 >>> stat.get() 6.5 >>> print(f'{stat.mean():.2f}±{stat.std():.2f}') 6.50±1.12
-
class
tianshou.utils.net.common.
Net
(layer_num: int, state_shape: tuple, action_shape: Optional[Union[tuple, int]] = 0, device: Union[str, int, torch.device] = 'cpu', softmax: bool = False, concat: bool = False, hidden_layer_size: int = 128, dueling: Optional[Tuple[int, int]] = None, norm_layer: Optional[Callable[[int], torch.nn.modules.module.Module]] = None)[source]¶ Bases:
torch.nn.modules.module.Module
Simple MLP backbone.
For advanced usage (how to customize the network), please refer to Build the Network.
- Parameters
concat (bool) – whether the input shape is concatenated by state_shape and action_shape. If it is True,
action_shape
is not the output shape, but affects the input shape.dueling (bool) – whether to use dueling network to calculate Q values (for Dueling DQN), defaults to False.
norm_layer – use which normalization before ReLU, e.g.,
nn.LayerNorm
andnn.BatchNorm1d
, defaults to None.
-
forward
(s: Union[numpy.ndarray, torch.Tensor], state: Optional[Any] = None, info: Dict[str, Any] = {}) → Tuple[torch.Tensor, Any][source]¶ Mapping: s -> flatten -> logits.
-
training
: bool¶
-
class
tianshou.utils.net.common.
Recurrent
(layer_num: int, state_shape: Sequence[int], action_shape: Sequence[int], device: Union[str, int, torch.device] = 'cpu', hidden_layer_size: int = 128)[source]¶ Bases:
torch.nn.modules.module.Module
Simple Recurrent network based on LSTM.
For advanced usage (how to customize the network), please refer to Build the Network.
-
forward
(s: Union[numpy.ndarray, torch.Tensor], state: Optional[Dict[str, torch.Tensor]] = None, info: Dict[str, Any] = {}) → Tuple[torch.Tensor, Dict[str, torch.Tensor]][source]¶ Mapping: s -> flatten -> logits.
In the evaluation mode, s should be with shape
[bsz, dim]
; in the training mode, s should be with shape[bsz, len, dim]
. See the code and comment for more detail.
-
training
: bool¶
-
-
tianshou.utils.net.common.
miniblock
(inp: int, oup: int, norm_layer: Optional[Callable[[int], torch.nn.modules.module.Module]]) → List[torch.nn.modules.module.Module][source]¶ Construct a miniblock with given input/output-size and norm layer.
-
class
tianshou.utils.net.discrete.
Actor
(preprocess_net: torch.nn.modules.module.Module, action_shape: Sequence[int], hidden_layer_size: int = 128, softmax_output: bool = True)[source]¶ Bases:
torch.nn.modules.module.Module
Simple actor network with MLP.
For advanced usage (how to customize the network), please refer to Build the Network.
-
forward
(s: Union[numpy.ndarray, torch.Tensor], state: Optional[Any] = None, info: Dict[str, Any] = {}) → Tuple[torch.Tensor, Any][source]¶ Mapping: s -> Q(s, *).
-
training
: bool¶
-
-
class
tianshou.utils.net.discrete.
Critic
(preprocess_net: torch.nn.modules.module.Module, hidden_layer_size: int = 128, last_size: int = 1)[source]¶ Bases:
torch.nn.modules.module.Module
Simple critic network with MLP.
For advanced usage (how to customize the network), please refer to Build the Network.
-
forward
(s: Union[numpy.ndarray, torch.Tensor], **kwargs: Any) → torch.Tensor[source]¶ Mapping: s -> V(s).
-
training
: bool¶
-
-
class
tianshou.utils.net.discrete.
DQN
(c: int, h: int, w: int, action_shape: Sequence[int], device: Union[str, int, torch.device] = 'cpu')[source]¶ Bases:
torch.nn.modules.module.Module
Reference: Human-level control through deep reinforcement learning.
For advanced usage (how to customize the network), please refer to Build the Network.
-
forward
(x: Union[numpy.ndarray, torch.Tensor], state: Optional[Any] = None, info: Dict[str, Any] = {}) → Tuple[torch.Tensor, Any][source]¶ Mapping: x -> Q(x, *).
-
training
: bool¶
-
-
class
tianshou.utils.net.continuous.
Actor
(preprocess_net: torch.nn.modules.module.Module, action_shape: Sequence[int], max_action: float = 1.0, device: Union[str, int, torch.device] = 'cpu', hidden_layer_size: int = 128)[source]¶ Bases:
torch.nn.modules.module.Module
Simple actor network with MLP.
For advanced usage (how to customize the network), please refer to Build the Network.
-
forward
(s: Union[numpy.ndarray, torch.Tensor], state: Optional[Any] = None, info: Dict[str, Any] = {}) → Tuple[torch.Tensor, Any][source]¶ Mapping: s -> logits -> action.
-
training
: bool¶
-
-
class
tianshou.utils.net.continuous.
ActorProb
(preprocess_net: torch.nn.modules.module.Module, action_shape: Sequence[int], max_action: float = 1.0, device: Union[str, int, torch.device] = 'cpu', unbounded: bool = False, hidden_layer_size: int = 128)[source]¶ Bases:
torch.nn.modules.module.Module
Simple actor network (output with a Gauss distribution) with MLP.
For advanced usage (how to customize the network), please refer to Build the Network.
-
forward
(s: Union[numpy.ndarray, torch.Tensor], state: Optional[Any] = None, info: Dict[str, Any] = {}) → Tuple[Tuple[torch.Tensor, torch.Tensor], Any][source]¶ Mapping: s -> logits -> (mu, sigma).
-
training
: bool¶
-
-
class
tianshou.utils.net.continuous.
Critic
(preprocess_net: torch.nn.modules.module.Module, device: Union[str, int, torch.device] = 'cpu', hidden_layer_size: int = 128)[source]¶ Bases:
torch.nn.modules.module.Module
Simple critic network with MLP.
For advanced usage (how to customize the network), please refer to Build the Network.
-
forward
(s: Union[numpy.ndarray, torch.Tensor], a: Optional[Union[numpy.ndarray, torch.Tensor]] = None, info: Dict[str, Any] = {}) → torch.Tensor[source]¶ Mapping: (s, a) -> logits -> Q(s, a).
-
training
: bool¶
-
-
class
tianshou.utils.net.continuous.
RecurrentActorProb
(layer_num: int, state_shape: Sequence[int], action_shape: Sequence[int], max_action: float = 1.0, device: Union[str, int, torch.device] = 'cpu', unbounded: bool = False, hidden_layer_size: int = 128)[source]¶ Bases:
torch.nn.modules.module.Module
Recurrent version of ActorProb.
For advanced usage (how to customize the network), please refer to Build the Network.
-
forward
(s: Union[numpy.ndarray, torch.Tensor], state: Optional[Dict[str, torch.Tensor]] = None, info: Dict[str, Any] = {}) → Tuple[Tuple[torch.Tensor, torch.Tensor], Dict[str, torch.Tensor]][source]¶ Almost the same as
Recurrent
.
-
training
: bool¶
-
-
class
tianshou.utils.net.continuous.
RecurrentCritic
(layer_num: int, state_shape: Sequence[int], action_shape: Sequence[int] = [0], device: Union[str, int, torch.device] = 'cpu', hidden_layer_size: int = 128)[source]¶ Bases:
torch.nn.modules.module.Module
Recurrent version of Critic.
For advanced usage (how to customize the network), please refer to Build the Network.
-
forward
(s: Union[numpy.ndarray, torch.Tensor], a: Optional[Union[numpy.ndarray, torch.Tensor]] = None, info: Dict[str, Any] = {}) → torch.Tensor[source]¶ Almost the same as
Recurrent
.
-
training
: bool¶
-