common

common#

Source code: tianshou/utils/net/common.py

class ActorCritic(actor: Module, critic: Module)[source]#

An actor-critic network for parsing parameters.

Using actor_critic.parameters() instead of set.union or list+list to avoid issue #449.

Parameters:

actor (nn.Module) – the actor network.
critic (nn.Module) – the critic network.

class BaseActor(*args, **kwargs)[source]#

abstract get_output_dim() → int[source]#

abstract get_preprocess_net() → Module[source]#

class BranchingNet(state_shape: int | ~collections.abc.Sequence[int], num_branches: int = 0, action_per_branch: int = 2, common_hidden_sizes: list[int] | None = None, value_hidden_sizes: list[int] | None = None, action_hidden_sizes: list[int] | None = None, norm_layer: type[~torch.nn.modules.module.Module] | None = None, norm_args: tuple[~typing.Any, ...] | dict[~typing.Any, ~typing.Any] | ~collections.abc.Sequence[tuple[~typing.Any, ...]] | ~collections.abc.Sequence[dict[~typing.Any, ~typing.Any]] | None = None, activation: type[~torch.nn.modules.module.Module] | None = <class 'torch.nn.modules.activation.ReLU'>, act_args: tuple[~typing.Any, ...] | dict[~typing.Any, ~typing.Any] | ~collections.abc.Sequence[tuple[~typing.Any, ...]] | ~collections.abc.Sequence[dict[~typing.Any, ~typing.Any]] | None = None, device: str | int | ~torch.device = 'cpu')[source]#

Branching dual Q network.

Network for the BranchingDQNPolicy, it uses a common network module, a value module and action “branches” one for each dimension.It allows for a linear scaling of Q-value the output w.r.t. the number of dimensions in the action space. For more info please refer to: arXiv:1711.08946. :param state_shape: int or a sequence of int of the shape of state. :param action_shape: int or a sequence of int of the shape of action. :param action_peer_branch: int or a sequence of int of the number of actions in each dimension. :param common_hidden_sizes: shape of the common MLP network passed in as a list. :param value_hidden_sizes: shape of the value MLP network passed in as a list. :param action_hidden_sizes: shape of the action MLP network passed in as a list. :param norm_layer: use which normalization before activation, e.g., nn.LayerNorm and nn.BatchNorm1d. Default to no normalization. You can also pass a list of normalization modules with the same length of hidden_sizes, to use different normalization module in different layers. Default to no normalization. :param activation: which activation to use after each layer, can be both the same activation for all layers if passed in nn.Module, or different activation for different Modules if passed in a list. Default to nn.ReLU. :param device: specify the device when the network actually runs. Default to “cpu”. :param softmax: whether to apply a softmax layer over the last layer’s output.

forward(obs: ndarray | Tensor, state: Any = None, **kwargs: Any) → tuple[Tensor, Any][source]#: Mapping: obs -> model -> logits.

class DataParallelNet(net: Module)[source]#

DataParallel wrapper for training agent with multi-GPU.

This class does only the conversion of input data type, from numpy array to torch’s Tensor. If the input is a nested dictionary, the user should create a similar class to do the same thing.

Parameters:: net (nn.Module) – the network to be distributed in different GPUs.

forward(obs: ndarray | Tensor, *args: Any, **kwargs: Any) → tuple[Any, Any][source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class EnsembleLinear(ensemble_size: int, in_feature: int, out_feature: int, bias: bool = True)[source]#

Linear Layer of Ensemble network.

Parameters:

ensemble_size – Number of subnets in the ensemble.
in_feature – dimension of the input vector.
out_feature – dimension of the output vector.
bias – whether to include an additive bias, default to be True.

forward(x: Tensor) → Tensor[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class MLP(input_dim: int, output_dim: int = 0, hidden_sizes: ~collections.abc.Sequence[int] = (), norm_layer: type[~torch.nn.modules.module.Module] | ~collections.abc.Sequence[type[~torch.nn.modules.module.Module]] | None = None, norm_args: tuple[~typing.Any, ...] | dict[~typing.Any, ~typing.Any] | ~collections.abc.Sequence[tuple[~typing.Any, ...]] | ~collections.abc.Sequence[dict[~typing.Any, ~typing.Any]] | None = None, activation: type[~torch.nn.modules.module.Module] | ~collections.abc.Sequence[type[~torch.nn.modules.module.Module]] | None = <class 'torch.nn.modules.activation.ReLU'>, act_args: tuple[~typing.Any, ...] | dict[~typing.Any, ~typing.Any] | ~collections.abc.Sequence[tuple[~typing.Any, ...]] | ~collections.abc.Sequence[dict[~typing.Any, ~typing.Any]] | None = None, device: str | int | ~torch.device | None = None, linear_layer: ~collections.abc.Callable[[int, int], ~torch.nn.modules.module.Module] = <class 'torch.nn.modules.linear.Linear'>, flatten_input: bool = True)[source]#

Simple MLP backbone.

Create a MLP of size input_dim * hidden_sizes[0] * hidden_sizes[1] * … * hidden_sizes[-1] * output_dim

Parameters:

input_dim – dimension of the input vector.
output_dim – dimension of the output vector. If set to 0, there is no final linear layer.
hidden_sizes – shape of MLP passed in as a list, not including input_dim and output_dim.
norm_layer – use which normalization before activation, e.g., nn.LayerNorm and nn.BatchNorm1d. Default to no normalization. You can also pass a list of normalization modules with the same length of hidden_sizes, to use different normalization module in different layers. Default to no normalization.
activation – which activation to use after each layer, can be both the same activation for all layers if passed in nn.Module, or different activation for different Modules if passed in a list. Default to nn.ReLU.
device – which device to create this model on. Default to None.
linear_layer – use this module as linear layer. Default to nn.Linear.
flatten_input – whether to flatten input data. Default to True.

forward(obs: ndarray | Tensor) → Tensor[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class Net(state_shape: int | ~collections.abc.Sequence[int], action_shape: ~collections.abc.Sequence[int] | int | ~numpy.int64 = 0, hidden_sizes: ~collections.abc.Sequence[int] = (), norm_layer: type[~torch.nn.modules.module.Module] | ~collections.abc.Sequence[type[~torch.nn.modules.module.Module]] | None = None, norm_args: tuple[~typing.Any, ...] | dict[~typing.Any, ~typing.Any] | ~collections.abc.Sequence[tuple[~typing.Any, ...]] | ~collections.abc.Sequence[dict[~typing.Any, ~typing.Any]] | None = None, activation: type[~torch.nn.modules.module.Module] | ~collections.abc.Sequence[type[~torch.nn.modules.module.Module]] | None = <class 'torch.nn.modules.activation.ReLU'>, act_args: tuple[~typing.Any, ...] | dict[~typing.Any, ~typing.Any] | ~collections.abc.Sequence[tuple[~typing.Any, ...]] | ~collections.abc.Sequence[dict[~typing.Any, ~typing.Any]] | None = None, device: str | int | ~torch.device = 'cpu', softmax: bool = False, concat: bool = False, num_atoms: int = 1, dueling_param: tuple[dict[str, ~typing.Any], dict[str, ~typing.Any]] | None = None, linear_layer: ~collections.abc.Callable[[int, int], ~torch.nn.modules.module.Module] = <class 'torch.nn.modules.linear.Linear'>)[source]#

Wrapper of MLP to support more specific DRL usage.

For advanced usage (how to customize the network), please refer to Build the Network.

Parameters:

state_shape – int or a sequence of int of the shape of state.
action_shape – int or a sequence of int of the shape of action.
hidden_sizes – shape of MLP passed in as a list.
norm_layer – use which normalization before activation, e.g., nn.LayerNorm and nn.BatchNorm1d. Default to no normalization. You can also pass a list of normalization modules with the same length of hidden_sizes, to use different normalization module in different layers. Default to no normalization.
activation – which activation to use after each layer, can be both the same activation for all layers if passed in nn.Module, or different activation for different Modules if passed in a list. Default to nn.ReLU.
device – specify the device when the network actually runs. Default to “cpu”.
softmax – whether to apply a softmax layer over the last layer’s output.
concat – whether the input shape is concatenated by state_shape and action_shape. If it is True, action_shape is not the output shape, but affects the input shape only.
num_atoms – in order to expand to the net of distributional RL. Default to 1 (not use).
dueling_param – whether to use dueling network to calculate Q values (for Dueling DQN). If you want to use dueling option, you should pass a tuple of two dict (first for Q and second for V) stating self-defined arguments as stated in class:~tianshou.utils.net.common.MLP. Default to None.
linear_layer – use this module constructor, which takes the input and output dimension as input, as linear layer. Default to nn.Linear.

common

Contents

common#