common#


class ActorCritic(actor: Module, critic: Module)[source]#

An actor-critic network for parsing parameters.

Using actor_critic.parameters() instead of set.union or list+list to avoid issue #449.

Parameters:
  • actor (nn.Module) – the actor network.

  • critic (nn.Module) – the critic network.

class BaseActor(*args, **kwargs)[source]#
abstract get_output_dim() int[source]#
abstract get_preprocess_net() Module[source]#
class BranchingNet(state_shape: int | ~collections.abc.Sequence[int], num_branches: int = 0, action_per_branch: int = 2, common_hidden_sizes: list[int] | None = None, value_hidden_sizes: list[int] | None = None, action_hidden_sizes: list[int] | None = None, norm_layer: type[~torch.nn.modules.module.Module] | None = None, norm_args: tuple[~typing.Any, ...] | dict[~typing.Any, ~typing.Any] | ~collections.abc.Sequence[tuple[~typing.Any, ...]] | ~collections.abc.Sequence[dict[~typing.Any, ~typing.Any]] | None = None, activation: type[~torch.nn.modules.module.Module] | None = <class 'torch.nn.modules.activation.ReLU'>, act_args: tuple[~typing.Any, ...] | dict[~typing.Any, ~typing.Any] | ~collections.abc.Sequence[tuple[~typing.Any, ...]] | ~collections.abc.Sequence[dict[~typing.Any, ~typing.Any]] | None = None, device: str | int | ~torch.device = 'cpu')[source]#

Branching dual Q network.

Network for the BranchingDQNPolicy, it uses a common network module, a value module and action “branches” one for each dimension.It allows for a linear scaling of Q-value the output w.r.t. the number of dimensions in the action space. For more info please refer to: arXiv:1711.08946. :param state_shape: int or a sequence of int of the shape of state. :param action_shape: int or a sequence of int of the shape of action. :param action_peer_branch: int or a sequence of int of the number of actions in each dimension. :param common_hidden_sizes: shape of the common MLP network passed in as a list. :param value_hidden_sizes: shape of the value MLP network passed in as a list. :param action_hidden_sizes: shape of the action MLP network passed in as a list. :param norm_layer: use which normalization before activation, e.g., nn.LayerNorm and nn.BatchNorm1d. Default to no normalization. You can also pass a list of normalization modules with the same length of hidden_sizes, to use different normalization module in different layers. Default to no normalization. :param activation: which activation to use after each layer, can be both the same activation for all layers if passed in nn.Module, or different activation for different Modules if passed in a list. Default to nn.ReLU. :param device: specify the device when the network actually runs. Default to “cpu”. :param softmax: whether to apply a softmax layer over the last layer’s output.

forward(obs: ndarray | Tensor, state: Any = None, **kwargs: Any) tuple[Tensor, Any][source]#

Mapping: obs -> model -> logits.

class DataParallelNet(net: Module)[source]#

DataParallel wrapper for training agent with multi-GPU.

This class does only the conversion of input data type, from numpy array to torch’s Tensor. If the input is a nested dictionary, the user should create a similar class to do the same thing.

Parameters:

net (nn.Module) – the network to be distributed in different GPUs.

forward(obs: ndarray | Tensor, *args: Any, **kwargs: Any) tuple[Any, Any][source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class EnsembleLinear(ensemble_size: int, in_feature: int, out_feature: int, bias: bool = True)[source]#

Linear Layer of Ensemble network.

Parameters:
  • ensemble_size – Number of subnets in the ensemble.

  • in_feature – dimension of the input vector.

  • out_feature – dimension of the output vector.

  • bias – whether to include an additive bias, default to be True.

forward(x: Tensor) Tensor[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class MLP(input_dim: int, output_dim: int = 0, hidden_sizes: ~collections.abc.Sequence[int] = (), norm_layer: type[~torch.nn.modules.module.Module] | ~collections.abc.Sequence[type[~torch.nn.modules.module.Module]] | None = None, norm_args: tuple[~typing.Any, ...] | dict[~typing.Any, ~typing.Any] | ~collections.abc.Sequence[tuple[~typing.Any, ...]] | ~collections.abc.Sequence[dict[~typing.Any, ~typing.Any]] | None = None, activation: type[~torch.nn.modules.module.Module] | ~collections.abc.Sequence[type[~torch.nn.modules.module.Module]] | None = <class 'torch.nn.modules.activation.ReLU'>, act_args: tuple[~typing.Any, ...] | dict[~typing.Any, ~typing.Any] | ~collections.abc.Sequence[tuple[~typing.Any, ...]] | ~collections.abc.Sequence[dict[~typing.Any, ~typing.Any]] | None = None, device: str | int | ~torch.device | None = None, linear_layer: ~collections.abc.Callable[[int, int], ~torch.nn.modules.module.Module] = <class 'torch.nn.modules.linear.Linear'>, flatten_input: bool = True)[source]#

Simple MLP backbone.

Create a MLP of size input_dim * hidden_sizes[0] * hidden_sizes[1] * … * hidden_sizes[-1] * output_dim

Parameters:
  • input_dim – dimension of the input vector.

  • output_dim – dimension of the output vector. If set to 0, there is no final linear layer.

  • hidden_sizes – shape of MLP passed in as a list, not including input_dim and output_dim.

  • norm_layer – use which normalization before activation, e.g., nn.LayerNorm and nn.BatchNorm1d. Default to no normalization. You can also pass a list of normalization modules with the same length of hidden_sizes, to use different normalization module in different layers. Default to no normalization.

  • activation – which activation to use after each layer, can be both the same activation for all layers if passed in nn.Module, or different activation for different Modules if passed in a list. Default to nn.ReLU.

  • device – which device to create this model on. Default to None.

  • linear_layer – use this module as linear layer. Default to nn.Linear.

  • flatten_input – whether to flatten input data. Default to True.

forward(obs: ndarray | Tensor) Tensor[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class Net(state_shape: int | ~collections.abc.Sequence[int], action_shape: ~collections.abc.Sequence[int] | int | ~numpy.int64 = 0, hidden_sizes: ~collections.abc.Sequence[int] = (), norm_layer: type[~torch.nn.modules.module.Module] | ~collections.abc.Sequence[type[~torch.nn.modules.module.Module]] | None = None, norm_args: tuple[~typing.Any, ...] | dict[~typing.Any, ~typing.Any] | ~collections.abc.Sequence[tuple[~typing.Any, ...]] | ~collections.abc.Sequence[dict[~typing.Any, ~typing.Any]] | None = None, activation: type[~torch.nn.modules.module.Module] | ~collections.abc.Sequence[type[~torch.nn.modules.module.Module]] | None = <class 'torch.nn.modules.activation.ReLU'>, act_args: tuple[~typing.Any, ...] | dict[~typing.Any, ~typing.Any] | ~collections.abc.Sequence[tuple[~typing.Any, ...]] | ~collections.abc.Sequence[dict[~typing.Any, ~typing.Any]] | None = None, device: str | int | ~torch.device = 'cpu', softmax: bool = False, concat: bool = False, num_atoms: int = 1, dueling_param: tuple[dict[str, ~typing.Any], dict[str, ~typing.Any]] | None = None, linear_layer: ~collections.abc.Callable[[int, int], ~torch.nn.modules.module.Module] = <class 'torch.nn.modules.linear.Linear'>)[source]#

Wrapper of MLP to support more specific DRL usage.

For advanced usage (how to customize the network), please refer to Build the Network.

Parameters:
  • state_shape – int or a sequence of int of the shape of state.

  • action_shape – int or a sequence of int of the shape of action.

  • hidden_sizes – shape of MLP passed in as a list.

  • norm_layer – use which normalization before activation, e.g., nn.LayerNorm and nn.BatchNorm1d. Default to no normalization. You can also pass a list of normalization modules with the same length of hidden_sizes, to use different normalization module in different layers. Default to no normalization.

  • activation – which activation to use after each layer, can be both the same activation for all layers if passed in nn.Module, or different activation for different Modules if passed in a list. Default to nn.ReLU.

  • device – specify the device when the network actually runs. Default to “cpu”.

  • softmax – whether to apply a softmax layer over the last layer’s output.

  • concat – whether the input shape is concatenated by state_shape and action_shape. If it is True, action_shape is not the output shape, but affects the input shape only.

  • num_atoms – in order to expand to the net of distributional RL. Default to 1 (not use).

  • dueling_param – whether to use dueling network to calculate Q values (for Dueling DQN). If you want to use dueling option, you should pass a tuple of two dict (first for Q and second for V) stating self-defined arguments as stated in class:~tianshou.utils.net.common.MLP. Default to None.

  • linear_layer – use this module constructor, which takes the input and output dimension as input, as linear layer. Default to nn.Linear.

See also

Please refer to MLP for more detailed explanation on the usage of activation, norm_layer, etc.

You can also refer to Actor, Critic, etc, to see how it’s suggested be used.

forward(obs: ndarray | Tensor, state: Any = None, **kwargs: Any) tuple[Tensor, Any][source]#

Mapping: obs -> flatten (inside MLP)-> logits.

Parameters:
  • obs

  • state – unused and returned as is

  • kwargs – unused

class NetBase(*args, **kwargs)[source]#

Interface for NNs used in policies.

abstract forward(obs: ndarray | Tensor, state: Any = None, **kwargs: Any) tuple[Tensor, Any][source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class Recurrent(layer_num: int, state_shape: int | Sequence[int], action_shape: Sequence[int] | int | int64, device: str | int | device = 'cpu', hidden_layer_size: int = 128)[source]#

Simple Recurrent network based on LSTM.

For advanced usage (how to customize the network), please refer to Build the Network.

forward(obs: ndarray | Tensor, state: RecurrentStateBatch | dict[str, Tensor] | None = None, **kwargs: Any) tuple[Tensor, dict[str, Tensor]][source]#

Mapping: obs -> flatten -> logits.

In the evaluation mode, obs should be with shape [bsz, dim]; in the training mode, obs should be with shape [bsz, len, dim]. See the code and comment for more detail.

Parameters:
  • obs

  • state – either None or a dict with keys ‘hidden’ and ‘cell’

  • kwargs – unused

Returns:

predicted action, next state as dict with keys ‘hidden’ and ‘cell’

get_dict_state_decorator(state_shape: dict[str, int | Sequence[int]], keys: Sequence[str]) tuple[Callable, int][source]#

A helper function to make Net or equivalent classes (e.g. Actor, Critic) applicable to dict state.

The first return item, decorator_fn, will alter the implementation of forward function of the given class by preprocessing the observation. The preprocessing is basically flatten the observation and concatenate them based on the keys order. The batch dimension is preserved if presented. The result observation shape will be equal to new_state_shape, the second return item.

Parameters:
  • state_shape – A dictionary indicating each state’s shape

  • keys – A list of state’s keys. The flatten observation will be according to this list order.

Returns:

a 2-items tuple decorator_fn and new_state_shape

get_output_dim(module: Module, alt_value: int | None) int[source]#

Retrieves value the output_dim attribute of the given module or uses the given alternative value if the attribute is not present. If both are present, they must match.

Parameters:
  • module – the module

  • alt_value – the alternative value

Returns:

the value

getattr_with_matching_alt_value(obj: Any, attr_name: str, alt_value: T | None) T[source]#

Gets the given attribute from the given object or takes the alternative value if it is not present. If both are present, they are required to match.

Parameters:
  • obj – the object from which to obtain the attribute value

  • attr_name – the attribute name

  • alt_value – the alternative value for the case where the attribute is not present, which cannot be None if the attribute is not present

Returns:

the value

miniblock(input_size: int, output_size: int = 0, norm_layer: type[~torch.nn.modules.module.Module] | None = None, norm_args: tuple[~typing.Any, ...] | dict[~typing.Any, ~typing.Any] | None = None, activation: type[~torch.nn.modules.module.Module] | None = None, act_args: tuple[~typing.Any, ...] | dict[~typing.Any, ~typing.Any] | None = None, linear_layer: ~collections.abc.Callable[[int, int], ~torch.nn.modules.module.Module] = <class 'torch.nn.modules.linear.Linear'>) list[Module][source]#

Construct a miniblock with given input/output-size, norm layer and activation.