discrete#


class Actor(preprocess_net: Module, action_shape: Sequence[int] | int | int64, hidden_sizes: Sequence[int] = (), softmax_output: bool = True, preprocess_net_output_dim: int | None = None, device: str | int | device = 'cpu')[source]#

Simple actor network.

Will create an actor operated in discrete action space with structure of preprocess_net —> action_shape.

Parameters:
  • preprocess_net – a self-defined preprocess_net which output a flattened hidden state.

  • action_shape – a sequence of int for the shape of action.

  • hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).

  • softmax_output – whether to apply a softmax layer over the last layer’s output.

  • preprocess_net_output_dim – the output dimension of preprocess_net.

For advanced usage (how to customize the network), please refer to Build the Network.

See also

Please refer to Net as an instance of how preprocess_net is suggested to be defined.

forward(obs: ndarray | Tensor, state: Any = None, info: dict[str, Any] | None = None) tuple[Tensor, Any][source]#

Mapping: s -> Q(s, *).

get_output_dim() int[source]#
get_preprocess_net() Module[source]#
class CosineEmbeddingNetwork(num_cosines: int, embedding_dim: int)[source]#

Cosine embedding network for IQN. Convert a scalar in [0, 1] to a list of n-dim vectors.

Parameters:
  • num_cosines – the number of cosines used for the embedding.

  • embedding_dim – the dimension of the embedding/output.

Note

From ku2482/fqf-iqn-qrdqn.pytorch /fqf_iqn_qrdqn/network.py .

forward(taus: Tensor) Tensor[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class Critic(preprocess_net: Module, hidden_sizes: Sequence[int] = (), last_size: int = 1, preprocess_net_output_dim: int | None = None, device: str | int | device = 'cpu')[source]#

Simple critic network.

It will create an actor operated in discrete action space with structure of preprocess_net —> 1(q value).

Parameters:
  • preprocess_net – a self-defined preprocess_net which output a flattened hidden state.

  • hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).

  • last_size – the output dimension of Critic network. Default to 1.

  • preprocess_net_output_dim – the output dimension of preprocess_net.

For advanced usage (how to customize the network), please refer to Build the Network.

See also

Please refer to Net as an instance of how preprocess_net is suggested to be defined.

forward(obs: ndarray | Tensor, **kwargs: Any) Tensor[source]#

Mapping: s -> V(s).

class FractionProposalNetwork(num_fractions: int, embedding_dim: int)[source]#

Fraction proposal network for FQF.

Parameters:
  • num_fractions – the number of factions to propose.

  • embedding_dim – the dimension of the embedding/input.

Note

Adapted from ku2482/fqf-iqn-qrdqn.pytorch /fqf_iqn_qrdqn/network.py .

forward(obs_embeddings: Tensor) tuple[Tensor, Tensor, Tensor][source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class FullQuantileFunction(preprocess_net: Module, action_shape: Sequence[int] | int | int64, hidden_sizes: Sequence[int] = (), num_cosines: int = 64, preprocess_net_output_dim: int | None = None, device: str | int | device = 'cpu')[source]#

Full(y parameterized) Quantile Function.

Parameters:
  • preprocess_net – a self-defined preprocess_net which output a flattened hidden state.

  • action_shape – a sequence of int for the shape of action.

  • hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).

  • num_cosines – the number of cosines to use for cosine embedding. Default to 64.

  • preprocess_net_output_dim – the output dimension of preprocess_net.

Note

The first return value is a tuple of (quantiles, fractions, quantiles_tau), where fractions is a Batch(taus, tau_hats, entropies).

forward(obs: ndarray | Tensor, propose_model: FractionProposalNetwork, fractions: Batch | None = None, **kwargs: Any) tuple[Any, Tensor][source]#

Mapping: s -> Q(s, *).

class ImplicitQuantileNetwork(preprocess_net: Module, action_shape: Sequence[int] | int | int64, hidden_sizes: Sequence[int] = (), num_cosines: int = 64, preprocess_net_output_dim: int | None = None, device: str | int | device = 'cpu')[source]#

Implicit Quantile Network.

Parameters:
  • preprocess_net – a self-defined preprocess_net which output a flattened hidden state.

  • action_shape – a sequence of int for the shape of action.

  • hidden_sizes – a sequence of int for constructing the MLP after preprocess_net. Default to empty sequence (where the MLP now contains only a single linear layer).

  • num_cosines – the number of cosines to use for cosine embedding. Default to 64.

  • preprocess_net_output_dim – the output dimension of preprocess_net.

Note

Although this class inherits Critic, it is actually a quantile Q-Network with output shape (batch_size, action_dim, sample_size).

The second item of the first return value is tau vector.

forward(obs: ndarray | Tensor, sample_size: int, **kwargs: Any) tuple[Any, Tensor][source]#

Mapping: s -> Q(s, *).

class IntrinsicCuriosityModule(feature_net: Module, feature_dim: int, action_dim: int, hidden_sizes: Sequence[int] = (), device: str | device = 'cpu')[source]#

Implementation of Intrinsic Curiosity Module. arXiv:1705.05363.

Parameters:
  • feature_net – a self-defined feature_net which output a flattened hidden state.

  • feature_dim – input dimension of the feature net.

  • action_dim – dimension of the action space.

  • hidden_sizes – hidden layer sizes for forward and inverse models.

  • device – device for the module.

forward(s1: ndarray | Tensor, act: ndarray | Tensor, s2: ndarray | Tensor, **kwargs: Any) tuple[Tensor, Tensor][source]#

Mapping: s1, act, s2 -> mse_loss, act_hat.

class NoisyLinear(in_features: int, out_features: int, noisy_std: float = 0.5)[source]#

Implementation of Noisy Networks. arXiv:1706.10295.

Parameters:
  • in_features – the number of input features.

  • out_features – the number of output features.

  • noisy_std – initial standard deviation of noisy linear layers.

Note

Adapted from ku2482/fqf-iqn-qrdqn.pytorch /fqf_iqn_qrdqn/network.py .

f(x: Tensor) Tensor[source]#
forward(x: Tensor) Tensor[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

reset() None[source]#
sample() None[source]#