td3

td3#

Source code: tianshou/policy/modelfree/td3.py

class TD3Policy(*, actor: Module, actor_optim: Optimizer, critic: Module, critic_optim: Optimizer, action_space: Space, critic2: Module | None = None, critic2_optim: Optimizer | None = None, tau: float = 0.005, gamma: float = 0.99, exploration_noise: BaseNoise | Literal['default'] | None = 'default', policy_noise: float = 0.2, update_actor_freq: int = 2, noise_clip: float = 0.5, estimation_step: int = 1, observation_space: Space | None = None, action_scaling: bool = True, action_bound_method: Literal['clip'] | None = 'clip', lr_scheduler: LRScheduler | MultipleLRSchedulers | None = None)[source]#

Implementation of TD3, arXiv:1802.09477.

Parameters:

actor – the actor network following the rules in BasePolicy. (s -> logits)
actor_optim – the optimizer for actor network.
critic – the first critic network. (s, a -> Q(s, a))
critic_optim – the optimizer for the first critic network.
action_space – Env’s action space. Should be gym.spaces.Box.
critic2 – the second critic network. (s, a -> Q(s, a)). If None, use the same network as critic (via deepcopy).
critic2_optim – the optimizer for the second critic network. If None, clone critic_optim to use for critic2.parameters().
tau – param for soft update of the target network.
gamma – discount factor, in [0, 1].
exploration_noise – add noise to action for exploration. This is useful when solving “hard exploration” problems. “default” is equivalent to GaussianNoise(sigma=0.1).
policy_noise – the noise used in updating policy network.
update_actor_freq – the update frequency of actor network.
noise_clip – the clipping range used in updating policy network.
observation_space – Env’s observation space.
action_scaling – if True, scale the action from [-1, 1] to the range of action_space. Only used if the action_space is continuous.
action_bound_method – method to bound action to range [-1, 1]. Only used if the action_space is continuous.
lr_scheduler – a learning rate scheduler that adjusts the learning rate in optimizer in each policy.update()

td3

Contents

td3#