ddpg

ddpg#

Source code: tianshou/policy/modelfree/ddpg.py

class DDPGPolicy(*, actor: Module, actor_optim: Optimizer, critic: Module, critic_optim: Optimizer, action_space: Space, tau: float = 0.005, gamma: float = 0.99, exploration_noise: BaseNoise | Literal['default'] | None = 'default', estimation_step: int = 1, observation_space: Space | None = None, action_scaling: bool = True, action_bound_method: Literal['clip'] | None = 'clip', lr_scheduler: LRScheduler | MultipleLRSchedulers | None = None)[source]#

Implementation of Deep Deterministic Policy Gradient. arXiv:1509.02971.

Parameters:

actor – The actor network following the rules in BasePolicy. (s -> model_output)
actor_optim – The optimizer for actor network.
critic – The critic network. (s, a -> Q(s, a))
critic_optim – The optimizer for critic network.
action_space – Env’s action space.
tau – Param for soft update of the target network.
gamma – Discount factor, in [0, 1].
exploration_noise – The exploration noise, added to the action. Defaults to GaussianNoise(sigma=0.1).
estimation_step – The number of steps to look ahead.
observation_space – Env’s observation space.
action_scaling – if True, scale the action from [-1, 1] to the range of action_space. Only used if the action_space is continuous.
action_bound_method – method to bound action to range [-1, 1]. Only used if the action_space is continuous.
lr_scheduler – if not None, will be called in policy.update().

ddpg

Contents

ddpg#