c51

c51#

Source code: tianshou/policy/modelfree/c51.py

class C51Policy(*, model: Module, optim: Optimizer, action_space: Discrete, discount_factor: float = 0.99, num_atoms: int = 51, v_min: float = -10.0, v_max: float = 10.0, estimation_step: int = 1, target_update_freq: int = 0, reward_normalization: bool = False, is_double: bool = True, clip_loss_grad: bool = False, observation_space: Space | None = None, lr_scheduler: LRScheduler | MultipleLRSchedulers | None = None)[source]#

Implementation of Categorical Deep Q-Network. arXiv:1707.06887.

Parameters:

model – a model following the rules (s_B -> action_values_BA)
optim – a torch.optim for optimizing the model.
discount_factor – in [0, 1].
num_atoms – the number of atoms in the support set of the value distribution. Default to 51.
v_min – the value of the smallest atom in the support set. Default to -10.0.
v_max – the value of the largest atom in the support set. Default to 10.0.
estimation_step – the number of steps to look ahead.
target_update_freq – the target network update frequency (0 if you do not use the target network).
reward_normalization – normalize the returns to Normal(0, 1). TODO: rename to return_normalization?
is_double – use double dqn.
clip_loss_grad – clip the gradient of the loss in accordance with nature14236; this amounts to using the Huber loss instead of the MSE loss.
observation_space – Env’s observation space.
lr_scheduler – if not None, will be called in policy.update().

c51

Contents

c51#