pg

pg#

Source code: tianshou/policy/modelfree/pg.py

class PGPolicy(*, actor: Module, optim: Optimizer, dist_fn: Callable[[...], Distribution], action_space: Space, discount_factor: float = 0.99, reward_normalization: bool = False, deterministic_eval: bool = False, observation_space: Space | None = None, action_scaling: bool = True, action_bound_method: Literal['clip', 'tanh'] | None = 'clip', lr_scheduler: LRScheduler | MultipleLRSchedulers | None = None)[source]#

Implementation of REINFORCE algorithm.

Parameters:

actor – mapping (s->model_output), should follow the rules in BasePolicy.
optim – optimizer for actor network.
dist_fn – distribution class for computing the action. Maps model_output -> distribution. Typically a Gaussian distribution taking model_output=mean,std as input for continuous action spaces, or a categorical distribution taking model_output=logits for discrete action spaces. Note that as user, you are responsible for ensuring that the distribution is compatible with the action space.
action_space – env’s action space.
discount_factor – in [0, 1].
reward_normalization – if True, will normalize the returns by subtracting the running mean and dividing by the running standard deviation. Can be detrimental to performance! See TODO in process_fn.
deterministic_eval – if True, will use deterministic action (the dist’s mode) instead of stochastic one during evaluation. Does not affect training.
observation_space – Env’s observation space.
action_scaling – if True, scale the action from [-1, 1] to the range of action_space. Only used if the action_space is continuous.
action_bound_method – method to bound action to range [-1, 1]. Only used if the action_space is continuous.
lr_scheduler – if not None, will be called in policy.update().

pg

Contents

pg#