collector#


class AsyncCollector(policy: BasePolicy, env: BaseVectorEnv, buffer: ReplayBuffer | None = None, exploration_noise: bool = False)[source]#

Async Collector handles async vector environment.

The arguments are exactly the same as Collector, please refer to Collector for more detailed explanation.

collect(n_step: int | None = None, n_episode: int | None = None, random: bool = False, render: float | None = None, no_grad: bool = True, reset_before_collect: bool = False, gym_reset_kwargs: dict[str, Any] | None = None) CollectStats[source]#

Collect a specified number of steps or episodes with async env setting.

This function does not collect an exact number of transitions specified by n_step or n_episode. Instead, to support the asynchronous setting, it may collect more transitions than requested by n_step or n_episode and save them into the buffer.

Parameters:
  • n_step – how many steps you want to collect.

  • n_episode – how many episodes you want to collect.

  • random – whether to use random policy_R for collecting data. Default to False.

  • render – the sleep time between rendering consecutive frames. Default to None (no rendering).

  • no_grad – whether to retain gradient in policy_R.forward(). Default to True (no gradient retaining).

  • reset_before_collect – whether to reset the environment before collecting data. It has only an effect if n_episode is not None, i.e. if one wants to collect a fixed number of episodes. (The collector needs the initial obs and info to function properly.)

  • gym_reset_kwargs – extra keyword arguments to pass into the environment’s reset function. Defaults to None (extra keyword arguments)

Note

One and only one collection number specification is permitted, either n_step or n_episode.

Returns:

A dataclass object

reset(reset_buffer: bool = True, reset_stats: bool = True, gym_reset_kwargs: dict[str, Any] | None = None) None[source]#

Reset the environment, statistics, and data needed to start the collection.

Parameters:
  • reset_buffer – if true, reset the replay buffer attached to the collector.

  • reset_stats – if true, reset the statistics attached to the collector.

  • gym_reset_kwargs – extra keyword arguments to pass into the environment’s reset function. Defaults to None (extra keyword arguments)

class CollectStats(*, n_collected_episodes: int = 0, n_collected_steps: int = 0, collect_time: float = 0.0, collect_speed: float = 0.0, returns: ndarray, returns_stat: SequenceSummaryStats | None, lens: ndarray, lens_stat: SequenceSummaryStats | None)[source]#

A data structure for storing the statistics of rollouts.

collect_speed: float = 0.0#

The speed of collecting (env_step per second).

collect_time: float = 0.0#

The time for collecting transitions.

lens: ndarray#

The collected episode lengths.

lens_stat: SequenceSummaryStats | None#

Stats of the collected episode lengths.

returns: ndarray#

The collected episode returns.

returns_stat: SequenceSummaryStats | None#

Stats of the collected returns.

classmethod with_autogenerated_stats(returns: ndarray, lens: ndarray, n_collected_episodes: int = 0, n_collected_steps: int = 0, collect_time: float = 0.0, collect_speed: float = 0.0) Self[source]#

Return a new instance with the stats autogenerated from the given lists.

class CollectStatsBase(*, n_collected_episodes: int = 0, n_collected_steps: int = 0)[source]#

The most basic stats, often used for offline learning.

n_collected_episodes: int = 0#

The number of collected episodes.

n_collected_steps: int = 0#

The number of collected steps.

class Collector(policy: BasePolicy, env: Env | BaseVectorEnv, buffer: ReplayBuffer | None = None, exploration_noise: bool = False)[source]#

Collector enables the policy to interact with different types of envs with exact number of steps or episodes.

Parameters:
  • policy – an instance of the BasePolicy class.

  • env – a gym.Env environment or an instance of the BaseVectorEnv class.

  • buffer – an instance of the ReplayBuffer class. If set to None, will instantiate a VectorReplayBuffer as the default buffer.

  • exploration_noise – determine whether the action needs to be modified with the corresponding policy’s exploration noise. If so, “policy. exploration_noise(act, batch)” will be called automatically to add the exploration noise into action. Default to False.

Note

Please make sure the given environment has a time limitation if using n_episode collect option.

Note

In past versions of Tianshou, the replay buffer passed to __init__ was automatically reset. This is not done in the current implementation.

close() None[source]#

Close the collector and the environment.

collect(n_step: int | None = None, n_episode: int | None = None, random: bool = False, render: float | None = None, no_grad: bool = True, reset_before_collect: bool = False, gym_reset_kwargs: dict[str, Any] | None = None) CollectStats[source]#

Collect a specified number of steps or episodes.

To ensure an unbiased sampling result with the n_episode option, this function will first collect n_episode - env_num episodes, then for the last env_num episodes, they will be collected evenly from each env.

Parameters:
  • n_step – how many steps you want to collect.

  • n_episode – how many episodes you want to collect.

  • random – whether to use random policy for collecting data.

  • render – the sleep time between rendering consecutive frames.

  • no_grad – whether to retain gradient in policy.forward().

  • reset_before_collect – whether to reset the environment before collecting data. (The collector needs the initial obs and info to function properly.)

  • gym_reset_kwargs – extra keyword arguments to pass into the environment’s reset function. Only used if reset_before_collect is True.

Note

One and only one collection number specification is permitted, either n_step or n_episode.

Returns:

The collected stats

property is_closed: bool#

Return True if the collector is closed.

reset(reset_buffer: bool = True, reset_stats: bool = True, gym_reset_kwargs: dict[str, Any] | None = None) None[source]#

Reset the environment, statistics, and data needed to start the collection.

Parameters:
  • reset_buffer – if true, reset the replay buffer attached to the collector.

  • reset_stats – if true, reset the statistics attached to the collector.

  • gym_reset_kwargs – extra keyword arguments to pass into the environment’s reset function. Defaults to None (extra keyword arguments)

reset_buffer(keep_statistics: bool = False) None[source]#

Reset the data buffer.

reset_env(gym_reset_kwargs: dict[str, Any] | None = None) None[source]#

Reset the environments and the initial obs, info, and hidden state of the collector.

reset_stat() None[source]#

Reset the statistic variables.