collector

collector#

Source code: tianshou/data/collector.py

class AsyncCollector(policy: BasePolicy, env: BaseVectorEnv, buffer: ReplayBuffer | None = None, exploration_noise: bool = False)[source]#

Async Collector handles async vector environment.

The arguments are exactly the same as Collector, please refer to Collector for more detailed explanation.

collect(n_step: int | None = None, n_episode: int | None = None, random: bool = False, render: float | None = None, no_grad: bool = True, reset_before_collect: bool = False, gym_reset_kwargs: dict[str, Any] | None = None) → CollectStats[source]#

Collect a specified number of steps or episodes with async env setting.

This function does not collect an exact number of transitions specified by n_step or n_episode. Instead, to support the asynchronous setting, it may collect more transitions than requested by n_step or n_episode and save them into the buffer.

Parameters:

n_step – how many steps you want to collect.
n_episode – how many episodes you want to collect.
random – whether to use random policy_R for collecting data. Default to False.
render – the sleep time between rendering consecutive frames. Default to None (no rendering).
no_grad – whether to retain gradient in policy_R.forward(). Default to True (no gradient retaining).
reset_before_collect – whether to reset the environment before collecting data. It has only an effect if n_episode is not None, i.e. if one wants to collect a fixed number of episodes. (The collector needs the initial obs and info to function properly.)
gym_reset_kwargs – extra keyword arguments to pass into the environment’s reset function. Defaults to None (extra keyword arguments)

Note

One and only one collection number specification is permitted, either n_step or n_episode.

Returns:: A dataclass object

reset(reset_buffer: bool = True, reset_stats: bool = True, gym_reset_kwargs: dict[str, Any] | None = None) → None[source]#

Reset the environment, statistics, and data needed to start the collection.

Parameters:

reset_buffer – if true, reset the replay buffer attached to the collector.
reset_stats – if true, reset the statistics attached to the collector.
gym_reset_kwargs – extra keyword arguments to pass into the environment’s reset function. Defaults to None (extra keyword arguments)

class CollectStats(*, n_collected_episodes: int = 0, n_collected_steps: int = 0, collect_time: float = 0.0, collect_speed: float = 0.0, returns: ndarray, returns_stat: SequenceSummaryStats | None, lens: ndarray, lens_stat: SequenceSummaryStats | None)[source]#

A data structure for storing the statistics of rollouts.

collect_speed: float = 0.0#: The speed of collecting (env_step per second).

collect_time: float = 0.0#: The time for collecting transitions.

lens: ndarray#: The collected episode lengths.

lens_stat: SequenceSummaryStats | None#: Stats of the collected episode lengths.

returns: ndarray#: The collected episode returns.

returns_stat: SequenceSummaryStats | None#: Stats of the collected returns.

classmethod with_autogenerated_stats(returns: ndarray, lens: ndarray, n_collected_episodes: int = 0, n_collected_steps: int = 0, collect_time: float = 0.0, collect_speed: float = 0.0) → Self[source]#: Return a new instance with the stats autogenerated from the given lists.

class CollectStatsBase(*, n_collected_episodes: int = 0, n_collected_steps: int = 0)[source]#

The most basic stats, often used for offline learning.

n_collected_episodes: int = 0#: The number of collected episodes.

n_collected_steps: int = 0#: The number of collected steps.

class Collector(policy: BasePolicy, env: Env | BaseVectorEnv, buffer: ReplayBuffer | None = None, exploration_noise: bool = False)[source]#

Collector enables the policy to interact with different types of envs with exact number of steps or episodes.

Parameters:

policy – an instance of the BasePolicy class.
env – a gym.Env environment or an instance of the BaseVectorEnv class.
buffer – an instance of the ReplayBuffer class. If set to None, will instantiate a VectorReplayBuffer as the default buffer.
exploration_noise – determine whether the action needs to be modified with the corresponding policy’s exploration noise. If so, “policy. exploration_noise(act, batch)” will be called automatically to add the exploration noise into action. Default to False.

Note

Please make sure the given environment has a time limitation if using n_episode collect option.

Note

In past versions of Tianshou, the replay buffer passed to __init__ was automatically reset. This is not done in the current implementation.

close() → None[source]#: Close the collector and the environment.

collect(n_step: int | None = None, n_episode: int | None = None, random: bool = False, render: float | None = None, no_grad: bool = True, reset_before_collect: bool = False, gym_reset_kwargs: dict[str, Any] | None = None) → CollectStats[source]#

Collect a specified number of steps or episodes.

To ensure an unbiased sampling result with the n_episode option, this function will first collect n_episode - env_num episodes, then for the last env_num episodes, they will be collected evenly from each env.

Parameters:

n_step – how many steps you want to collect.
n_episode – how many episodes you want to collect.
random – whether to use random policy for collecting data.
render – the sleep time between rendering consecutive frames.
no_grad – whether to retain gradient in policy.forward().
reset_before_collect – whether to reset the environment before collecting data. (The collector needs the initial obs and info to function properly.)
gym_reset_kwargs – extra keyword arguments to pass into the environment’s reset function. Only used if reset_before_collect is True.

Note

One and only one collection number specification is permitted, either n_step or n_episode.

Returns:: The collected stats

property is_closed: bool#: Return True if the collector is closed.

reset(reset_buffer: bool = True, reset_stats: bool = True, gym_reset_kwargs: dict[str, Any] | None = None) → None[source]#

Reset the environment, statistics, and data needed to start the collection.

Parameters:

reset_buffer – if true, reset the replay buffer attached to the collector.
reset_stats – if true, reset the statistics attached to the collector.
gym_reset_kwargs – extra keyword arguments to pass into the environment’s reset function. Defaults to None (extra keyword arguments)

reset_buffer(keep_statistics: bool = False) → None[source]#: Reset the data buffer.

reset_env(gym_reset_kwargs: dict[str, Any] | None = None) → None[source]#: Reset the environments and the initial obs, info, and hidden state of the collector.

reset_stat() → None[source]#: Reset the statistic variables.

collector

Contents

collector#