collector#
Source code: tianshou/data/collector.py
- class AsyncCollector(policy: BasePolicy, env: BaseVectorEnv, buffer: ReplayBuffer | None = None, exploration_noise: bool = False)[source]#
Async Collector handles async vector environment.
The arguments are exactly the same as
Collector
, please refer toCollector
for more detailed explanation.- collect(n_step: int | None = None, n_episode: int | None = None, random: bool = False, render: float | None = None, no_grad: bool = True, reset_before_collect: bool = False, gym_reset_kwargs: dict[str, Any] | None = None) CollectStats [source]#
Collect a specified number of steps or episodes with async env setting.
This function does not collect an exact number of transitions specified by n_step or n_episode. Instead, to support the asynchronous setting, it may collect more transitions than requested by n_step or n_episode and save them into the buffer.
- Parameters:
n_step – how many steps you want to collect.
n_episode – how many episodes you want to collect.
random – whether to use random policy_R for collecting data. Default to False.
render – the sleep time between rendering consecutive frames. Default to None (no rendering).
no_grad – whether to retain gradient in policy_R.forward(). Default to True (no gradient retaining).
reset_before_collect – whether to reset the environment before collecting data. It has only an effect if n_episode is not None, i.e. if one wants to collect a fixed number of episodes. (The collector needs the initial obs and info to function properly.)
gym_reset_kwargs – extra keyword arguments to pass into the environment’s reset function. Defaults to None (extra keyword arguments)
Note
One and only one collection number specification is permitted, either
n_step
orn_episode
.- Returns:
A dataclass object
- reset(reset_buffer: bool = True, reset_stats: bool = True, gym_reset_kwargs: dict[str, Any] | None = None) None [source]#
Reset the environment, statistics, and data needed to start the collection.
- Parameters:
reset_buffer – if true, reset the replay buffer attached to the collector.
reset_stats – if true, reset the statistics attached to the collector.
gym_reset_kwargs – extra keyword arguments to pass into the environment’s reset function. Defaults to None (extra keyword arguments)
- class CollectStats(*, n_collected_episodes: int = 0, n_collected_steps: int = 0, collect_time: float = 0.0, collect_speed: float = 0.0, returns: ndarray, returns_stat: SequenceSummaryStats | None, lens: ndarray, lens_stat: SequenceSummaryStats | None)[source]#
A data structure for storing the statistics of rollouts.
- collect_speed: float = 0.0#
The speed of collecting (env_step per second).
- collect_time: float = 0.0#
The time for collecting transitions.
- lens: ndarray#
The collected episode lengths.
- lens_stat: SequenceSummaryStats | None#
Stats of the collected episode lengths.
- returns: ndarray#
The collected episode returns.
- returns_stat: SequenceSummaryStats | None#
Stats of the collected returns.
- class CollectStatsBase(*, n_collected_episodes: int = 0, n_collected_steps: int = 0)[source]#
The most basic stats, often used for offline learning.
- n_collected_episodes: int = 0#
The number of collected episodes.
- n_collected_steps: int = 0#
The number of collected steps.
- class Collector(policy: BasePolicy, env: Env | BaseVectorEnv, buffer: ReplayBuffer | None = None, exploration_noise: bool = False)[source]#
Collector enables the policy to interact with different types of envs with exact number of steps or episodes.
- Parameters:
policy – an instance of the
BasePolicy
class.env – a
gym.Env
environment or an instance of theBaseVectorEnv
class.buffer – an instance of the
ReplayBuffer
class. If set to None, will instantiate aVectorReplayBuffer
as the default buffer.exploration_noise – determine whether the action needs to be modified with the corresponding policy’s exploration noise. If so, “policy. exploration_noise(act, batch)” will be called automatically to add the exploration noise into action. Default to False.
Note
Please make sure the given environment has a time limitation if using n_episode collect option.
Note
In past versions of Tianshou, the replay buffer passed to __init__ was automatically reset. This is not done in the current implementation.
- collect(n_step: int | None = None, n_episode: int | None = None, random: bool = False, render: float | None = None, no_grad: bool = True, reset_before_collect: bool = False, gym_reset_kwargs: dict[str, Any] | None = None) CollectStats [source]#
Collect a specified number of steps or episodes.
To ensure an unbiased sampling result with the n_episode option, this function will first collect
n_episode - env_num
episodes, then for the lastenv_num
episodes, they will be collected evenly from each env.- Parameters:
n_step – how many steps you want to collect.
n_episode – how many episodes you want to collect.
random – whether to use random policy for collecting data.
render – the sleep time between rendering consecutive frames.
no_grad – whether to retain gradient in policy.forward().
reset_before_collect – whether to reset the environment before collecting data. (The collector needs the initial obs and info to function properly.)
gym_reset_kwargs – extra keyword arguments to pass into the environment’s reset function. Only used if reset_before_collect is True.
Note
One and only one collection number specification is permitted, either
n_step
orn_episode
.- Returns:
The collected stats
- property is_closed: bool#
Return True if the collector is closed.
- reset(reset_buffer: bool = True, reset_stats: bool = True, gym_reset_kwargs: dict[str, Any] | None = None) None [source]#
Reset the environment, statistics, and data needed to start the collection.
- Parameters:
reset_buffer – if true, reset the replay buffer attached to the collector.
reset_stats – if true, reset the statistics attached to the collector.
gym_reset_kwargs – extra keyword arguments to pass into the environment’s reset function. Defaults to None (extra keyword arguments)