tianshou.env¶
-
class
tianshou.env.
BaseVectorEnv
(env_fns: List[Callable[], gym.core.Env]], worker_fn: Callable[[Callable[], gym.core.Env]], tianshou.env.worker.base.EnvWorker], wait_num: Optional[int] = None, timeout: Optional[float] = None)[source]¶ Bases:
gym.core.Env
Base class for vectorized environments wrapper. Usage:
env_num = 8 envs = DummyVectorEnv([lambda: gym.make(task) for _ in range(env_num)]) assert len(envs) == env_num
It accepts a list of environment generators. In other words, an environment generator
efn
of a specific task means thatefn()
returns the environment of the given task, for example,gym.make(task)
.All of the VectorEnv must inherit
BaseVectorEnv
. Here are some other usages:envs.seed(2) # which is equal to the next line envs.seed([2, 3, 4, 5, 6, 7, 8, 9]) # set specific seed for each env obs = envs.reset() # reset all environments obs = envs.reset([0, 5, 7]) # reset 3 specific environments obs, rew, done, info = envs.step([1] * 8) # step synchronously envs.render() # render all environments envs.close() # close all environments
Warning
If you use your own environment, please make sure the
seed
method is set up properly, e.g.,def seed(self, seed): np.random.seed(seed)
Otherwise, the outputs of these envs may be the same with each other.
- Parameters
env_fns – a list of callable envs,
env_fns[i]()
generates the ith env.worker_fn – a callable worker,
worker_fn(env_fns[i])
generates a worker which contains this env.wait_num (int) – use in asynchronous simulation if the time cost of
env.step
varies with time and synchronously waiting for all environments to finish a step is time-wasting. In that case, we can return whenwait_num
environments finish a step and keep on simulation in these environments. IfNone
, asynchronous simulation is disabled; else,1 <= wait_num <= env_num
.timeout (float) – use in asynchronous simulation same as above, in each vectorized step it only deal with those environments spending time within
timeout
seconds.
-
__getattr__
(key: str) → Any[source]¶ Try to retrieve an attribute from each individual wrapped environment, if it does not belong to the wrapping vector environment class.
-
close
() → None[source]¶ Close all of the environments. This function will be called only once (if not, it will be called during garbage collected). This way,
close
of all workers can be assured.
-
reset
(id: Optional[Union[int, List[int]]] = None) → numpy.ndarray[source]¶ Reset the state of all the environments and return initial observations if id is
None
, otherwise reset the specific environments with the given id, either an int or a list.
-
seed
(seed: Optional[Union[int, List[int]]] = None) → List[List[int]][source]¶ Set the seed for all environments.
Accept
None
, an int (which will extendi
to[i, i + 1, i + 2, ...]
) or a list.- Returns
The list of seeds used in this env’s random number generators. The first value in the list should be the “main” seed, or the value which a reproducer pass to “seed”.
-
step
(action: Optional[numpy.ndarray], id: Optional[Union[int, List[int]]] = None) → Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray, numpy.ndarray][source]¶ Run one timestep of all the environments’ dynamics if id is “None”, otherwise run one timestep for some environments with given id, either an int or a list. When the end of episode is reached, you are responsible for calling reset(id) to reset this environment’s state.
Accept a batch of action and return a tuple (obs, rew, done, info).
- Parameters
action (numpy.ndarray) – a batch of action provided by the agent.
- Returns
A tuple including four items:
obs
a numpy.ndarray, the agent’s observation of current environmentsrew
a numpy.ndarray, the amount of rewards returned after previous actionsdone
a numpy.ndarray, whether these episodes have ended, in which case further step() calls will return undefined resultsinfo
a numpy.ndarray, contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
For the async simulation:
Provide the given action to the environments. The action sequence should correspond to the
id
argument, and theid
argument should be a subset of theenv_id
in the last returnedinfo
(initially they are env_ids of all the environments). If action isNone
, fetch unfinished step() calls instead.
-
class
tianshou.env.
DummyVectorEnv
(env_fns: List[Callable[], gym.core.Env]], wait_num: Optional[int] = None, timeout: Optional[float] = None)[source]¶ Bases:
tianshou.env.venvs.BaseVectorEnv
Dummy vectorized environment wrapper, implemented in for-loop.
See also
Please refer to
BaseVectorEnv
for more detailed explanation.
-
class
tianshou.env.
MultiAgentEnv
(**kwargs)[source]¶ Bases:
abc.ABC
,gym.core.Env
The interface for multi-agent environments. Multi-agent environments must be wrapped as
MultiAgentEnv
. Here is the usage:env = MultiAgentEnv(...) # obs is a dict containing obs, agent_id, and mask obs = env.reset() action = policy(obs) obs, rew, done, info = env.step(action) env.close()
The available action’s mask is set to 1, otherwise it is set to 0. Further usage can be found at Multi-Agent Reinforcement Learning.
-
abstract
reset
() → dict[source]¶ Reset the state. Return the initial state, first agent_id, and the initial action set, for example,
{'obs': obs, 'agent_id': agent_id, 'mask': mask}
-
abstract
step
(action: numpy.ndarray) → Tuple[dict, numpy.ndarray, numpy.ndarray, numpy.ndarray][source]¶ Run one timestep of the environment’s dynamics. When the end of episode is reached, you are responsible for calling reset() to reset the environment’s state.
Accept action and return a tuple (obs, rew, done, info).
- Parameters
action (numpy.ndarray) – action provided by a agent.
- Returns
A tuple including four items:
obs
a dict containing obs, agent_id, and mask, which means that it is theagent_id
player’s turn to play withobs
observation andmask
.rew
a numpy.ndarray, the amount of rewards returned after previous actions. Depending on the specific environment, this can be either a scalar reward for current agent or a vector reward for all the agents.done
a numpy.ndarray, whether the episode has ended, in which case further step() calls will return undefined resultsinfo
a numpy.ndarray, contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
abstract
-
class
tianshou.env.
RayVectorEnv
(env_fns: List[Callable[], gym.core.Env]], wait_num: Optional[int] = None, timeout: Optional[float] = None)[source]¶ Bases:
tianshou.env.venvs.BaseVectorEnv
Vectorized environment wrapper based on ray. This is a choice to run distributed environments in a cluster.
See also
Please refer to
BaseVectorEnv
for more detailed explanation.
-
class
tianshou.env.
ShmemVectorEnv
(env_fns: List[Callable[], gym.core.Env]], wait_num: Optional[int] = None, timeout: Optional[float] = None)[source]¶ Bases:
tianshou.env.venvs.BaseVectorEnv
Optimized version of SubprocVectorEnv which uses shared variables to communicate observations. ShmemVectorEnv has exactly the same API as SubprocVectorEnv.
See also
Please refer to
SubprocVectorEnv
for more detailed explanation.
-
class
tianshou.env.
SubprocVectorEnv
(env_fns: List[Callable[], gym.core.Env]], wait_num: Optional[int] = None, timeout: Optional[float] = None)[source]¶ Bases:
tianshou.env.venvs.BaseVectorEnv
Vectorized environment wrapper based on subprocess.
See also
Please refer to
BaseVectorEnv
for more detailed explanation.
-
class
tianshou.env.worker.
DummyEnvWorker
(env_fn: Callable[], gym.core.Env])[source]¶ Bases:
tianshou.env.worker.base.EnvWorker
Dummy worker used in sequential vector environments.
-
static
wait
(workers: List[DummyEnvWorker], wait_num: int, timeout: Optional[float] = None) → List[tianshou.env.worker.dummy.DummyEnvWorker][source]¶ Given a list of workers, return those ready ones.
-
static
-
class
tianshou.env.worker.
EnvWorker
(env_fn: Callable[], gym.core.Env])[source]¶ Bases:
abc.ABC
An abstract worker for an environment.
-
step
(action: numpy.ndarray) → Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray, numpy.ndarray][source]¶ send_action
andget_result
are coupled in sync simulation, so typically users only callstep
function. But they can be called separately in async simulation, i.e. someone callssend_action
first, and callsget_result
later.
-
-
class
tianshou.env.worker.
RayEnvWorker
(env_fn: Callable[], gym.core.Env])[source]¶ Bases:
tianshou.env.worker.base.EnvWorker
Ray worker used in RayVectorEnv.
-
static
wait
(workers: List[RayEnvWorker], wait_num: int, timeout: Optional[float] = None) → List[tianshou.env.worker.ray.RayEnvWorker][source]¶ Given a list of workers, return those ready ones.
-
static
-
class
tianshou.env.worker.
SubprocEnvWorker
(env_fn: Callable[], gym.core.Env], share_memory=False)[source]¶ Bases:
tianshou.env.worker.base.EnvWorker
Subprocess worker used in SubprocVectorEnv and ShmemVectorEnv.
-
static
wait
(workers: List[SubprocEnvWorker], wait_num: int, timeout: Optional[float] = None) → List[tianshou.env.worker.subproc.SubprocEnvWorker][source]¶ Given a list of workers, return those ready ones.
-
static