tianshou.env¶
-
class
tianshou.env.
BaseVectorEnv
(env_fns)[source]¶ Bases:
abc.ABC
,gym.core.Wrapper
Base class for vectorized environments wrapper. Usage:
env_num = 8 envs = VectorEnv([lambda: gym.make(task) for _ in range(env_num)]) assert len(envs) == env_num
It accepts a list of environment generators. In other words, an environment generator
efn
of a specific task means thatefn()
returns the environment of the given task, for example,gym.make(task)
.All of the VectorEnv must inherit
BaseVectorEnv
. Here are some other usages:envs.seed(2) # which is equal to the next line envs.seed([2, 3, 4, 5, 6, 7, 8, 9]) # set specific seed for each env obs = envs.reset() # reset all environments obs = envs.reset([0, 5, 7]) # reset 3 specific environments obs, rew, done, info = envs.step([1] * 8) # step synchronously envs.render() # render all environments envs.close() # close all environments
-
abstract
reset
(id=None)[source]¶ Reset the state of all the environments and return initial observations if id is
None
, otherwise reset the specific environments with given id, either an int or a list.
-
abstract
seed
(seed=None)[source]¶ Set the seed for all environments. Accept
None
, an int (which will extendi
to[i, i + 1, i + 2, ...]
) or a list.
-
abstract
step
(action)[source]¶ Run one timestep of all the environments’ dynamics. When the end of episode is reached, you are responsible for calling reset(id) to reset this environment’s state.
Accept a batch of action and return a tuple (obs, rew, done, info).
- Parameters
action (numpy.ndarray) – a batch of action provided by the agent.
- Returns
A tuple including four items:
obs
a numpy.ndarray, the agent’s observation of current environmentsrew
a numpy.ndarray, the amount of rewards returned after previous actionsdone
a numpy.ndarray, whether these episodes have ended, in which case further step() calls will return undefined resultsinfo
a numpy.ndarray, contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
abstract
-
class
tianshou.env.
VectorEnv
(env_fns)[source]¶ Bases:
tianshou.env.vecenv.BaseVectorEnv
Dummy vectorized environment wrapper, implemented in for-loop.
See also
Please refer to
BaseVectorEnv
for more detailed explanation.-
reset
(id=None)[source]¶ Reset the state of all the environments and return initial observations if id is
None
, otherwise reset the specific environments with given id, either an int or a list.
-
seed
(seed=None)[source]¶ Set the seed for all environments. Accept
None
, an int (which will extendi
to[i, i + 1, i + 2, ...]
) or a list.
-
step
(action)[source]¶ Run one timestep of all the environments’ dynamics. When the end of episode is reached, you are responsible for calling reset(id) to reset this environment’s state.
Accept a batch of action and return a tuple (obs, rew, done, info).
- Parameters
action (numpy.ndarray) – a batch of action provided by the agent.
- Returns
A tuple including four items:
obs
a numpy.ndarray, the agent’s observation of current environmentsrew
a numpy.ndarray, the amount of rewards returned after previous actionsdone
a numpy.ndarray, whether these episodes have ended, in which case further step() calls will return undefined resultsinfo
a numpy.ndarray, contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
-
class
tianshou.env.
SubprocVectorEnv
(env_fns)[source]¶ Bases:
tianshou.env.vecenv.BaseVectorEnv
Vectorized environment wrapper based on subprocess.
See also
Please refer to
BaseVectorEnv
for more detailed explanation.-
reset
(id=None)[source]¶ Reset the state of all the environments and return initial observations if id is
None
, otherwise reset the specific environments with given id, either an int or a list.
-
seed
(seed=None)[source]¶ Set the seed for all environments. Accept
None
, an int (which will extendi
to[i, i + 1, i + 2, ...]
) or a list.
-
step
(action)[source]¶ Run one timestep of all the environments’ dynamics. When the end of episode is reached, you are responsible for calling reset(id) to reset this environment’s state.
Accept a batch of action and return a tuple (obs, rew, done, info).
- Parameters
action (numpy.ndarray) – a batch of action provided by the agent.
- Returns
A tuple including four items:
obs
a numpy.ndarray, the agent’s observation of current environmentsrew
a numpy.ndarray, the amount of rewards returned after previous actionsdone
a numpy.ndarray, whether these episodes have ended, in which case further step() calls will return undefined resultsinfo
a numpy.ndarray, contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
-
class
tianshou.env.
RayVectorEnv
(env_fns)[source]¶ Bases:
tianshou.env.vecenv.BaseVectorEnv
Vectorized environment wrapper based on ray. However, according to our test, it is about two times slower than
SubprocVectorEnv
.See also
Please refer to
BaseVectorEnv
for more detailed explanation.-
reset
(id=None)[source]¶ Reset the state of all the environments and return initial observations if id is
None
, otherwise reset the specific environments with given id, either an int or a list.
-
seed
(seed=None)[source]¶ Set the seed for all environments. Accept
None
, an int (which will extendi
to[i, i + 1, i + 2, ...]
) or a list.
-
step
(action)[source]¶ Run one timestep of all the environments’ dynamics. When the end of episode is reached, you are responsible for calling reset(id) to reset this environment’s state.
Accept a batch of action and return a tuple (obs, rew, done, info).
- Parameters
action (numpy.ndarray) – a batch of action provided by the agent.
- Returns
A tuple including four items:
obs
a numpy.ndarray, the agent’s observation of current environmentsrew
a numpy.ndarray, the amount of rewards returned after previous actionsdone
a numpy.ndarray, whether these episodes have ended, in which case further step() calls will return undefined resultsinfo
a numpy.ndarray, contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-