tianshou.env¶
-
class
tianshou.env.
BaseVectorEnv
(env_fns: List[Callable[], gym.core.Env]])[source]¶ Bases:
abc.ABC
,gym.core.Env
Base class for vectorized environments wrapper. Usage:
env_num = 8 envs = VectorEnv([lambda: gym.make(task) for _ in range(env_num)]) assert len(envs) == env_num
It accepts a list of environment generators. In other words, an environment generator
efn
of a specific task means thatefn()
returns the environment of the given task, for example,gym.make(task)
.All of the VectorEnv must inherit
BaseVectorEnv
. Here are some other usages:envs.seed(2) # which is equal to the next line envs.seed([2, 3, 4, 5, 6, 7, 8, 9]) # set specific seed for each env obs = envs.reset() # reset all environments obs = envs.reset([0, 5, 7]) # reset 3 specific environments obs, rew, done, info = envs.step([1] * 8) # step synchronously envs.render() # render all environments envs.close() # close all environments
-
abstract
__getattr__
(key: str)[source]¶ Try to retrieve an attribute from each individual wrapped environment, if it does not belong to the wrapping vector environment class.
-
abstract
close
() → None[source]¶ Close all of the environments.
Environments will automatically close() themselves when garbage collected or when the program exits.
-
abstract
reset
(id: Optional[Union[int, List[int]]] = None)[source]¶ Reset the state of all the environments and return initial observations if id is
None
, otherwise reset the specific environments with given id, either an int or a list.
-
abstract
seed
(seed: Optional[Union[int, List[int]]] = None) → List[int][source]¶ Set the seed for all environments.
Accept
None
, an int (which will extendi
to[i, i + 1, i + 2, ...]
) or a list.- Returns
The list of seeds used in this env’s random number generators. The first value in the list should be the “main” seed, or the value which a reproducer pass to “seed”.
-
abstract
step
(action: numpy.ndarray, id: Optional[Union[int, List[int]]] = None) → Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray, numpy.ndarray][source]¶ Run one timestep of all the environments’ dynamics if id is
None
, otherwise run one timestep for some environments with given id, either an int or a list. When the end of episode is reached, you are responsible for calling reset(id) to reset this environment’s state.Accept a batch of action and return a tuple (obs, rew, done, info).
- Parameters
action (numpy.ndarray) – a batch of action provided by the agent.
- Returns
A tuple including four items:
obs
a numpy.ndarray, the agent’s observation of current environmentsrew
a numpy.ndarray, the amount of rewards returned after previous actionsdone
a numpy.ndarray, whether these episodes have ended, in which case further step() calls will return undefined resultsinfo
a numpy.ndarray, contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
abstract
-
class
tianshou.env.
MultiAgentEnv
(**kwargs)[source]¶ Bases:
abc.ABC
,gym.core.Env
The interface for multi-agent environments. Multi-agent environments must be wrapped as
MultiAgentEnv
. Here is the usage:env = MultiAgentEnv(...) # obs is a dict containing obs, agent_id, and mask obs = env.reset() action = policy(obs) obs, rew, done, info = env.step(action) env.close()
The available action’s mask is set to 1, otherwise it is set to 0. Further usage can be found at Multi-Agent Reinforcement Learning.
-
abstract
reset
() → dict[source]¶ Reset the state. Return the initial state, first agent_id, and the initial action set, for example,
{'obs': obs, 'agent_id': agent_id, 'mask': mask}
-
abstract
step
(action: numpy.ndarray) → Tuple[dict, numpy.ndarray, numpy.ndarray, numpy.ndarray][source]¶ Run one timestep of the environment’s dynamics. When the end of episode is reached, you are responsible for calling reset() to reset the environment’s state.
Accept action and return a tuple (obs, rew, done, info).
- Parameters
action (numpy.ndarray) – action provided by a agent.
- Returns
A tuple including four items:
obs
a dict containing obs, agent_id, and mask, which means that it is theagent_id
player’s turn to play withobs
observation andmask
.rew
a numpy.ndarray, the amount of rewards returned after previous actions. Depending on the specific environment, this can be either a scalar reward for current agent or a vector reward for all the agents.done
a numpy.ndarray, whether the episode has ended, in which case further step() calls will return undefined resultsinfo
a numpy.ndarray, contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
abstract
-
class
tianshou.env.
RayVectorEnv
(env_fns: List[Callable[], gym.core.Env]])[source]¶ Bases:
tianshou.env.basevecenv.BaseVectorEnv
Vectorized environment wrapper based on ray. However, according to our test, it is about two times slower than
SubprocVectorEnv
.See also
Please refer to
BaseVectorEnv
for more detailed explanation.-
__getattr__
(key)[source]¶ Try to retrieve an attribute from each individual wrapped environment, if it does not belong to the wrapping vector environment class.
-
close
() → List[Any][source]¶ Close all of the environments.
Environments will automatically close() themselves when garbage collected or when the program exits.
-
reset
(id: Optional[Union[int, List[int]]] = None) → numpy.ndarray[source]¶ Reset the state of all the environments and return initial observations if id is
None
, otherwise reset the specific environments with given id, either an int or a list.
-
seed
(seed: Optional[Union[int, List[int]]] = None) → List[int][source]¶ Set the seed for all environments.
Accept
None
, an int (which will extendi
to[i, i + 1, i + 2, ...]
) or a list.- Returns
The list of seeds used in this env’s random number generators. The first value in the list should be the “main” seed, or the value which a reproducer pass to “seed”.
-
step
(action: numpy.ndarray, id: Optional[Union[int, List[int]]] = None) → Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray, numpy.ndarray][source]¶ Run one timestep of all the environments’ dynamics if id is
None
, otherwise run one timestep for some environments with given id, either an int or a list. When the end of episode is reached, you are responsible for calling reset(id) to reset this environment’s state.Accept a batch of action and return a tuple (obs, rew, done, info).
- Parameters
action (numpy.ndarray) – a batch of action provided by the agent.
- Returns
A tuple including four items:
obs
a numpy.ndarray, the agent’s observation of current environmentsrew
a numpy.ndarray, the amount of rewards returned after previous actionsdone
a numpy.ndarray, whether these episodes have ended, in which case further step() calls will return undefined resultsinfo
a numpy.ndarray, contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
-
class
tianshou.env.
SubprocVectorEnv
(env_fns: List[Callable[], gym.core.Env]])[source]¶ Bases:
tianshou.env.basevecenv.BaseVectorEnv
Vectorized environment wrapper based on subprocess.
See also
Please refer to
BaseVectorEnv
for more detailed explanation.-
__getattr__
(key)[source]¶ Try to retrieve an attribute from each individual wrapped environment, if it does not belong to the wrapping vector environment class.
-
close
() → List[Any][source]¶ Close all of the environments.
Environments will automatically close() themselves when garbage collected or when the program exits.
-
reset
(id: Optional[Union[int, List[int]]] = None) → numpy.ndarray[source]¶ Reset the state of all the environments and return initial observations if id is
None
, otherwise reset the specific environments with given id, either an int or a list.
-
seed
(seed: Optional[Union[int, List[int]]] = None) → List[int][source]¶ Set the seed for all environments.
Accept
None
, an int (which will extendi
to[i, i + 1, i + 2, ...]
) or a list.- Returns
The list of seeds used in this env’s random number generators. The first value in the list should be the “main” seed, or the value which a reproducer pass to “seed”.
-
step
(action: numpy.ndarray, id: Optional[Union[int, List[int]]] = None) → Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray, numpy.ndarray][source]¶ Run one timestep of all the environments’ dynamics if id is
None
, otherwise run one timestep for some environments with given id, either an int or a list. When the end of episode is reached, you are responsible for calling reset(id) to reset this environment’s state.Accept a batch of action and return a tuple (obs, rew, done, info).
- Parameters
action (numpy.ndarray) – a batch of action provided by the agent.
- Returns
A tuple including four items:
obs
a numpy.ndarray, the agent’s observation of current environmentsrew
a numpy.ndarray, the amount of rewards returned after previous actionsdone
a numpy.ndarray, whether these episodes have ended, in which case further step() calls will return undefined resultsinfo
a numpy.ndarray, contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
-
class
tianshou.env.
VectorEnv
(env_fns: List[Callable[], gym.core.Env]])[source]¶ Bases:
tianshou.env.basevecenv.BaseVectorEnv
Dummy vectorized environment wrapper, implemented in for-loop.
See also
Please refer to
BaseVectorEnv
for more detailed explanation.-
__getattr__
(key)[source]¶ Try to retrieve an attribute from each individual wrapped environment, if it does not belong to the wrapping vector environment class.
-
close
() → List[Any][source]¶ Close all of the environments.
Environments will automatically close() themselves when garbage collected or when the program exits.
-
reset
(id: Optional[Union[int, List[int]]] = None) → numpy.ndarray[source]¶ Reset the state of all the environments and return initial observations if id is
None
, otherwise reset the specific environments with given id, either an int or a list.
-
seed
(seed: Optional[Union[int, List[int]]] = None) → List[int][source]¶ Set the seed for all environments.
Accept
None
, an int (which will extendi
to[i, i + 1, i + 2, ...]
) or a list.- Returns
The list of seeds used in this env’s random number generators. The first value in the list should be the “main” seed, or the value which a reproducer pass to “seed”.
-
step
(action: numpy.ndarray, id: Optional[Union[int, List[int]]] = None) → Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray, numpy.ndarray][source]¶ Run one timestep of all the environments’ dynamics if id is
None
, otherwise run one timestep for some environments with given id, either an int or a list. When the end of episode is reached, you are responsible for calling reset(id) to reset this environment’s state.Accept a batch of action and return a tuple (obs, rew, done, info).
- Parameters
action (numpy.ndarray) – a batch of action provided by the agent.
- Returns
A tuple including four items:
obs
a numpy.ndarray, the agent’s observation of current environmentsrew
a numpy.ndarray, the amount of rewards returned after previous actionsdone
a numpy.ndarray, whether these episodes have ended, in which case further step() calls will return undefined resultsinfo
a numpy.ndarray, contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-