tianshou.env¶

class tianshou.env.BaseVectorEnv(env_fns: List[Callable[], gym.core.Env]])[source]¶

Bases: abc.ABC, gym.core.Env

Base class for vectorized environments wrapper. Usage:

env_num = 8
envs = VectorEnv([lambda: gym.make(task) for _ in range(env_num)])
assert len(envs) == env_num

It accepts a list of environment generators. In other words, an environment generator efn of a specific task means that efn() returns the environment of the given task, for example, gym.make(task).

All of the VectorEnv must inherit BaseVectorEnv. Here are some other usages:

envs.seed(2)  # which is equal to the next line
envs.seed([2, 3, 4, 5, 6, 7, 8, 9])  # set specific seed for each env
obs = envs.reset()  # reset all environments
obs = envs.reset([0, 5, 7])  # reset 3 specific environments
obs, rew, done, info = envs.step([1] * 8)  # step synchronously
envs.render()  # render all environments
envs.close()  # close all environments

abstract __getattr__(key: str)[source]¶: Try to retrieve an attribute from each individual wrapped environment, if it does not belong to the wrapping vector environment class.

__len__() → int[source]¶: Return len(self), which is the number of environments.

abstract close() → None[source]¶

Close all of the environments.

Environments will automatically close() themselves when garbage collected or when the program exits.

abstract render(**kwargs) → None[source]¶: Render all of the environments.

abstract reset(id: Optional[Union[int, List[int]]] = None)[source]¶: Reset the state of all the environments and return initial observations if id is None, otherwise reset the specific environments with given id, either an int or a list.

abstract seed(seed: Optional[Union[int, List[int]]] = None) → List[int][source]¶

Set the seed for all environments.

Accept None, an int (which will extend i to [i, i + 1, i + 2, ...]) or a list.

Returns: The list of seeds used in this env’s random number generators. The first value in the list should be the “main” seed, or the value which a reproducer pass to “seed”.

abstract step(action: numpy.ndarray, id: Optional[Union[int, List[int]]] = None) → Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray, numpy.ndarray][source]¶

Run one timestep of all the environments’ dynamics if id is None, otherwise run one timestep for some environments with given id, either an int or a list. When the end of episode is reached, you are responsible for calling reset(id) to reset this environment’s state.

Accept a batch of action and return a tuple (obs, rew, done, info).

Parameters

action (numpy.ndarray) – a batch of action provided by the agent.

Returns

A tuple including four items:

obs a numpy.ndarray, the agent’s observation of current environments
rew a numpy.ndarray, the amount of rewards returned after previous actions
done a numpy.ndarray, whether these episodes have ended, in which case further step() calls will return undefined results
info a numpy.ndarray, contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

class tianshou.env.MultiAgentEnv(**kwargs)[source]¶

Bases: abc.ABC, gym.core.Env

The interface for multi-agent environments. Multi-agent environments must be wrapped as MultiAgentEnv. Here is the usage:

env = MultiAgentEnv(...)
# obs is a dict containing obs, agent_id, and mask
obs = env.reset()
action = policy(obs)
obs, rew, done, info = env.step(action)
env.close()

The available action’s mask is set to 1, otherwise it is set to 0. Further usage can be found at Multi-Agent Reinforcement Learning.

abstract reset() → dict[source]¶: Reset the state. Return the initial state, first agent_id, and the initial action set, for example, {'obs': obs, 'agent_id': agent_id, 'mask': mask}

abstract step(action: numpy.ndarray) → Tuple[dict, numpy.ndarray, numpy.ndarray, numpy.ndarray][source]¶

Run one timestep of the environment’s dynamics. When the end of episode is reached, you are responsible for calling reset() to reset the environment’s state.

Accept action and return a tuple (obs, rew, done, info).

Parameters

action (numpy.ndarray) – action provided by a agent.

Returns

A tuple including four items:

obs a dict containing obs, agent_id, and mask, which means that it is the agent_id player’s turn to play with obs observation and mask.
rew a numpy.ndarray, the amount of rewards returned after previous actions. Depending on the specific environment, this can be either a scalar reward for current agent or a vector reward for all the agents.
done a numpy.ndarray, whether the episode has ended, in which case further step() calls will return undefined results
info a numpy.ndarray, contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

class tianshou.env.RayVectorEnv(env_fns: List[Callable[], gym.core.Env]])[source]¶

Bases: tianshou.env.basevecenv.BaseVectorEnv

Vectorized environment wrapper based on ray. However, according to our test, it is about two times slower than SubprocVectorEnv.