tianshou.env

class tianshou.env.BaseVectorEnv(env_fns: List[Callable[], gym.core.Env]])[source]

Bases: abc.ABC, gym.core.Env

Base class for vectorized environments wrapper. Usage:

env_num = 8
envs = VectorEnv([lambda: gym.make(task) for _ in range(env_num)])
assert len(envs) == env_num

It accepts a list of environment generators. In other words, an environment generator efn of a specific task means that efn() returns the environment of the given task, for example, gym.make(task).

All of the VectorEnv must inherit BaseVectorEnv. Here are some other usages:

envs.seed(2)  # which is equal to the next line
envs.seed([2, 3, 4, 5, 6, 7, 8, 9])  # set specific seed for each env
obs = envs.reset()  # reset all environments
obs = envs.reset([0, 5, 7])  # reset 3 specific environments
obs, rew, done, info = envs.step([1] * 8)  # step synchronously
envs.render()  # render all environments
envs.close()  # close all environments
__len__() → int[source]

Return len(self), which is the number of environments.

abstract close() → None[source]

Close all of the environments.

Environments will automatically close() themselves when garbage collected or when the program exits.

abstract render(**kwargs) → None[source]

Render all of the environments.

abstract reset(id: Union[int, List[int], None] = None)[source]

Reset the state of all the environments and return initial observations if id is None, otherwise reset the specific environments with given id, either an int or a list.

abstract seed(seed: Union[int, List[int], None] = None) → None[source]

Set the seed for all environments.

Accept None, an int (which will extend i to [i, i + 1, i + 2, ...]) or a list.

Returns

The list of seeds used in this env’s random number generators.

The first value in the list should be the “main” seed, or the value which a reproducer should pass to ‘seed’.

abstract step(action: numpy.ndarray) → Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray, numpy.ndarray][source]

Run one timestep of all the environments’ dynamics. When the end of episode is reached, you are responsible for calling reset(id) to reset this environment’s state.

Accept a batch of action and return a tuple (obs, rew, done, info).

Parameters

action (numpy.ndarray) – a batch of action provided by the agent.

Returns

A tuple including four items:

  • obs a numpy.ndarray, the agent’s observation of current environments

  • rew a numpy.ndarray, the amount of rewards returned after previous actions

  • done a numpy.ndarray, whether these episodes have ended, in which case further step() calls will return undefined results

  • info a numpy.ndarray, contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

class tianshou.env.VectorEnv(env_fns: List[Callable[], gym.core.Env]])[source]

Bases: tianshou.env.vecenv.BaseVectorEnv

Dummy vectorized environment wrapper, implemented in for-loop.

See also

Please refer to BaseVectorEnv for more detailed explanation.

close() → None[source]

Close all of the environments.

Environments will automatically close() themselves when garbage collected or when the program exits.

render(**kwargs) → None[source]

Render all of the environments.

reset(id: Union[int, List[int], None] = None) → None[source]

Reset the state of all the environments and return initial observations if id is None, otherwise reset the specific environments with given id, either an int or a list.

seed(seed: Union[int, List[int], None] = None) → None[source]

Set the seed for all environments.

Accept None, an int (which will extend i to [i, i + 1, i + 2, ...]) or a list.

Returns

The list of seeds used in this env’s random number generators.

The first value in the list should be the “main” seed, or the value which a reproducer should pass to ‘seed’.

step(action: numpy.ndarray) → Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray, numpy.ndarray][source]

Run one timestep of all the environments’ dynamics. When the end of episode is reached, you are responsible for calling reset(id) to reset this environment’s state.

Accept a batch of action and return a tuple (obs, rew, done, info).

Parameters

action (numpy.ndarray) – a batch of action provided by the agent.

Returns

A tuple including four items:

  • obs a numpy.ndarray, the agent’s observation of current environments

  • rew a numpy.ndarray, the amount of rewards returned after previous actions

  • done a numpy.ndarray, whether these episodes have ended, in which case further step() calls will return undefined results

  • info a numpy.ndarray, contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

class tianshou.env.SubprocVectorEnv(env_fns: List[Callable[], gym.core.Env]])[source]

Bases: tianshou.env.vecenv.BaseVectorEnv

Vectorized environment wrapper based on subprocess.

See also

Please refer to BaseVectorEnv for more detailed explanation.

close() → None[source]

Close all of the environments.

Environments will automatically close() themselves when garbage collected or when the program exits.

render(**kwargs) → None[source]

Render all of the environments.

reset(id: Union[int, List[int], None] = None) → None[source]

Reset the state of all the environments and return initial observations if id is None, otherwise reset the specific environments with given id, either an int or a list.

seed(seed: Union[int, List[int], None] = None) → None[source]

Set the seed for all environments.

Accept None, an int (which will extend i to [i, i + 1, i + 2, ...]) or a list.

Returns

The list of seeds used in this env’s random number generators.

The first value in the list should be the “main” seed, or the value which a reproducer should pass to ‘seed’.

step(action: numpy.ndarray) → Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray, numpy.ndarray][source]

Run one timestep of all the environments’ dynamics. When the end of episode is reached, you are responsible for calling reset(id) to reset this environment’s state.

Accept a batch of action and return a tuple (obs, rew, done, info).

Parameters

action (numpy.ndarray) – a batch of action provided by the agent.

Returns

A tuple including four items:

  • obs a numpy.ndarray, the agent’s observation of current environments

  • rew a numpy.ndarray, the amount of rewards returned after previous actions

  • done a numpy.ndarray, whether these episodes have ended, in which case further step() calls will return undefined results

  • info a numpy.ndarray, contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

class tianshou.env.RayVectorEnv(env_fns: List[Callable[], gym.core.Env]])[source]

Bases: tianshou.env.vecenv.BaseVectorEnv

Vectorized environment wrapper based on ray. However, according to our test, it is about two times slower than SubprocVectorEnv.

See also

Please refer to BaseVectorEnv for more detailed explanation.

close() → None[source]

Close all of the environments.

Environments will automatically close() themselves when garbage collected or when the program exits.

render(**kwargs) → None[source]

Render all of the environments.

reset(id: Union[int, List[int], None] = None) → None[source]

Reset the state of all the environments and return initial observations if id is None, otherwise reset the specific environments with given id, either an int or a list.

seed(seed: Union[int, List[int], None] = None) → None[source]

Set the seed for all environments.

Accept None, an int (which will extend i to [i, i + 1, i + 2, ...]) or a list.

Returns

The list of seeds used in this env’s random number generators.

The first value in the list should be the “main” seed, or the value which a reproducer should pass to ‘seed’.

step(action: numpy.ndarray) → Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray, numpy.ndarray][source]

Run one timestep of all the environments’ dynamics. When the end of episode is reached, you are responsible for calling reset(id) to reset this environment’s state.

Accept a batch of action and return a tuple (obs, rew, done, info).

Parameters

action (numpy.ndarray) – a batch of action provided by the agent.

Returns

A tuple including four items:

  • obs a numpy.ndarray, the agent’s observation of current environments

  • rew a numpy.ndarray, the amount of rewards returned after previous actions

  • done a numpy.ndarray, whether these episodes have ended, in which case further step() calls will return undefined results

  • info a numpy.ndarray, contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)