tianshou.env

class tianshou.env.BaseVectorEnv(env_fns)[source]

Bases: abc.ABC, gym.core.Wrapper

Base class for vectorized environments wrapper. Usage:

env_num = 8
envs = VectorEnv([lambda: gym.make(task) for _ in range(env_num)])
assert len(envs) == env_num

It accepts a list of environment generators. In other words, an environment generator efn of a specific task means that efn() returns the environment of the given task, for example, gym.make(task).

All of the VectorEnv must inherit BaseVectorEnv. Here are some other usages:

envs.seed(2)  # which is equal to the next line
envs.seed([2, 3, 4, 5, 6, 7, 8, 9])  # set specific seed for each env
obs = envs.reset()  # reset all environments
obs = envs.reset([0, 5, 7])  # reset 3 specific environments
obs, rew, done, info = envs.step([1] * 8)  # step synchronously
envs.render()  # render all environments
envs.close()  # close all environments
__len__()[source]

Return len(self), which is the number of environments.

abstract close()[source]

Close all of the environments.

abstract render(**kwargs)[source]

Render all of the environments.

abstract reset(id=None)[source]

Reset the state of all the environments and return initial observations if id is None, otherwise reset the specific environments with given id, either an int or a list.

abstract seed(seed=None)[source]

Set the seed for all environments. Accept None, an int (which will extend i to [i, i + 1, i + 2, ...]) or a list.

abstract step(action)[source]

Run one timestep of all the environments’ dynamics. When the end of episode is reached, you are responsible for calling reset(id) to reset this environment’s state.

Accept a batch of action and return a tuple (obs, rew, done, info).

Parameters

action (numpy.ndarray) – a batch of action provided by the agent.

Returns

A tuple including four items:

  • obs a numpy.ndarray, the agent’s observation of current environments

  • rew a numpy.ndarray, the amount of rewards returned after previous actions

  • done a numpy.ndarray, whether these episodes have ended, in which case further step() calls will return undefined results

  • info a numpy.ndarray, contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

class tianshou.env.VectorEnv(env_fns)[source]

Bases: tianshou.env.vecenv.BaseVectorEnv

Dummy vectorized environment wrapper, implemented in for-loop. The usage is in BaseVectorEnv.

close()[source]

Close all of the environments.

render(**kwargs)[source]

Render all of the environments.

reset(id=None)[source]

Reset the state of all the environments and return initial observations if id is None, otherwise reset the specific environments with given id, either an int or a list.

seed(seed=None)[source]

Set the seed for all environments. Accept None, an int (which will extend i to [i, i + 1, i + 2, ...]) or a list.

step(action)[source]

Run one timestep of all the environments’ dynamics. When the end of episode is reached, you are responsible for calling reset(id) to reset this environment’s state.

Accept a batch of action and return a tuple (obs, rew, done, info).

Parameters

action (numpy.ndarray) – a batch of action provided by the agent.

Returns

A tuple including four items:

  • obs a numpy.ndarray, the agent’s observation of current environments

  • rew a numpy.ndarray, the amount of rewards returned after previous actions

  • done a numpy.ndarray, whether these episodes have ended, in which case further step() calls will return undefined results

  • info a numpy.ndarray, contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

class tianshou.env.SubprocVectorEnv(env_fns)[source]

Bases: tianshou.env.vecenv.BaseVectorEnv

Vectorized environment wrapper based on subprocess. The usage is in BaseVectorEnv.

close()[source]

Close all of the environments.

render(**kwargs)[source]

Render all of the environments.

reset(id=None)[source]

Reset the state of all the environments and return initial observations if id is None, otherwise reset the specific environments with given id, either an int or a list.

seed(seed=None)[source]

Set the seed for all environments. Accept None, an int (which will extend i to [i, i + 1, i + 2, ...]) or a list.

step(action)[source]

Run one timestep of all the environments’ dynamics. When the end of episode is reached, you are responsible for calling reset(id) to reset this environment’s state.

Accept a batch of action and return a tuple (obs, rew, done, info).

Parameters

action (numpy.ndarray) – a batch of action provided by the agent.

Returns

A tuple including four items:

  • obs a numpy.ndarray, the agent’s observation of current environments

  • rew a numpy.ndarray, the amount of rewards returned after previous actions

  • done a numpy.ndarray, whether these episodes have ended, in which case further step() calls will return undefined results

  • info a numpy.ndarray, contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

class tianshou.env.RayVectorEnv(env_fns)[source]

Bases: tianshou.env.vecenv.BaseVectorEnv

Vectorized environment wrapper based on ray. However, according to our test, it is about two times slower than SubprocVectorEnv. The usage is in BaseVectorEnv.

close()[source]

Close all of the environments.

render(**kwargs)[source]

Render all of the environments.

reset(id=None)[source]

Reset the state of all the environments and return initial observations if id is None, otherwise reset the specific environments with given id, either an int or a list.

seed(seed=None)[source]

Set the seed for all environments. Accept None, an int (which will extend i to [i, i + 1, i + 2, ...]) or a list.

step(action)[source]

Run one timestep of all the environments’ dynamics. When the end of episode is reached, you are responsible for calling reset(id) to reset this environment’s state.

Accept a batch of action and return a tuple (obs, rew, done, info).

Parameters

action (numpy.ndarray) – a batch of action provided by the agent.

Returns

A tuple including four items:

  • obs a numpy.ndarray, the agent’s observation of current environments

  • rew a numpy.ndarray, the amount of rewards returned after previous actions

  • done a numpy.ndarray, whether these episodes have ended, in which case further step() calls will return undefined results

  • info a numpy.ndarray, contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)