mapolicy

mapolicy#

Source code: tianshou/policy/multiagent/mapolicy.py

class MAPRolloutBatchProtocol(*args, **kwargs)[source]#

class MapTrainingStats(agent_id_to_stats: dict[str | int, TrainingStats], train_time_aggregator: Literal['min', 'max', 'mean'] = 'max')[source]#

get_loss_stats_dict() → dict[str, float][source]#: Collects loss_stats_dicts from all agents, prepends agent_id to all keys, and joins results.

class MultiAgentPolicyManager(*, policies: list[BasePolicy], env: PettingZooEnv, action_scaling: bool = False, action_bound_method: Literal['clip', 'tanh'] | None = 'clip', lr_scheduler: LRScheduler | MultipleLRSchedulers | None = None)[source]#

Multi-agent policy manager for MARL.

This multi-agent policy manager accepts a list of BasePolicy. It dispatches the batch data to each of these policies when the “forward” is called. The same as “process_fn” and “learn”: it splits the data and feeds them to each policy. A figure in Multi-Agent Reinforcement Learning can help you better understand this procedure.

Parameters:

policies – a list of policies.
env – a PettingZooEnv.
action_scaling – if True, scale the action from [-1, 1] to the range of action_space. Only used if the action_space is continuous.
action_bound_method – method to bound action to range [-1, 1]. Only used if the action_space is continuous.
lr_scheduler – if not None, will be called in policy.update().

exploration_noise(act: ndarray | BatchProtocol, batch: RolloutBatchProtocol) → ndarray | BatchProtocol[source]#: Add exploration noise from sub-policy onto act.

forward(batch: Batch, state: dict | Batch | None = None, **kwargs: Any) → Batch[source]#

Dispatch batch data from obs.agent_id to every policy’s forward.

Parameters:

batch – TODO: document what is expected at input and make a BatchProtocol for it
state – if None, it means all agents have no state. If not None, it should contain keys of “agent_1”, “agent_2”, …

Returns:

a Batch with the following contents: TODO: establish a BatcProtocol for this

{
    "act": actions corresponding to the input
    "state": {
        "agent_1": output state of agent_1's policy for the state
        "agent_2": xxx
        ...
        "agent_n": xxx}
    "out": {
        "agent_1": output of agent_1's policy for the input
        "agent_2": xxx
        ...
        "agent_n": xxx}
}

learn(batch: MAPRolloutBatchProtocol, *args: Any, **kwargs: Any) → MapTrainingStats[source]#

Dispatch the data to all policies for learning.

Parameters:: batch – must map agent_ids to rollout batches

policies: dict[str | int, BasePolicy]#: Maps agent_id to policy.

process_fn(batch: MAPRolloutBatchProtocol, buffer: ReplayBuffer, indice: ndarray) → MAPRolloutBatchProtocol[source]#

Dispatch batch data from obs.agent_id to every policy’s process_fn.

Save original multi-dimensional rew in “save_rew”, set rew to the reward of each agent during their “process_fn”, and restore the original reward afterwards.

replace_policy(policy: BasePolicy, agent_id: int) → None[source]#: Replace the “agent_id”th policy in this manager.

train(mode: bool = True) → Self[source]#: Set each internal policy in training mode.

mapolicy

Contents

mapolicy#