utils#


gather_info(start_time: float, policy_update_time: float, gradient_step: int, best_reward: float, best_reward_std: float, train_collector: Collector | None = None, test_collector: Collector | None = None) InfoStats[source]#

A simple wrapper of gathering information from collectors.

Returns:

A dataclass object with the following members (depending on available collectors):

  • gradient_step the total number of gradient steps;

  • best_reward the best reward over the test results;

  • best_reward_std the standard deviation of best reward over the test results;

  • train_step the total collected step of training collector;

  • train_episode the total collected episode of training collector;

  • test_step the total collected step of test collector;

  • test_episode the total collected episode of test collector;

  • timing the timing statistics, with the following members:

  • total_time the total time elapsed;

  • train_time the total time elapsed for learning training (collecting samples plus model update);

  • train_time_collect the time for collecting transitions in the training collector;

  • train_time_update the time for training models;

  • test_time the time for testing;

  • update_speed the speed of updating (env_step per second).

test_episode(policy: BasePolicy, collector: Collector, test_fn: Callable[[int, int | None], None] | None, epoch: int, n_episode: int, logger: BaseLogger | None = None, global_step: int | None = None, reward_metric: Callable[[ndarray], ndarray] | None = None) CollectStats[source]#

A simple wrapper of testing policy in collector.