Training Configuration

This documentation explains how to configure arguments for the training process. Several pre-defined training examples using specific algorithms can be found in the example_train/algorithm directory. These examples can serve as a reference for creating a new training configuration based on your requirements.

GOPS use the argparse package to pass and parse arguments. These arguments will be passed to init_args() function in gops/utils/init_args.py to create corresponding components such like samplers or algorithms.

Important

Please note that certain arguments are interdependent, and modifying them separately may result in errors or inaccurate results. Please read this documentation carefully before making any change.

Environment Variables

OMP_NUM_THREADS : This environment variable controls the number of threads used by each process when using the ray package for parallel computing. The default value is 1.

User Parameters

Key parameters in user level:

  • env_id (str): ID of the environment

  • algorithm (str): name of the reinforcement learning algorithm to use

  • enable_cuda (bool): whether to use CUDA for computation

  • seed (int): (Optional): assign the global seed for training, using a random value by default

Environment Parameters

Basic and extra parameters for environment.

  • action_type (str): type of environment action: ‘continu’ or ‘discret’

  • is_render (bool): whether render the env when evaluation

Note

To standardize different types of environments, GOPS uses some additional environment wrappers by default. You can also add or remove specific wrappers by configuring the corresponding parameters here. Refer to Environment Wrapping Utils for more information.

Note

Some environments may require extra parameters, which should be added here.

Approximate Function Parameters

Basic and extra parameters for value and policy function.

  • value_func_name (str): value function structure, depended on the used algorithm: StateValue, ActionValue, ActionValueDis, ActionValueDistri

  • value_func_type (str): type of value function, depended on the used algorithm: MLP, CNN, CNN_SHARED, RNN, POLY, GAUSS

  • policy_func_name (str): policy function structure, depended on the used algorithm: None, DetermPolicy, FiniteHorizonPolicy, StochaPolicy

  • policy_func_type (str): type of policy function, depended on the used algorithm: MLP, CNN, CNN_SHARED, RNN, POLY, GAUSS

  • policy_act_distribution (str): type of distribution for policy actions: default, TanGaussDistribution, GaussDistribution

Note

Please note that some arguments are interdependent. Changing them separately may cause errors or incorrect results.

There are three main ways to check for such errors.

  • You can check the init_args() function in gops/utils/init_args.py as all arguments are passed here to create corresponding components.

  • For each type of function, you can find the complete configuration in gops/appfunc.

  • Different choices of the function type require specific parameters to be set. Details can be found in the get_appfunc_dict function in gops/utils/common_utils.

For example, if the function type is MLP or RNN, the following parameters need to be set:

  • hidden_sizes (list): size of hidden layers in value or policy function.

  • hidden_activation (str): activation function for hidden layers in value or policy function: relu, gelu, elu, selu, sigmoid, tanh

  • output_activation (str): activation function for output in value or policy function: linear, tanh

RL Algorithm Parameters

Basic and extra parameters for algorithm.

  • value_learning_rate (float): learning rate of value iteration

  • policy_learning_rate (float): learning rate of policy iteration

Note

For some RL algorithms, additional parameters need to be set. Please refer to the algorithm module for detailed information.

Take DSAC as an example:

parser.add_argument("--value_learning_rate", type=float, default=1e-3)
parser.add_argument("--policy_learning_rate", type=float, default=1e-3)
# special parameter
parser.add_argument("--alpha_learning_rate", type=float, default=1e-3)
parser.add_argument("--gamma", type=float, default=0.99)
parser.add_argument("--tau", type=float, default=0.2)
parser.add_argument("--alpha", type=float, default=0.2)
parser.add_argument("--auto_alpha", type=bool, default=True)
parser.add_argument("--delay_update", type=int, default=2)
parser.add_argument("--TD_bound", type=float, default=10)
parser.add_argument("--bound", default=True)

Trainer Parameters

Basic and extra parameters for trainer.

  • trainer (str): type of trainer: off_serial_trainer, off_async_trainer, off_sync_trainer, on_serial_trainer, on_sync_trainer

  • max_iteration (int): number of max iteration

  • ini_network_dir (str): path of initial networks

  • num_algs (int): number of algorithms if async trainer is used

  • num_samplers (int): number of samplers to use

  • sample_interval (int): period of sampling

Buffer Parameters

Basic and extra parameters for buffer.

  • buffer_name (str): name of buffer to use: replay_buffer, prioritized_replay_buffer

  • buffer_warm_size (int): size of collected samples before training

  • buffer_max_size (int): max size of replay buffer

  • replay_batch_size (int): batch size of replay samples from buffer

Sampler Parameters

Basic and extra parameters for sampler.

  • sample_name (str): name of sampler to use: off_sampler, on_sampler

  • sample_batch_size (int): batch size of sampler for buffer store

  • noise_params (dict): add noise to action for better exploration, only used for continuous action space

Evaluator Parameters

Basic and extra parameters for evaluator.

  • evaluator_name (str): name of evaluator to use: evaluator, evaluator_filter

  • num_eval_episode (int): number of episodes for evaluation

  • eval_interval (int): period of every evaluation episode

  • eval_save (bool): whether to save evaluation data: True, False

Data Saving Parameters

Basic and extra parameters for data saving.

  • save_folder (str): directory of data to save

  • appfunc_save_interval (int): save value/policy every N updates

  • log_save_interval (int): save key information every N updates