Training Configuration

This documentation explains how to configure arguments for the training process. Several pre-defined training examples using specific algorithms can be found in the example_train/algorithm directory. These examples can serve as a reference for creating a new training configuration based on your requirements.

GOPS use the argparse package to pass and parse arguments. These arguments will be passed to init_args() function in gops/utils/init_args.py to create corresponding components such like samplers or algorithms.

Important

Please note that certain arguments are interdependent, and modifying them separately may result in errors or inaccurate results. Please read this documentation carefully before making any change.

Environment Variables

OMP_NUM_THREADS : This environment variable controls the number of threads used by each process when using the ray package for parallel computing. The default value is 1.

User Parameters

Key parameters in user level:

env_id (str): ID of the environment
algorithm (str): name of the reinforcement learning algorithm to use
enable_cuda (bool): whether to use CUDA for computation
seed (int): (Optional): assign the global seed for training, using a random value by default

Environment Parameters

Basic and extra parameters for environment.

action_type (str): type of environment action: ‘continu’ or ‘discret’
is_render (bool): whether render the env when evaluation

Note

To standardize different types of environments, GOPS uses some additional environment wrappers by default. You can also add or remove specific wrappers by configuring the corresponding parameters here. Refer to Environment Wrapping Utils for more information.

Note

Some environments may require extra parameters, which should be added here.

Approximate Function Parameters

Basic and extra parameters for value and policy function.

value_func_name (str): value function structure, depended on the used algorithm: StateValue, ActionValue, ActionValueDis, ActionValueDistri
value_func_type (str): type of value function, depended on the used algorithm: MLP, CNN, CNN_SHARED, RNN, POLY, GAUSS
policy_func_name (str): policy function structure, depended on the used algorithm: None, DetermPolicy, FiniteHorizonPolicy, StochaPolicy
policy_func_type (str): type of policy function, depended on the used algorithm: MLP, CNN, CNN_SHARED, RNN, POLY, GAUSS
policy_act_distribution (str): type of distribution for policy actions: default, TanGaussDistribution, GaussDistribution

Note

Please note that some arguments are interdependent. Changing them separately may cause errors or incorrect results.

There are three main ways to check for such errors.

You can check the init_args() function in gops/utils/init_args.py as all arguments are passed here to create corresponding components.
For each type of function, you can find the complete configuration in gops/appfunc.
Different choices of the function type require specific parameters to be set. Details can be found in the get_appfunc_dict function in gops/utils/common_utils.

For example, if the function type is MLP or RNN, the following parameters need to be set:

hidden_sizes (list): size of hidden layers in value or policy function.
hidden_activation (str): activation function for hidden layers in value or policy function: relu, gelu, elu, selu, sigmoid, tanh
output_activation (str): activation function for output in value or policy function: linear, tanh

RL Algorithm Parameters

Basic and extra parameters for algorithm.

value_learning_rate (float): learning rate of value iteration
policy_learning_rate (float): learning rate of policy iteration

Note

For some RL algorithms, additional parameters need to be set. Please refer to the algorithm module for detailed information.

Take DSAC as an example:

parser.add_argument("--value_learning_rate", type=float, default=1e-3)
parser.add_argument("--policy_learning_rate", type=float, default=1e-3)
# special parameter
parser.add_argument("--alpha_learning_rate", type=float, default=1e-3)
parser.add_argument("--gamma", type=float, default=0.99)
parser.add_argument("--tau", type=float, default=0.2)
parser.add_argument("--alpha", type=float, default=0.2)
parser.add_argument("--auto_alpha", type=bool, default=True)
parser.add_argument("--delay_update", type=int, default=2)
parser.add_argument("--TD_bound", type=float, default=10)
parser.add_argument("--bound", default=True)

Trainer Parameters

Basic and extra parameters for trainer.

trainer (str): type of trainer: off_serial_trainer, off_async_trainer, off_sync_trainer, on_serial_trainer, on_sync_trainer
max_iteration (int): number of max iteration
ini_network_dir (str): path of initial networks
num_algs (int): number of algorithms if async trainer is used
num_samplers (int): number of samplers to use
sample_interval (int): period of sampling

Buffer Parameters

Basic and extra parameters for buffer.

buffer_name (str): name of buffer to use: replay_buffer, prioritized_replay_buffer
buffer_warm_size (int): size of collected samples before training
buffer_max_size (int): max size of replay buffer
replay_batch_size (int): batch size of replay samples from buffer

Sampler Parameters

Basic and extra parameters for sampler.

sample_name (str): name of sampler to use: off_sampler, on_sampler
sample_batch_size (int): batch size of sampler for buffer store
noise_params (dict): add noise to action for better exploration, only used for continuous action space

Evaluator Parameters

Basic and extra parameters for evaluator.

evaluator_name (str): name of evaluator to use: evaluator, evaluator_filter
num_eval_episode (int): number of episodes for evaluation
eval_interval (int): period of every evaluation episode
eval_save (bool): whether to save evaluation data: True, False

Data Saving Parameters

Basic and extra parameters for data saving.

save_folder (str): directory of data to save
appfunc_save_interval (int): save value/policy every N updates
log_save_interval (int): save key information every N updates