Training Configuration
This documentation explains how to configure arguments for the training process. Several pre-defined training examples using specific algorithms can be found in the example_train/algorithm directory. These examples can serve as a reference for creating a new training configuration based on your requirements.
GOPS use the argparse package to pass and parse arguments. These arguments will be passed to init_args() function in gops/utils/init_args.py to create corresponding components such like samplers or algorithms.
Important
Please note that certain arguments are interdependent, and modifying them separately may result in errors or inaccurate results. Please read this documentation carefully before making any change.
Environment Variables
OMP_NUM_THREADS : This environment variable controls the number of threads used by each process when using the ray package for parallel computing. The default value is 1.
User Parameters
Key parameters in user level:
env_id(str): ID of the environmentalgorithm(str): name of the reinforcement learning algorithm to useenable_cuda(bool): whether to use CUDA for computationseed(int): (Optional): assign the global seed for training, using a random value by default
Environment Parameters
Basic and extra parameters for environment.
action_type(str): type of environment action: ‘continu’ or ‘discret’is_render(bool): whether render the env when evaluation
Note
To standardize different types of environments, GOPS uses some additional environment wrappers by default. You can also add or remove specific wrappers by configuring the corresponding parameters here. Refer to Environment Wrapping Utils for more information.
Note
Some environments may require extra parameters, which should be added here.
Approximate Function Parameters
Basic and extra parameters for value and policy function.
value_func_name(str): value function structure, depended on the used algorithm:StateValue,ActionValue,ActionValueDis,ActionValueDistrivalue_func_type(str): type of value function, depended on the used algorithm:MLP,CNN,CNN_SHARED,RNN,POLY,GAUSSpolicy_func_name(str): policy function structure, depended on the used algorithm:None,DetermPolicy,FiniteHorizonPolicy,StochaPolicypolicy_func_type(str): type of policy function, depended on the used algorithm:MLP,CNN,CNN_SHARED,RNN,POLY,GAUSSpolicy_act_distribution(str): type of distribution for policy actions:default,TanGaussDistribution,GaussDistribution
Note
Please note that some arguments are interdependent. Changing them separately may cause errors or incorrect results.
There are three main ways to check for such errors.
You can check the
init_args()function ingops/utils/init_args.pyas all arguments are passed here to create corresponding components.For each type of function, you can find the complete configuration in
gops/appfunc.Different choices of the function type require specific parameters to be set. Details can be found in the
get_appfunc_dictfunction ingops/utils/common_utils.
For example, if the function type is MLP or RNN, the following parameters need to be set:
hidden_sizes(list): size of hidden layers in value or policy function.hidden_activation(str): activation function for hidden layers in value or policy function:relu,gelu,elu,selu,sigmoid,tanhoutput_activation(str): activation function for output in value or policy function:linear,tanh
RL Algorithm Parameters
Basic and extra parameters for algorithm.
value_learning_rate(float): learning rate of value iterationpolicy_learning_rate(float): learning rate of policy iteration
Note
For some RL algorithms, additional parameters need to be set. Please refer to the algorithm module for detailed information.
Take DSAC as an example:
parser.add_argument("--value_learning_rate", type=float, default=1e-3)
parser.add_argument("--policy_learning_rate", type=float, default=1e-3)
# special parameter
parser.add_argument("--alpha_learning_rate", type=float, default=1e-3)
parser.add_argument("--gamma", type=float, default=0.99)
parser.add_argument("--tau", type=float, default=0.2)
parser.add_argument("--alpha", type=float, default=0.2)
parser.add_argument("--auto_alpha", type=bool, default=True)
parser.add_argument("--delay_update", type=int, default=2)
parser.add_argument("--TD_bound", type=float, default=10)
parser.add_argument("--bound", default=True)
Trainer Parameters
Basic and extra parameters for trainer.
trainer(str): type of trainer:off_serial_trainer,off_async_trainer,off_sync_trainer,on_serial_trainer,on_sync_trainermax_iteration(int): number of max iterationini_network_dir(str): path of initial networksnum_algs(int): number of algorithms if async trainer is usednum_samplers(int): number of samplers to usesample_interval(int): period of sampling
Buffer Parameters
Basic and extra parameters for buffer.
buffer_name(str): name of buffer to use:replay_buffer,prioritized_replay_bufferbuffer_warm_size(int): size of collected samples before trainingbuffer_max_size(int): max size of replay bufferreplay_batch_size(int): batch size of replay samples from buffer
Sampler Parameters
Basic and extra parameters for sampler.
sample_name(str): name of sampler to use:off_sampler,on_samplersample_batch_size(int): batch size of sampler for buffer storenoise_params(dict): add noise to action for better exploration, only used for continuous action space
Evaluator Parameters
Basic and extra parameters for evaluator.
evaluator_name(str): name of evaluator to use:evaluator,evaluator_filternum_eval_episode(int): number of episodes for evaluationeval_interval(int): period of every evaluation episodeeval_save(bool): whether to save evaluation data:True,False
Data Saving Parameters
Basic and extra parameters for data saving.
save_folder(str): directory of data to saveappfunc_save_interval(int): save value/policy every N updateslog_save_interval(int): save key information every N updates