# Training Configuration This documentation explains how to configure arguments for the training process. Several pre-defined training examples using specific algorithms can be found in the `example_train/algorithm` directory. These examples can serve as a reference for creating a new training configuration based on your requirements. GOPS use the `argparse` package to pass and parse arguments. These arguments will be passed to ```init_args() ``` function in `gops/utils/init_args.py` to create corresponding components such like samplers or algorithms. :::{important} Please note that certain arguments are interdependent, and modifying them separately may result in errors or inaccurate results. **Please read this documentation carefully before making any change.** ::: ## Environment Variables `OMP_NUM_THREADS` : This environment variable controls the number of threads used by each process when using the `ray` package for parallel computing. The default value is `1`. ## User Parameters Key parameters in user level: - `env_id` (str): ID of the environment - `algorithm` (str): name of the reinforcement learning algorithm to use - `enable_cuda` (bool): whether to use CUDA for computation - `seed` (int): (Optional): assign the global seed for training, using a random value by default ## Environment Parameters Basic and extra parameters for environment. - `action_type` (str): type of environment action: 'continu' or 'discret' - `is_render` (bool): whether render the env when evaluation :::{note} To standardize different types of environments, GOPS uses some additional environment wrappers by default. **You can also add or remove specific wrappers by configuring the corresponding parameters here.** Refer to {ref}`wrapping_utils` for more information. ::: :::{note} Some environments may require extra parameters, which should be added here. ::: ## Approximate Function Parameters Basic and extra parameters for value and policy function. - `value_func_name` (str): value function structure, depended on the used algorithm: `StateValue`, `ActionValue`, `ActionValueDis`, `ActionValueDistri` - `value_func_type` (str): type of value function, depended on the used algorithm: `MLP`, `CNN`, `CNN_SHARED`, `RNN`, `POLY`, `GAUSS` - `policy_func_name` (str): policy function structure, depended on the used algorithm: `None`, `DetermPolicy`, `FiniteHorizonPolicy`, `StochaPolicy` - `policy_func_type` (str): type of policy function, depended on the used algorithm: `MLP`, `CNN`, `CNN_SHARED`, `RNN`, `POLY`, `GAUSS` - `policy_act_distribution` (str): type of distribution for policy actions: `default`, `TanGaussDistribution`, `GaussDistribution` :::{note} Please note that some arguments are interdependent. Changing them separately may cause errors or incorrect results. ::: There are three main ways to check for such errors. - You can check the `init_args()` function in `gops/utils/init_args.py` as all arguments are passed here to create corresponding components. - For each type of function, you can find the complete configuration in `gops/appfunc`. - Different choices of the function type require specific parameters to be set. Details can be found in the `get_appfunc_dict` function in `gops/utils/common_utils`. For example, if the function type is `MLP` or `RNN`, the following parameters need to be set: - `hidden_sizes` (list): size of hidden layers in value or policy function. - `hidden_activation` (str): activation function for hidden layers in value or policy function: `relu`, `gelu`, `elu`, `selu`, `sigmoid`, `tanh` - `output_activation` (str): activation function for output in value or policy function: `linear`, `tanh` ## RL Algorithm Parameters Basic and extra parameters for algorithm. - `value_learning_rate` (float): learning rate of value iteration - `policy_learning_rate` (float): learning rate of policy iteration :::{note} For some RL algorithms, additional parameters need to be set. Please refer to the `algorithm` module for detailed information. ::: Take DSAC as an example: ```bash parser.add_argument("--value_learning_rate", type=float, default=1e-3) parser.add_argument("--policy_learning_rate", type=float, default=1e-3) # special parameter parser.add_argument("--alpha_learning_rate", type=float, default=1e-3) parser.add_argument("--gamma", type=float, default=0.99) parser.add_argument("--tau", type=float, default=0.2) parser.add_argument("--alpha", type=float, default=0.2) parser.add_argument("--auto_alpha", type=bool, default=True) parser.add_argument("--delay_update", type=int, default=2) parser.add_argument("--TD_bound", type=float, default=10) parser.add_argument("--bound", default=True) ``` ## Trainer Parameters Basic and extra parameters for trainer. - `trainer` (str): type of trainer: `off_serial_trainer`, `off_async_trainer`, `off_sync_trainer`, `on_serial_trainer`, `on_sync_trainer` - `max_iteration` (int): number of max iteration - `ini_network_dir` (str): path of initial networks - `num_algs` (int): number of algorithms if async trainer is used - `num_samplers` (int): number of samplers to use - `sample_interval` (int): period of sampling ## Buffer Parameters Basic and extra parameters for buffer. - `buffer_name` (str): name of buffer to use: `replay_buffer`, `prioritized_replay_buffer` - `buffer_warm_size` (int): size of collected samples before training - `buffer_max_size` (int): max size of replay buffer - `replay_batch_size` (int): batch size of replay samples from buffer ## Sampler Parameters Basic and extra parameters for sampler. - `sample_name` (str): name of sampler to use: `off_sampler`, `on_sampler` - `sample_batch_size` (int): batch size of sampler for buffer store - `noise_params` (dict): add noise to action for better exploration, only used for continuous action space ## Evaluator Parameters Basic and extra parameters for evaluator. - `evaluator_name` (str): name of evaluator to use: `evaluator`, `evaluator_filter` - `num_eval_episode` (int): number of episodes for evaluation - `eval_interval` (int): period of every evaluation episode - `eval_save` (bool): whether to save evaluation data: `True`, `False` ## Data Saving Parameters Basic and extra parameters for data saving. - `save_folder` (str): directory of data to save - `appfunc_save_interval` (int): save value/policy every N updates - `log_save_interval` (int): save key information every N updates