HumanCompatibleAI · AdamGleave · Oct 10, 2023 · Jan 4, 2023 · Jan 10, 2023 · Jan 26, 2023
diff --git a/benchmarking/README.md b/benchmarking/README.md
@@ -1,41 +1,42 @@
 # Benchmarking imitation
 
-This directory contains sacred configuration files for benchmarking imitation's algorithms. For v0.3.2, these correspond to the hyperparameters used in the paper [imitation: Clean Imitation Learning Implementations](https://www.rocamonde.com/publication/gleave-imitation-2022/).
+The `src/imitation/scripts/config/tuned_hps` directory provides the tuned hyperparameter configs for benchmarking imitation. For v0.4.0, these correspond to the hyperparameters used in the paper [imitation: Clean Imitation Learning Implementations](https://www.rocamonde.com/publication/gleave-imitation-2022/).
 
-Configuration files can be loaded either from the CLI or from the Python API. The examples below assume that your current working directory is the root of the `imitation` repository. This is not necessarily the case and you should adjust your paths accordingly.
+Configuration files can be loaded either from the CLI or from the Python API.
 
 ## CLI
 
 ```bash
-python -m imitation.scripts.<train_script> <algo> with benchmarking/<config_name>.json
+python -m imitation.scripts.<train_script> <algo> with <algo>_<env>
 ```
-`train_script` can be either 1) `train_imitation` with `algo` as `bc` or `dagger` or 2) `train_adversarial`  with `algo` as `gail` or `airl`.
+`train_script` can be either 1) `train_imitation` with `algo` as `bc` or `dagger` or 2) `train_adversarial`  with `algo` as `gail` or `airl`. The `env` can be either of `seals_ant`, `seals_half_cheetah`, `seals_hopper`, `seals_swimmer`, or `seals_walker`. The hyperparameters for other environments are not tuned yet. You can either the tuned hyperparameter for any of the other environments or tune the hyperparameters using the `tuning` script.
 
 ## Python
 
 ```python
 ...
 from imitation.scripts.<train_script> import <train_ex>
-<train_ex>.run(command_name="<algo>", named_configs=["benchmarking/<config_name>.json"])
+<train_ex>.run(command_name="<algo>", named_configs=["<algo>_<env>"])
 ```
 
 # Tuning Hyperparameters
 
-The hyperparameters of any algorithm in imitation can be tuned using the `tuning.py` script.
+The hyperparameters of any algorithm in imitation can be tuned using the `scripts/tuning.py`.
 The benchmarking hyperparameter configs were generated by tuning the hyperparameters using
-the search space defined in the `tuning_config.py` script. The tuning script proceeds in two
-phases: 1) The hyperparameters are tuned using the search space provided, and 2) the best
-hyperparameter config found in the first phase based on the maximum mean return is
-re-evaluated on a separate set of seeds, and the mean and standard deviation of these trials
-are reported.
+the search space defined in the `scripts/config/tuning.py`.
 
-To tune the hyperparameters of an algorithm using the default search space provided:
+The tuning script proceeds in two phases:
+1. Tune the hyperparameters using the search space provided.
+2. Re-evaluate the best hyperparameter config found in the first phase based on the maximum mean return on a separate set of seeds. Report the mean and standard deviation of these trials.
+
+To use it with the default search space:
 ```bash
-python tuning.py with {algo} 'parallel_run_config.base_named_configs=["{env}"]'
+python -m imitation.scripts.tuning with <algo> 'parallel_run_config.base_named_configs=["<env>"]'
 ```
 
-In this command, `{algo}` provides the default search space and settings to be used for
-the specific algorithm, which is defined in the `tuning_config.py` script and
-`'parallel_run_config.base_named_configs=["{env}"]'` sets the environment to tune the algorithm in.
-See the documentation of `tuning.py` and `parallel.py` scripts for many other arguments that can be
+In this command: 
+- `<algo>` provides the default search space and settings for the specific algorithm, which is defined in the `scripts/config/tuning.py`
+- `<env>` sets the environment to tune the algorithm in. They are defined in the algo-specifc `scripts/config/train_[adversarial/imitation/preference_comparisons/rl].py` files. For the already tuned environments, use the `<algo>_<env>` named configs here.
+
+See the documentation of `scripts/tuning.py` and `scripts/parallel.py` for many other arguments that can be
 provided through the command line to change the tuning behavior.
diff --git a/experiments/commands.py b/experiments/commands.py
@@ -12,23 +12,24 @@
 
 For example, we can run:
 
+TUNED_HPS_DIR=../src/imitation/scripts/config/tuned_hps
 python commands.py \
     --name=run0 \
-    --cfg_pattern=../benchmarking/*ai*_seals_walker_*.json \
+    --cfg_pattern=$TUNED_HPS_DIR/*ai*_seals_walker_*.json \
     --output_dir=output
 
 And get the following commands printed out:
 
 python -m imitation.scripts.train_adversarial airl \
     --capture=sys --name=run0 \
     --file_storage=output/sacred/$USER-cmd-run0-airl-0-a3531726 \
-    with ../benchmarking/airl_seals_walker_best_hp_eval.json \
+    with ../src/imitation/scripts/config/tuned_hps/airl_seals_walker_best_hp_eval.json \
     seed=0 logging.log_root=output
 
 python -m imitation.scripts.train_adversarial gail \
     --capture=sys --name=run0 \
     --file_storage=output/sacred/$USER-cmd-run0-gail-0-a1ec171b \
-    with ../benchmarking/gail_seals_walker_best_hp_eval.json \
+    with $TUNED_HPS_DIR/gail_seals_walker_best_hp_eval.json \
     seed=0 logging.log_root=output
 
 We can execute commands in parallel by piping them to GNU parallel:
@@ -40,9 +41,10 @@
 
 For example, we can run:
 
+TUNED_HPS_DIR=../src/imitation/scripts/config/tuned_hps
 python commands.py \
     --name=run0 \
-    --cfg_pattern=../benchmarking/bc_seals_half_cheetah_best_hp_eval.json \
+    --cfg_pattern=$TUNED_HPS_DIR/bc_seals_half_cheetah_best_hp_eval.json \
     --output_dir=/data/output \
     --remote
 
@@ -51,8 +53,9 @@
 ctl job run --name $USER-cmd-run0-bc-0-72cb1df3 \
     --command "python -m imitation.scripts.train_imitation bc \
     --capture=sys --name=run0 \
-    --file_storage=/data/output/sacred/$USER-cmd-run0-bc-0-72cb1df3 \
-    with /data/imitation/benchmarking/bc_seals_half_cheetah_best_hp_eval.json \
+    --file_storage=/data/output/sacred/$USER-cmd-run0-bc-0-72cb1df3 with \
+    /data/imitation/src/imitation/scripts/config/tuned_hps/
+    bc_seals_half_cheetah_best_hp_eval.json \
     seed=0 logging.log_root=/data/output" \
     --container hacobe/devbox:imitation \
     --login --force-pull --never-restart --gpu 0 --shared-host-dir-mount /data
@@ -220,7 +223,7 @@ def parse() -> argparse.Namespace:
     parser.add_argument(
         "--remote_cfg_dir",
         type=str,
-        default="/data/imitation/benchmarking",
+        default="/data/imitation/src/imitation/scripts/config/tuned_hps",
         help="""Path to a directory storing config files \
 accessible from each container. """,
     )

diff --git a/setup.cfg b/setup.cfg
@@ -7,7 +7,6 @@ per-file-ignores =
 # F841 local variable unused [for Sacred config scopes]
   src/imitation/scripts/config/*.py:F841
   ../src/imitation/scripts/config/*.py:F841
-  benchmarking/tuning_config.py:F841
   src/imitation/envs/examples/airl_envs/*.py:D
 
 [darglint]

diff --git a/setup.py b/setup.py
@@ -182,7 +182,7 @@ def get_local_version(version: "ScmVersion", time_format="%Y%m%d") -> str:
     python_requires=">=3.8.0",
     packages=find_packages("src"),
     package_dir={"": "src"},
-    package_data={"imitation": ["py.typed", "envs/examples/airl_envs/assets/*.xml"]},
+    package_data={"imitation": ["py.typed", "scripts/config/tuned_hps/*.json"]},
     # Note: while we are strict with our test and doc requirement versions, we try to
     #   impose as little restrictions on the install requirements as possible. Try to
     #   encode only known incompatibilities here. This prevents nasty dependency issues

diff --git a/src/imitation/scripts/analyze.py b/src/imitation/scripts/analyze.py
@@ -268,13 +268,10 @@ def analyze_imitation(
     Returns:
         The DataFrame generated from the Sacred logs.
     """
-    if table_verbosity == 3:
-        # Get column names for which we have get value using make_entry_fn
-        # These are same across Level 2 & 3. In Level 3, we additionally add remaining
-        #  config columns.
-        table_entry_fns_subset = _get_table_entry_fns_subset(2)
-    else:
-        table_entry_fns_subset = _get_table_entry_fns_subset(table_verbosity)
+    # Get column names for which we have get value using make_entry_fn
+    # These are same across Level 2 & 3. In Level 3, we additionally add remaining
+    #  config columns.
+    table_entry_fns_subset = _get_table_entry_fns_subset(min(table_verbosity, 2))
 
     output_table = pd.DataFrame()
     for sd in _gather_sacred_dicts():

diff --git a/src/imitation/scripts/config/parallel.py b/src/imitation/scripts/config/parallel.py
@@ -8,7 +8,7 @@
 search spaces to the config like `"seed": tune.choice([0, 1, 2, 3])`.
 
 For tuning hyperparameters of an algorithm on a given environment,
-check out the benchmarking/tuning.py script.
+check out the imitation/scripts/tuning.py script.
 """
 
 import numpy as np

diff --git a/src/imitation/scripts/config/train_adversarial.py b/src/imitation/scripts/config/train_adversarial.py
@@ -1,7 +1,8 @@
 """Configuration for imitation.scripts.train_adversarial."""
 
+import pathlib
+
 import sacred
-from torch import nn
 
 from imitation.rewards import reward_nets
 from imitation.scripts.ingredients import demonstrations, environment, expert
@@ -101,29 +102,6 @@ def pendulum():
 # Standard MuJoCo Gym environment named configs
 
 
-@train_adversarial_ex.named_config
-def seals_ant():
-    locals().update(**MUJOCO_SHARED_LOCALS)
-    locals().update(**ANT_SHARED_LOCALS)
-    environment = dict(gym_id="seals/Ant-v0")
-    demonstrations = dict(rollout_type="ppo-huggingface")
-    rl = dict(
-        batch_size=2048,
-        rl_kwargs=dict(
-            batch_size=16,
-            clip_range=0.3,
-            ent_coef=3.1441389214159857e-06,
-            gae_lambda=0.8,
-            gamma=0.995,
-            learning_rate=0.00017959211641976886,
-            max_grad_norm=0.9,
-            n_epochs=10,
-            # policy_kwargs are same as the defaults
-            vf_coef=0.4351450387648799,
-        ),
-    )
-
-
 CHEETAH_SHARED_LOCALS = dict(
     MUJOCO_SHARED_LOCALS,
     rl=dict(batch_size=16384, rl_kwargs=dict(batch_size=1024)),
@@ -158,117 +136,6 @@ def half_cheetah():
     environment = dict(gym_id="HalfCheetah-v2")
 
 
-@train_adversarial_ex.named_config
-def seals_half_cheetah():
-    environment = dict(gym_id="seals/HalfCheetah-v0")
-    demonstrations = dict(rollout_type="ppo-huggingface")
-    rl = dict(
-        batch_size=512,
-        rl_kwargs=dict(
-            batch_size=64,
-            clip_range=0.1,
-            ent_coef=3.794797423594763e-06,
-            gae_lambda=0.95,
-            gamma=0.95,
-            learning_rate=0.0003286871805949382,
-            max_grad_norm=0.8,
-            n_epochs=5,
-            vf_coef=0.11483689492120866,
-        ),
-    )
-    algorithm_kwargs = dict(
-        # Number of discriminator updates after each round of generator updates
-        n_disc_updates_per_round=16,
-        # Equivalent to no replay buffer if batch size is the same
-        gen_replay_buffer_capacity=512,
-        demo_batch_size=8192,
-    )
-
-
-@train_adversarial_ex.named_config
-def seals_hopper():
-    environment = dict(gym_id="seals/Hopper-v0")
-    demonstrations = dict(rollout_type="ppo-huggingface")
-    policy = dict(
-        policy_cls="MlpPolicy",
-        policy_kwargs=dict(
-            activation_fn=nn.ReLU,
-            net_arch=[dict(pi=[64, 64], vf=[64, 64])],
-        ),
-    )
-    rl = dict(
-        batch_size=2048,
-        rl_kwargs=dict(
-            batch_size=512,
-            clip_range=0.1,
-            ent_coef=0.0010159833764878474,
-            gae_lambda=0.98,
-            gamma=0.995,
-            learning_rate=0.0003904770450788824,
-            max_grad_norm=0.9,
-            n_epochs=20,
-            vf_coef=0.20315938606555833,
-        ),
-    )
-
-
-@train_adversarial_ex.named_config
-def seals_swimmer():
-    environment = dict(gym_id="seals/Swimmer-v0")
-    total_timesteps = int(2e6)
-    demonstrations = dict(rollout_type="ppo-huggingface")
-    policy = dict(
-        policy_cls="MlpPolicy",
-        policy_kwargs=dict(
-            activation_fn=nn.ReLU,
-            net_arch=[dict(pi=[64, 64], vf=[64, 64])],
-        ),
-    )
-    rl = dict(
-        batch_size=2048,
-        rl_kwargs=dict(
-            batch_size=64,
-            clip_range=0.1,
-            ent_coef=5.167107294612664e-08,
-            gae_lambda=0.95,
-            gamma=0.999,
-            learning_rate=0.000414936134792374,
-            max_grad_norm=2,
-            n_epochs=5,
-            # policy_kwargs are same as the defaults
-            vf_coef=0.6162112311062333,
-        ),
-    )
-
-
-@train_adversarial_ex.named_config
-def seals_walker():
-    environment = dict(gym_id="seals/Walker2d-v0")
-    demonstrations = dict(rollout_type="ppo-huggingface")
-    policy = dict(
-        policy_cls="MlpPolicy",
-        policy_kwargs=dict(
-            activation_fn=nn.ReLU,
-            net_arch=[dict(pi=[64, 64], vf=[64, 64])],
-        ),
-    )
-    rl = dict(
-        batch_size=8192,
-        rl_kwargs=dict(
-            batch_size=128,
-            clip_range=0.4,
-            ent_coef=0.00013057334805552262,
-            gae_lambda=0.92,
-            gamma=0.98,
-            learning_rate=0.000138575372312869,
-            max_grad_norm=0.6,
-            n_epochs=20,
-            # policy_kwargs are same as the defaults
-            vf_coef=0.6167177795726859,
-        ),
-    )
-
-
 @train_adversarial_ex.named_config
 def seals_humanoid():
     locals().update(**MUJOCO_SHARED_LOCALS)
@@ -296,3 +163,23 @@ def fast():
         demo_batch_size=1,
         n_disc_updates_per_round=4,
     )
+
+
+hyperparam_dir = pathlib.Path(__file__).absolute().parent / "tuned_hps"
+tuned_alg_envs = [
+    "airl_seals_ant",
+    "airl_seals_half_cheetah",
+    "airl_seals_hopper",
+    "airl_seals_swimmer",
+    "airl_seals_walker",
+    "gail_seals_ant",
+    "gail_seals_half_cheetah",
+    "gail_seals_hopper",
+    "gail_seals_swimmer",
+    "gail_seals_walker",
+]
+
+for tuned_alg_env in tuned_alg_envs:
+    config_file = hyperparam_dir / f"{tuned_alg_env}_best_hp_eval.json"
+    assert config_file.is_file(), f"{config_file} does not exist"
+    train_adversarial_ex.add_named_config(tuned_alg_env, str(config_file))