Skip to content
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
b4210c1
Merge py file changes from benchmark-algs
taufeeque9 Jan 4, 2023
97bc063
Clean parallel script
taufeeque9 Jan 10, 2023
9291225
Undo the changes from #653 to the dagger benchmark config files.
ernestum Jan 26, 2023
276d863
Improve readability and interpretability of benchmarking tests.
ernestum Jan 25, 2023
37eb914
Add pxponential beta scheduler for dagger
taufeeque9 Mar 1, 2023
877383b
Ignore coverage for unknown algorithms.
ernestum Feb 2, 2023
c8e55cb
Cleanup and extend tests for beta schedules in dagger.
ernestum Feb 2, 2023
6b9b306
Merge branch 'master' into benchmark-pr
taufeeque9 Feb 6, 2023
8576465
Fix test cases
taufeeque9 Feb 8, 2023
d81eb68
Add optuna to dependencies
taufeeque9 Feb 8, 2023
27467d3
Fix test case
taufeeque9 Feb 8, 2023
b59a768
Merge branch 'master' into benchmark-pr
taufeeque9 Feb 8, 2023
1a3b6b8
Clean up the scripts
taufeeque9 Feb 9, 2023
7a438da
Remove reporter(done) since mean_return is reported by the runs
taufeeque9 Feb 9, 2023
5bc5835
Merge branch 'master' into benchmark-pr
taufeeque9 Feb 20, 2023
2e56de8
Add beta_schedule parameter to dagger script
taufeeque9 Feb 23, 2023
84e854a
Merge branch 'master' into benchmark-pr
taufeeque9 Mar 16, 2023
73d8576
Update config policy kwargs
taufeeque9 Mar 16, 2023
9fdf878
Changes from review
taufeeque9 May 16, 2023
1c1dbc4
Fix errors with some configs
taufeeque9 May 16, 2023
3467af2
Merge branch 'master' into benchmark-pr
taufeeque9 May 16, 2023
44c4e97
Updates based on review
taufeeque9 Jun 14, 2023
4d493ae
Merge branch 'master' into benchmark-pr
taufeeque9 Jun 14, 2023
ab01269
Change metric everywhere
taufeeque9 Jun 14, 2023
f64580e
Merge branch 'master' into benchmark-pr
taufeeque9 Jul 11, 2023
e896d7d
Separate tuning code from parallel.py
taufeeque9 Jul 11, 2023
64c3a8d
Fix docstring
taufeeque9 Jul 11, 2023
8fba0d3
Removing resume option as it is getting tricky to correctly implement
taufeeque9 Jul 11, 2023
12ab31c
Minor fixes
taufeeque9 Jul 11, 2023
19b0f2c
Updates from review
taufeeque9 Jul 16, 2023
046b8d9
fix lint error
taufeeque9 Jul 16, 2023
8eee082
Add documentation for using the tuning script
taufeeque9 Jul 16, 2023
5ce7658
Fix lint error
taufeeque9 Jul 17, 2023
a8be331
Updates from the review
taufeeque9 Jul 18, 2023
4ff006d
Fix file name test errors
taufeeque9 Jul 18, 2023
6933afa
Add tune_run_kwargs in parallel script
taufeeque9 Jul 19, 2023
77f9d9b
Fix test errors
taufeeque9 Jul 19, 2023
54eb8a6
Fix test
taufeeque9 Jul 19, 2023
d50238f
Fix lint
taufeeque9 Jul 19, 2023
3fe22d4
Updates from review
taufeeque9 Jul 19, 2023
c50aa20
Simplify few lines of code
taufeeque9 Jul 20, 2023
000af61
Updates from review
taufeeque9 Aug 4, 2023
8b55134
Fix test
taufeeque9 Aug 4, 2023
f3ba2b5
Revert "Fix test"
taufeeque9 Aug 4, 2023
f8251c7
Fix test
taufeeque9 Aug 4, 2023
664fc37
Convert Dict to Mapping in input argument
taufeeque9 Aug 7, 2023
8690e1d
Ignore coverage in script configurations.
ernestum Aug 30, 2023
dd9eb6a
Pin huggingface_sb3 version.
ernestum Aug 30, 2023
b3930f4
Merge branch 'master' into benchmark-pr
ernestum Sep 26, 2023
40d87ef
Update to the newest seals environment versions.
ernestum Sep 26, 2023
71f6c92
Push gymnasium dependency to 0.29 to ensure mujoco envs work.
ernestum Sep 27, 2023
747ad32
Incorporate review comments
taufeeque9 Oct 4, 2023
691e759
Fix test errors
taufeeque9 Oct 4, 2023
2038e60
Move benchmarking/ to scripts/ and add named configs for tuned hyperp…
taufeeque9 Oct 4, 2023
35c7265
Bump cache version & remove unnecessary files
taufeeque9 Oct 5, 2023
fdf4f49
Include tuned hyperparam json files in package data
taufeeque9 Oct 5, 2023
5f9a4e6
Update storage hash
taufeeque9 Oct 5, 2023
91bb785
Update search space of bc
taufeeque9 Oct 5, 2023
3d93c84
Merge branch 'master' of github.com:HumanCompatibleAI/imitation into …
ZiyueWang25 Oct 5, 2023
f59fea2
update benchmark and hyper parameter tuning readme
ZiyueWang25 Oct 5, 2023
95110dc
Update README.md
taufeeque9 Oct 5, 2023
75f3477
Incorporate reviewer's comments in benchmarking readme
taufeeque9 Oct 6, 2023
77c1115
Merge branch 'master' into benchmark-pr
taufeeque9 Oct 6, 2023
1ba2b00
Update gymnasium version and render mode in eval policy
taufeeque9 Oct 7, 2023
ba4b693
Fix error
taufeeque9 Oct 7, 2023
bb76ee1
Merge branch 'update-gymnasium-dep' into benchmark-pr
taufeeque9 Oct 7, 2023
278f225
Merge branch 'master' into benchmark-pr
taufeeque9 Oct 8, 2023
01755a2
Update commands.py hex strings
taufeeque9 Oct 9, 2023
fdcef92
Merge branch 'master' into benchmark-pr
taufeeque9 Oct 9, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 18 additions & 17 deletions benchmarking/README.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,42 @@
# Benchmarking imitation

This directory contains sacred configuration files for benchmarking imitation's algorithms. For v0.3.2, these correspond to the hyperparameters used in the paper [imitation: Clean Imitation Learning Implementations](https://www.rocamonde.com/publication/gleave-imitation-2022/).
The `src/imitation/scripts/config/tuned_hps` directory provides the tuned hyperparameter configs for benchmarking imitation. For v0.4.0, these correspond to the hyperparameters used in the paper [imitation: Clean Imitation Learning Implementations](https://www.rocamonde.com/publication/gleave-imitation-2022/).
Comment thread
taufeeque9 marked this conversation as resolved.
Outdated

Configuration files can be loaded either from the CLI or from the Python API. The examples below assume that your current working directory is the root of the `imitation` repository. This is not necessarily the case and you should adjust your paths accordingly.
Configuration files can be loaded either from the CLI or from the Python API.

## CLI

```bash
python -m imitation.scripts.<train_script> <algo> with benchmarking/<config_name>.json
python -m imitation.scripts.<train_script> <algo> with <algo>_<env>
```
`train_script` can be either 1) `train_imitation` with `algo` as `bc` or `dagger` or 2) `train_adversarial` with `algo` as `gail` or `airl`.
`train_script` can be either 1) `train_imitation` with `algo` as `bc` or `dagger` or 2) `train_adversarial` with `algo` as `gail` or `airl`. The `env` can be either of `seals_ant`, `seals_half_cheetah`, `seals_hopper`, `seals_swimmer`, or `seals_walker`. The hyperparameters for other environments are not tuned yet. You can either the tuned hyperparameter for any of the other environments or tune the hyperparameters using the `tuning` script.
Comment thread
taufeeque9 marked this conversation as resolved.
Outdated

## Python

```python
...
from imitation.scripts.<train_script> import <train_ex>
<train_ex>.run(command_name="<algo>", named_configs=["benchmarking/<config_name>.json"])
<train_ex>.run(command_name="<algo>", named_configs=["<algo>_<env>"])
```

# Tuning Hyperparameters

The hyperparameters of any algorithm in imitation can be tuned using the `tuning.py` script.
The hyperparameters of any algorithm in imitation can be tuned using the `scripts/tuning.py`.
Comment thread
taufeeque9 marked this conversation as resolved.
Outdated
The benchmarking hyperparameter configs were generated by tuning the hyperparameters using
the search space defined in the `tuning_config.py` script. The tuning script proceeds in two
phases: 1) The hyperparameters are tuned using the search space provided, and 2) the best
hyperparameter config found in the first phase based on the maximum mean return is
re-evaluated on a separate set of seeds, and the mean and standard deviation of these trials
are reported.
the search space defined in the `scripts/config/tuning.py`.

To tune the hyperparameters of an algorithm using the default search space provided:
The tuning script proceeds in two phases:
1. Tune the hyperparameters using the search space provided.
2. Re-evaluate the best hyperparameter config found in the first phase based on the maximum mean return on a separate set of seeds. Report the mean and standard deviation of these trials.

To use it with the default search space:
```bash
python tuning.py with {algo} 'parallel_run_config.base_named_configs=["{env}"]'
python -m imitation.scripts.tuning with <algo> 'parallel_run_config.base_named_configs=["<env>"]'
```

In this command, `{algo}` provides the default search space and settings to be used for
the specific algorithm, which is defined in the `tuning_config.py` script and
`'parallel_run_config.base_named_configs=["{env}"]'` sets the environment to tune the algorithm in.
See the documentation of `tuning.py` and `parallel.py` scripts for many other arguments that can be
In this command:
- `<algo>` provides the default search space and settings for the specific algorithm, which is defined in the `scripts/config/tuning.py`
- `<env>` sets the environment to tune the algorithm in. They are defined in the algo-specifc `scripts/config/train_[adversarial/imitation/preference_comparisons/rl].py` files. For the already tuned environments, use the `<algo>_<env>` named configs here.
Comment thread
taufeeque9 marked this conversation as resolved.
Outdated

See the documentation of `scripts/tuning.py` and `scripts/parallel.py` for many other arguments that can be
provided through the command line to change the tuning behavior.
17 changes: 10 additions & 7 deletions experiments/commands.py
Comment thread
ernestum marked this conversation as resolved.
Original file line number Diff line number Diff line change
Expand Up @@ -12,23 +12,24 @@

For example, we can run:

TUNED_HPS_DIR=../src/imitation/scripts/config/tuned_hps
python commands.py \
--name=run0 \
--cfg_pattern=../benchmarking/*ai*_seals_walker_*.json \
--cfg_pattern=$TUNED_HPS_DIR/*ai*_seals_walker_*.json \
--output_dir=output

And get the following commands printed out:

python -m imitation.scripts.train_adversarial airl \
--capture=sys --name=run0 \
--file_storage=output/sacred/$USER-cmd-run0-airl-0-a3531726 \
with ../benchmarking/airl_seals_walker_best_hp_eval.json \
with ../src/imitation/scripts/config/tuned_hps/airl_seals_walker_best_hp_eval.json \
seed=0 logging.log_root=output

python -m imitation.scripts.train_adversarial gail \
--capture=sys --name=run0 \
--file_storage=output/sacred/$USER-cmd-run0-gail-0-a1ec171b \
with ../benchmarking/gail_seals_walker_best_hp_eval.json \
with $TUNED_HPS_DIR/gail_seals_walker_best_hp_eval.json \
seed=0 logging.log_root=output

We can execute commands in parallel by piping them to GNU parallel:
Expand All @@ -40,9 +41,10 @@

For example, we can run:

TUNED_HPS_DIR=../src/imitation/scripts/config/tuned_hps
python commands.py \
--name=run0 \
--cfg_pattern=../benchmarking/bc_seals_half_cheetah_best_hp_eval.json \
--cfg_pattern=$TUNED_HPS_DIR/bc_seals_half_cheetah_best_hp_eval.json \
--output_dir=/data/output \
--remote

Expand All @@ -51,8 +53,9 @@
ctl job run --name $USER-cmd-run0-bc-0-72cb1df3 \
--command "python -m imitation.scripts.train_imitation bc \
--capture=sys --name=run0 \
--file_storage=/data/output/sacred/$USER-cmd-run0-bc-0-72cb1df3 \
with /data/imitation/benchmarking/bc_seals_half_cheetah_best_hp_eval.json \
--file_storage=/data/output/sacred/$USER-cmd-run0-bc-0-72cb1df3 with \
/data/imitation/src/imitation/scripts/config/tuned_hps/
bc_seals_half_cheetah_best_hp_eval.json \
seed=0 logging.log_root=/data/output" \
--container hacobe/devbox:imitation \
--login --force-pull --never-restart --gpu 0 --shared-host-dir-mount /data
Expand Down Expand Up @@ -220,7 +223,7 @@ def parse() -> argparse.Namespace:
parser.add_argument(
"--remote_cfg_dir",
type=str,
default="/data/imitation/benchmarking",
default="/data/imitation/src/imitation/scripts/config/tuned_hps",
help="""Path to a directory storing config files \
accessible from each container. """,
)
Expand Down
1 change: 0 additions & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ per-file-ignores =
# F841 local variable unused [for Sacred config scopes]
src/imitation/scripts/config/*.py:F841
../src/imitation/scripts/config/*.py:F841
benchmarking/tuning_config.py:F841
src/imitation/envs/examples/airl_envs/*.py:D

[darglint]
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ def get_local_version(version: "ScmVersion", time_format="%Y%m%d") -> str:
python_requires=">=3.8.0",
packages=find_packages("src"),
package_dir={"": "src"},
package_data={"imitation": ["py.typed", "envs/examples/airl_envs/assets/*.xml"]},
package_data={"imitation": ["py.typed", "scripts/config/tuned_hps/*.json"]},
# Note: while we are strict with our test and doc requirement versions, we try to
# impose as little restrictions on the install requirements as possible. Try to
# encode only known incompatibilities here. This prevents nasty dependency issues
Expand Down
11 changes: 4 additions & 7 deletions src/imitation/scripts/analyze.py
Original file line number Diff line number Diff line change
Expand Up @@ -268,13 +268,10 @@ def analyze_imitation(
Returns:
The DataFrame generated from the Sacred logs.
"""
if table_verbosity == 3:
# Get column names for which we have get value using make_entry_fn
# These are same across Level 2 & 3. In Level 3, we additionally add remaining
# config columns.
table_entry_fns_subset = _get_table_entry_fns_subset(2)
else:
table_entry_fns_subset = _get_table_entry_fns_subset(table_verbosity)
# Get column names for which we have get value using make_entry_fn
# These are same across Level 2 & 3. In Level 3, we additionally add remaining
# config columns.
table_entry_fns_subset = _get_table_entry_fns_subset(min(table_verbosity, 2))

output_table = pd.DataFrame()
for sd in _gather_sacred_dicts():
Expand Down
2 changes: 1 addition & 1 deletion src/imitation/scripts/config/parallel.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
search spaces to the config like `"seed": tune.choice([0, 1, 2, 3])`.

For tuning hyperparameters of an algorithm on a given environment,
check out the benchmarking/tuning.py script.
check out the imitation/scripts/tuning.py script.
"""

import numpy as np
Expand Down
157 changes: 22 additions & 135 deletions src/imitation/scripts/config/train_adversarial.py
Comment thread
ernestum marked this conversation as resolved.
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
"""Configuration for imitation.scripts.train_adversarial."""

import pathlib

import sacred
from torch import nn

from imitation.rewards import reward_nets
from imitation.scripts.ingredients import demonstrations, environment, expert
Expand Down Expand Up @@ -101,29 +102,6 @@ def pendulum():
# Standard MuJoCo Gym environment named configs


@train_adversarial_ex.named_config
def seals_ant():
locals().update(**MUJOCO_SHARED_LOCALS)
locals().update(**ANT_SHARED_LOCALS)
environment = dict(gym_id="seals/Ant-v0")
demonstrations = dict(rollout_type="ppo-huggingface")
rl = dict(
batch_size=2048,
rl_kwargs=dict(
batch_size=16,
clip_range=0.3,
ent_coef=3.1441389214159857e-06,
gae_lambda=0.8,
gamma=0.995,
learning_rate=0.00017959211641976886,
max_grad_norm=0.9,
n_epochs=10,
# policy_kwargs are same as the defaults
vf_coef=0.4351450387648799,
),
)


CHEETAH_SHARED_LOCALS = dict(
MUJOCO_SHARED_LOCALS,
rl=dict(batch_size=16384, rl_kwargs=dict(batch_size=1024)),
Expand Down Expand Up @@ -158,117 +136,6 @@ def half_cheetah():
environment = dict(gym_id="HalfCheetah-v2")


@train_adversarial_ex.named_config
def seals_half_cheetah():
environment = dict(gym_id="seals/HalfCheetah-v0")
demonstrations = dict(rollout_type="ppo-huggingface")
rl = dict(
batch_size=512,
rl_kwargs=dict(
batch_size=64,
clip_range=0.1,
ent_coef=3.794797423594763e-06,
gae_lambda=0.95,
gamma=0.95,
learning_rate=0.0003286871805949382,
max_grad_norm=0.8,
n_epochs=5,
vf_coef=0.11483689492120866,
),
)
algorithm_kwargs = dict(
# Number of discriminator updates after each round of generator updates
n_disc_updates_per_round=16,
# Equivalent to no replay buffer if batch size is the same
gen_replay_buffer_capacity=512,
demo_batch_size=8192,
)


@train_adversarial_ex.named_config
def seals_hopper():
environment = dict(gym_id="seals/Hopper-v0")
demonstrations = dict(rollout_type="ppo-huggingface")
policy = dict(
policy_cls="MlpPolicy",
policy_kwargs=dict(
activation_fn=nn.ReLU,
net_arch=[dict(pi=[64, 64], vf=[64, 64])],
),
)
rl = dict(
batch_size=2048,
rl_kwargs=dict(
batch_size=512,
clip_range=0.1,
ent_coef=0.0010159833764878474,
gae_lambda=0.98,
gamma=0.995,
learning_rate=0.0003904770450788824,
max_grad_norm=0.9,
n_epochs=20,
vf_coef=0.20315938606555833,
),
)


@train_adversarial_ex.named_config
def seals_swimmer():
environment = dict(gym_id="seals/Swimmer-v0")
total_timesteps = int(2e6)
demonstrations = dict(rollout_type="ppo-huggingface")
policy = dict(
policy_cls="MlpPolicy",
policy_kwargs=dict(
activation_fn=nn.ReLU,
net_arch=[dict(pi=[64, 64], vf=[64, 64])],
),
)
rl = dict(
batch_size=2048,
rl_kwargs=dict(
batch_size=64,
clip_range=0.1,
ent_coef=5.167107294612664e-08,
gae_lambda=0.95,
gamma=0.999,
learning_rate=0.000414936134792374,
max_grad_norm=2,
n_epochs=5,
# policy_kwargs are same as the defaults
vf_coef=0.6162112311062333,
),
)


@train_adversarial_ex.named_config
def seals_walker():
environment = dict(gym_id="seals/Walker2d-v0")
demonstrations = dict(rollout_type="ppo-huggingface")
policy = dict(
policy_cls="MlpPolicy",
policy_kwargs=dict(
activation_fn=nn.ReLU,
net_arch=[dict(pi=[64, 64], vf=[64, 64])],
),
)
rl = dict(
batch_size=8192,
rl_kwargs=dict(
batch_size=128,
clip_range=0.4,
ent_coef=0.00013057334805552262,
gae_lambda=0.92,
gamma=0.98,
learning_rate=0.000138575372312869,
max_grad_norm=0.6,
n_epochs=20,
# policy_kwargs are same as the defaults
vf_coef=0.6167177795726859,
),
)


@train_adversarial_ex.named_config
def seals_humanoid():
locals().update(**MUJOCO_SHARED_LOCALS)
Expand Down Expand Up @@ -296,3 +163,23 @@ def fast():
demo_batch_size=1,
n_disc_updates_per_round=4,
)


hyperparam_dir = pathlib.Path(__file__).absolute().parent / "tuned_hps"
tuned_alg_envs = [
"airl_seals_ant",
"airl_seals_half_cheetah",
"airl_seals_hopper",
"airl_seals_swimmer",
"airl_seals_walker",
"gail_seals_ant",
"gail_seals_half_cheetah",
"gail_seals_hopper",
"gail_seals_swimmer",
"gail_seals_walker",
]

for tuned_alg_env in tuned_alg_envs:
config_file = hyperparam_dir / f"{tuned_alg_env}_best_hp_eval.json"
assert config_file.is_file(), f"{config_file} does not exist"
train_adversarial_ex.add_named_config(tuned_alg_env, str(config_file))
Loading