To set up a python environment (with dev-tools of your taste, in our workflow, we use conda and python 3.8), just install all the requirements:
python3 install -r requirements.txt
However, in this setup, you must install mujoco210 binaries by hand. Sometimes this is not super straightforward, but this recipe can help:
mkdir -p /root/.mujoco \
&& wget https://mujoco.org/download/mujoco210-linux-x86_64.tar.gz -O mujoco.tar.gz \
&& tar -xf mujoco.tar.gz -C /root/.mujoco \
&& rm mujoco.tar.gz
export LD_LIBRARY_PATH=/root/.mujoco/mujoco210/bin:${LD_LIBRARY_PATH}
You may also need to install additional dependencies for mujoco_py. We recommend following the official guide from mujoco_py.
We also provide a more straightforward way with a dockerfile that is already set up to work. All you have to do is build and run it :)
docker build -t rebrac .
To run, mount current directory:
docker run -it \
--gpus=all \
--rm \
--volume "<PATH_TO_THE_REPO>:/workspace/" \
--name rebrac \
rebrac bash
To reproduce V-D4RL, you need to download the corresponding datasets. The easiest way is probably to run the download_vd4rl.sh
script we provide.
You can also do it manually with the following links to the datasets archives:
Note that provided links contain only datasets reported in the paper without distraction and multitasking.
After downloading the datasets, you must put the data into the vd4rl
directory.
Configs for the main experiments are stored in the configs/rebrac/<task_type>
and configs/rebrac-vis/<task_type>
.
All available hyperparameters are listed in the rebrac/algorithms/rebrac.py
for D4RL and rebrac/algorithms/rebrac_torch_vis.py
for V-D4RL.
For example, to start ReBRAC training process with D4RL halfcheetah-medium-v2
dataset, run the following:
PYTHONPATH=. python3 src/algorithms/rebrac.py --config_path="configs/rebrac/halfcheetah/halfcheetah_medium.yaml"
For V-D4RL walker_walk-expert-v2
dataset, run the following:
PYTHONPATH=. python3 src/algorithms/rebrac_torch_vis.py --config_path="configs/rebrac-vis/walker_walk/expert.yaml"
For better transparency and replication, we release all the experiments (5k+) in the form of Weights & Biases reports.
If you want to replicate results from our work, you can use the configs for Weights & Biases Sweeps provided in the configs/sweeps
. Note, we do not supply a codebase for both IQL and SAC-RND. However, in our work, we relied upon these implementations: IQL (CORL), SAC-RND (original implementation).
Paper element | Sweeps to run from configs/sweeps/ |
---|---|
Tables 2, 3, 4 | eval/rebrac_d4rl_sweep.yaml , eval/td3_bc_d4rl_sweep.yaml |
Table 5 | eval/rebrac_visual_sweep.yaml |
Table 6 | All sweeps from ablations |
Figure 2 | All sweeps from network_sizes |
Hyperparameters tuning | All sweeps from tuning |
We also provide scripts for reconstructing the graphs in our paper: eop/ReBRAC_ploting.ipynb
, including performance profiles, probability of improvement, and expected online performance. For your convenience, we repacked the results into .pickle files, so you can re-use them for further research and head-to-head comparisons.
If you use this code for your research, please consider the following bibtex:
@article{tarasov2023revisiting,
title={Revisiting the Minimalist Approach to Offline Reinforcement Learning},
author={Denis Tarasov and Vladislav Kurenkov and Alexander Nikulin and Sergey Kolesnikov},
journal={arXiv preprint arXiv:2305.09836},
year={2023}
}