You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All of the algorithms with offline-to-online finetuning log training regret (regret obtained by online interactions which are used for training) under both train/regret and eval/regret. So we report only train regret which is different from Cal-QL work where authors report eval regret. Reporting eval regret is strange because the thing we really want to minimize on practice is a train regret so this bug is not critical but should be kept in mind. I will fix it but without reruning all of the algorithms due to compute limitations (maybe later we will rerun it).
The text was updated successfully, but these errors were encountered:
All of the algorithms with offline-to-online finetuning log training regret (regret obtained by online interactions which are used for training) under both
train/regret
andeval/regret
. So we report only train regret which is different from Cal-QL work where authors report eval regret. Reporting eval regret is strange because the thing we really want to minimize on practice is a train regret so this bug is not critical but should be kept in mind. I will fix it but without reruning all of the algorithms due to compute limitations (maybe later we will rerun it).The text was updated successfully, but these errors were encountered: