You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
idea 1: generate more suggestions and only send the top n_suggestions ranked by value.
idea 2: generate n_suggestions and set the reward for the lowest m suggestions to -1 (this should happen in the observe step)
idea 3: combine idea 1 and 2 - generate x factor more suggestions
than n_suggestions and send top n_suggestions ranked by value.
Use all of the suggestions to update the controller, setting the
rewards for the suggestions that didn't make it to -1 (this should
happen in the suggest step)
idea 5: reward function engineering: keep track of the running min
and max reward over the entire run, normalizing the reward for each
batch to be betwee -1 (min) and 1 (max)
Noting these down for the neurips bbo challenge
n_suggestions
ranked by value.n_suggestions
and set the reward for the lowestm
suggestions to -1 (this should happen in the observe step)x
factor more suggestionsthan
n_suggestions
and send topn_suggestions
ranked by value.Use all of the suggestions to update the controller, setting the
rewards for the suggestions that didn't make it to -1 (this should
happen in the suggest step)
actions by algorithm/hyperparameter name. This enables the
addition of arbitrary new hyperparameters while preserving the
weights of the old hyperparameters [metalearn] support arbitrary expansion of action space in metalearn_controller #24.
idea 5: reward function engineering: keep track of the running minand max reward over the entire run, normalizing the reward for each
batch to be betwee -1 (min) and 1 (max)
hyperparameters within the bounds specified by api_config:
https://pytorch.org/docs/stable/distributions.html#normal
The text was updated successfully, but these errors were encountered: