Name		Name	Last commit message	Last commit date
parent directory ..
03a-ranking-bandit-local-train.ipynb		03a-ranking-bandit-local-train.ipynb
03x-baseline-train-ranking-bandit.ipynb		03x-baseline-train-ranking-bandit.ipynb
03x_restore_ranking_agents.ipynb		03x_restore_ranking_agents.ipynb
README.md		README.md
vocab_dict.pkl		vocab_dict.pkl

README.md

Bandit Rankers

TODO

Objectives

baseline-ranking-agents.ipynb - build baseline ranking bandits: score vector and cascading feedback
TODO - build ranking image
TODO - submit ranking train job to Vertex AI
TODO - serving ranking bandit with Vertex AI

Overview

The contextual bandits approach is classified as an extension of multi-armed bandits
a contextual multi-armed bandit problem is a simplified reinforcement learning algorithm where the agent takes an action from a set of possible actions

The Bandit Ranking agent will be similar to the `NeuralEpsilonGreedy` agent.

Main differences:

The item features are stored in the per_arm part of the observation, in the order of how they are recommended
Since this ordered list of items expresses what action was taken by the policy, the action value of the trajectory is not used by the agent.

Note: difference between the "per-arm" observation recieved by the policy vs the agent:

While the agent receives the items in the recommendation slots, the policy receives the items that are available for recommendation.

The user is responsible for converting the observation to the syntax required by the agent.

Training data structure

The training observation contains the global features and the features of the items in the recommendation slots

The item features are stored in the per_arm part of the observation, in the order of how they are recommended
Note: since this ordered list of items expresses what action was taken by the policy, the action value of the trajectory is not used by the agent

TODO

Understanding model variables and gradients during training with TensorBoard

Historgrams

TODO

Distributions

TODO

References

Cascading Linear Submodular Bandits: Accounting for Position Bias and Diversity in Online Learning to Rank, G. Hiranandani, H. Singh, P. Gupta, I. A. Burhanuddin, Z. Wen and B. Kveton, 35th Conference on Uncertainty in Artificial Intelligence (2019)

account for both position bias and diversity in forming the list of items to recommend

Contextual Combinatorial Cascading Bandits, , S. Li, B. Wang, S. Zhang, W. Chen, Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:1245-1253, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

03-ranking

03-ranking

README.md

Bandit Rankers

Objectives

Overview

The Bandit Ranking agent will be similar to the `NeuralEpsilonGreedy` agent.

Training data structure

Understanding model variables and gradients during training with TensorBoard

Historgrams

Distributions

References

Files

03-ranking

Directory actions

More options

Directory actions

More options

Latest commit

History

03-ranking

Folders and files

parent directory

README.md

Bandit Rankers

Objectives

Overview

The Bandit Ranking agent will be similar to the NeuralEpsilonGreedy agent.

Training data structure

Understanding model variables and gradients during training with TensorBoard

Historgrams

Distributions

References

The Bandit Ranking agent will be similar to the `NeuralEpsilonGreedy` agent.