-
Book Overview & Buying
-
Table Of Contents
Mastering Reinforcement Learning with Python
By :
In our online advertising examples so far, we have assumed to have a fixed set of ads (actions/arms) to choose from. However, in many applications of contextual bandits, the set of available actions change over time. Take the example of a modern advertising network that uses an ad server to match ads to websites/apps. This is a very dynamic operation which involves, leaving the pricing aside, three major components:
Previously, we considered only the user profile for the context. An ad server needs to take the website/app content into account additionally, but this does not really change the structure of problem we solved before. However, now, we cannot use a separate model for each ad since the ad inventory is dynamic. We handle this by using a single model to which we feed ad features. This is illustrated in Figure 5.
Figure 3.5 –...