Let's evaluate the knowledge we gained in this chapter by answering the following questions:
- What is a MAB problem?
- How does the epsilon-greedy policy select an arm?
- What is the significance of T in softmax exploration?
- How do we compute the upper confidence bound?
- What happens when the value of alpha is higher than the value of beta in the beta distribution?
- What are the steps involved in Thompson sampling?
- What are contextual bandits?