To compare the two algorithms, we're going to build a distribution that represents the payoffs of each algorithm, and do a quick test to see if the RPMBandit is, in fact, better than the SimpleBandit algorithm.
The following is a simulation harness that I've built to compare the two:
def run_bandit_sim(bandit_algorithm): simulated_experiment = BanditScenario({ 'A': { 'conversion_rate': 1, 'order_average': 35.00 }, 'B':{ 'conversion_rate': 1, 'order_average': 50.00 } }) simple_bandit = bandit_algorithm for visitor_i in range(500): treatment = simple_bandit.choose_treatment() payout = simulated_experiment.next_visitor(treatment) simple_bandit.log_payout(treatment, payout) return sum(simulated_experiment._bandit_payoffs) simple_bandit_results = np.array([run_bandit_sim(SimpleBandit(['A', 'B'])) for i in range(300)]) rpm_bandit_results = np.array([run_bandit_sim...