15.2 Multi Arm Bandit

Inspired by a bank of poker machines with the gambler needing to choose which arm to pull in order to optimise outcomes.

Need to identify the objectives, rewards, and arms.

Applications in biological design and recommendations.

Objects: best arm identification (with a separate exploration stage to identify the best m items with fixed budget N) versus regret minimisation (no separate exploration stage instead recommends items sequentially to minimise cumulative regret).

Best Arm Identification with Fixed Budget.

