Agents learn about each arm's μ by observing the payouts yielded by that arm.
Agents update their beliefs using Bayes' Theorem
Agents' initial beliefs are given by the improper flat prior (i.e., p(μ) is proportional to a constant).
Agents choose the arm with the greatest E(μ).
Agents learn about each arm's μ and σ2 by observing the payouts yielded by that arm.
Agents update their beliefs using Bayes' Theorem.
Agents' initial beliefs are given by the improper prior p(μ,σ2) ∝ (σ2)-1.