• Erratum
• Open Access

# Erratum to: Generalized Thompson sampling for sequential decision-making and causal inference

https://doi.org/10.1186/s40294-014-0004-x

• Accepted: 7 August 2014
• Published:

The original article was published in Complex Adaptive Systems Modeling 2014 2:2

No abstract.

## Decisions in the presence of latent variables

We correct errors in equations (14), (15) and (19) of the main text.

### Equations (14) and (15)

Nature’s probability of flipping either coin does not actually depend on the agent’s prediction, so we can replace the conditional probabilities p0(θ|x) by p0(θ). We have then an inner variational problem:
$arg\underset{\stackrel{~}{p}\left(\theta |x\right)}{max}\underset{\theta }{\Sigma }\stackrel{~}{p}\left(\theta |x\right)\left[-\frac{1}{\beta }log\frac{\stackrel{~}{p}\left(\theta |x\right)}{{p}_{0}\left(\theta \right)}+U\left(x,\theta \right)\right]$
(14)
with the solution
$p\left(\theta |x\right)=\frac{1}{{Z}_{\beta }\left(x\right)}{p}_{0}\left(\theta \right)exp\left(\beta U\left(x,\theta \right)\right)$
(15)

and the normalization constant ${Z}_{\beta }\left(x\right)={\Sigma }_{\theta }{p}_{0}\left(\theta \right)exp\left(\beta U\left(x,\theta \right)\right)$ and an outer variational problem as described by equation (16) in the main text. Note that deliberation renders the two variables x and θ dependent.

### Equation (19)

In the case of α = β and uniform prior ${p}_{0}\left(x\right)=\mathcal{U}\left(x\right)$, equation (17) reduces to
$p\left(x\right)=\underset{\theta }{\Sigma }{p}_{0}\left(\theta \right)\frac{{e}^{\alpha U\left(x,\theta \right)}}{{Z}_{\alpha }},$
(19)
where ${Z}_{\alpha }=\underset{x}{\Sigma }\underset{\theta }{\Sigma }{p}_{0}\left(\theta \right){e}^{\alpha U\left(x,\theta \right)}$. Note that eα U(x,θ)/Z α is in general not a conditional distribution. However, equation (19) can be equivalently rewritten as
$p\left(x\right)=\underset{\theta }{\Sigma }\frac{{p}_{0}\left(\theta \right)\underset{{x}^{\text{'}}}{\Sigma }{e}^{\alpha U\left({x}^{\text{'}},\theta \right)}}{{Z}_{\alpha }}\frac{{e}^{\alpha U\left(x,\theta \right)}}{\underset{{x}^{,}}{\Sigma }{e}^{\alpha U\left({x}^{\text{'}},\theta \right)}}=\underset{\theta }{\Sigma }p\left(\theta \right)p\left(x|\theta \right),$

where we have expanded the fraction by $\underset{{x}^{\text{'}}}{\Sigma }{e}^{\alpha U\left({x}^{\text{'}},\theta \right)}$.

This last equality can also be obtained by stating the same variational problem in reverse causal order of x and θ, which is the natural statement of the Thompson sampling problem. The nested variational problem then becomes
$\begin{array}{c}arg\underset{\stackrel{~}{p}\left(x,\theta \right)}{max}\underset{\theta }{\Sigma }\stackrel{~}{p}\left(\theta \right)\left[-\frac{1}{\beta }log\frac{\stackrel{~}{p}\left(\theta \right)}{{p}_{0}\left(\theta \right)}+\underset{x}{\Sigma }\stackrel{~}{p}\left(x|\theta \right)\left[U\left(x,\theta \right)-\frac{1}{\alpha }log\frac{\stackrel{~}{p}\left(x|\theta \right)}{{p}_{0}\left(x\right)}\right]\right]\end{array}$
with the solutions
$p\left(x|\theta \right)=\frac{{p}_{0}\left(x\right){e}^{\alpha U\left(x,\theta \right)}}{\underset{{x}^{\text{'}}}{\Sigma }{p}_{0}\left({x}^{\theta }\right){e}^{\alpha U\left({x}^{\text{'}},\theta \right)}}$
(i)
and
$p\left(\theta \right)=\frac{1}{{Z}_{\beta \alpha }}{p}_{0}\left(\theta \right)exp\left(\frac{\beta }{\alpha }log\underset{x}{\Sigma }{p}_{0}\left(x\right){e}^{\alpha U\left(x,\theta \right)}\right)$
(ii)

with normalization constant ${Z}_{\beta \alpha }=\underset{\theta }{\Sigma }{p}_{0}\left(\theta \right)exp\left(\beta /\alpha \phantom{\rule{0.3em}{0ex}}log\underset{x}{\Sigma }{p}_{0}\left(x\right){e}^{\alpha U\left(x,\theta \right)}\right)$. In the limit α and β → 0, the Thompson sampling agent is determined by the solutions $p\left(x|\theta \right)=\delta \left(x-arg\underset{{x}^{\text{'}}}{max}U\left({x}^{\text{'}},\theta \right)\right)$ and p(θ)=p0(θ). Sampling an action from $p\left(x\right)=\underset{\theta }{\Sigma }p\left(\theta \right)p\left(x|\theta \right)$ is much cheaper than sampling an action from equation (18) because of the reversed causal order in θ and x, which implies that β/α→ 0 in equation (ii) instead of β/α as in equation (17).

In the case of α=β the solutions for the two different causal orders of x and θ are equivalent. Assuming again a uniform prior ${p}_{0}\left(x\right)=\mathcal{U}\left(x\right)$, we can compute the Thompson sampling agent from equation (i) and equation (ii) for α=β to be
$p\left(x\right)=\underset{\theta }{\Sigma }p\left(\theta \right)p\left(x|\theta \right)=\underset{\theta }{\Sigma }\frac{{p}_{0}\left(\theta \right)\underset{{x}^{\text{'}}}{\Sigma }{e}^{\alpha U\left({x}^{\text{'}},\theta \right)}}{\underset{{x}^{\text{'}}}{\Sigma }\underset{{\theta }^{\text{'}}}{\Sigma }{p}_{0}\left({\theta }^{\text{'}}\right){e}^{\alpha U\left({x}^{\text{'}},{\theta }^{\text{'}}\right)}}\frac{{e}^{\alpha U\left(x,\theta \right)}}{\underset{{x}^{\theta }}{\Sigma }{e}^{\alpha U\left({x}^{\text{'}},\theta \right)}},$

which is exactly equivalent to p(x) in equation (19). To sample from equation (19), we draw θ~p0(θ) and accept $x~{p}_{0}\left(x\right)=\mathcal{U}\left(x\right)$ if ueα U(x,θ)/e α T , where $u~\mathcal{U}\left[\phantom{\rule{0.3em}{0ex}}0;1\right]$.

## Authors’ Affiliations

(1)
GRASP Laboratory, Electrical and Systems Engineering Department, University of Pennsylvania, Philadelphia, PA 19104, USA
(2)
Max Planck Institute for Biological Cybernetics and Max Planck Institute for Intelligent Systems, Speemanstrasse 38, Tübingen, 72076, Germany

## References 