Open Access

# Erratum to: Generalized Thompson sampling for sequential decision-making and causal inference

DOI: 10.1186/s40294-014-0004-x

Accepted: 7 August 2014

Published: 1 October 2014

The original article was published in Complex Adaptive Systems Modeling 2014 2:2

No abstract.

## Decisions in the presence of latent variables

We correct errors in equations (14), (15) and (19) of the main text.

### Equations (14) and (15)

Nature’s probability of flipping either coin does not actually depend on the agent’s prediction, so we can replace the conditional probabilities p0(θ|x) by p0(θ). We have then an inner variational problem:
$arg\underset{\stackrel{~}{p}\left(\theta |x\right)}{max}\underset{\theta }{\Sigma }\stackrel{~}{p}\left(\theta |x\right)\left[-\frac{1}{\beta }log\frac{\stackrel{~}{p}\left(\theta |x\right)}{{p}_{0}\left(\theta \right)}+U\left(x,\theta \right)\right]$
(14)
with the solution
$p\left(\theta |x\right)=\frac{1}{{Z}_{\beta }\left(x\right)}{p}_{0}\left(\theta \right)exp\left(\beta U\left(x,\theta \right)\right)$
(15)

and the normalization constant ${Z}_{\beta }\left(x\right)={\Sigma }_{\theta }{p}_{0}\left(\theta \right)exp\left(\beta U\left(x,\theta \right)\right)$ and an outer variational problem as described by equation (16) in the main text. Note that deliberation renders the two variables x and θ dependent.

### Equation (19)

In the case of α = β and uniform prior ${p}_{0}\left(x\right)=\mathcal{U}\left(x\right)$, equation (17) reduces to
$p\left(x\right)=\underset{\theta }{\Sigma }{p}_{0}\left(\theta \right)\frac{{e}^{\alpha U\left(x,\theta \right)}}{{Z}_{\alpha }},$
(19)
where ${Z}_{\alpha }=\underset{x}{\Sigma }\underset{\theta }{\Sigma }{p}_{0}\left(\theta \right){e}^{\alpha U\left(x,\theta \right)}$. Note that eα U(x,θ)/Z α is in general not a conditional distribution. However, equation (19) can be equivalently rewritten as
$p\left(x\right)=\underset{\theta }{\Sigma }\frac{{p}_{0}\left(\theta \right)\underset{{x}^{\text{'}}}{\Sigma }{e}^{\alpha U\left({x}^{\text{'}},\theta \right)}}{{Z}_{\alpha }}\frac{{e}^{\alpha U\left(x,\theta \right)}}{\underset{{x}^{,}}{\Sigma }{e}^{\alpha U\left({x}^{\text{'}},\theta \right)}}=\underset{\theta }{\Sigma }p\left(\theta \right)p\left(x|\theta \right),$

where we have expanded the fraction by $\underset{{x}^{\text{'}}}{\Sigma }{e}^{\alpha U\left({x}^{\text{'}},\theta \right)}$.

This last equality can also be obtained by stating the same variational problem in reverse causal order of x and θ, which is the natural statement of the Thompson sampling problem. The nested variational problem then becomes
$\begin{array}{c}arg\underset{\stackrel{~}{p}\left(x,\theta \right)}{max}\underset{\theta }{\Sigma }\stackrel{~}{p}\left(\theta \right)\left[-\frac{1}{\beta }log\frac{\stackrel{~}{p}\left(\theta \right)}{{p}_{0}\left(\theta \right)}+\underset{x}{\Sigma }\stackrel{~}{p}\left(x|\theta \right)\left[U\left(x,\theta \right)-\frac{1}{\alpha }log\frac{\stackrel{~}{p}\left(x|\theta \right)}{{p}_{0}\left(x\right)}\right]\right]\end{array}$
with the solutions
$p\left(x|\theta \right)=\frac{{p}_{0}\left(x\right){e}^{\alpha U\left(x,\theta \right)}}{\underset{{x}^{\text{'}}}{\Sigma }{p}_{0}\left({x}^{\theta }\right){e}^{\alpha U\left({x}^{\text{'}},\theta \right)}}$
(i)
and
$p\left(\theta \right)=\frac{1}{{Z}_{\beta \alpha }}{p}_{0}\left(\theta \right)exp\left(\frac{\beta }{\alpha }log\underset{x}{\Sigma }{p}_{0}\left(x\right){e}^{\alpha U\left(x,\theta \right)}\right)$
(ii)

with normalization constant ${Z}_{\beta \alpha }=\underset{\theta }{\Sigma }{p}_{0}\left(\theta \right)exp\left(\beta /\alpha \phantom{\rule{0.3em}{0ex}}log\underset{x}{\Sigma }{p}_{0}\left(x\right){e}^{\alpha U\left(x,\theta \right)}\right)$. In the limit α and β → 0, the Thompson sampling agent is determined by the solutions $p\left(x|\theta \right)=\delta \left(x-arg\underset{{x}^{\text{'}}}{max}U\left({x}^{\text{'}},\theta \right)\right)$ and p(θ)=p0(θ). Sampling an action from $p\left(x\right)=\underset{\theta }{\Sigma }p\left(\theta \right)p\left(x|\theta \right)$ is much cheaper than sampling an action from equation (18) because of the reversed causal order in θ and x, which implies that β/α→ 0 in equation (ii) instead of β/α as in equation (17).

In the case of α=β the solutions for the two different causal orders of x and θ are equivalent. Assuming again a uniform prior ${p}_{0}\left(x\right)=\mathcal{U}\left(x\right)$, we can compute the Thompson sampling agent from equation (i) and equation (ii) for α=β to be
$p\left(x\right)=\underset{\theta }{\Sigma }p\left(\theta \right)p\left(x|\theta \right)=\underset{\theta }{\Sigma }\frac{{p}_{0}\left(\theta \right)\underset{{x}^{\text{'}}}{\Sigma }{e}^{\alpha U\left({x}^{\text{'}},\theta \right)}}{\underset{{x}^{\text{'}}}{\Sigma }\underset{{\theta }^{\text{'}}}{\Sigma }{p}_{0}\left({\theta }^{\text{'}}\right){e}^{\alpha U\left({x}^{\text{'}},{\theta }^{\text{'}}\right)}}\frac{{e}^{\alpha U\left(x,\theta \right)}}{\underset{{x}^{\theta }}{\Sigma }{e}^{\alpha U\left({x}^{\text{'}},\theta \right)}},$

which is exactly equivalent to p(x) in equation (19). To sample from equation (19), we draw θ~p0(θ) and accept $x~{p}_{0}\left(x\right)=\mathcal{U}\left(x\right)$ if ueα U(x,θ)/e α T , where $u~\mathcal{U}\left[\phantom{\rule{0.3em}{0ex}}0;1\right]$.

## Authors’ Affiliations

(1)
GRASP Laboratory, Electrical and Systems Engineering Department, University of Pennsylvania
(2)
Max Planck Institute for Biological Cybernetics and Max Planck Institute for Intelligent Systems

## References

1. Ortega, PA, Braun DA: Generalized Thompson sampling for sequential decision-making and causal inference. Complex Adaptive Systems Modeling 2014, 2: 2.