# Erratum to: Generalized Thompson sampling for sequential decision-making and causal inference

The Original Article was published on 14 March 2014

No abstract.

## Decisions in the presence of latent variables

We correct errors in equations (14), (15) and (19) of the main text.

### Equations (14) and (15)

Natureâ€™s probability of flipping either coin does not actually depend on the agentâ€™s prediction, so we can replace the conditional probabilities p0(Î¸|x) by p0(Î¸). We have then an inner variational problem:

$arg\underset{\stackrel{~}{p}\left(\mathrm{Î¸}|x\right)}{max}\underset{\mathrm{Î¸}}{Î£}\stackrel{~}{p}\left(\mathrm{Î¸}|x\right)\left[-\frac{1}{\mathrm{Î²}}log\frac{\stackrel{~}{p}\left(\mathrm{Î¸}|x\right)}{{p}_{0}\left(\mathrm{Î¸}\right)}+U\left(x,\mathrm{Î¸}\right)\right]$
(14)

with the solution

$p\left(\mathrm{Î¸}|x\right)=\frac{1}{{Z}_{\mathrm{Î²}}\left(x\right)}{p}_{0}\left(\mathrm{Î¸}\right)exp\left(\mathrm{Î²}U\left(x,\mathrm{Î¸}\right)\right)$
(15)

and the normalization constant ${Z}_{\mathrm{Î²}}\left(x\right)={Î£}_{\mathrm{Î¸}}{p}_{0}\left(\mathrm{Î¸}\right)exp\left(Î²U\left(x,\mathrm{Î¸}\right)\right)$ and an outer variational problem as described by equation (16) in the main text. Note that deliberation renders the two variables x and Î¸ dependent.

### Equation (19)

In the case of Î± = Î² and uniform prior ${p}_{0}\left(x\right)=\mathcal{U}\left(x\right)$, equation (17) reduces to

$p\left(x\right)=\underset{\mathrm{Î¸}}{Î£}{p}_{0}\left(\mathrm{Î¸}\right)\frac{{e}^{Î±U\left(x,\mathrm{Î¸}\right)}}{{Z}_{\mathrm{Î±}}},$
(19)

where ${Z}_{\mathrm{Î±}}=\underset{x}{Î£}\underset{\mathrm{Î¸}}{Î£}{p}_{0}\left(\mathrm{Î¸}\right){e}^{\mathrm{Î±}U\left(x,\mathrm{Î¸}\right)}$. Note that eÎ±U(x,Î¸)/Z Î± is in general not a conditional distribution. However, equation (19) can be equivalently rewritten as

$p\left(x\right)=\underset{\mathrm{Î¸}}{Î£}\frac{{p}_{0}\left(\mathrm{Î¸}\right)\underset{{x}^{\text{'}}}{Î£}{e}^{\mathrm{Î±}U\left({x}^{\text{'}},\mathrm{Î¸}\right)}}{{Z}_{\mathrm{Î±}}}\frac{{e}^{\mathrm{Î±}U\left(x,\mathrm{Î¸}\right)}}{\underset{{x}^{,}}{Î£}{e}^{\mathrm{Î±}U\left({x}^{\text{'}},\mathrm{Î¸}\right)}}=\underset{\mathrm{Î¸}}{Î£}p\left(\mathrm{Î¸}\right)p\left(x|\mathrm{Î¸}\right),$

where we have expanded the fraction by $\underset{{x}^{\text{'}}}{Î£}{e}^{\mathrm{Î±}U\left({x}^{\text{'}},\mathrm{Î¸}\right)}$.

This last equality can also be obtained by stating the same variational problem in reverse causal order of x and Î¸, which is the natural statement of the Thompson sampling problem. The nested variational problem then becomes

$\begin{array}{c}arg\underset{\stackrel{~}{p}\left(x,\mathrm{Î¸}\right)}{max}\underset{\mathrm{Î¸}}{Î£}\stackrel{~}{p}\left(\mathrm{Î¸}\right)\left[-\frac{1}{\mathrm{Î²}}log\frac{\stackrel{~}{p}\left(\mathrm{Î¸}\right)}{{p}_{0}\left(\mathrm{Î¸}\right)}+\underset{x}{Î£}\stackrel{~}{p}\left(x|\mathrm{Î¸}\right)\left[U\left(x,\mathrm{Î¸}\right)-\frac{1}{\mathrm{Î±}}log\frac{\stackrel{~}{p}\left(x|\mathrm{Î¸}\right)}{{p}_{0}\left(x\right)}\right]\right]\end{array}$

with the solutions

$p\left(x|\mathrm{Î¸}\right)=\frac{{p}_{0}\left(x\right){e}^{\mathrm{Î±}U\left(x,\mathrm{Î¸}\right)}}{\underset{{x}^{\text{'}}}{Î£}{p}_{0}\left({x}^{Î¸}\right){e}^{\mathrm{Î±}U\left({x}^{\text{'}},\mathrm{Î¸}\right)}}$
(i)

and

$p\left(\mathrm{Î¸}\right)=\frac{1}{{Z}_{\mathrm{Î²}\mathrm{Î±}}}{p}_{0}\left(\mathrm{Î¸}\right)exp\left(\frac{\mathrm{Î²}}{\mathrm{Î±}}log\underset{x}{Î£}{p}_{0}\left(x\right){e}^{\mathrm{Î±}U\left(x,\mathrm{Î¸}\right)}\right)$
(ii)

with normalization constant ${Z}_{\mathrm{Î²}\mathrm{Î±}}=\underset{\mathrm{Î¸}}{Î£}{p}_{0}\left(\mathrm{Î¸}\right)exp\left(\mathrm{Î²}/\mathrm{Î±}\phantom{\rule{0.3em}{0ex}}log\underset{x}{Î£}{p}_{0}\left(x\right){e}^{\mathrm{Î±}U\left(x,\mathrm{Î¸}\right)}\right)$. In the limit Î± â†’ âˆž and Î² â†’ 0, the Thompson sampling agent is determined by the solutions $p\left(x|\mathrm{Î¸}\right)=\mathrm{Î´}\left(x-arg\underset{{x}^{\text{'}}}{max}U\left({x}^{\text{'}},\mathrm{Î¸}\right)\right)$ and p(Î¸)=p0(Î¸). Sampling an action from $p\left(x\right)=\underset{\mathrm{Î¸}}{Î£}p\left(\mathrm{Î¸}\right)p\left(x|\mathrm{Î¸}\right)$ is much cheaper than sampling an action from equation (18) because of the reversed causal order in Î¸ and x, which implies that Î²/Î±â†’ 0 in equation (ii) instead of Î²/Î±â†’âˆž as in equation (17).

In the case of Î±=Î² the solutions for the two different causal orders of x and Î¸ are equivalent. Assuming again a uniform prior ${p}_{0}\left(x\right)=\mathcal{U}\left(x\right)$, we can compute the Thompson sampling agent from equation (i) and equation (ii) for Î±=Î² to be

$p\left(x\right)=\underset{\mathrm{Î¸}}{Î£}p\left(\mathrm{Î¸}\right)p\left(x|\mathrm{Î¸}\right)=\underset{\mathrm{Î¸}}{Î£}\frac{{p}_{0}\left(\mathrm{Î¸}\right)\underset{{x}^{\text{'}}}{Î£}{e}^{\mathrm{Î±}U\left({x}^{\text{'}},\mathrm{Î¸}\right)}}{\underset{{x}^{\text{'}}}{Î£}\underset{{\mathrm{Î¸}}^{\text{'}}}{Î£}{p}_{0}\left({\mathrm{Î¸}}^{\text{'}}\right){e}^{\mathrm{Î±}U\left({x}^{\text{'}},{\mathrm{Î¸}}^{\text{'}}\right)}}\frac{{e}^{\mathrm{Î±}U\left(x,\mathrm{Î¸}\right)}}{\underset{{x}^{Î¸}}{Î£}{e}^{\mathrm{Î±}U\left({x}^{\text{'}},\mathrm{Î¸}\right)}},$

which is exactly equivalent to p(x) in equation (19). To sample from equation (19), we draw Î¸~p0(Î¸) and accept $x~{p}_{0}\left(x\right)=\mathcal{U}\left(x\right)$ if uâ‰¤eÎ±U(x,Î¸)/eÎ±T, where $u~\mathcal{U}\left[\phantom{\rule{0.3em}{0ex}}0;1\right]$.

## References

1. Ortega, PA, Braun DA: Generalized Thompson sampling for sequential decision-making and causal inference. Complex Adaptive Systems Modeling 2014, 2: 2.

## Author information

Authors

### Corresponding authors

Correspondence to Pedro A Ortega or Daniel A Braun.

The online version of the original article can be found at 10.1186/2194-3206-2-2

## Rights and permissions

Reprints and permissions