Erratum to: Generalized Thompson sampling for sequential decision-making and causal inference

Ortega, Pedro A; Braun, Daniel A

doi:10.1186/s40294-014-0004-x

Erratum
Open access
Published: 01 October 2014

Erratum to: Generalized Thompson sampling for sequential decision-making and causal inference

Pedro A Ortega¹ &
Daniel A Braun²

Complex Adaptive Systems Modeling volume 2, Article number: 4 (2014) Cite this article

3739 Accesses
1 Citations
Metrics details

The Original Article was published on 14 March 2014

Abstract

No abstract.

Decisions in the presence of latent variables

We correct errors in equations (14), (15) and (19) of the main text.

Equations (14) and (15)

Nature’s probability of flipping either coin does not actually depend on the agent’s prediction, so we can replace the conditional probabilities p₀(θ|x) by p₀(θ). We have then an inner variational problem:

arg max_{\tilde{p} (θ | x)} \underset{θ}{Σ} \tilde{p} (θ | x) [- \frac{1}{β} log \frac{\tilde{p} (θ | x)}{p_{0} (θ)} + U (x, θ)]

(14)

with the solution

p (θ | x) = \frac{1}{Z_{β} (x)} p_{0} (θ) exp (β U (x, θ))

(15)

and the normalization constant $Z_{β} (x) = Σ_{θ} p_{0} (θ) exp (β U (x, θ))$ and an outer variational problem as described by equation (16) in the main text. Note that deliberation renders the two variables x and θ dependent.

Equation (19)

In the case of α = β and uniform prior $p_{0} (x) = U (x)$ , equation (17) reduces to

p (x) = \underset{θ}{Σ} p_{0} (θ) \frac{e^{α U (x, θ)}}{Z_{α}},

(19)

where $Z_{α} = \underset{x}{Σ} \underset{θ}{Σ} p_{0} (θ) e^{α U (x, θ)}$ . Note that e^αU(x,θ)/Z_α is in general not a conditional distribution. However, equation (19) can be equivalently rewritten as

p (x) = \underset{θ}{Σ} \frac{p_{0} (θ) \underset{x^{'}}{Σ} e^{α U (x^{'}, θ)}}{Z_{α}} \frac{e^{α U (x, θ)}}{\underset{x^{,}}{Σ} e^{α U (x^{'}, θ)}} = \underset{θ}{Σ} p (θ) p (x | θ),

where we have expanded the fraction by $\underset{x^{'}}{Σ} e^{α U (x^{'}, θ)}$ .

This last equality can also be obtained by stating the same variational problem in reverse causal order of x and θ, which is the natural statement of the Thompson sampling problem. The nested variational problem then becomes

\begin{matrix} arg max_{\tilde{p} (x, θ)} \underset{θ}{Σ} \tilde{p} (θ) [- \frac{1}{β} log \frac{\tilde{p} (θ)}{p_{0} (θ)} + \underset{x}{Σ} \tilde{p} (x | θ) [U (x, θ) - \frac{1}{α} log \frac{\tilde{p} (x | θ)}{p_{0} (x)}]] \end{matrix}

with the solutions

p (x | θ) = \frac{p_{0} (x) e^{α U (x, θ)}}{\underset{x^{'}}{Σ} p_{0} (x^{θ}) e^{α U (x^{'}, θ)}}

(i)

and

p (θ) = \frac{1}{Z_{β α}} p_{0} (θ) exp (\frac{β}{α} log \underset{x}{Σ} p_{0} (x) e^{α U (x, θ)})

(ii)

with normalization constant $Z_{β α} = \underset{θ}{Σ} p_{0} (θ) exp (β / α log \underset{x}{Σ} p_{0} (x) e^{α U (x, θ)})$ . In the limit α → ∞ and β → 0, the Thompson sampling agent is determined by the solutions $p (x | θ) = δ (x - arg max_{x^{'}} U (x^{'}, θ))$ and p(θ)=p₀(θ). Sampling an action from $p (x) = \underset{θ}{Σ} p (θ) p (x | θ)$ is much cheaper than sampling an action from equation (18) because of the reversed causal order in θ and x, which implies that β/α→ 0 in equation (ii) instead of β/α→∞ as in equation (17).

In the case of α=β the solutions for the two different causal orders of x and θ are equivalent. Assuming again a uniform prior $p_{0} (x) = U (x)$ , we can compute the Thompson sampling agent from equation (i) and equation (ii) for α=β to be

p (x) = \underset{θ}{Σ} p (θ) p (x | θ) = \underset{θ}{Σ} \frac{p_{0} (θ) \underset{x^{'}}{Σ} e^{α U (x^{'}, θ)}}{\underset{x^{'}}{Σ} \underset{θ^{'}}{Σ} p_{0} (θ^{'}) e^{α U (x^{'}, θ^{'})}} \frac{e^{α U (x, θ)}}{\underset{x^{θ}}{Σ} e^{α U (x^{'}, θ)}},

which is exactly equivalent to p(x) in equation (19). To sample from equation (19), we draw θ~p₀(θ) and accept $x ~ p_{0} (x) = U (x)$ if u≤e^αU(x,θ)/e^αT, where $u ~ U [0; 1]$ .

References

Ortega, PA, Braun DA: Generalized Thompson sampling for sequential decision-making and causal inference. Complex Adaptive Systems Modeling 2014, 2: 2.
Article Google Scholar

Download references

Author information

Authors and Affiliations

GRASP Laboratory, Electrical and Systems Engineering Department, University of Pennsylvania, Philadelphia, PA, 19104, USA
Pedro A Ortega
Max Planck Institute for Biological Cybernetics and Max Planck Institute for Intelligent Systems, Speemanstrasse 38, Tübingen, 72076, Germany
Daniel A Braun

Authors

Pedro A Ortega
View author publications
You can also search for this author in PubMed Google Scholar
Daniel A Braun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Pedro A Ortega or Daniel A Braun.

Additional information

The online version of the original article can be found at 10.1186/2194-3206-2-2

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Ortega, P.A., Braun, D.A. Erratum to: Generalized Thompson sampling for sequential decision-making and causal inference. Complex Adapt Syst Model 2, 4 (2014). https://doi.org/10.1186/s40294-014-0004-x

Download citation

Received: 24 June 2014
Accepted: 07 August 2014
Published: 01 October 2014
DOI: https://doi.org/10.1186/s40294-014-0004-x

Erratum to: Generalized Thompson sampling for sequential decision-making and causal inference

Abstract

Decisions in the presence of latent variables

Equations (14) and (15)

Equation (19)

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Share this article