# Erratum to: Generalized Thompson sampling for sequential decision-making and causal inference

• The original article was published in Complex Adaptive Systems Modeling 2014 2:2

No abstract.

## Decisions in the presence of latent variables

We correct errors in equations (14), (15) and (19) of the main text.

### Equations (14) and (15)

Nature’s probability of flipping either coin does not actually depend on the agent’s prediction, so we can replace the conditional probabilities p0(θ|x) by p0(θ). We have then an inner variational problem:

$arg max p ~ ( θ | x ) Σ θ p ~ (θ|x) - 1 β log p ~ ( θ | x ) p 0 ( θ ) + U ( x , θ )$
(14)

with the solution

$p(θ|x)= 1 Z β ( x ) p 0 (θ)exp β U ( x , θ )$
(15)

and the normalization constant $Z β (x)= Σ θ p 0 (θ)exp β U ( x , θ )$ and an outer variational problem as described by equation (16) in the main text. Note that deliberation renders the two variables x and θ dependent.

### Equation (19)

In the case of α = β and uniform prior $p 0 (x)=U(x)$, equation (17) reduces to

$p(x)= Σ θ p 0 (θ) e α U ( x , θ ) Z α ,$
(19)

where $Z α = Σ x Σ θ p 0 (θ) e α U ( x , θ )$. Note that eαU(x,θ)/Z α is in general not a conditional distribution. However, equation (19) can be equivalently rewritten as

$p ( x ) = Σ θ p 0 ( θ ) Σ x ' e α U ( x ' , θ ) Z α e α U ( x , θ ) Σ x , e α U ( x ' , θ ) = Σ θ p ( θ ) p ( x | θ ) ,$

where we have expanded the fraction by $Σ x ' e α U ( x ' , θ )$.

This last equality can also be obtained by stating the same variational problem in reverse causal order of x and θ, which is the natural statement of the Thompson sampling problem. The nested variational problem then becomes

$arg max p ~ ( x , θ ) Σ θ p ~ ( θ ) - 1 β log p ~ ( θ ) p 0 ( θ ) + Σ x p ~ ( x | θ ) U ( x , θ ) - 1 α log p ~ ( x | θ ) p 0 ( x )$

with the solutions

$p(x|θ)= p 0 ( x ) e α U ( x , θ ) Σ x ' p 0 ( x θ ) e α U ( x ' , θ )$
(i)

and

$p(θ)= 1 Z β α p 0 (θ)exp β α log Σ x p 0 ( x ) e α U ( x , θ )$
(ii)

with normalization constant $Z β α = Σ θ p 0 (θ)exp β / α log Σ x p 0 ( x ) e α U ( x , θ )$. In the limit α and β → 0, the Thompson sampling agent is determined by the solutions $p(x|θ)=δ(x-arg max x ' U( x ' ,θ))$ and p(θ)=p0(θ). Sampling an action from $p(x)= Σ θ p(θ)p(x|θ)$ is much cheaper than sampling an action from equation (18) because of the reversed causal order in θ and x, which implies that β/α→ 0 in equation (ii) instead of β/α as in equation (17).

In the case of α=β the solutions for the two different causal orders of x and θ are equivalent. Assuming again a uniform prior $p 0 (x)=U(x)$, we can compute the Thompson sampling agent from equation (i) and equation (ii) for α=β to be

$p ( x ) = Σ θ p ( θ ) p ( x | θ ) = Σ θ p 0 ( θ ) Σ x ' e α U ( x ' , θ ) Σ x ' Σ θ ' p 0 ( θ ' ) e α U ( x ' , θ ' ) e α U ( x , θ ) Σ x θ e α U ( x ' , θ ) ,$

which is exactly equivalent to p(x) in equation (19). To sample from equation (19), we draw θ~p0(θ) and accept $x~ p 0 (x)=U(x)$ if ueαU(x,θ)/eαT, where $u~U[0;1]$.

## References

1. 1.

Ortega, PA, Braun DA: Generalized Thompson sampling for sequential decision-making and causal inference. Complex Adaptive Systems Modeling 2014, 2: 2.

## Author information

Authors

### Corresponding authors

Correspondence to Pedro A Ortega or Daniel A Braun. 