Inference in Haskell for Approximate Bayesian Computation

1 Approximate Bayesian Computation

1.1 Bayesian Inference

Often rewritten with hypotheses θ and evidence \(\mathbf x\)

\[ \overbrace{p(θ | \mathbf x)}^{\text{posterior}} \propto \overbrace{p(\mathbf x | θ)}^{\text{likelihood}} \overbrace{p(θ)}^\text{prior}. \]

The posterior is the target
The likelihood is the model
The prior is an assumption

1.2 Example

Suppose \(\mathbf x = (3~1~7~5~1~2~3~4~3~2)\) comes from \(Poi(θ)\) assuming nothing about \(θ\).

1.2.1

1.2.2

1.3 Motivation

In this example we used a simple distribution. In reality, the likelihood may be unavailable (e.g. Tukey’s g-k distribution¹) or intractable (e.g. many convolutions).

\hfill

Avoid likelihoods! Instead use simulation with numerical and Monte-Carlo methods.

\hfill

Write a program to simulate data. Requires a source of randomness.

type Gen = MWC.GenIO

newtype Sampler ω = { runSampler :: ReaderT Gen IO ω }
  deriving (Functor, Applicative, Monad)

sample :: Sampler ω -> Gen -> IO ω
sample = runReaderT . runSampler

1.4 Example Samplers

Can re-define them straight from MWC

random :: Sampler Double
random = Sampler $ ask >>= MWC.uniform -- U(0, 1)

Or we can reuse samplers to make new distributions.

uniform :: Double -> Double -> Sampler Double
uniform a b | a <= b = (\x -> (b - a) * x + a) <$> random

bernoulli :: Double -> Sampler Bool
bernoulli p | 0 <= p && p <= 1 = (<=p) <$> random

gaussian :: Double -> Double -> Sampler Double
gaussian μ σ² = (\u1 u2 ->
   μ + σ² * sqrt (-2 * log u1) * cos (2 * pi * u2))
  <$> random <*> random

1.5 Approximate Bayesian Computation

ABC is a likelihood-free method useable with Sampler.

\hfill

To use ABC we need:

A generative model \(μ : θ → \texttt{Sampler}~ω\)

Observations \(\mathbf y : ω\)

\hfill

To approximate \(p(θ | \mathbf y)\)

we consider \(θ_0\)

and take \(\mathbf x ← μ (θ_0)\)

which we compare with \(\mathbf y\)

to apply a weight to \(θ_0\).

\hfill

To approximate the posterior, this is repeated many times for different \(θ\) using a Monte-Carlo method.

2 Rejection Sampling

2.1 Rejection Sampling

A simple Monte-Carlo method. Sample \(\mathbf x\) from the prior and see how “good” it is.

Either \(\mathbf x\) is fully accepted or not, i.e. no weights besides 0 and 1.

\hfill

Abstracted the key operations into a “handler” RSKernel.

Now the algorithm is generally written:

rs :: RSKernel k a => Int -> k -> Sampler [a]
rs 0 _ = return []
rs n k = do
  x <- propose k
  a <- k `accepts` x
  if a
    then (x:) <$> rs (n-1) k
    else rs (n-1) k

2.2 Approximate Bayesian Computation

To enable ABC via rejection sampling, just need to provide a handler.

data RSABC θ ω = RSABC
  { observations :: ω
  , model :: θ -> Sampler ω
  , prior :: Sampler θ }

instance Eq ω => RSKernel (RSABC θ ω) θ where
  propose :: RSABC θ ω -> Sampler θ
  propose RSABC{..} = prior

  accepts :: RSABC θ ω -> θ -> Sampler Bool
  accepts RSABC{..} θ = do
    x <- model θ
    return $ x == observations

3 Necessary Improvements

3.1 Tolerance

To increase the acceptance rate, we usually use a weaker condition, that \(|| \mathbf x - \mathbf y || \leq ε\).

RSABC θ ω = RSABC
  { distance :: ω -> ω -> Double
  , ϵ :: Double
  , ... }

instance RSKernel (RSABC θ ω) where
  ...
  accepts RSABC{..} θ = do
    x <- model θ
    return $ x `distance` observations <= ϵ

This is only strictly necessary for continuous distributions.

3.2 Dimension Reduction

We rarely use only one observation, so \(\mathbf y\) is a long vector.

\hfill

Affected by the “Curse of Dimensionality.” Results that \(\mathbf x\) and \(\mathbf y\) almost always far apart.

\hfill

Solve this with /dimension reduction methods/² e.g. summary statistics

3.3 Summary Statistics

Ideally, summary \(S\) is “sufficient”, i.e. \(p(θ|S(\mathbf y)) = p(θ|\mathbf y)\).

\hfill

Sufficient summaries are hard to find. In practice \(S\) is “informative”.

\hfill

Often \(S\) maps raw data to e.g. mean, variance, quantiles…

4 Metropolis-Hastings

4.1 Metropolis-Hastings

An improvement on rejection sampling, by staying near accepted samples.

Since we stay in “good” regions and move out of “bad” regions, we will approximate the posterior sooner.

class MHKernel k a | k -> a where
  perturb :: k -> a -> Sampler a
  accepts :: k -> a -> a -> Sampler Bool

mh :: MHKernel k a => Int -> k -> a -> Sampler [a]
mh 0 _ _ = return []
mh n k last = do
  proposed <- k `perturb` last
  a <- accepts k last proposed
  if a
    then (proposed:) <$> mh (n-1) k proposed
    else (last:) <$> mh (n-1) k last

4.2 Approximate Bayesian Computation

data MHABC θ ω = MHABC
  { observations :: ω
  , model :: θ -> Sampler ω
  , priorD :: θ -> Double -- ^ density
  , transition :: θ -> Sampler θ -- ^ assumed symmetrical
  , distance :: ω -> ω -> Double , ϵ :: Double }

instance MHKernel (MHABC θ ω) θ where
  perturb :: MHABC θ ω -> θ -> Sampler θ
  perturb MHABC{..} = transition

  accepts :: MHABC θ ω -> θ -> θ -> Sampler Bool
  accepts MHABC{..} θ θ' = do
    x <- model θ'
    if x `distance` observations <= ϵ
      then bernoulli $ min 1 (priorD θ' / priorD θ)
      else return False

5 Conclusion

5.1 Summary

Implemented Approximate Bayesian Computation, particularly for the Tukey g-and-k distribution.
Learnt some Monte-Carlo methods and tried to implement them generally.

5.2 Possible Developments

Parallelisation⁴

\hfill

Adaptive Metropolis
Particle Filter and other algorithms
Allow p.d.f-based distributions³

\hfill

Use kernel density estimate with approximated posterior
Find peaks with AD/stochastic gradient ascent

6 Further Reading

6.1 Further Reading

Marjoram et al
Umberto Picchini’s slides
Fernhead and Prangle — Constructing Summary Statistics
Blum et al — Comparative Review of Dimension Reduction Methods
Drovandi et Frazier — Comparison with Full Data Methods

7 Footnotes

¹ Like Gaussian distribution with skew and kurtosis.

³ Possible by Monte-Carlo methods

² Some full-data approaches, with recent interest

⁴ Attempts were made… anyone?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inference.org

inference.org

Inference in Haskell for Approximate Bayesian Computation

1 Approximate Bayesian Computation

1.1 Bayesian Inference

1.2 Example

1.2.1

1.2.2

1.3 Motivation

1.4 Example Samplers

1.5 Approximate Bayesian Computation

2 Rejection Sampling

2.1 Rejection Sampling

2.2 Approximate Bayesian Computation

3 Necessary Improvements

3.1 Tolerance

3.2 Dimension Reduction

3.3 Summary Statistics

4 Metropolis-Hastings

4.1 Metropolis-Hastings

4.2 Approximate Bayesian Computation

5 Conclusion

5.1 Summary

5.2 Possible Developments

6 Further Reading

6.1 Further Reading

7 Footnotes

Files

inference.org

Latest commit

History

inference.org

File metadata and controls

Inference in Haskell for Approximate Bayesian Computation

1 Approximate Bayesian Computation

1.1 Bayesian Inference

1.2 Example

1.2.1

1.2.2

1.3 Motivation

1.4 Example Samplers

1.5 Approximate Bayesian Computation

2 Rejection Sampling

2.1 Rejection Sampling

2.2 Approximate Bayesian Computation

3 Necessary Improvements

3.1 Tolerance

3.2 Dimension Reduction

3.3 Summary Statistics

4 Metropolis-Hastings

4.1 Metropolis-Hastings

4.2 Approximate Bayesian Computation

5 Conclusion

5.1 Summary

5.2 Possible Developments

6 Further Reading

6.1 Further Reading

7 Footnotes