Skip to content

Latest commit

 

History

History
372 lines (268 loc) · 9.08 KB

inference.org

File metadata and controls

372 lines (268 loc) · 9.08 KB

Inference in Haskell for Approximate Bayesian Computation

1 Approximate Bayesian Computation

1.1 Bayesian Inference

Often rewritten with hypotheses θ and evidence \(\mathbf x\)

\[ \overbrace{p(θ | \mathbf x)}\text{posterior} \propto \overbrace{p(\mathbf x | θ)}\text{likelihood} \overbrace{p(θ)}\text{prior}. \]

  • The posterior is the target
  • The likelihood is the model
  • The prior is an assumption

1.2 Example

Suppose \(\mathbf x = (3~1~7~5~1~2~3~4~3~2)\) comes from \(Poi(θ)\) assuming nothing about \(θ\).

1.2.1

1.2.2

1.3 Motivation

In this example we used a simple distribution. In reality, the likelihood may be unavailable (e.g. Tukey’s g-k distribution1) or intractable (e.g. many convolutions).

\hfill

Avoid likelihoods! Instead use simulation with numerical and Monte-Carlo methods.

\hfill

Write a program to simulate data. Requires a source of randomness.

type Gen = MWC.GenIO

newtype Sampler ω = { runSampler :: ReaderT Gen IO ω }
  deriving (Functor, Applicative, Monad)

sample :: Sampler ω -> Gen -> IO ω
sample = runReaderT . runSampler

1.4 Example Samplers

Can re-define them straight from MWC

random :: Sampler Double
random = Sampler $ ask >>= MWC.uniform -- U(0, 1)

Or we can reuse samplers to make new distributions.

uniform :: Double -> Double -> Sampler Double
uniform a b | a <= b = (\x -> (b - a) * x + a) <$> random

bernoulli :: Double -> Sampler Bool
bernoulli p | 0 <= p && p <= 1 = (<=p) <$> random

gaussian :: Double -> Double -> Sampler Double
gaussian μ σ² = (\u1 u2 ->
   μ + σ² * sqrt (-2 * log u1) * cos (2 * pi * u2))
  <$> random <*> random

1.5 Approximate Bayesian Computation

ABC is a likelihood-free method useable with Sampler.

\hfill

To use ABC we need:

  • A generative model \(μ : θ → \texttt{Sampler}~ω\)
  • Observations \(\mathbf y : ω\)

\hfill

To approximate \(p(θ | \mathbf y)\)

we consider \(θ_0\)

and take \(\mathbf x ← μ (θ_0)\)

which we compare with \(\mathbf y\)

to apply a weight to \(θ_0\).

\hfill

To approximate the posterior, this is repeated many times for different \(θ\) using a Monte-Carlo method.

2 Rejection Sampling

2.1 Rejection Sampling

A simple Monte-Carlo method. Sample \(\mathbf x\) from the prior and see how “good” it is.

Either \(\mathbf x\) is fully accepted or not, i.e. no weights besides 0 and 1.

\hfill

Abstracted the key operations into a “handler” RSKernel.

Now the algorithm is generally written:

rs :: RSKernel k a => Int -> k -> Sampler [a]
rs 0 _ = return []
rs n k = do
  x <- propose k
  a <- k `accepts` x
  if a
    then (x:) <$> rs (n-1) k
    else rs (n-1) k

2.2 Approximate Bayesian Computation

To enable ABC via rejection sampling, just need to provide a handler.

data RSABC θ ω = RSABC
  { observations :: ω
  , model :: θ -> Sampler ω
  , prior :: Sampler θ }

instance Eq ω => RSKernel (RSABC θ ω) θ where
  propose :: RSABC θ ω -> Sampler θ
  propose RSABC{..} = prior

  accepts :: RSABC θ ω -> θ -> Sampler Bool
  accepts RSABC{..} θ = do
    x <- model θ
    return $ x == observations

3 Necessary Improvements

3.1 Tolerance

To increase the acceptance rate, we usually use a weaker condition, that \(|| \mathbf x - \mathbf y || \leq ε\).

RSABC θ ω = RSABC
  { distance :: ω -> ω -> Double
  , ϵ :: Double
  , ... }

instance RSKernel (RSABC θ ω) where
  ...
  accepts RSABC{..} θ = do
    x <- model θ
    return $ x `distance` observations <= ϵ

This is only strictly necessary for continuous distributions.

3.2 Dimension Reduction

We rarely use only one observation, so \(\mathbf y\) is a long vector.

\hfill

Affected by the “Curse of Dimensionality.” Results that \(\mathbf x\) and \(\mathbf y\) almost always far apart.

\hfill

Solve this with /dimension reduction methods/2 e.g. summary statistics

3.3 Summary Statistics

Ideally, summary \(S\) is “sufficient”, i.e. \(p(θ|S(\mathbf y)) = p(θ|\mathbf y)\).

\hfill

Sufficient summaries are hard to find. In practice \(S\) is “informative”.

\hfill

Often \(S\) maps raw data to e.g. mean, variance, quantiles…

4 Metropolis-Hastings

4.1 Metropolis-Hastings

An improvement on rejection sampling, by staying near accepted samples.

Since we stay in “good” regions and move out of “bad” regions, we will approximate the posterior sooner.

class MHKernel k a | k -> a where
  perturb :: k -> a -> Sampler a
  accepts :: k -> a -> a -> Sampler Bool

mh :: MHKernel k a => Int -> k -> a -> Sampler [a]
mh 0 _ _ = return []
mh n k last = do
  proposed <- k `perturb` last
  a <- accepts k last proposed
  if a
    then (proposed:) <$> mh (n-1) k proposed
    else (last:) <$> mh (n-1) k last

4.2 Approximate Bayesian Computation

data MHABC θ ω = MHABC
  { observations :: ω
  , model :: θ -> Sampler ω
  , priorD :: θ -> Double -- ^ density
  , transition :: θ -> Sampler θ -- ^ assumed symmetrical
  , distance :: ω -> ω -> Double , ϵ :: Double }

instance MHKernel (MHABC θ ω) θ where
  perturb :: MHABC θ ω -> θ -> Sampler θ
  perturb MHABC{..} = transition

  accepts :: MHABC θ ω -> θ -> θ -> Sampler Bool
  accepts MHABC{..} θ θ' = do
    x <- model θ'
    if x `distance` observations <= ϵ
      then bernoulli $ min 1 (priorD θ' / priorD θ)
      else return False

5 Conclusion

5.1 Summary

  • Implemented Approximate Bayesian Computation, particularly for the Tukey g-and-k distribution.
  • Learnt some Monte-Carlo methods and tried to implement them generally.

5.2 Possible Developments

  • Parallelisation4

\hfill

  • Adaptive Metropolis
  • Particle Filter and other algorithms
  • Allow p.d.f-based distributions3

\hfill

  • Use kernel density estimate with approximated posterior
  • Find peaks with AD/stochastic gradient ascent

6 Further Reading

6.1 Further Reading

7 Footnotes

1 Like Gaussian distribution with skew and kurtosis.

3 Possible by Monte-Carlo methods

2 Some full-data approaches, with recent interest

4 Attempts were made… anyone?