Often rewritten with hypotheses θ and evidence \(\mathbf x\)
\[ \overbrace{p(θ | \mathbf x)}\text{posterior} \propto \overbrace{p(\mathbf x | θ)}\text{likelihood} \overbrace{p(θ)}\text{prior}. \]
- The posterior is the target
- The likelihood is the model
- The prior is an assumption
Suppose \(\mathbf x = (3~1~7~5~1~2~3~4~3~2)\) comes from \(Poi(θ)\) assuming nothing about \(θ\).
In this example we used a simple distribution. In reality, the likelihood may be unavailable (e.g. Tukey’s g-k distribution1) or intractable (e.g. many convolutions).
\hfill
Avoid likelihoods! Instead use simulation with numerical and Monte-Carlo methods.
\hfill
Write a program to simulate data. Requires a source of randomness.
type Gen = MWC.GenIO
newtype Sampler ω = { runSampler :: ReaderT Gen IO ω }
deriving (Functor, Applicative, Monad)
sample :: Sampler ω -> Gen -> IO ω
sample = runReaderT . runSampler
Can re-define them straight from MWC
random :: Sampler Double
random = Sampler $ ask >>= MWC.uniform -- U(0, 1)
Or we can reuse samplers to make new distributions.
uniform :: Double -> Double -> Sampler Double
uniform a b | a <= b = (\x -> (b - a) * x + a) <$> random
bernoulli :: Double -> Sampler Bool
bernoulli p | 0 <= p && p <= 1 = (<=p) <$> random
gaussian :: Double -> Double -> Sampler Double
gaussian μ σ² = (\u1 u2 ->
μ + σ² * sqrt (-2 * log u1) * cos (2 * pi * u2))
<$> random <*> random
ABC is a likelihood-free method useable with Sampler
.
\hfill
To use ABC we need:
- A generative model \(μ : θ → \texttt{Sampler}~ω\)
- Observations \(\mathbf y : ω\)
\hfill
To approximate \(p(θ | \mathbf y)\)
we consider \(θ_0\)
and take \(\mathbf x ← μ (θ_0)\)
which we compare with \(\mathbf y\)
to apply a weight to \(θ_0\).
\hfill
To approximate the posterior, this is repeated many times for different \(θ\) using a Monte-Carlo method.
A simple Monte-Carlo method. Sample \(\mathbf x\) from the prior and see how “good” it is.
Either \(\mathbf x\) is fully accepted or not, i.e. no weights besides 0 and 1.
\hfill
Abstracted the key operations into a “handler” RSKernel
.
Now the algorithm is generally written:
rs :: RSKernel k a => Int -> k -> Sampler [a]
rs 0 _ = return []
rs n k = do
x <- propose k
a <- k `accepts` x
if a
then (x:) <$> rs (n-1) k
else rs (n-1) k
To enable ABC via rejection sampling, just need to provide a handler.
data RSABC θ ω = RSABC
{ observations :: ω
, model :: θ -> Sampler ω
, prior :: Sampler θ }
instance Eq ω => RSKernel (RSABC θ ω) θ where
propose :: RSABC θ ω -> Sampler θ
propose RSABC{..} = prior
accepts :: RSABC θ ω -> θ -> Sampler Bool
accepts RSABC{..} θ = do
x <- model θ
return $ x == observations
To increase the acceptance rate, we usually use a weaker condition, that \(|| \mathbf x - \mathbf y || \leq ε\).
RSABC θ ω = RSABC
{ distance :: ω -> ω -> Double
, ϵ :: Double
, ... }
instance RSKernel (RSABC θ ω) where
...
accepts RSABC{..} θ = do
x <- model θ
return $ x `distance` observations <= ϵ
This is only strictly necessary for continuous distributions.
We rarely use only one observation, so \(\mathbf y\) is a long vector.
\hfill
Affected by the “Curse of Dimensionality.” Results that \(\mathbf x\) and \(\mathbf y\) almost always far apart.
\hfill
Solve this with /dimension reduction methods/2 e.g. summary statistics
Ideally, summary \(S\) is “sufficient”, i.e. \(p(θ|S(\mathbf y)) = p(θ|\mathbf y)\).
\hfill
Sufficient summaries are hard to find. In practice \(S\) is “informative”.
\hfill
Often \(S\) maps raw data to e.g. mean, variance, quantiles…
An improvement on rejection sampling, by staying near accepted samples.
Since we stay in “good” regions and move out of “bad” regions, we will approximate the posterior sooner.
class MHKernel k a | k -> a where
perturb :: k -> a -> Sampler a
accepts :: k -> a -> a -> Sampler Bool
mh :: MHKernel k a => Int -> k -> a -> Sampler [a]
mh 0 _ _ = return []
mh n k last = do
proposed <- k `perturb` last
a <- accepts k last proposed
if a
then (proposed:) <$> mh (n-1) k proposed
else (last:) <$> mh (n-1) k last
data MHABC θ ω = MHABC
{ observations :: ω
, model :: θ -> Sampler ω
, priorD :: θ -> Double -- ^ density
, transition :: θ -> Sampler θ -- ^ assumed symmetrical
, distance :: ω -> ω -> Double , ϵ :: Double }
instance MHKernel (MHABC θ ω) θ where
perturb :: MHABC θ ω -> θ -> Sampler θ
perturb MHABC{..} = transition
accepts :: MHABC θ ω -> θ -> θ -> Sampler Bool
accepts MHABC{..} θ θ' = do
x <- model θ'
if x `distance` observations <= ϵ
then bernoulli $ min 1 (priorD θ' / priorD θ)
else return False
- Implemented Approximate Bayesian Computation, particularly for the Tukey g-and-k distribution.
- Learnt some Monte-Carlo methods and tried to implement them generally.
- Parallelisation4
\hfill
- Adaptive Metropolis
- Particle Filter and other algorithms
- Allow p.d.f-based distributions3
\hfill
- Use kernel density estimate with approximated posterior
- Find peaks with AD/stochastic gradient ascent
- Marjoram et al
- Umberto Picchini’s slides
- Fernhead and Prangle — Constructing Summary Statistics
- Blum et al — Comparative Review of Dimension Reduction Methods
- Drovandi et Frazier — Comparison with Full Data Methods
1 Like Gaussian distribution with skew and kurtosis.
3 Possible by Monte-Carlo methods
2 Some full-data approaches, with recent interest
4 Attempts were made… anyone?