Variational inference loss stuck at high value with poor posterior predictive check #658

katesjef · 2025-02-13T18:38:48Z

katesjef
Feb 13, 2025

Hi all!

As experienced in several previous posts (i.e., #527, #323), I have quite a complex hierarchical model and have had trouble achieving convergence for non-decision time (the distribution runs up against a ceiling). One of the workarounds I am trying is to use variational inference instead.

When I run VI, even for long periods of time, I see that the loss is stuck at a very high value (~200k). Running a posterior predictive check also reveals opposite behavior for observed and predicted RT.

My questions are: Is it worth it to continue pursuing VI in this case? If so, are there ways to improve optimization and/or efficiency? Or might there be better alternatives to this approach?

Thank you for this great package!

HSSM v 0.2.4

param_t = {
'name' : 't',
'formula' : 't ~ 1',
'prior' : {
'Intercept' : {"name": "Gamma", "mu": 0.15, 'sigma': 0.13, 'initval':0.01},
},
'bounds' : (0, 2),

When using NUTS:

And a posterior predictive plot of:

When using VI:

(from the tutorial)
with model_6.pymc_model:
advi = pm.FullRankADVI()

start = model_6.pymc_model.initial_point()
vars_dict = {var.name: var for var in model_6.pymc_model.continuous_value_vars}
x0 = DictToArrayBijection.map(
{var_name: value for var_name, value in start.items() if var_name in vars_dict}
)

tracker = pm.callbacks.Tracker(
mean=lambda: DictToArrayBijection.rmap(
RaveledVars(advi.approx.mean.eval(), x0.point_map_info), start
),
std=lambda: DictToArrayBijection.rmap(
RaveledVars(advi.approx.std.eval(), x0.point_map_info), start
),
)

approx = advi.fit(n=1000000, callbacks=[tracker])
vi_posterior_samples = approx.sample(1000)

A "t" posterior of:

And the posterior predictive plot:

frankmj · 2025-02-13T18:52:18Z

frankmj
Feb 13, 2025
Maintainer

The VI loss looks like it hasn't stabilized. Can you try reducing the learning rate ? (e.g. to .01 or .001) it may be just adapting too much between iterations

…

On Thu, Feb 13, 2025 at 1:39 PM katesjef ***@***.***> wrote: Hi all! As experienced in several previous posts (i.e., #527 <#527>, #323 <#323>), I have quite a complex hierarchical model and have had trouble achieving convergence for non-decision time (the distribution runs up against a ceiling). One of the workarounds I am trying is to use variational inference instead. When I run VI, even for long periods of time, I see that the loss is stuck at a very high value (~200k). Running a posterior predictive check also reveals opposite behavior for observed and predicted RT. My questions are: Is it worth it to continue pursuing VI in this case? If so, are there ways to improve optimization and/or efficiency? Or might there be better alternatives to this approach? Thank you for this great package! HSSM v 0.2.4 param_t = { 'name' : 't', 'formula' : 't ~ 1', 'prior' : { 'Intercept' : {"name": "Gamma", "mu": 0.15, 'sigma': 0.13, 'initval':0.01}, }, 'bounds' : (0, 2), *When using NUTS*: NUTS_t_intercept_trace.png (view on web) <https://github.com/user-attachments/assets/dce830ac-f94f-4e6d-96f5-c65a056f837a> NUTS_t_intercept_summary.png (view on web) <https://github.com/user-attachments/assets/a5f2de91-f034-4ba9-a119-a0d4f82b6aa4> And a posterior predictive plot of: NUTS_ppc.png (view on web) <https://github.com/user-attachments/assets/bb53acbe-8049-42e3-b76f-0e3af4966042> *When using VI*: (from the tutorial) with model_6.pymc_model: advi = pm.FullRankADVI() start = model_6.pymc_model.initial_point() vars_dict = {var.name: var for var in model_6.pymc_model.continuous_value_vars} x0 = DictToArrayBijection.map( {var_name: value for var_name, value in start.items() if var_name in vars_dict} ) tracker = pm.callbacks.Tracker( mean=lambda: DictToArrayBijection.rmap( RaveledVars(advi.approx.mean.eval(), x0.point_map_info), start ), std=lambda: DictToArrayBijection.rmap( RaveledVars(advi.approx.std.eval(), x0.point_map_info), start ), ) approx = advi.fit(n=1000000, callbacks=[tracker]) vi_posterior_samples = approx.sample(1000) VI_means.png (view on web) <https://github.com/user-attachments/assets/8b4a4506-f83a-4add-8594-2e1555c670a8> VI_loss.png (view on web) <https://github.com/user-attachments/assets/a6ebb760-4c40-4a28-ab5c-e8c180ef9dc0> A "t" posterior of: VI_t_intercept_posterior.png (view on web) <https://github.com/user-attachments/assets/b9a8bcaf-7534-422e-9304-03e32bd40f61> And the posterior predictive plot: VI_ppc.png (view on web) <https://github.com/user-attachments/assets/5df16d2e-cbca-4319-9c65-095108600b47> — Reply to this email directly, view it on GitHub <#658>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAG7TFCRNRXLKJWJMIGTXL32PTRE5AVCNFSM6AAAAABXC4CT2KVHI2DSMVQWIX3LMV43ERDJONRXK43TNFXW4OZXHE3DEMZUGI> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

1 reply

katesjef Feb 13, 2025
Author

I will try this and provide an update - thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variational inference loss stuck at high value with poor posterior predictive check #658

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Variational inference loss stuck at high value with poor posterior predictive check #658

katesjef Feb 13, 2025

Replies: 1 comment · 1 reply

frankmj Feb 13, 2025 Maintainer

katesjef Feb 13, 2025 Author

katesjef
Feb 13, 2025

Replies: 1 comment 1 reply

frankmj
Feb 13, 2025
Maintainer

katesjef Feb 13, 2025
Author