Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add Levy distribution #317

Merged
merged 3 commits into from
Jan 30, 2025
Merged

Conversation

Qazalbash
Copy link
Contributor

This PR adds the implementation of Levy distribution. There is some numerical error in checking PDF integrate to be equal to cdf. I revisited the implementation several times and found no ambiguity. Therefore, I am pushing to seek some guidance.

@YeungOnion
Copy link
Contributor

Hmm, while our check for consistency being an integrator is likely susceptible to the usual nuance of general forward integrators.

The error message, below, indicates that two steps into the integration it's already off by 20%. Since it checks at each step this is pretty significant.

failures:

---- distribution::levy::tests::test_continuous stdout ----
Integral of pdf doesn't equal cdf!
Integration from 1 by 0.003182932192553111 to 1.0063658643851063 = 0.006119699097797403
cdf = 0.005069737329462489

I'm not certain this is the case, but the implementations appear correct to me as well.

Options I expect could be

  • inappropriate step size for that part of the function, which I don't think is the case considering it's not terribly steep near the mean
  • the erfc implementation isn't sufficiently accurate, but we use that elsewhere, so we would have expected to see it elsewhere
  • my first thought, the integrator is insufficient. The current one integrates a linear approximation to the function, and perhaps we need better (but again the function isn't very steep here, so I'm unsure this is the case).

We may need to update the check for continuous distributions to be more robust integration or rely on other constraints. I can open this part as a different issue.

@Qazalbash
Copy link
Contributor Author

Hi, I am also contributing a little bit to NumPyro, a Python-based PPL. Their test suites are big and quite good. Looking at them may help.

@YeungOnion
Copy link
Contributor

YeungOnion commented Jan 19, 2025

That's a very cool project, along with JAX. I think I may have seen JAX before and forgot about it.

I took a pass through by skimming, searching, and with an LLM. Couldn't find one in that test suite.


I've also been thinking, perhaps this is a little unnecessary: testing values for the function will isolate precision effects to and precision checking (done by approx).

Perhaps if we had a default impl of PDF for a CDF then this may have been a stronger requirement. But instead, an alternative sanity check might be going the other way and verifying a finite difference in the derivative over some "well-chosen" interval.

@Qazalbash would you be willing to write an internal function analogous to the integrator for this and use that test instead? You could use it standalone in the test and for later development we could see if it works replacing the integrator within internal::check_continuous_distribution

@Qazalbash
Copy link
Contributor Author

Perhaps if we had a default impl of PDF for a CDF then this may have been a stronger requirement. But instead, an alternative sanity check might be going the other way and verifying a finite difference in the derivative over some "well-chosen" interval.

I am in favor of checking whether the gradient of a CDF equals a PDF. Is there any good autograd we can use?

Qazalbash would you be willing to write an internal function analogous to the integrator for this and use that test instead? You could use it standalone in the test and for later development we could see if it works replacing the integrator within internal::check_continuous_distribution

I will experiment with some, Just to let you know it will not be soon.

@Qazalbash
Copy link
Contributor Author

@YeungOnion check #320 for derivatives of cdf is equal to pdf.

Copy link

codecov bot commented Jan 29, 2025

Codecov Report

Attention: Patch coverage is 97.37991% with 12 lines in your changes missing coverage. Please review.

Project coverage is 94.40%. Comparing base (71e010b) to head (d426d4b).
Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
src/distribution/levy.rs 97.37% 12 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #317      +/-   ##
==========================================
+ Coverage   94.27%   94.40%   +0.13%     
==========================================
  Files          58       59       +1     
  Lines       12974    13432     +458     
==========================================
+ Hits        12231    12681     +450     
- Misses        743      751       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Qazalbash
Copy link
Contributor Author

On Windows, there is a difference at I guess, the 12th decimal place, should we ignore it?

@YeungOnion
Copy link
Contributor

Was harder to find than I expected, but the toolchain is using MSVC per this workflow
so it's targeting a different math library on windows. In some sense, it's nice to know that things aren't exactly the same. If we switched to using gcc on windows, and there's still a difference that would be really useful to know.

I'm open to

  • windows workflows for both MSVC and GCC, and then we can modify the MSVC toolchain to run last of the 4 OS's and not fail fast, so if anyone wants to use MSVC then they'll be able to see how different things are.
  • or excluding MSVC from scope and making it clear on the README. Besides, people should be running unit tests on their machines.

I think I like the second idea better, but I want to know if anyone is using Rust with MSVC. nalgebra has it as a supported platform on i686, but on cursory glance, couldn't find where they test that.


Aside: It's honestly a little surprising that this is the first time the difference has been large enough to catch. The only clear distinctions appears to be the float power, since add, divide, and multiply are well defined by IEEE, but perhaps one is using fused multiply-add when not explicit and the other is not?

@YeungOnion
Copy link
Contributor

What I can say is that this is not to do with a correct implementation on the Levy distribution. Also, can you tell me how you generated test cases? I've been wondering about people's different approaches.

@Qazalbash
Copy link
Contributor Author

Qazalbash commented Jan 29, 2025

I think I like the second idea better, but I want to know if anyone is using Rust with MSVC. nalgebra has it as a supported platform on i686, but on cursory glance, couldn't find where they test that.

I also think this is right!

What I can say is that this is not to do with a correct implementation on the Levy distribution. Also, can you tell me how you generated test cases? I've been wondering about people's different approaches.

They are mathematically correct, I manually computed the equations in a separate Rust program.

@YeungOnion
Copy link
Contributor

Oh that's good! I know we've got a fair bit against scipy. What's the other library or libraries you use?

@YeungOnion YeungOnion merged commit dd45aed into statrs-dev:master Jan 30, 2025
9 of 10 checks passed
@Qazalbash Qazalbash deleted the levy-dist branch January 30, 2025 05:02
@Qazalbash
Copy link
Contributor Author

Oh that's good! I know we've got a fair bit against scipy. What's the other library or libraries you use?

For my research work, I predominantly use the JAX python framework. I also make open-source contributions to that ecosystem. As I mentioned before, I use numpyro for stats and writing a lightweight one for my personal research with almost similar APIs.

I saw this project quite a while ago and was learning rust, so I found it a sweet place to apply my rust knowledge while contributing to open-source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants