Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gamma Hurdle DWPC model #123

Open
zietzm opened this issue Jun 21, 2018 · 2 comments
Open

Gamma Hurdle DWPC model #123

zietzm opened this issue Jun 21, 2018 · 2 comments

Comments

@zietzm
Copy link
Collaborator

zietzm commented Jun 21, 2018

DWPCs across multiple permutations for a single source, target node combination have a zero-inflated distribution.
image

Meanwhile, the nonzero values follow roughly a gamma distribution
image

The distributions of DWPC values are similar for a single source, target degree combination along a single metapath over all permutations. Below, orange is the distribution of nonzero values, blue is the distribution of values including zeros.

image

image

image

Full pdf

In the past, we have modeled the distribution of permuted DWPC values for a single source, target combination across permutations as a simple normal distribution with the mean and variance calculated from the various permuted values. For example, in the plot above, we show the distribution of permuted DWPC values for the source, target degree combination (448, 6) across 25 permutations.

In general, the gamma hurdle model will more precisely fit the nonzero data than a cut-off normal distribution.
image

@dhimmel
Copy link
Collaborator

dhimmel commented Jun 26, 2018

This is a brilliant work by @zietzm! As I understand it, we can calculate the gamma hurdle parameters with the same summary values that we are efficiently computing currently: sum, sum of squares, and number of nonzero DWPCs.

Some random thoughts I have:

  • In the third formula, is there a missing plus sign?
  • previously we needed one nonzero value to compute the standard deviation of the normal. Now we will need two distinct nonzero DWPCs since the gamma distribution will not be fit to any of the zero values.

It would be great to fit these gamma hurdle models and see how well they approximate actual permuted DWPC distributions. We could evaluate this statistically and visually. It would be nice to evaluate a range of permuted DWPC distributions in terms of nonzero percentage and skewdness.

@zietzm
Copy link
Collaborator Author

zietzm commented Jun 28, 2018

In the third formula, is there a missing plus sign?

No. The two probabilities are being multiplied rather than added.

Now we will need two distinct nonzero DWPCs

That's correct. Hopefully that should not be a problem, now that we have 200 permutations, but potentially this could be something we have to deal with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants