You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DWPCs across multiple permutations for a single source, target node combination have a zero-inflated distribution.
Meanwhile, the nonzero values follow roughly a gamma distribution
The distributions of DWPC values are similar for a single source, target degree combination along a single metapath over all permutations. Below, orange is the distribution of nonzero values, blue is the distribution of values including zeros.
In the past, we have modeled the distribution of permuted DWPC values for a single source, target combination across permutations as a simple normal distribution with the mean and variance calculated from the various permuted values. For example, in the plot above, we show the distribution of permuted DWPC values for the source, target degree combination (448, 6) across 25 permutations.
In general, the gamma hurdle model will more precisely fit the nonzero data than a cut-off normal distribution.
The text was updated successfully, but these errors were encountered:
This is a brilliant work by @zietzm! As I understand it, we can calculate the gamma hurdle parameters with the same summary values that we are efficiently computing currently: sum, sum of squares, and number of nonzero DWPCs.
Some random thoughts I have:
In the third formula, is there a missing plus sign?
previously we needed one nonzero value to compute the standard deviation of the normal. Now we will need two distinct nonzero DWPCs since the gamma distribution will not be fit to any of the zero values.
It would be great to fit these gamma hurdle models and see how well they approximate actual permuted DWPC distributions. We could evaluate this statistically and visually. It would be nice to evaluate a range of permuted DWPC distributions in terms of nonzero percentage and skewdness.
In the third formula, is there a missing plus sign?
No. The two probabilities are being multiplied rather than added.
Now we will need two distinct nonzero DWPCs
That's correct. Hopefully that should not be a problem, now that we have 200 permutations, but potentially this could be something we have to deal with.
DWPCs across multiple permutations for a single source, target node combination have a zero-inflated distribution.
Meanwhile, the nonzero values follow roughly a gamma distribution
The distributions of DWPC values are similar for a single source, target degree combination along a single metapath over all permutations. Below, orange is the distribution of nonzero values, blue is the distribution of values including zeros.
Full pdf
In the past, we have modeled the distribution of permuted DWPC values for a single source, target combination across permutations as a simple normal distribution with the mean and variance calculated from the various permuted values. For example, in the plot above, we show the distribution of permuted DWPC values for the source, target degree combination
(448, 6)
across 25 permutations.In general, the gamma hurdle model will more precisely fit the nonzero data than a cut-off normal distribution.
The text was updated successfully, but these errors were encountered: