Skip to content

This package implements both the discrete and continuous maximum likelihood estimators for fitting the power-law distribution to data. Additionally, a goodness-of-fit based approach is used to estimate the lower cutoff for the scaling region.

Notifications You must be signed in to change notification settings

CharlesCara/poweRlaw

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The poweRlaw package

This package implements both the discrete and continuous maximum likelihood estimators for fitting the power-law distribution to data. Additionally, a goodness-of-fit based approach is used to estimate the lower cut-off for the scaling region.

The code developed in this package has been heavily influenced from the python and R code found at: http://tuvalu.santafe.edu/~aaronc/powerlaws/ . In particular, the R code of Laurent Dubroca.

Installation

Currently, this package can only be installed from github and requires R >=2-15.0. The easiest way to install from github is to use the devtools package:

install.packages("devtools")
library(devtools)
install_github('poweRlaw', 'csgillespie')

Note Windows users have to first install Rtools.

The pl_data object

First we load the package and an example data set:

require(poweRlaw)
data(moby_sample)

The moby_sample data set is described in its associated help file - ?moby_sample. This dataset contains a subset of the word frequency from the novel Moby Dick. The entire data set can be loaded using data(moby).

We then create a pl_data object. This is an R reference object that contains the data set.

pl_d = pl_data$new(moby_sample)

which can be plotted using the plot function:

plot(pl_d)
#Alternatively, all plot & line methods return the data silently, so
d = plot(pl_d)
head(d)

The distribution object

There are currently two distributions available in this package; the discrete and continuous power-law. To fit and manipulate discrete power-laws, we use the displ object. To fit and manipulate continuous power-laws, we use conpl objects. Regardless of the distribution type, the interface to plotting and fitting is identical. See demo(moby) and demo(blackout) for further examples.

Example: discrete power-law

To create a discrete power-law object, we use the displ method:

m = displ$new(pl_d)

Initially, xmin is set to the minimum value of x and the scaling parameter alpha is set to NULL:

m$getXmin()
m$getPars()

To estimate the lower cut-off xmin, we use

(xmin = estimate_xmin(m))

This function returns the Kolomogorov-statistic, xmin value and the alpha scaling parameter. We can then set the xmin value in m: value

m$setXmin(xmin[2])
m$getXmin()

Next we estimate the scaling parameter using the mle method:

pars = estimate_pars(m)

and set the parameters in the m object:

m$setPars(pars)
m$getPars()

We can get a handle on the uncertainty when estimating xmin using bootstrapping:

#This function runs in parallel.
#Set the number threads to the number of CPU cores
#This can take a while
bs = bootstrap_xmin(m, no_of_sims=1000, threads=1)

Example results are included in this package:

##Moby_bootstrap relates to the entire moby data set
##See ?bootstrap_moby
data(bootstrap_moby)
bs = bootstrap_moby

We can plot the range of plausible xmin and alpha values using:

hist(bs$bootstraps[,2])
hist(bs$bootstraps[,3])

The values in the first row of table 6.1 in Clauset et al, can be obtained via:

##p-value
bs[[1]]
##sd of xmin. In the paper, it's 2
sd(bs[[3]][,2])
##sd of alpha. In the paper, it's 0.02
sd(bs[[3]][,3])

Plotting distribution objects

Once a distribution object has been created, standard plotting functions can be applied to it:

m$setXmin(7)
m$setPars(1.95)
##Plot the data, remove point before the threshold
plot(m)
##Add in the fitted distribution
lines(m, col=2)

The data used in the plots can be extracted by assignment:

d = plot(m)
pl = lines(m, col=2)

About

This package implements both the discrete and continuous maximum likelihood estimators for fitting the power-law distribution to data. Additionally, a goodness-of-fit based approach is used to estimate the lower cutoff for the scaling region.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • R 100.0%