Power-law Distributions in Empirical Data
This page is a companion for the review article on power-law distributions
in empirical data, written by
Aaron Clauset (me),
Cosma R. Shalizi and
M.E.J. Newman. The intention
is that this page will host implementations of the methods we describe in the
article. For now, these are simply the versions we wrote (in Matlab and R), but
our hope is to eventually host versions in a variety of languages. In general,
we want to make the methods as accessible to the community as possible. (Please note that we cannot provide any support for code not written by ourselves.)
Journal Reference
A. Clauset, C.R. Shalizi, and M.E.J. Newman, "Power-law distributions in empirical data" SIAM Review 51(4), 661-703 (2009).
(arXiv:0706.1062, doi:10.1137/070710111)
Random number generators
This function generates continuous values randomly distributed according to one of the
five distributions discussed in the article (power law, exponential, log-normal,
stretched exponential, and power law with cutoff). Usage information is included in the
file; type 'help randht' at the Matlab prompt for more information.
randht.m (Matlab)
randht.py (Python, by Joel Ornstein)
Fitting a power-law distribution
This function implements both the discrete and continuous maximum likelihood estimators
for fitting the power-law distribution to data, along with the goodness-of-fit based
approach to estimating the lower cutoff for the scaling region. Usage information is
included in the file; type 'help plfit' at the Matlab prompt for more information.
plfit.m (Matlab)
plfit.r (R, by Laurent Dubroca)
plfit.py (Python, by Adam Ginsburg)
plfit.c (C++, by Wim Otte; includes plvar.c)
plfit.py (Python, by Joel Ornstein)
Visualizing the fitted distribution
After several requests, I've written this function, which plots (on log-log axes) the
empirical distribution along with the fitted power-law distribution. Usage information is
included in the file; type 'help plplot' at the Matlab prompt for more information.
(This function is now also included in the full package download below.)
plplot.m (Matlab)
plplot.py (Python, by Joel Ornstein)
Estimating uncertainty in the fitted parameters
This function implements the nonparametric approach for estimating the uncertainty in
the estimated parameters for the power-law fit found by the plfit function. It too
implements both continuous and discrete versions. Usage information is included in the
file; type 'help plvar' at the Matlab prompt for more information.
plvar.m (Matlab)
plvar.c (C++, by Wim Otte; includes plfit.c)
plvar.py (Python, by Joel Ornstein)
Calculating p-value for fitted power-law model
This function implements the Kolmogorov-Smirnov test (which computes a p-value
for the estimated power-law fit to the data) for the power-law model. As above, it too
implements both continuous and discrete versions of the test. Usage information is
included in the file; type 'help plpva' at the Matlab prompt for more information.
plpva.m (Matlab)
parplpva2.m (Matlab, by Casper Peterson, uses Parallel Toolbox)
plpva.py (Python, by Joel Ornstein)
Riemann Zeta function
The discrete estimator needs to calculate the Riemann Zeta function for normalization.
Matlab includes this function in the Symbolic Math Toolbox, but there are free versions
available if you don't have this toolbox. For instance, Paul Godfrey's
special functions library (via Matlab Central File Exchange) gives one, which we
mirror here (note, you need both these files; tip to Will Tracy).
deta.m (Matlab)
zeta.m (Matlab)
Calculating likelihood-ratio test results
The functions necessary to compute the log likelihood ratio tests is implemented in
the statistical programming language R.
Documentation of these functions is given in a separate file, and the R functions
themselves are in a downloadable tgz file (note: this is not a proper R package, yet).
Documentation
R code
Download all files
To make getting the most up-to-date versions of the original files (in Matlab and R) even
easier, they're available as a downloadable zip file here.
Download all Matlab and R files
Download Python package (by Jeff Alstott)
A note Matlab compatibility
All of the Matlab functions here were designed to be compatible with Matlab v7. They are
not necessarily compatible with older versions of Matlab. That being said, it
should be possible to make them compatible as the core functionality does not depend
on v7 features.
A note about bugs and alternative implementations
The code provided here is provided as-is, with no warranty, with no guarantees of
technical support or maintenance, etc. If you experience problems while using the code,
please let us know via email. We are also happy to host (or link to)
implementations of any of these functions in other programming languages,
in the interest of facilitating their more widespread use. That being said,
all such code also comes with no warranties, etc. If you do have questions
about any of the implementations, please contact the respective function's
author; the pl* functions in Matlab were written by Aaron Clauset and the LRT
functions in R were written by Cosma Shalizi; all other language
implementations were written by members of the wider community.
Finally, if you use our code in an academic publication, it would be courteous of you to
thank me (Aaron) and Cosma in your acknowledgements for providing you with
implementations of the methods.
A note the data sets
The 24 data sets we studied in the paper were drawn from the literature, and the proper
citations are given in the paper. You can find much more detailed information, including
links to download many of the data sets, here.
A note about method tutorials
We do not currently have any tutorial information for installing or using these methods,
beyond what we describe in the paper and what is contained in the help files that go
with the Matlab and R files themselves. That being said, the InterSciWiki at UC Irvine
has a good overview tutorial page that may be of some use. (Note, 03.08.2011: the InterSciWiki seems to be down.)
Updates
17 January 2012: fixed a minor bug in the way plfit, plvar, plpva parse the nowarn
and nosmall arguments.
24 August 2011: posted updated version of plfit.r, at the request of
its author Laurent Dubroca.
4 August 2011: posted Joel Ornstein's Python ports of plfit, plvar, plpva and plplot.
8 October 2010: replaced plfit.r with new version, at request of its author Laurent Dubroca.
24 January 2010: fixed a minor bug in how plfit.m reports the log-likelihood of the fitted data, for the discrete case, after the selection of xmin is done; posted updated version of Wim Otte's code with the same fix.
27 November 2009: posted Wim Otte's C++ implementation of plfit and plvar.
1 October 2009: added the option in plfit, plvar and plpva to 'lock' xmin to a specific value (thanks to Paul Willems for the suggestion).
27 August 2009: posted Adam Ginsburg's Python implementation of plfit.
13 August 2009: created a new page with detailed information
about obtaining copies of the 24 empirical data sets we studied.
12 August 2009: fixed a minor bug in the R version of plfit that would cause its
results to disagree slightly with the results from the matlab version (thanks to Naoki
Masuda for pointing it out).
17 March 2009: fixed a minor bug in the R version of plfit that would cause the
returned KS statistic to be incorrect when xmin=1 (thanks to Jeff Stuckman for
pointing it out).
7 February 2009: try-catch block in integer portion of plfit now defaults to
iterative version if the try block ever fails (thanks to Rajiv Das for the
suggestion).
25 April 2008: changed randht, plpva and plvar to only initialize the
pseudo-random number generator on the first time they are called.
5 March 2008: in the integer routines, plfit, plvar and plpva now automatically
switch to a slower but more memory efficient estimation routine when the vectorized
default routine fails (e.g., Out of Memory error when max(x) is extremely large).
29 February 2008: posted Laurent Dubroca's R implementation of plfit.
17 February 2008: posted the plplot.m function for plotting the fitted
power-law distributions against the empirical data.
30 January 2008: corrected typo in plpva when using hidden 'sample' option, and
reordered the commands for 'limit' and 'sample' throughout (thanks to Klaas Dellschaft
for suggestions).
28 September 2007: corrected typo in argument parsing for randht.m, significant
efficiency improvements to xmin estimation routine in plfit.m, plpva.m and plvar.m
(thanks to Jim Bagrow for suggestions).
7 September 2007: corrected interim reporting in plpva.m; changed plfit.m,
plvar.m and plpva.m to reshape input vector to column format, and to prevent using
continuous approximation in small-sample regime for discrete data.
25 July 2007: corrected a typo in plvar.m, typo in pareto.R, typo in
log-likelihood for discrete cut-off powerlaw and fixed small bug in a plotting
routine.
29 June 2007: corrected a typo in plpva.m, typo in pareto.R and updated
compilation instructions in discpowerexp.R.