Estimating the probability of rare events
This page is a companion for the
article on estimating the historical and future probability of rare events,
Aaron Clauset (me), and
This page hosts implementations of the methods we describe in the article.
For now, these are simply the versions we wrote (in Matlab and Python), but
in the future includes those by others.
NOTE: that we cannot provide support for any code not written by ourselves.
A. Clauset and R. Woodard, "Estimating the historical and future probabilities of large terrorist events." Annals of Applied Statistics 7(4), 1838-1865 (2013).
(Subject of a special session at ASA Joint Statistical Meetings, Montreal Canada, 5 August 2013)
The Matlab implementation requires access to the plfit.m procedure and an implementation of the zeta function, both of which can be found here.
Python code requires numpy and mpmath (for the zeta function). Both are available in standard packaging (ubuntu, for example). Scipy is needed for some plotting niceties, but not for the main tasks.
Power law outlier detection (single variable)
This function implements both the discrete and continuous power-law outlier detection algorithms described in the paper. Usage information is included in the file; type 'help plout' at the Matlab prompt for more information.
Power law outlier detection (with discrete covariates)
This function implements the same method as plout but for integer-valued event covariates. The function applies the method to each of the marginal distributions and combines the results across them. Type 'help ploutm' at the Matlab prompt for more information.
Visualizing the ensemble of fitted distributions
This function takes the output of plout (or ploutm) and plots a portion of the ensemble of fitted models against the empirical data on log-log axes. Usage information is included in the file; type 'help pleplot' at the Matlab prompt for more information.
Matlab compatibility issues
All of the Matlab functions here were designed to be compatible with Matlab v7. They are not necessarily compatible with older versions of Matlab. That being said, it should be possible to make them compatible as the core functionality does not depend on v7 features.
A note about bugs and alternative implementations
The code provided here is provided as-is, with no warranty, with no guarantees of technical support or maintenance, etc. If you experience problems while using the code, please let us know via email. We are also happy to host (or link to) implementations of any of these functions in other programming languages. If you have questions about any of the implementations, please contact the respective function's author; the Matlab functions were written by Aaron Clauset and the Python functions were written by Ryan Woodard.
Finally, if you use our code in an academic publication, it would be courteous of you to thank us in your acknowledgements for providing you with implementations of the methods.
Data for replication purposes
To facilitate the replication of our results, we are providing access to the empirical data used to make our calculations. (Due to licensing restrictions, we cannot provide access to the original databases.) If you use these data in a publication, please cite the original source. The accompanying README file explains the file format and provides additional information.
National Memorial Institute for the Prevention of Terrorism, (2008) "Terrorism Knowledge Base." http://www.tkb.org (accessed 29 January 2008).
Download the data
9 March 2012: version 1.0.1 of plout and ploutm (Matlab) functions posted; these versions better document the bootstrap features and describe how to extract confidence intervals from the results.
3 January 2012: initial version of Matlab functions posted.