Open data release to accompany
The Evolution of Wikipedia's Norm Network
Bradi Heaberlin & Simon DeDeo, Future Internet 2016, 8(2), 14; doi:10.3390/fi8020014
• the hyperlink network for 1976 nodes describing the social norms of Wikipedia,
• page properties—creation date, number of edits, page views
• LDA / Topic-Modeled page semantics

== README ==
Release 0.1, 17 April 2016

The data can be found in, which contains four files:

1. nodes.csv
a TAB-delimited file containing node properties. Louvain Communities are numbered as in Table G.1; nodes outside of the giant component are labelled as belonging to community "-1".

2. topics.csv
a COMMA-delimited file, containing the topic distribution for the page in question. Topics are ordered as in Table G.2.

3. links.csv
a COMMA-delimited file, easily readable by Gephi, containing the hyperlink network.

4. README.txt
this file.


Column 6 is data derived from StatsGrok; the remaining data is either gathered directly via Wikipedia's API, or is the outcome of data processing described in the paper. Note that Louvain Community detection on a network of this size varies from run to run; if you re-run Louvain, you may find small shifts in cluster membership. We report community membership for the giant component only.

A small number of pages (ten) do not have associated topics, usually because they were too small (no text remaining after stopwords), or because the pages were deleted as cruft by Wikipedia editors (making the original text unrecoverable).

Links are highly filtered: as much as possible, we reject links in infoboxes, in auto-generated tables at the bottom of pages, and links generated by templates (see paper text). We do synonym resolution, so that a link to WP:NPOV is correctly registered as a link to Page Id 1020.

=== BiBTeX ===

AUTHOR = {Heaberlin, Bradi and DeDeo, Simon},
TITLE = {The Evolution of {W}ikipedia's Norm Network},
JOURNAL = {Future Internet},
VOLUME = {8},
YEAR = {2016},
NUMBER = {2},
PAGES = {14},
URL = {},
ISSN = {1999-5903},
DOI = {10.3390/fi8020014}