**Network Analysis and ModelingCSCI 5352, Fall 2014**

Time: Tuesday and Thursday, 12:30pm - 1:45pm

Room: ECCS 1B12

Instructor: Aaron Clauset

Office: ECOT 743

Office hours: Tuesday, 2:00-3:30pm

Email: look it up

Teaching Assistant: Henry Romero

TA Office Hours: Monday, 9:30-10:30am in ECEE 150

Syllabus

**Description**

**Course work and grading**

**Schedule and lecture notes**

**Problem sets**

**Supplemental readings**

**Description**

Network science is a thriving and increasingly important cross-disciplinary
domain that focuses on the representation, analysis and modeling of complex
social, biological and technological systems as networks or graphs. Modern
data sets often include some kind of network. Nodes can have locations,
directions, memory, demographic characteristics, content, and preferences.
Edges can have lengths, directions, capacities, costs, durations, and types.
And, these variables and the network structure itself can vary, with edges and
nodes appearing, disappearing and changing their characteristics over time.
Capturing, modeling and understanding networks and rich data requires
understanding both the mathematics of networks and the computational tools for
identifying and explaining the patterns they contain.

This graduate-level course will examine modern techniques for analyzing and
modeling the structure and dynamics of complex networks. The focus will be on
statistical algorithms and methods, and both lectures and assignments will
emphasize model interpretability and understanding the processes that
generate real data. Applications will be drawn from computational biology and
computational social science. No biological or social science training is
required. (Note: this is not a scientific computing course, but there will be
plenty of computing for science.)

**Prerequisites**

Recommended: CSCI 3104 (undergraduate algorithms) and APPM 3570 (applied
probability), or equivalent preparation.

Note: An adequate mathematical and programming background is mandatory. The
concepts and techniques covered in this course depend heavily on basic
statistics (distributions, Monte Carlo techniques), scientific programming,
and calculus (integration and differentiation). Students without sufficient
preparation will struggle to keep up with the lectures and assignments.
Students without proper preparation may audit the course.

**Text**

Required (available at the CU Bookstore):

1. Networks: An Introduction by M.E.J. Newman

2. Pattern Recognition and Machine Learning by C.M. Bishop.

Optional:

1. All of Statistics by L. Wasserman

2. Numerical Recipes

3. Networks, Crowds and Markets by D. Easley and J. Kleinberg

4. Error and the Growth of Experimental Knowledge by D.G. Mayo.

**Course work and grading**

Attendance to the lectures is required.

Most of the class will be standard graduate-style lectures by the instructor.
These will be supplemented by guest lectures on special or advanced topics,
and class discussions of selected papers drawn from the networks literature.
Problem sets will develop and extend topics presented in class, and will
introduce additional topics not covered in class. Performance on the problem
sets will be the major component of evaluation. There are no written
examinations in the course, and thus students are expected to spend serious
quality time on the problem sets. Additional details are given in the syllabus.

*Problem sets*: There will be 6 problem sets. Each will include some
mathematical and some computational problems. Problem sets will be due roughly
every two weeks. Programming components of the problem sets may be completed
in any reasonable imperative language, and students are not expected to code
everything from scratch (using available network libraries is okay).

See the syllabus for more details about formatting and submitting your
solutions, for advice about how to get maximum points, and for the class
policies on collaboration and on late submissions.
Students that are
unsure about whether something is permitted under the policies described in
the syllabus should consult with the instructor well before a particular
deadline.

*Class project*: The purpose of the class project is to
formulate and explore a research question of the student's devising related
to network analysis and modeling. Students may work in teams of 1-2 people.
The deliverables are a short (10 minute) in-class presentation of the project
results, and a 10-page writeup. See the syllabus for more details.

*Grading* Grades will be assigned based on (section 001) attendance
(20%), problem sets (50%), and class project (30%), or (section 740)
problem sets (63%) and class project (37%).

**Tentative Schedule**

Week 1 : Introduction and overview (Lecture 1)

Week 2 : Random graphs I: the Erdos-Renyi model (Lecture 2)

Week 3 : Measures of centrality (Lecture 3)

Week 4 : Random graphs II: the configuration model (Lecture 4)

Week 5 : Large-scale structure I: assortativity and modularity (Lecture 5)

Week 6 : Large-scale structure II: stochastic block models (Lecture 6)

Week 7 : Large-scale structure III: more block models (Lecture 7 and Lecture 7 supplement)

Week 8 : Wrangling network data I: from raw data to networks (Lecture 8)

Week 9 : Wrangling network data II: sampled networks (Lecture 9)

Week 10 : Spatial networks (Lecture 10)

Week 11 : Growing networks (Lecture 11)

Week 12 : Dynamic networks (Lecture 12)

Week 13 : Advanced topics

Week 14 : Fall break

Weeks 15-16 : Project presentations and Wrap up

**Problem Sets**

**Problem set 1** (assigned Aug 28; due Sept 10) [data files: see class Dropbox]

**Problem set 2** (assigned Sept 11; due Sept 24) [data file: see class Dropbox]

**Problem set 3** (assigned Sept 25; due Oct 8) [data files: see class Dropbox]

**Problem set 4** (assigned Oct 9; due Oct 22) [data files; see class Dropbox]

**Problem set 5** (assigned Oct 25; due Nov 5)

**Problem set 6** (assigned Nov 9; due Nov 19)

Week 1:

- M.E.J. Newman, "The structure and function of complex networks."
*SIAM Review***45**, 167-256 (2003). - L. Breiman, "Statistical Modeling: The Two Cultures."
*Statistical Science***16**, 199-231 (2001).

Week 2:

- M.E.J. Newman, "Power laws, Pareto distributions and Zipf's law."
*Contemporary Physics***46**(5), 323-351 (2005). - M. Mitzenmacher, "A Brief History of Generative Models for Power Law and Lognormal Distributions."
*Internet Mathematics***1**(2), 226-251 (2004). - A. Clauset, C.R. Shalizi and M.E.J. Newman, "Power-law distributions in empirical data."
*SIAM Review***51**(4), 661-703 (2009).

Week 3:

- S. Borgati, "Centrality and network flow."
*Social Networks***27**, 55-71 (2005). - B. Ball and M.E.J. Newman, "Friendship networks and social status."
*Network Science***1**, 16-30 (2013).

Week 4:

- M.E.J. Newman, S.H. Strogatz and D.J. Watts, "Random graphs with arbitrary degree distributions and their applications."
*Physical Review E***64**, 026118 (2001). - D.S. Callaway et al., "Are randomly grown graphs really random?."
*Physical Review E***64**, 041902 (2001).

Week 5:

- M. McPherson, L. Smith-Lovin and J.M. Cook, "Birds of a Feather: Homophily in Social Networks."
*Annual Reviews of Sociology***27**, 415-444 (2001). - M.E.J. Newman, "Mixing patterns in networks."
*Physical Review E***67**, 026126 (2003). - A. Clauset, M.E.J. Newman and C. Moore, "Finding community structure in very large networks."
*Physical Review E***70**, 066111 (2004). - B.H. Good, Y.-A. de Montjoye and A. Clauset, "Performance of modularity maximization in practical contexts."
*Physical Review E***81**, 046106 (2010).

Weeks 6-7:

- B. Karrer and M.E.J. Newman, "Stochastic blockmodels and community structure in networks."
*Physical Review E***83**, 016107 (2011). - A. Clauset, C. Moore, and M.E.J. Newman, "
Hierarchical structure and the prediction of missing links in networks."
*Nature***453**, 98 - 101 (2008). - B. Ball, B. Karrer and M.E.J. Newman, "An efficient and principled method for detecting communities in networks."
*Physical Review E***84**, 036103 (2011).

Week 10:

- S. Milgram, "The Small-World Problem."
*Psychology Today***1**(1), 61-67 (1967). - D. Watts and S. Strogatz, "Collective dynamics of 'small-world' networks."
*Nature***393**, 440-442 (1998). - J. Kleinberg, "The Small-World Phenomenon, an Algorithmic Perspective."
*Proc. 32nd ACM Symposium on Theory of Computing*(2000). - D. Liben-Nowell et al., "Geographic routing in social networks."
*PNAS***102**(33), 11623-11628 (2005). - V. Kalapala et al., Scale invariance in road networks.
*Physical Review E***73**, 026130 (2006).

Week 11:

- S. Redner, "Citation statistics from 110 years of
*Physical Review*."*Physics Today***58**, 49-54 (2005). - M.E.J. Newman, "The first-mover advantage in scientific publication."
*European Physics Letters***86**, 68001 (2009). - A. Vazquez, A. Flammini, A. Maritan, and A. Vespignani,
"Modeling of protein interaction networks."
*Complexus***1**, 38-44 (2003). - M. Middendorf, E. Ziv, and C. H. Wiggins,
"Inferring network mechanisms: The Drosophila melanogaster protein interaction network."
*PNAS***102**, 3192–3197 (2005).

**Network Tools**

NetworkX, network analysis package (Python)

igraph, network analysis tools (Python, C++, R)

graph-tool, network analysis and visualization software (Python, C++)

GraphLab, scalable network analysis (Python, C++)

**Network Visualization**

Cytoscape, network
visualization software

yEd Graph Editor, network visualization software

Graphviz,
network visualization software

Gephi, network visualization software

graph-tool, network analysis and visualization software

webweb, network visualization tool joining Matlab and d3

MuxViz, multilayer analysis and visualization platform

**Network Data Sets**

Mark Newman's network data sets (social, biological, technological)

Stanford Network Analysis Project repository (social and online networks)

KONECT, The Koblenz Network Collection (social, biological, online networks)

Interaction Web DataBase (ecological networks)

Open Connectome database (brain networks)

Lazega lawyers network (social)

US Census Education-Employment network (social, bipartite, weighted)

APS Physical Review bibliographic network (informational, directed, acyclic)

Internet Movie Database (social, bipartite, temporal)

Wikipedia (informational, temporal, big)

**Other Courses on Networks**

Network Theory (University of Michigan)

Statistical Network Analysis (Purdue University)

Networks (Cornell University)

Networks (Harvard University)

Social and Economic Networks: Models and Analysis (Coursera / Stanford)

Social Network Analysis (Coursera / University of Michigan)

Social and Information Network Analysis (Stanford)

Graphs and Networks (Yale)

Spectral Graph Theory (Yale)

The Structure of Social Data (Stanford)

**Resources**

LaTeX (general) and TeXShop (Mac)

Matlab license for CU staff (includes student employees)

Mathematica license for CU students

NumPy/SciPy libraries for Python (similar to Matlab)

GNU Octave (similar to Matlab)

Wolfram Alpha (Web interface for simple integration and differentiation)

Introduction to the Modeling and Analysis of Complex Systems, by Hiroki Sayama (free online textbook)

**Things Worth Reading**

Everything you wanted to know about Data Analysis and Fitting but were afraid to ask, by Peter Young

Machine Learning, Statistical Inference and Induction Notebook (by Cosma Shalizi)

Power Law distributions, etc. Notebook (by Cosma Shalizi)

Statistics Done Wrong, The woefully complete guide (by Alex Reinhart)

Some Advice on Process for
[Research Projects]