Network Analysis and Modeling
CSCI 5352, Fall 2016
Time: Monday and Wednesday, 9:30am - 10:45am
Room: ECCS 1B12
Instructor: Aaron Clauset
Office: ECOT 743
Office hours: Monday, 1:15-2:45pm
Email: firstname.lastname@example.org (an Atbash cipher)
Network science is a thriving and increasingly important cross-disciplinary domain that focuses on the representation, analysis and modeling of complex social, biological and technological systems as networks or graphs. Modern data sets often include some kind of network. Nodes can have locations, directions, memory, demographic characteristics, content, and preferences. Edges can have lengths, directions, capacities, costs, durations, and types. And, these variables and the network structure itself can vary, with edges and nodes appearing, disappearing and changing their characteristics over time. Capturing, modeling and understanding networks and rich data requires understanding both the mathematics of networks and the computational tools for identifying and explaining the patterns they contain.
This graduate-level course will examine modern techniques for analyzing and modeling the structure and dynamics of complex networks. The focus will be on statistical algorithms and methods, and both lectures and assignments will emphasize model interpretability and understanding the processes that generate real data. Applications will be drawn from computational biology and computational social science. No biological or social science training is required. (Note: this is not a scientific computing course, but there will be plenty of computing for science.)
Recommended: CSCI 3104 (undergraduate algorithms) and APPM 3570 (applied probability), or equivalent preparation.
Note: An adequate mathematical and programming background is mandatory. The concepts and techniques covered in this course depend heavily on basic statistics (distributions, Monte Carlo techniques), scientific programming, and calculus (integration and differentiation). Students without sufficient preparation will struggle to keep up with the lectures and assignments. Students without proper preparation may audit the course.
Required (available at the CU Bookstore):
1. Networks: An Introduction by M.E.J. Newman
2. Pattern Recognition and Machine Learning by C.M. Bishop.
1. All of Statistics by L. Wasserman
2. Numerical Recipes
3. Networks, Crowds and Markets by D. Easley and J. Kleinberg
4. Error and the Growth of Experimental Knowledge by D.G. Mayo.
Course work and grading
Attendance to the lectures is required.
Most of the class will be standard graduate-style lectures by the instructor. These will be supplemented by guest lectures on special or advanced topics, and class discussions of selected papers drawn from the networks literature. Problem sets will develop and extend topics presented in class, and will introduce additional topics not covered in class. Performance on the problem sets will be the major component of evaluation. There are no written examinations in the course, and thus students are expected to spend serious quality time on the problem sets. Additional details are given in the syllabus.
Problem sets: There will be 6 problem sets. Each will include some mathematical and some computational problems. Problem sets will be due roughly every two weeks. Programming components of the problem sets may be completed in any reasonable imperative language, and students are not expected to code everything from scratch (using available network libraries is okay).
See the syllabus for more details about formatting and submitting your solutions, for advice about how to get maximum points, and for the class policies on collaboration and on late submissions. Students that are unsure about whether something is permitted under the policies described in the syllabus should consult with the instructor well before a particular deadline.
Class project: The purpose of the class project is to formulate and explore a research question of the student's devising related to network analysis and modeling. Students may work in small teams. The deliverables are (i) a short (10 minute) in-class presentation of the project results, and (ii) a 10-page writeup. See the syllabus for more details.
Grading: See the syllabus.
Week 1 : Introduction and overview (Lecture 0 and Lecture 1)
Week 2 : Measures of structural importance (Lecture 2)
Week 3 : Random graphs I: homogeneous degrees (Lecture 3)
Week 4 : Random graphs II: heterogeneous degrees (Lecture 4)
Week 5 : Large-scale structure I: assortativity and modularity (Lecture 5)
Week 6 : Large-scale structure II: stochastic block models (Lecture 6 and supplement)
Week 7 : Large-scale structure III: generalizations and theorems (Lecture 7)
Week 8 : Wrangling network data I: from raw data to networks, and sampling (Lecture 8b)
Week 9 : TBD (Lecture 9a and Lecture 9b)
Week 10 : Spatial networks (Lecture 10)
Week 11 : Growing networks (Lecture 11)
Week 12 : Dynamic networks (Lecture 12)
Week 13 : Advanced topics
Week 14 : Fall break
Weeks 15-16 : Project presentations and Wrap up
Problem set 2 [data file in the class Dropbox]
Problem set 3 [data files in the class Dropbox]
Problem set 6
- M.E.J. Newman, "The structure and function of complex networks." SIAM Review 45, 167-256 (2003).
- L. Breiman, "Statistical Modeling: The Two Cultures." Statistical Science 16, 199-231 (2001).
- M.E.J. Newman, "Power laws, Pareto distributions and Zipf's law." Contemporary Physics 46(5), 323-351 (2005).
- M. Mitzenmacher, "A Brief History of Generative Models for Power Law and Lognormal Distributions." Internet Mathematics 1(2), 226-251 (2004).
- A. Clauset, C.R. Shalizi and M.E.J. Newman, "Power-law distributions in empirical data." SIAM Review 51(4), 661-703 (2009).
- S. Borgati, "Centrality and network flow." Social Networks 27, 55-71 (2005).
- B. Ball and M.E.J. Newman, "Friendship networks and social status." Network Science 1, 16-30 (2013).
- B.K. Fosdick et al., "Configuring Random Graph Models with Fixed Degree Sequences." Preprint, arxiv:1608.00607 (2016).
- D.S. Callaway et al., "Are randomly grown graphs really random?" Physical Review E 64, 041902 (2001).
- J. Hackl and B.T. Adey, "Random representation of spatially embedded complex transportation networks." Preprint, arxiv:1609.03324 (2016).
- M. McPherson, L. Smith-Lovin and J.M. Cook, "Birds of a Feather: Homophily in Social Networks." Annual Reviews of Sociology 27, 415-444 (2001).
- M.E.J. Newman, "Mixing patterns in networks." Physical Review E 67, 026126 (2003).
- A. Clauset, M.E.J. Newman and C. Moore, "Finding community structure in very large networks." Physical Review E 70, 066111 (2004).
- B.H. Good, Y.-A. de Montjoye and A. Clauset, "Performance of modularity maximization in practical contexts." Physical Review E 81, 046106 (2010).
- B. Karrer and M.E.J. Newman, "Stochastic blockmodels and community structure in networks." Physical Review E 83, 016107 (2011).
- A. Clauset, C. Moore, and M.E.J. Newman, "Hierarchical structure and the prediction of missing links in networks." Nature 453, 98 - 101 (2008).
- B. Ball, B. Karrer and M.E.J. Newman, "An efficient and principled method for detecting communities in networks." Physical Review E 84, 036103 (2011).
- T. P. Peixoto, "Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models." Phys. Rev. E 89, 012804 (2014)
- S. Goel et al., "The Structural Virality of Online Diffusion." Management Science 62(1), 180-196 (2015).
- J. Cheng et al., "Can cascades be predicted?." Proc. WWW (2014).
- D. J. Watts & P. S. Dodds, "Influentials, Networks, and Public Opinion Formation." J. Consumer Research 34, 441-458 (2007). J. Stat. Mech., P01015 (2006).
- M. Gastner & M.E.J. Newman, "Shape and efficiency in spatial distribution networks." J. Stat. Mech., P01015 (2006).
- J. Blumenstock et al., "Predicting poverty and wealth from mobile phone metadata." Science 350(6264), 1073-1076 (2015).
- R. Milo et al., "Network Motifs: Simple Buildling Blocks of Complex Neteworks" Science 298, 824-827 (2002).
- M. Middendorf et al., "Inferring network mechanisms: The Drosophila melanogaster protein interaction netework" Proc. Natl. Acad. Sci. USA 102(9), 3192-3197 (2005).
- B. Karrer and M.E.J. Newman, "Random graphs containing arbitrary distributions of subgraphs." Physical Review E 82, 066118 (2010).
NetworkX, network analysis package (Python) igraph, network analysis tools (Python, C++, R)
graph-tool, network analysis and visualization software (Python, C++) GraphLab, scalable network analysis (Python, C++)
Cytoscape, network visualization software
yEd Graph Editor, network visualization software
Graphviz, network visualization software Gephi, network visualization software graph-tool, network analysis and visualization software webweb, network visualization tool joining Matlab and d3
MuxViz, multilayer analysis and visualization platform
Other Courses on Networks
Network Theory (University of Michigan)
Statistical Network Analysis (Purdue University)
Networks (Cornell University)
Networks (Harvard University)
Social and Economic Networks: Models and Analysis (Coursera / Stanford)
Social Network Analysis (Coursera / University of Michigan)
Social and Information Network Analysis (Stanford)
Graphs and Networks (Yale)
Spectral Graph Theory (Yale)
The Structure of Social Data (Stanford)
LaTeX (general) and TeXShop (Mac)
Matlab license for CU staff (includes student employees)
Mathematica license for CU students
NumPy/SciPy libraries for Python (similar to Matlab)
GNU Octave (similar to Matlab)
Wolfram Alpha (Web interface for simple integration and differentiation)
Introduction to the Modeling and Analysis of Complex Systems, by Hiroki Sayama (free online textbook)
Things Worth Reading
Everything you wanted to know about Data Analysis and Fitting but were afraid to ask, by Peter Young
Machine Learning, Statistical Inference and Induction Notebook (by Cosma Shalizi)
Power Law distributions, etc. Notebook (by Cosma Shalizi)
Statistics Done Wrong, The woefully complete guide (by Alex Reinhart)
Some Advice on Process for [Research Projects]