Extracting insight from large networks: small-scale structures, large-scale structures, and their implications for machine learning and data analysis
Presenter
October 25, 2011
Keywords:
- Data analysis
MSC:
- 62-07
Abstract
Recent empirical work has demonstrated that, although there often exists
meaningful "small scale" structure (e.g., clustering structure around a
single individual at the size-scale of roughly 100 individuals) in large
social and information networks, analogous "large scale" structure (e.g.,
meaningful or statistically significant properties of tens or hundreds of
thousands of individuals) either is lacking entirely or is of a form that
is extremely difficult for traditional machine learning and data analysis
tools to identify reliably. For example, there are often small clusters
which provide a "bottleneck" to diffusions (e.g., diffusive-based dynamic
processes of the form of interest in viral marketing applications and
tipping point models of network dynamics); on the other hand, there are
typically no large clusters that have analogous bottlenecks, and thus
diffusion-based metrics (and the associated machine learning and data
analysis tools) are simply much less meaningful (or discriminative or
useful) if one is interested in analyzing the network at large sizes.
This empirical work will be reviewed in some detail; its implications for
extracting insight from large networks with popular machine learning and
data analysis tools will be discussed; and several examples of novel
machine learning and data analysis tools that were developed in response
to these observations will be discussed.