Videos

Extracting insight from large networks: small-scale structures, large-scale structures, and their implications for machine learning and data analysis

Presenter
October 25, 2011
Keywords:
  • Data analysis
MSC:
  • 62-07
Abstract
Recent empirical work has demonstrated that, although there often exists meaningful "small scale" structure (e.g., clustering structure around a single individual at the size-scale of roughly 100 individuals) in large social and information networks, analogous "large scale" structure (e.g., meaningful or statistically significant properties of tens or hundreds of thousands of individuals) either is lacking entirely or is of a form that is extremely difficult for traditional machine learning and data analysis tools to identify reliably. For example, there are often small clusters which provide a "bottleneck" to diffusions (e.g., diffusive-based dynamic processes of the form of interest in viral marketing applications and tipping point models of network dynamics); on the other hand, there are typically no large clusters that have analogous bottlenecks, and thus diffusion-based metrics (and the associated machine learning and data analysis tools) are simply much less meaningful (or discriminative or useful) if one is interested in analyzing the network at large sizes. This empirical work will be reviewed in some detail; its implications for extracting insight from large networks with popular machine learning and data analysis tools will be discussed; and several examples of novel machine learning and data analysis tools that were developed in response to these observations will be discussed.