
Tutorial - Large Scale Inference

March 26, 2012
Statistical inference for large scale factorization and latent variable model problems is challenging. It requires the ability to partition the state space, to synchronize copies, and to perform distributed updates. Such problems arise in very large scale topic models dealing with 500 million documents, and in graph factorization problems with 200 million vertices. This talk describes basic tools from systems research for distributing data and computation over hundreds of computers and how to synchronize updates efficiently. We argue in favor of asynchronous updates, both from a systems design and from an experimental point of view. In particular, we show how a distributed approximate Gibbs sampler can be implemented for time-dependent latent variable models and how the method of multipliers can be adapted for large scale graph factorization.