Abstract
Big Data at the LHC
Frank Wuerthwein
University of California, San Diego (UCSD)
The Large Hadron Collider is one of the largest and most complicated pieces of scientific apparatus ever constructed. The detectors along the LHC ring see as many as 800 million proton-proton collisions per second. An event in 10 to the 11th power is new physics and there is a hierarchical series of steps to extract a tiny signal from an enormous background. High energy physics (HEP) has long been a driver in managing and processing enormous scientific datasets and the largest scale high throughput computing centers. HEP developed one of the first scientific computing grids that now regularly operates 500k processor cores and half of an exabyte of disk storage located on 5 continents including hundred of connected facilities. In this presentation I will discuss the techniques used to extract scientific discovery from a large and complicated dataset. While HEP has developed many tools and techniques for handling big datasets, there is an increasing desire within the field to make more effective use of additional industry developments. I will discuss some of the ongoing work to adopt industry techniques in big data analytics to improve the discovery potential of the LHC and the effectiveness of the scientists who work on it.