Scalable deep learning: using parallel algorithms and HPC systems to train large models on big data set
Presenter
September 17, 2018
Abstract
Brian Van Essen
Lawrence Livermore National Laboratory
This talk will present the major approaches to parallelizing deep learning training and how they are applied in current deep learning toolkits. We will introduce the Livermore Deep Learning Neural Network toolkit (LBANN) that is specifically optimized to combine multiple levels of parallelism on HPC systems. Additionally, we will discuss how these techniques in scale on HPC systems, and the impact of hardware architectures optimized for deep learning workloads impacts these approaches. Finally, we will discuss some of the major challenges around the issues of communication and I/O.