Videos

Thinking parallel: sparse iterative solvers with CUDA

Presenter
January 11, 2011
Keywords:
  • Sparse matrices
MSC:
  • 65F50
Abstract
Iterative sparse linear solvers are a critical component of a scientific computing platform.  Developing effective preconditioning strategies is the main challenge in developing iterative sparse solvers on massively parallel systems. As computing systems become increasingly power-constrained, memory hierarchies for massively parallel systems will become deeper and  more hierarchical.  Parallel algorithms with all-to-all communication patterns that assume uniform memory access times will be inefficient on these systems.  In this talk, I will outline the challenges of developing good parallel preconditioners, and demonstrate that domain decomposition methods have communication patterns that match emerging parallel platforms.  I will present recent work to develop restricted additive Schwarz (RAS) preconditioners as part of the open source 'cusp' library of sparse parallel algorithms.  On 2d Poisson problems, a RAS preconditioner is consistently faster than diagonal preconditioning in time-to-solution.  Detailed analysis demonstrates that the communication pattern of RAS matches the on-chip bandwidths of a Fermi GPU.  Line smoothing, which requires solving a large number of small tridiagonal linears systems in local memory, is another preconditioning approach with similar communication patterns.  I will conclude with a roadmap for devoping a range of preconditioners, smoothers, and linear solvers on massively parallel hardware based on the domain decomposition and line smoothing approaches.