MathInstitutes.org

Scalable Data Science and Apache Flink: Key Challenges and (Some) Solutions

Presenter

Volker Markl

February 2, 2017

Scalable Data Science and Apache Flink: Key Challenges and (Some) Solutions Thumbnail

Abstract

Scalable Data Science and Apache Flink: Key Challenges and (Some) Solutions Volker Markl Technische Universität Berlin Big data holds great promise. However, in today’s job market, there are an insufficient number of qualified data scientists. As a consequence, this shortage is effectively limiting big data from fully realizing its potential to deliver insight and provide value for scientists, business analysts, and society as a whole. Hence, we believe that novel technologies that draw on the concepts of declarative languages, query optimization, automatic parallelization, and hardware adaptation are necessary, in order to resolve the human resource bottleneck. In this talk, we will discuss several aspects of our research in this area, including results on how to optimize iterative data flow programs, optimistic fault-tolerance, and steps toward a deep language embedding of advanced data analysis programs. We will also discuss how our research activities have led to Apache Flink, an open-source big data analytics system that is today a major data processing engine in the Apache Big Data Stack used in a variety of applications by academia and industry.

Abstract

Supplementary Materials

Scalable Data Science and Apache Flink: Key Challenges and (Some) Solutio

Videos

Scalable Data Science and Apache Flink: Key Challenges and (Some) Solutions

Presenter

Abstract

Supplementary Materials