Juliana Freire - New York University
The growing number of available structured datasets, from Web tables and open-data portals to enterprise data, open up new opportunities to enrich analytics and improve machine learning models through data augmentation. While dataset search engines for the Web and enterprises provide a first step towards in improving dataset findability, their query interfaces are limited, supporting only simple, keyword-based queries and faceted search. In this talk, I will discuss a new class of the dataset search queries that uncover relationships between datasets and support data augmentation. Concretely, given as input a dataset D, in the context of an analytics question A or a predictive model M, a data augmentation query returns a ranked list of datasets that are related to D and that answer A or enhance the performance of M. I will present our ongoing research on techniques to support the efficient evaluation of augmentation queries as well as to present search results so that users can make sense of the data and effectively perform relevance judgements about their suitability for a given task.