Learning with Imperfect Labels
Presenter
September 14, 2016
Keywords:
- Rare class prediction, imperfect labels, weak supervision, insufficient labeled data
Abstract
Many real-world problems involve learning predictive models for rare classes in situations where there are no gold standard labels for training samples but imperfect labels are available for all instances. We present RAre class Prediction in absence of True labels (RAPT), a three step predictive modeling framework for classifying rare class in such problem settings. The first step of the RAPT framework learns a classifier that optimizes both precision and recall by only using imperfectly labeled training samples. We also show that, under certain assumptions on the imperfect labels, the quality of this classifier is almost as good as the one constructed using gold standard labels. The second and third steps of the framework make use of the fact that imperfect labels are available for all instances to further improve the precision and recall of the rare class.
We applied the RAPT framework to generate a new burned area product for the tropical forests in South America and South-east Asia using Moderate Resolution Imaging Spectroradiometer (MODIS) multispectral surface reflectance data and Active Fire hotspots. The total burned area detected in this region between 2000-2014 is 2,286,385 MODIS pixels (approximately 571 K sq. km.), which is more than three times compared to the estimates by the state-of-the art product from NASA: MODIS MCD64A1 (742,886 MODIS pixels). Our validation results, obtained using multiple lines of evidence, indicate that the events reported in our product are indeed true burn events that are missed by the state-of-art burned area products.