Videos

Mining in the Presence of Class Imbalance: Precision-Recall Curves and the F-Measure

November 15, 2014
Abstract
Jacqueline Hughes-Oliver North Carolina State University Statistics Department Algorithms for anomaly detection and information retrieval are designed to identify and characterize “unusual” subjects. As a result, they are typically applied in situations where class membership is not balanced and may even be highly imbalanced. Assessment of the effectiveness of such algorithms has increasingly abandoned the idea of overall accuracy or error rates due to their inability to distinguish between different types of errors. Even the popular receiver operating characteristic (ROC) curve is being pushed aside because of its property of being independent of class imbalance. In an attempt to assess an algorithm both with respect to its accuracy (as measured by the sensitivity, also known as true positive rate, also known as recall) and its utility (as measured by the positive predictive value, also known as precision), the precision-recall (PR) curve is gaining popularity. In this work, we investigate properties of the PR curve and some related summary measures. Discussion is aided by application to real and simulated datasets.
Supplementary Materials