In our data mining/ML work we often face domains in which class distributions are greatly skewed and classification error costs are unequal. This makes evaluation of classifiers very difficult because classification accuracy, the metric by which most evaluation is currently done, is completely inadequate for such situations. Furthermore, class distributions in these domains often drift over time, and error costs may only be known within an approximate range.
Motivated by these difficulties, we've developed a robust framework for evaluating learned classifiers. This framework is based on ROC analysis, and enables us to analyze and visualize classifier performance separately from assumptions about class distributions and error costs. Our method, based on the ROC convex hull, allows us to:
- analyze classifier performance over a broad range of performance conditions,
- determine easily the best available classifier(s) for a set of assumptions, and
- determine the range of conditions under which a given classifier will be best.
We now use this method extensively in our work, and other applied researchers have begun using it as well. We've decided to place the program we use under the Gnu Public License (GPL) and make it publicly available to the ML and Data Mining communities.
- Changes to previous version:
Initial Announcement on mloss.org.
Leave a comment
You must be logged in to post comments.