plot a Receiver Operating Characteristic (ROC) curve
nan_rocplot(scores, classes); nan_rocplot(scores, classes, plottype); nan_rocplot(scores, classes, plottype, Nthr); [AUC AUH] = nan_rocplot(scores, classes); [AUC AUH acc0 accM thrM thr acc sens speci hull] = nan_rocplot(...);
output of the classifier
Boolean vector of the same size as scores
controls what is plotted, it defaults to 1, where:
gives no plot (useful to get AUC without creating a plot)
gives a standard ROC curve, with sensitivity vs (1 - specificity)
gives my preferred convention, with sensitivity vs specificity
is an optional number of thresholds to consider (points in the ROC),
area under the ROC curve,
area under the convex hull
accuracy at a threshold of zero
max accuracy from the tested set of thresholds
threshold which leads to the max accuracy from the tested set of thresholds
considered thresholds,
corresponding accuracies
sensitivities
specificities
convex hull
ROC curves illustrate the performance on a binary classification problem where classification is based on simply thresholding a set of scores at varying levels. Low thresholds give high sensitivity but low specificity, high thresholds give high specificity but low sensitivity; the ROC curve plots this trade-off over a range of thresholds.
classes is a Boolean vector of the same size as scores, which should be true where scores >= threshold should yield true, e.g. true for patients, and false for healthy control subjects, in medical diagnosis. If you have two vectors of scores, e.g. patient and control, first do: scores = [control(:); patient(:)]; classes = [false(numel(control), 1); true(numel(patient), 1)];
AUC is the area under the ROC curve, a measure of overall accuracy, which gives the probability that the classifier would rank a randomly chosen true instance (e.g. patient) higher than a random false one (e.g. control subject).
AUH is the area under the convex hull of the ROC curve, which is of interest because it is theoretically possible to operate at any point on the convex hull of the points in an ROC curve (by using some proportional selection of classifiers that operate at two points defining the relevant section of the convex hull. This is just a more complex version of the logic that gives the null line for an ROC plot; if a threshold of -inf gives (speci=0,sens=1) and +inf gives (1,0), then using -inf for half of your data and +inf for the other half is expected to give (0.5,0.5) if you have equal numbers of true and false instances).
Also recorded in the plot legend, and optionally returned, are acc0, the accuracy at a threshold of zero, which is of special importance in some algorithms, e.g. if your scores come from a linear classifier like a Support Vector Machine which can give scores(i) = w'*x(:, i) - b, as equivalent to testing for w'*x(:, i) > b, and accM, the max accuracy from the tested set of thresholds, which occurs at the threshold thrM.
The function can also return a vector of all considered thresholds, along with the corresponding accuracies, sensitivities and specificities, in the variables thr, acc, sens, speci. The output hull contains the indices into sens and speci that give the convex hull. These arguments can then be used to plot multiple ROC curves and/or convex hulls on the same axis, e.g. after calling rocplot twice (with plottype 0), you could do: plot(speci1,sens1,'b', speci2,sens2,'r', [0 1],[1 0],'g');