<< nan_partest Classification nan_row_col_deletion >>

NaN Toolbox >> NaN Toolbox > Classification > nan_rocplot

nan_rocplot

plot a Receiver Operating Characteristic (ROC) curve

Calling Sequence

nan_rocplot(scores, classes);
nan_rocplot(scores, classes, plottype);
nan_rocplot(scores, classes, plottype, Nthr);
[AUC AUH] = nan_rocplot(scores, classes);
[AUC AUH acc0 accM thrM thr acc sens speci hull] = nan_rocplot(...);

Parameters

scores :

output of the classifier

classes :

Boolean vector of the same size as scores

plottype:

controls what is plotted, it defaults to 1, where:

0 :

gives no plot (useful to get AUC without creating a plot)

1 :

gives a standard ROC curve, with sensitivity vs (1 - specificity)

2 :

gives my preferred convention, with sensitivity vs specificity

Nthr :

is an optional number of thresholds to consider (points in the ROC),

AUC :

area under the ROC curve,

AUH :

area under the convex hull

acc0 :

accuracy at a threshold of zero

accM :

max accuracy from the tested set of thresholds

thrM :

threshold which leads to the max accuracy from the tested set of thresholds

thr :

considered thresholds,

acc :

corresponding accuracies

sens :

sensitivities

speci :

specificities

hull:

convex hull

Description

ROC curves illustrate the performance on a binary classification problem where classification is based on simply thresholding a set of scores at varying levels. Low thresholds give high sensitivity but low specificity, high thresholds give high specificity but low sensitivity; the ROC curve plots this trade-off over a range of thresholds.

classes is a Boolean vector of the same size as scores, which should be true where scores >= threshold should yield true, e.g. true for patients, and false for healthy control subjects, in medical diagnosis. If you have two vectors of scores, e.g. patient and control, first do: scores = [control(:); patient(:)]; classes = [false(numel(control), 1); true(numel(patient), 1)];

AUC is the area under the ROC curve, a measure of overall accuracy, which gives the probability that the classifier would rank a randomly chosen true instance (e.g. patient) higher than a random false one (e.g. control subject).

AUH is the area under the convex hull of the ROC curve, which is of interest because it is theoretically possible to operate at any point on the convex hull of the points in an ROC curve (by using some proportional selection of classifiers that operate at two points defining the relevant section of the convex hull. This is just a more complex version of the logic that gives the null line for an ROC plot; if a threshold of -inf gives (speci=0,sens=1) and +inf gives (1,0), then using -inf for half of your data and +inf for the other half is expected to give (0.5,0.5) if you have equal numbers of true and false instances).

Also recorded in the plot legend, and optionally returned, are acc0, the accuracy at a threshold of zero, which is of special importance in some algorithms, e.g. if your scores come from a linear classifier like a Support Vector Machine which can give scores(i) = w'*x(:, i) - b, as equivalent to testing for w'*x(:, i) > b, and accM, the max accuracy from the tested set of thresholds, which occurs at the threshold thrM.

The function can also return a vector of all considered thresholds, along with the corresponding accuracies, sensitivities and specificities, in the variables thr, acc, sens, speci. The output hull contains the indices into sens and speci that give the convex hull. These arguments can then be used to plot multiple ROC curves and/or convex hulls on the same axis, e.g. after calling rocplot twice (with plottype 0), you could do: plot(speci1,sens1,'b', speci2,sens2,'r', [0 1],[1 0],'g');

Authors

<< nan_partest Classification nan_row_col_deletion >>