Basic outlier tests for normal distributions
[outlierfree] = ST_outlier(v) [outlierfree] = ST_outlier(v, mod) [outlierfree, outlier] = ST_outlier(v) [outlierfree, outlier] = ST_outlier(v, mod)
n-by-1 or 1-by-m matrix of doubles, numerical values (n>10, better n>25)
1-by-1 matrix of strings, "sd" "iqr15"or "iqr30" mode
n-by-1 or 1-by-m matrix of doubles, outlier-free data
n-by-1 or 1-by-m matrix of doubles, outliers
Performs basic outlier tests.
SD-MODE: If you have a normal, symetric and unimodal distribution you can use the "sd" mode (population standard deviation, S.D. or sigma). In this mode a value is presented as an outlier when it is more than 2.5xS.D. off the arithmetic mean in both directions.
IQR-MODES:Testing on outliers with interquartile range (IQR) distance is recommended for skewed data in the first place. But it is also applicaple for normally distributed data.
IQR15-MODE: It is common to consider a value an outlier when it is more than 1.5xIQR (inter-quartile range) off from the lower or upper quartile. The "iqr15"-mode make use of this.
IQR30-MODE: But with a border of 1,5xIQR 0.7% of the distribution can be expected as an outlier automatically. This means that a distribution of 143 values or more could have at least one outlier in any case. To avoid this, values between 1.5xIQR and 3.0xIQR from the lower or upper quartile are called extreme values or weak outliers and just values outside of 3.0xIQR are strong outliers. SampleSTAT toolbox take care of this by introducing the "iqr30" mode.
![]() | Do use ST_outlier "sd" mode ONLY with NORMAL distributed data and with more than 10 or better more than 25 values! Use ST_deandixon (or ST_nalimov) for distributions with lower number of values. |
Lohringer, H., "Grundlagen der Statistik", Oct, 10th, 2012, http://www.statistics4u.info/fundstat_germ/cc_outlier_tests_4sigma.html