Evaluate the quality of univariate data
The goal of this document is to illustrate practical uses of routines regarding measures of variation in the SampleSTAT toolbox.
If you are collecting data on a process, it is important to determine not only the location of the mean, but also to look at the variation within the data. If you are, for example, interpreting the results of a chemical analysis, you may put much more emphasis on the obtained average value if you know that the individual samples vary only very little in comparison to the mean.
In general, the spread of a distribution, both in absolute and in relative terms, is a good measure of the variability (and hence reliability) of the data. There are several ways to specify the variation in the data. SampleSTAT provides routines to give you more information of your data as mean, median and standard deviation can do.
These functions are good to extend the built-in functions mean(), stdev(), max(), min(), median().
The toolbox SampleSTAT provide the following functions/macros for measuring variation (details see example below).
Calculates the stray area (range of dispersion of the values) of univariate data for a statistical confidence level (95%, 99%, 99.9%) and level of significance (0.5, 0.01, 0.001), resp. This is similar to the sample standard deviation but gives more confidence as the sample standard deviation (68%) can provide.
Calculates the trust area (range of dispersion of the mean) of univariate data and for a statistical confidence level (95%, 99%, 99.9%) and level of significance (0.5, 0.01, 0.001), resp. and is the sample standard deviation of the mean. Because specifying the sample standard deviation (s) is more or less useless without the additional specification of the mean (and of course the type of distribution). It makes a big difference if s = 5 with a mean of = 100, with a mean of = 3. Relating the sample standard deviation to the mean resolves this problem.
Determines the student factor for an amount of numbers and for a statistical confidence level (95%, 99%, 99.9%) and level of significance (0.5, 0.01, 0.001), resp.- service function for ST_strayarea and ST_trustarea
// Sample data v = [ .. 9.999; .. 9.998; .. 10.002; .. 10.000; .. 10.001; .. 10.000 .. ]; // Statistical confidence level p = "95%"; // Calculate statistical results n = length(v); // Number of values x = mean(v); // Arithmetic mean s = stdev (v); // Sample standard deviation sa = ST_strayarea(v, p); // Range of dispersion of the values (stray area) ta = ST_trustarea(v, p); // Range of dispersion of the mean (trust area) mi = min(v); // Minimal value ma = max(v); // Maximal value // Output clc; mprintf("\nDemo for range of dispersion (stray- and trustarea)\n\n"); mprintf("Values:\n"); disp(v); // Display data mprintf("\n"); // blank line mprintf(.. "Number of Values : %i\n" + .. "Arithmetic Mean : %f\n" + .. "Sample Standard Deviation : %f\n" + .. "Confidence Level : %s\n" + .. "Range of Dispersion (values): %f\n" + .. "Range of Dispersion (mean) : %f\n" + .. "Minimum : %f\n" + .. "Maximum : %f\n", .. n, x, s, p, sa, ta, mi, ma); mprintf("\n"); // blank line mprintf( .. "68 percent of the values will stray around %.3f +/- %.3f (sample standard \n" + .. "deviation). %s of the values will be expected around %.3f +/- %.3f (Range \n" + .. "of disp. of the values, stray area).\n" + .. "With a propability of %s the mean of %.3f will stray around %.3f +/- %.3f \n" + .. "(Rage of dispersion of the mean, trust area).\n", x, s, p, x, sa, p, x, x, ta); | ![]() | ![]() |
Output:
Demo for range of dispersion (stray- and trustarea) Values: 9.999 9.998 10.002 10. 10.001 10. Number of Values : 6 Arithmetic Mean : 10.000000 Sample Standard Deviation : 0.001414 Confidence Level : 95% Range of Dispersion (values): 0.003635 Range of Dispersion (mean) : 0.001484 Minimum : 9.998000 Maximum : 10.002000 68 percent of the values will stray around 10.000 +/- 0.001 (sample standard deviation). 95% of the values will be expected around 10.000 +/- 0.004 (Range of disp. of the values, stray area). With a propability of 95% the mean of 10.000 will stray around 10.000 +/- 0.001 (Rage of dispersion of the mean, trust area)
R. Kaiser, G. Gottschalk; "Elementare Tests zur Beurteilung von Meßdaten", BI Hochschultaschenbücher, Bd. 774, Mannheim 1972.
Hani A. Ibrahim - hani.ibrahim@gmx.de