<< nan_cor Data Correlation and Covariance nan_corrcov >>

NaN Toolbox >> NaN Toolbox > Data Correlation and Covariance > nan_corrcoef

nan_corrcoef

calculates the correlation matrix from pairwise correlations.

Calling Sequence

[...] = nan_corrcoef(X);    calculates the (auto-)correlation matrix of X
[...] = nan_corrcoef(X,Y);  calculates the crosscorrelation between X and Y
[...] = nan_corrcoef(..., Mode);
[...] = nan_corrcoef(..., param1, value1, param2, value2, ... );
[R,p,ci1,ci2,nansig] = nan_corrcoef(...);

Parameters

Mode='Pearson' or 'parametric' [default]:

gives the correlation coefficient

Mode='Spearman' :

gives 'Spearman''s Rank Correlation Coefficient'

Mode='Rank' :

gives a nonparametric Rank Correlation Coefficient

param = 'Mode':

type of correlation

param= 'rows':

how do deal with missing values encoded as NaN's.

value = 'complete':

remove all rows with at least one NaN

value = 'pairwise':

[default]

value = :

significance level to compute confidence interval [default = 0.01]

R :

is the correlation matrix

R(i,j) :

is the correlation coefficient r between X(:,i) and Y(:,j)

p :

gives the significance of R

p > alpha:

do not reject the Null hypothesis: 'R is zero'.

p < alpha:

The alternative hypothesis 'R is larger than zero' is true with probability (1-alpha).

ci1 :

lower (1-alpha) confidence interval

ci2 :

upper (1-alpha) confidence interval

nan_sig :

p-value whether H0: 'NaN''s are not correlated' could be correct

Description

The input data can contain missing values encoded with NaN. Missing data (NaN's) are handled by pairwise deletion [15]. In order to avoid possible pitfalls, use case-wise deletion or or check the correlation of NaN's with your data (see below). A significance test for testing the Hypothesis 'correlation coefficient R is significantly different to zero' is included.

The result is only valid if the occurence of NaN's is uncorrelated. In order to avoid this pitfall, the correlation of NaN's should be checked or case-wise deletion should be applied. Case-Wise deletion can be implemented ix = ~or(isnan([X,Y]),2); [...] = CORRCOEF(X(ix,:),Y(ix,:),...);

Correlation (non-random distribution) of NaN's can be checked with [nan_R,nan_sig]=nan_corrcoef(X,isnan(X)) or [nan_R,nan_sig]=nan_corrcoef([X,Y],isnan([X,Y])) or [R,p,ci1,ci2] = CORRCOEF(...);

Further recommandation related to the correlation coefficient: + LOOK AT THE SCATTERPLOTS to make sure that the relationship is linear + Correlation is not causation because it is not clear which parameter is 'cause' and which is 'effect' and the observed correlation between two variables might be due to the action of other, unobserved variables.

See also

Bibliography

on the correlation coefficient

[ 1] http://mathworld.wolfram.com/CorrelationCoefficient.html

[ 2] http://www.geography.btinternet.co.uk/spearman.htm

[ 3] Hogg, R. V. and Craig, A. T. Introduction to Mathematical Statistics, 5th ed. New York: Macmillan, pp. 338 and 400, 1995.

[ 4] Lehmann, E. L. and D'Abrera, H. J. M. Nonparametrics: Statistical Methods Based on Ranks, rev. ed. Englewood Cliffs, NJ: Prentice-Hall, pp. 292, 300, and 323, 1998.

[ 5] Press, W. H.; Flannery, B. P.; Teukolsky, S. A.; and Vetterling, W. T. Numerical Recipes in FORTRAN: The Art of Scientific Computing, 2nd ed. Cambridge, England: Cambridge University Press, pp. 634-637, 1992

[ 6] http://mathworld.wolfram.com/SpearmanRankCorrelationCoefficient.html

on the significance test of the correlation coefficient

[11] http://www.met.rdg.ac.uk/cag/STATS/corr.html

[12] http://www.janda.org/c10/Lectures/topic06/L24-significanceR.htm

[13] http://faculty.vassar.edu/lowry/ch4apx.html

[14] http://davidmlane.com/hyperstat/B134689.html

[15] http://www.statsoft.com/textbook/stbasic.html//Correlations

others

[20] http://www.tufts.edu/~gdallal/corr.htm

[21] Fisher transformation http://en.wikipedia.org/wiki/Fisher_transformation

Authors

<< nan_cor Data Correlation and Covariance nan_corrcov >>