Univariate linear regression
B = scidoe_simplelinreg(X,Y) B = scidoe_simplelinreg(X,Y,level) [B,bint] = scidoe_simplelinreg(...) [B,bint,r] = scidoe_simplelinreg(...) [B,bint,r,rint] = scidoe_simplelinreg(...) [B,bint,r,rint,stats] = scidoe_simplelinreg(...) [B,bint,r,rint,stats,fullstats] = scidoe_simplelinreg(...)
a 1-by-1 matrix of doubles, the confidence level (default level=0.05, meaning that the default level is 95%). level is expected to be in the range [0.,0.5]
a 1-by-2 matrix of doubles, where B(1) is the intercept and B(2) is the slope
a 2-by-2 matrix of doubles, intervals with confidence level. The column bin(:,1) are the lower bounds and bin(:,2) are the upper bounds.
a m-by-1 matrix of doubles, the residuals Y-B(1)-B(2)*x
a m-by-2 matrix of doubles, the confidence intervals of the residuals. The column rint(:,1) are the lower bounds and rint(:,2) are the upper bounds.
a 4-by-1 matrix of doubles, the statistics. stats(1) is the R2 statistics, stats(2) is the F statistics, stats(3) is the p-value of the F statistics, stats(4) is an estimate of the error variance.
a struct, the statistics, see below for details.
This function fits data in linear model
Y=B(1)+B(2)*X + e
where B(1) is the y-intercept, B(2) is the slope and e is a random normal error with mean zero and variance sigma2.
The coefficients B are so that they are the best fit of the data in the least squares sense.
The system should be overdetermined, meaning having more equations than unknowns, otherwise the system doesn't have a solution. In other words, m must be greater or equal than 2. Otherwise, an error is generated.
The entries in bint are a confidence interval for the entries in B. In other words, B(1) in [bint(1,1),bint(1,2)] with probability 1-level and B(2) in [bint(2,1),bint(2,2)] with probability 1-level.
The fields in fullstats
are :
fullstats.RegressionSS: the sum of squares of the regression
fullstats.RegressionDof: the number of degrees of freedom of the regression
fullstats.RegressionMean: the mean of the sum of squares of the regression
fullstats.ResidualSS: the sum of squares of the residuals
fullstats.ResidualDof: the number of degrees of freedom of the residuals
fullstats.ResidualMean: the mean of the sum of squares of the residuals
fullstats.F: the F statistics
fullstats.pval : the p-value corresponding to the F statistics
fullstats.Bstddev: a (n+1)-by-1 matrix of doubles, the standard deviation of B
fullstats.R2: the R-squared statistics
The residual r is :
r == y-X*B
The statistics fullstats.R2 is in the range [0,1]. On output, the coefficient of determination fullstats.R2 measures the proportion of the variation in the response variables that is explained by the different input values. A value of fullstats.R2 near 1 indicates a good fit, while a value near zero indicates a poor fit.
The F statistics tests the null hypothesis B(1)==B(2)==...==B(n) agains the alternative hypothesis that there is one nonzero B(i), for some i=1,2,...,n. The F test does not tell which of the B(i) is nonzero, but only that one of them is linearly related to the response. When the P-value for the F test statistic is small (e.g. less than 0.001), this provides an evidence against the null hypothesis. In other words, if the P-value is small, therefore one of the parameters B(i) is linearly related to the response.
X = [ 57. 64. 69. 82. 92. 111. 114. 132. 144. 146. ] Y = [ 121. 129. 140. 164. 188. 217. 231. 264. 289. 294. ] B = scidoe_simplelinreg(X,Y) expected = [4.8222564 1.9671389] // Visually check the result: scf(); plot(X,Y,"bo"); L = B(1)+B(2)*X; plot(X,L,"r-") legend(["Data","Linear Fit"]); xtitle("Linear Regression","X","Y"); // Compute a 95% confidence interval. // Visually check the results. // Visually check the residuals. // Reference : // http://en.wikipedia.org/wiki/Simple_linear_regression Height = [ 1.47 1.50 1.52 1.55 1.57 .. 1.60 1.63 1.65 1.68 1.70 .. 1.73 1.75 1.78 1.80 1.83 ]; Mass =[ 52.21 53.12 54.48 55.84 57.20 .. 58.57 59.93 61.29 63.11 64.47 .. 66.28 68.10 69.92 72.19 74.46 ]; [B,bint] = scidoe_simplelinreg(Height,Mass); Bexpected = [ -39.062 61.272 ]; Bintexpected = [ -45.4 -32.7 57.4 65.1 ]; // Visually check the result: scf(); Height = Height(:); Mass = Mass(); plot(Height,Mass,"bo"); L = B(1)+B(2)*Height; plot(Height,L,"r-") legend(["Data","Linear Fit"]); xtitle("Linear Regression","Height","Mass"); // Compute the residuals [B,bint,r] = scidoe_simplelinreg(Height,Mass); // Visually see the residuals scf(); plot(Height,r,"bo"); xtitle("Linear Regression","Height","Residuals"); // A quadratic fit would be better: // we see a quadratic pattern in the residuals. // See the standardized residuals [B,bint,r,stats] = scidoe_simplelinreg(Height,Mass); disp(stats) scf(); plot(Height,r/sqrt(stats.sigma2),"bo"); xtitle("Linear Regression","Height","Standard Residuals"); // Create a dataset grand("setsd",0); // The number of observations N = 100; // The exact coefficients B = [2;3] // The input Xmin = 50; Xmax = 150; X=grand(N,1,"uin",Xmin,Xmax); X=gsort(X,"g","i"); // Compute the ouput from the input Y = B(1)+B(2)*X; // Add a normal noise to the output sigma = 5; e = grand(N,1,"nor",0,sigma); Y = int(Y + e); // Linear regression B = scidoe_simplelinreg(X,Y) // Compare B (exact) with B (estimate) [B,B] scf(); plot(X,Y,"bo"); L = B(1)+B(2)*X; plot(X,L,"r-") legend(["Data","Linear Fit"]); xtitle("Linear Regression","X","Y"); | ![]() | ![]() |
"Introduction to probability and statistics for engineers and scientists.", Third Edition, Sheldon Ross, Elsevier Academic Press, 2004
http://en.wikipedia.org/wiki/Simple_linear_regression