Univariate linear regression
B = scidoe_simplelinreg(X,Y) B = scidoe_simplelinreg(X,Y,level) [B,bint] = scidoe_simplelinreg(...) [B,bint,r] = scidoe_simplelinreg(...)
a 1-by-1 matrix of doubles, the confidence level (default level=0.05, meaning that the default level is 95%). level is expected to be in the range [0.,0.5]
a 1-by-2 matrix of doubles, where B(1) is the intercept and B(2) is the slope
a 2-by-2 matrix of doubles, intervals with confidence level. The column bin(:,1) are the lower bounds and bin(:,2) are the upper bounds.
a m-by-1 matrix of doubles, the residuals Y-B(1)-B(2)*x
a struct, the statistics, see below for details.
This function fits data in linear model
Y=B(1)+B(2)*X + e
where B(1) is the y-intercept, B(2) is the slope and e is a random normal error with mean zero and variance sigma2.
The coefficients B are so that they are the best fit of the data in the least squares sense.
The system should be overdetermined, meaning having more equations than unknowns, otherwise the system doesn't have a solution. In other words, m must be greater or equal than 2. Otherwise, an error is generated.
The entries in bint are a confidence interval for the entries in B. In other words, B(1) in [bint(1,1),bint(1,2)] with probability 1-level and B(2) in [bint(2,1),bint(2,2)] with probability 1-level.
The fields in stats
are :
stats.SSR : the sum of squares of the residuals,
stats.sigma2 : the estimate the variance of the random normal error,
stats.R2 : the coefficient of determination.
On output, the sum of squares of the residuals stats.SSR is such that
stats.SSR == sum(r.^2)
where the residual r is
r == Y-X*B
The statistics stats.R2 is in the range [0,1]. On output, the coefficient of determination stats.R2 measures the proportion of the variation in the response variables that is explained by the different input values. A value of stats.R2 near 1 indicates a good fit, while a value near zero indicates a poor fit.
The standardized residuals are
r/sqrt(stats.sigma2)
When the model is correct, the standardized residuals are approximately randomly distributed with mean zero and 95% of their values in the range [-1.96,1.96].
X = [ 57. 64. 69. 82. 92. 111. 114. 132. 144. 146. ] Y = [ 121. 129. 140. 164. 188. 217. 231. 264. 289. 294. ] B = scidoe_simplelinreg(X,Y) expected = [4.8222564 1.9671389] // Visually check the result: scf(); plot(X,Y,"bo"); L = B(1)+B(2)*X; plot(X,L,"r-") legend(["Data","Linear Fit"]); xtitle("Linear Regression","X","Y"); // Compute a 95% confidence interval. // Visually check the results. // Visually check the residuals. // Reference : // http://en.wikipedia.org/wiki/Simple_linear_regression Height = [ 1.47 1.50 1.52 1.55 1.57 .. 1.60 1.63 1.65 1.68 1.70 .. 1.73 1.75 1.78 1.80 1.83 ]; Mass =[ 52.21 53.12 54.48 55.84 57.20 .. 58.57 59.93 61.29 63.11 64.47 .. 66.28 68.10 69.92 72.19 74.46 ]; [B,bint] = scidoe_simplelinreg(Height,Mass); Bexpected = [ -39.062 61.272 ]; Bintexpected = [ -45.4 -32.7 57.4 65.1 ]; // Visually check the result: scf(); Height = Height(:); Mass = Mass(); plot(Height,Mass,"bo"); L = B(1)+B(2)*Height; plot(Height,L,"r-") legend(["Data","Linear Fit"]); xtitle("Linear Regression","Height","Mass"); // Compute the residuals [B,bint,r] = scidoe_simplelinreg(Height,Mass); // Visually see the residuals scf(); plot(Height,r,"bo"); xtitle("Linear Regression","Height","Residuals"); // A quadratic fit would be better: // we see a quadratic pattern in the residuals. // See the standardized residuals [B,bint,r,stats] = scidoe_simplelinreg(Height,Mass); disp(stats) scf(); plot(Height,r/sqrt(stats.sigma2),"bo"); xtitle("Linear Regression","Height","Standard Residuals"); // Create a dataset grand("setsd",0); // The number of observations N = 100; // The exact coefficients B = [2;3] // The input Xmin = 50; Xmax = 150; X=grand(N,1,"uin",Xmin,Xmax); X=gsort(X,"g","i"); // Compute the ouput from the input Y = B(1)+B(2)*X; // Add a normal noise to the output sigma = 5; e = grand(N,1,"nor",0,sigma); Y = int(Y + e); // Linear regression B = scidoe_simplelinreg(X,Y) // Compare B (exact) with B (estimate) [B,B] scf(); plot(X,Y,"bo"); L = B(1)+B(2)*X; plot(X,L,"r-") legend(["Data","Linear Fit"]); xtitle("Linear Regression","X","Y"); | ![]() | ![]() |
"Introduction to probability and statistics for engineers and scientists.", Third Edition, Sheldon Ross, Elsevier Academic Press, 2004
http://en.wikipedia.org/wiki/Simple_linear_regression