Multiple linear regression
B = regres(y,X) B = regres(y,X,level) [B,bint] = regres(...) [B,bint,r] = regres(...) [B,bint,r,rint] = regres(...) [B,bint,r,rint,stats] = regres(...) [B,bint,r,rint,stats,fullstats] = regres(...)
a m-by-1 matrix of doubles, the responses.
a m-by-n matrix of doubles, the inputs, where m is the number of observations and n is the number of variables.
a 1-by-1 matrix of doubles, the confidence level (default level=0.05, meaning that the default level is 95%). level is expected to be in the range [0.,0.5]
a n-by-1 matrix of doubles, the coefficients of the linear model.
a 2-by-2 matrix of doubles, intervals with confidence level. The column bin(:,1) are the lower bounds and bin(:,2) are the upper bounds.
a m-by-1 matrix of doubles, the residuals y-B(1)-B(2)*x
a 4-by-1 matrix of doubles, the statistics. stats(1) is the R2 statistics, stats(2) is the F statistics, stats(3) is the p-value of the F statistics, stats(4) is an estimate of the error variance.
a struct, the statistics, see below for details.
This function computes a linear model with n independent variables x1, x2, ..., xn which best fit the data in the least squares sense.
The linear model we compute is
y = B(1)*x1 + B(2)*x2 +...+ B(n)*xn.
The fields in fullstats
are :
fullstats.RegressionSS: the sum of squares of the regression
fullstats.RegressionDof: the number of degrees of freedom of the regression
fullstats.RegressionMean: the mean of the sum of squares of the regression
fullstats.ResidualSS: the sum of squares of the residuals
fullstats.ResidualDof: the number of degrees of freedom of the residuals
fullstats.ResidualMean: the mean of the sum of squares of the residuals
fullstats.F: the F statistics
fullstats.pval : the p-value corresponding to the F statistics
fullstats.Bstddev: a (n+1)-by-1 matrix of doubles, the standard deviation of B
fullstats.R2: the R-squared statistics
The residual r is :
r == y-X*B
Assuming that X and y are column vectors, the simplest way to perform a simple linear regression is to use the statement
B = regres(y,[ones(X),X])
The statistics fullstats.R2 is in the range [0,1]. On output, the coefficient of determination fullstats.R2 measures the proportion of the variation in the response variables that is explained by the different input values. A value of fullstats.R2 near 1 indicates a good fit, while a value near zero indicates a poor fit.
The F statistics tests the null hypothesis B(1)=B(2)=...=B(n)=0 against the alternative hypothesis that there is one nonzero B(i), for some i=1,2,...,n. The F test does not tell which of the B(i) is nonzero, but only that one of them is linearly related to the response. When the P-value for the F test statistic is small (e.g. less than 0.001), this provides an evidence against the null hypothesis. In other words, if the P-value is small, therefore one of the parameters B(i) is linearly related to the response.
Caution! If we are interested in F-statistics and its p-value, then the X input argument should contain a column of 1s, so that the model has a constant term.
Compatibility note
This function is mostly compatible with Matlab,
except that Matlab has no fullstats
output argument,
which allows to have much more outputs than
the entries in stat
.
In R, the feature is provided by the lm
function and
the function anova
prints the analysis of
variance table.
In this toolbox, this feature is provided by the
regresprint
function,
which uses the fields in the fullstats
variable.
This would not have been possible with the stats
variable alone.
The regres
function name was chosen because
regress
already exists in Scilab, but Scilab's
regress
is not Matlab-compatible, and has much
less features.
// A simple linear regression X = [ 57. 64. 69. 82. 92. 111. 114. 132. 144. 146. ]; Y = [ 121. 129. 140. 164. 188. 217. 231. 264. 289. 294. ]; B = regres(Y,[ones(X),X]) expected = [4.8222564;1.9671389]; // Visually check the result: scf(); plot(X,Y,"bo"); L = B(1)+B(2)*X; plot(X,L,"r-") legend(["Data","Linear Fit"]); xtitle("Linear Regression","X","Y"); // Compute a 95% confidence interval. // Visually check the results. // Visually check the residuals. // Reference : [2] Height = [ 1.47 1.50 1.52 1.55 1.57 .. 1.60 1.63 1.65 1.68 1.70 .. 1.73 1.75 1.78 1.80 1.83 ]; Mass =[ 52.21 53.12 54.48 55.84 57.20 .. 58.57 59.93 61.29 63.11 64.47 .. 66.28 68.10 69.92 72.19 74.46 ]; Height = Height(:); Mass = Mass(:); [B,bint] = regres(Mass,[ones(Height),Height]) Bexpected = [ -39.062 61.272 ]; Bintexpected = [ -45.4 -32.7 57.4 65.1 ]; // Visually check the result: scf(); plot(Height,Mass,"bo"); L = B(1)+B(2)*Height; plot(Height,L,"r-") legend(["Data","Linear Fit"]); xtitle("Linear Regression","Height","Mass"); // Compute the residuals [B,bint,r] = regres(Mass,[ones(Height),Height]); // Visually see the residuals scf(); plot(Height,r,"bo"); xtitle("Linear Regression","Height","Residuals"); // A quadratic fit would be better: // we see a quadratic pattern in the residuals. // Longley.dat contains 1 Response Variable y, // 6 Predictor Variables x and 16 Observations. // Source : [4] X = [ 83.0 234289 2356 1590 107608 1947 88.5 259426 2325 1456 108632 1948 88.2 258054 3682 1616 109773 1949 89.5 284599 3351 1650 110929 1950 96.2 328975 2099 3099 112075 1951 98.1 346999 1932 3594 113270 1952 99.0 365385 1870 3547 115094 1953 100.0 363112 3578 3350 116219 1954 101.2 397469 2904 3048 117388 1955 104.6 419180 2822 2857 118734 1956 108.4 442769 2936 2798 120445 1957 110.8 444546 4681 2637 121950 1958 112.6 482704 3813 2552 123366 1959 114.2 502601 3931 2514 125368 1960 115.7 518173 4806 2572 127852 1961 116.9 554894 4007 2827 130081 1962 ]; y = [ 60323 61122 60171 61187 63221 63639 64989 63761 66019 67857 68169 66513 68655 69564 69331 70551 ]; [B,bint,r,rint,stats,fullstats] = regres(y,X) // Print an analysis of variance table scidoe_regressprint(fullstats) // Create an example with known coefficients grand("setsd",0); // The number of observations N = 100; // The exact coefficients Bexact = [2000;3;4] // The input 1 X1min = 50; X1max = 150; X1=grand(N,1,"uin",X1min,X1max); X1=gsort(X1,"g","i"); // The input 2 X2min = -500; X2max = -200; X2=grand(N,1,"uin",X2min,X2max); X2=gsort(X2,"g","i"); // Make the observations X = [ones(X1),X1,X2]; // Compute the ouput from the input y = Bexact(1)+Bexact(2)*X1+Bexact(3)*X2; // Add a normal noise to the output sigma = 50; e = grand(N,1,"nor",0,sigma); y = int(y + e); // Linear regression B = regres(y,X) // Compare B (exact) with B (estimate) [B,Bexact] L = B(1)+B(2)*X(:,2)+B(3)*X(:,3); // Check visually scf(); plot(y,L,"bo"); plot(y,y,"r-"); xtitle("Linear Regression","Data","Linear Fit"); // A short example of univariate linear regression X = rand(30,1,"n"); Y = 2+3*X+rand(30,1,"n"); scf(); plot(X,Y,"bo") B = scidoe_regress(Y,[ones(X),X]); plot(X,B(1)*ones(X)+B(2)*X,"r-") // To see which points are there, please use: // identify(X,Y,1) // Click middle button to leave. | ![]() | ![]() |
[1] "Introduction to probability and statistics for engineers and scientists.", Third Edition, Sheldon Ross, Elsevier Academic Press, 2004
[2] http://en.wikipedia.org/wiki/Linear_regression
[3] Octave's regres, William Poetra Yoga Hadisoeseno
[4] http://www.itl.nist.gov/div898/strd/lls/data/LINKS/DATA/Longley.dat