Stepwise model selection based on linear regression
mdl = stepwiselm(y,x) mdl = stepwiselm(y,x,direction) mdl = stepwiselm(y,x,direction,criteria) mdl = stepwiselm(y,x,direction,criteria,nitermax) mdl = stepwiselm(y,x,direction,criteria,nitermax,verbose)
a m-by-1 matrix of doubles, the responses.
a m-by-n matrix of doubles, the inputs, where m is the number of observations and n is the number of variables.
a string, the direction of search. Available values are "forward", "backward" and "both" (default direction = "forward").
a string, the criteria. Available values are "AIC" and "BIC" (default criteria = "BIC")
a double, integer value, greater or equal to 1, the maximum number of iterations (default nitermax = 100)
a boolean, true if messages are to be printed (default verbose = %f)
a struct. See below for details
This function tries to find the model which maximizes the given criterion.
The fields in mdl are the following.
"selection" : a 1-by-n matrix of doubles, zeros or ones. Variable X(i) is selected if selection(i)==1.
"Z" : a m-by-nbselected matrix of doubles, the columns of the selected inputs
"selectionHistory" : a 1-by-nbselected matrix of doubles, the indices of the selected inputs, in the order where each variable was selected
"criteriaHistory" : a 1-by-nbselected matrix of doubles, the values of the criteria of the selected inputs, in the order where each variable was selected
path = stixbox_getpath (); csvfile = fullfile(path,"tests","unit_tests","regression","data_Linthurst-1979.csv"); separator = " "; decimal = "."; // BIO SAL pH K Na Zn data = csvRead(csvfile, separator, decimal); m = size(data,"r"); BIO = data(:,1); SAL = data(:,2); pH = data(:,3); K = data(:,4); Na = data(:,5); Zn = data(:,6); X = [ones(m,1) SAL pH K Na Zn]; y = BIO; // With default parameters mdl = stepwiselm(y,X) // With "backward" and "AIC" mdl = stepwiselm(y,X,"backward","AIC"); // With "both" mdl = stepwiselm(y,X,"both"); // Print outputs mdl = stepwiselm(y,X,"both",[],[],%t); // Search an order 2 model modelspec = 2; [X,multiindices]=stepwiselm_generate([SAL pH K Na Zn],modelspec); mdl = stepwiselm(y,X,"forward","AIC"); // Print the selection inputlabels = ["SAL" "pH" "K" "Na" "Zn"]; // Apply the selection on the model selectionindices = find(mdl.selection==1)'; multiindices = multiindices(selectionindices,:); str = stepwiselm_print(multiindices,inputlabels); disp(str) // Use this selection to make a prediction [B,bint,r,rint,stats,fullstats] = regress(y,mdl.Z); yPredicted = mdl.Z*B; // Make a plot to see the quality scf(); plot(y,yPredicted,"b.") plot(y,y,"r-") xlabel("Observations") ylabel("Predictions") title(msprintf("R2=%.2f%%",100*fullstats.R2)) | ![]() | ![]() |