Snappy Stepwise Regression

Stepwise regression, the technique that attempts to select a smaller subset of variables from a larger set by at each step choosing the ‘best’ or dropping the ‘worst’ was developed back in the late 1950’s by applied statisticians in the petroleum and automotive industries. With an ancestry like this, there’s no wonder that it is often regarded as the statistical version of the early 60’s Chev Corvair, at best only ‘driveable’ by expert careful users, or in Ralph Nader’s immortal words and title of his 1966 book  ‘Unsafe at Any Speed’.

Well maybe. But if used with cross-validation and good sense, it’s an old-tech standby to later model ‘lasso’ and ‘elastic net’ techniques. However, there’s an easy way for a bit of a softshoe shuffle of the old stepwise routine. See how well (preferably on a fresh set of data) forward entry with just one or, maybe two, or at most three variables do, compared with larger models. (SAS and SPSS allow the number of steps to be specified).

Of if you’d like to do some slightly fancier steps it in twotone spats, try a best subset regression (available in SAS, and SPSS through automatic linear, and Minitab and R etc), of all one variable combinations, two variables, three variables.

The inspiration for this is partly from Gerd Gigerenzer’s ‘take the best’ heuristic, taking the best cue or clue often beats more complex techniques including multiple regression etc. ‘Take the best’ is described in Prof Gigerenzer’s great new general book Risk Savvy: How to Make Good Decisions (Penguin, 2014) http://www.penguin.co.uk/nf/Book/BookDisplay/0,,9781846144745,00.html as well as his earlier academic books such as Simple Heuristics That Make Us Smart (Oxford University Press, 1999)

See if a good little model can do as well as a good (or bad) big ‘un!.

 

Further Future Reading

Draper NR, Smith H (1966) Applied regression analysis (and later editions). Wiley: New York.

Unknown's avatar

Author: Dr Dean McKenzie

I hold a BA(Honours) in Psychology from Deakin University, and much more recently, a PhD in Psychiatric Epidemiology (Classification & Regression Trees) from Monash University (2009) I have many years experience applying classical (e.g. ANOVA), contemporary (e.g. quantile regression) and data mining (e.g. trees, bagging, boosting, random forests) to psychological, medical and health data using Stata, IBM SPSS, Salford CART and open source Weka, as well as in statistical consulting, and advising people of many different levels of stats experience