Snappy Stepwise Regression

Stepwise regression, the technique that attempts to select a smaller subset of variables from a larger set by at each step choosing the ‘best’ or dropping the ‘worst’ was developed back in the late 1950’s by applied statisticians in the petroleum and automotive industries. With an ancestry like this, there’s no wonder that it is often regarded as the statistical version of the early 60’s Chev Corvair, at best only ‘driveable’ by expert careful users, or in Ralph Nader’s immortal words and title of his 1966 book  ‘Unsafe at Any Speed’.

Well maybe. But if used with cross-validation and good sense, it’s an old-tech standby to later model ‘lasso’ and ‘elastic net’ techniques. However, there’s an easy way for a bit of a softshoe shuffle of the old stepwise routine. See how well (preferably on a fresh set of data) forward entry with just one or, maybe two, or at most three variables do, compared with larger models. (SAS and SPSS allow the number of steps to be specified).

Of if you’d like to do some slightly fancier steps it in twotone spats, try a best subset regression (available in SAS, and SPSS through automatic linear, and Minitab and R etc), of all one variable combinations, two variables, three variables.

The inspiration for this is partly from Gerd Gigerenzer’s ‘take the best’ heuristic, taking the best cue or clue often beats more complex techniques including multiple regression etc. ‘Take the best’ is described in Prof Gigerenzer’s great new general book Risk Savvy: How to Make Good Decisions (Penguin, 2014) http://www.penguin.co.uk/nf/Book/BookDisplay/0,,9781846144745,00.html as well as his earlier academic books such as Simple Heuristics That Make Us Smart (Oxford University Press, 1999)

See if a good little model can do as well as a good (or bad) big ‘un!.

 

Further Future Reading

Draper NR, Smith H (1966) Applied regression analysis (and later editions). Wiley: New York.

John and Betty’s Journey into Statistics Packages*

In past days of our lives, those who wanted to learn a stats package, would attend courses, and bail up/bake cakes for statisticians, but would mainly raise the drawbridge, lock the computer lab door and settle down with the VT100 terminal or Apple II or IBM PC and a copy of the brown or update blue SPSS Manual, or whatever.

Nowadays, folks tend to look things up on the web, something of a mixed blessing, and so maybe software consultants will now say LIUOTFW (‘Look It Up On The Flipping Web’) rather than the late, great RYFM (‘Read Your Flipping Manual’).

And yes, there are some great websites, and great online documentation supplied by the software venders, but there are also some great books, available in electronic and print form. A list of three of the many wonderful texts available for each package (IBM SPSS, SAS, Stata, R and Minitab) can be downloaded from the Downloadables section on this site.

IBM SPSS (in particular), R (ever growing), and to a slightly lesser extent SAS, seem to have the best range of primers and introductory texts.
IMHO though, Stata could do with a new colourful, fun primer (not necessarily a Dummies Guide, although there’s Roberto Pedace’s Econometrics for Dummies (Wiley, New York, 2013) which features Stata), perhaps one by Andy Field, who has already done superb books on SPSS, R and SAS.

While up on the soapbox, I reckon Minitab could do with a new primer for Psychologists / Social Scientists, much like that early ripsnorter by Ray Watson, Pip Pattison and Sue Finch, Beginning Statistics for Psychology (Prentice Hall, Sydney, 1993).

Anyway, in memories of days gone by, brew a pot of coffee or tea, unplug email, turn off the phone and the mobile/cell, and settle in for an initial night’s journey, on a set or two of real and interesting data, with a good stats package book, or two!

*(The title of this post riffs off the improbably boring and stereotyped 1950’s early readers still used in Victorian primary (grade) schools in the 1960’s
http://nla.gov.au/nla.aus-vn4738114 (think Dick and Jane, or Alice and Jerry), as well as the far more entertaining and recent John and Betty’s Journey into Complex Numbers by Matt Bower http://www.slideshare.net/aus_autarch/john-and-betty )

Minitab 17: think Mini Cooper, not Minnie Mouse

As it has been 3 or 4 years since the previous version, the new release of Minitab 17 statistical package is surely cause for rejoicing, merriment, and an extra biscuit with a strong cup of tea.

At one of the centres where I work, the data analysts sit at the same lunch table, but are known by their packages, the Stata people, the SAS person, the R person, the SPSS person and so on. No Minitab person as yet, but maybe there should be. Not only for its easy to use graphics, mentioned in a previous post, but for its all round interface, programmability (Minitab syntax looks a little like that great Kemeny-Kurtz language from 1964 Dartmouth College, BASIC, but more powerful), and a few new features (Poisson regression for relative risks & counted data, although alas no negative binomial regression for trickier counted data), and even better graphics.

Bubble plots, Outlier tests, and the Box-Cox transformation (another great collaboration from 1964), Minitab was also one of the first packages to include Exploratory Data Analysis (e.g. box plots and smoothed regression), for when the data are about as well-behaved as the next door neighbours strung out on espresso coffee mixed with red cordial.

Not as much cachet for when the R and SAS programmers come a-swaggering in, but still worth recommending for those who may not be getting as much as they should be out of SPSS, particularly for graphics, yet find the other packages a little too high to climb.

http://www.minitab.com/en-us/

SecretSource: of Minitab and Dataviz

When the goers go and the stayers stay, when shirts loosen and tattoos glisten, it’s time for the statisticians and the miners and the data scientists to talk, and walk, Big Iron.

R. S-Plus. SAS. Tableau. Stata. GnuPlot. Mondrian. DataDesk. Minitab.   MINITAB?????? Okay, we’ll leave the others to get back to their arm wrasslin’, but if you want to produce high quality graphs, simply, readily and quickly, then Minitab could be for you.

A commercialized version of Omnitab, Minitab appeared in Philadelphia in 1972 and has long been associated with students learning stats, but also now with business, industrial and medical/health quality management and six sigma, etc. There’s some  other real ‘rough and tumble’ applications involving Minitab – DR Helsell’s ‘Statistics for Censored Environmental Data using Minitab and R’ (Wiley 2012), for instance.

IBM SPSS and Microsoft Excel can produce good graphs (‘good’ in the ‘good sense’ of John Tukey , Edward Tufte, William Cleveland, Howard Wainer, Stephen Few & Nathan Yau etc etc), with the soft pedal down and ‘caution switches’ on, but Minitab is probably going to be easier.

For example, the Statistical Consulting Centre at the University of Melbourne uses Minitab for most of its graphs (R for the trickiest ones). As well as general short courses on Minitab, R, SPSS and GenStat there’s a one day course in Minitab graphics in November, which I’ve done and can recommend.

More details on the Producing Excellent Graphics Simply (PEGS) course using Minitab at Melbourne are at

http://www.scc.ms.unimelb.edu.au/pegs.html

student and academic pricing for Minitab is at http://onthehub.com/

What, I wonder, would Florence Nightingale have used for graphic software if she was alive today???