“AIC/BIC not p” : comparing means using information criteria

A basic principle in Science is that of parsimony, or reducing complexity where possible, as typified in the application of Occam’s Razor.

William of Occam (or Ockham), a philosopher monk named after the English town that he came from, said something to the effect of ‘pluralitas non est ponenda sine necessitate’ (‘plurality should not be posited without necessity’). In other words, don’t increase, beyond what is necessary, the number of entities needed to explain something.

Occam’s Razor doesn’t necessarily mean that ‘less is always better’, it merely suggests that more complex models shouldn’t be used unless required, to increase model performance, for example. As is commonly, but probably mistakenly believed to have been proposed by Albert Einstein, ‘everything should be made as simple as possible, but not simpler’.


Common methods of measuring performance or ‘bang’, taking into account the cost, complexity or ‘buck’, are the Akaike Information Criterion (AIC), Bayesian or Schwarz Information Criterion (BIC), Minimum Message Length (MML) and Minimum Description Length (MDL).

Unlike the standard AIC, the latter three techniques take sample size into account, while MDL and MML also take the precision of the model estimates into account, but let’s just keep to the comparatively simpler AIC/BIC here.

An excellent new book by Thom Baguley ‘Serious Stats’ (serious in this case meaning powerful rather than scarey) http://seriousstats.wordpress.com/ shows how to do a t-test using AIC/BIC in SPSS and R.

I’ll do it here using Stata regression, the idea being to compare a null model (e.g. just the constant) with a model including the group. In this case we’re looking at the difference between headroom in American and ‘Foreign’ cars in 1978. (well, it’s Thursday night!).

Here’s the t-test results

(1978 Automobile Data)
Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
Domestic |      52    3.153846    .1269928    .9157578    2.898898    3.408795
Foreign |        22    2.613636     .103676    .4862837     2.39803      2.829242


Domestic has slightly bigger mean headroom (but also larger variation!), p value is 0.011, indicating that the probability of getting a difference in means as large as or larger than the one above (0.540), IF the null hypothesis, that the populations means are actually identical, holds, is around 1 in a 100.


Using the method shown in Dr Thom’s book (Stata implementation on my Downloadables page) we get


Akaike’s information criterion and Bayesian information criterion

Model |    Obs    ll(null)   ll(model)     df          AIC         BIC
nullmodel |     74   -92.12213   -92.12213      1     186.2443    188.5483
groupmodel |     74   -92.12213   -88.78075      2     181.5615    186.1696


AIC and BIC values are lower for the model including group, suggesting in this case that increasing complexity (the two groups), also commensurately increases performance (i.e. need to take into account the two group means for US and non-US cars, rather than assuming there’s just one common mean, or universal headroom)

Of course, things get a little more complex when comparing several means, having different variances etc (as the example above actually does, although means still “significantly” different when differences in variances taken into account using separate variance t-test).  Something to think about, and more info on applying AIC/BIC to variety of statistical methods can be found in refs below, particularly 3 and 5.


Further Reading (refs 2,3,4 and 5 are the most approachable, with Thom Baguley’s book referred to above, more approachable still)

      1. Akaike, H., A new look at the statistical model identification. IEEE Transactions on Automatic Control, 1974. 19: p. 716-723.
      2. Anderson, D.R., Model based inference in the life sciences: a primer on evidence. 2007, New York: Springer.
      3. Dayton, C.M., Information criteria for the paired-comparisons problem. American Statistician, 1998. 52: p. 144-151.
      4. Forsyth, R.S., D.D. Clarke, and R.L. Wright, Overfitting revisited : an information-theoretic approach to simplifying discrimination trees. Journal of Experimental and Artificial Intelligence, 1994. 6: p. 289-302.
      5. Sakamoto, Y., M. Ishiguro, and G. Kitagawa, Akaike information criterion statistics. 1986, Boston, MA: Dordrecht.
      6. Schwarz, G., Estimating the dimension of a model. Annals of Statistics, 1978. 6: p. 461-464.
      7. Wallace, C.S., Statistical and inductive inference by minimum message length. 2005, New York: Springer.
      8. Wallace, C.S. and D.M. Boulton, An information measure for classification. Computer Journal, 1968. 11: p. 185-194.







Hot Cross Buns: How Much Bang for the Buck?

Good Friday and Easter Monday are public holidays in Australia and UK (the former day is holiday in US  in 12 states). For many down here, including those who don’t pay much nevermind to symbols, Good Friday is traditionally the day to eat Hot Cross Buns. For the last few years, the Melbourne Age newspaper has rated a dozen such  buns for quality, as well as listing their price.




We would expect that, quality would increase, to some extent with price, although it would eventually flatten out (e.g. thrice as expensive doesn’t always mean thrice as good). Graphing programs such as Graphpad, Kaleidagraph and SigmaPlot, as well as R and most Stats packages, can readily fit a plethora of polynomial and other nonlinearities, but I used Stata to perform a preliminary scatterplot of the relationship between tasters’ score (out of 10)  and price per bun (A$), smoothed using Bill Cleveland’s locally weighted least squares Lowess/Loess algorithm. http://en.wikipedia.org/wiki/Lowess



The relationship shown above is vaguely linear or, rather, ‘monotonic’, at least until I can have a better go with some nonlinear routines.

A simple linear regression model accounts for around 42% of the variation in taste, in this small and hardly random sample, returning the equation y=1.71*unitprice+1.98, suggesting (at best) that subjective taste, not necessarily representing anyone in particular, increases by 1.7 with every dollar increase in unit price.

But the fun really begins when looking at the residuals, the difference between the actual taste score, and that predicted using the above model. Some buns had negative residuals, indicating (surprise surprise!) that their taste was (much) lower than expected, given their price. I won’t mention the negatives.

As to the positives, two bakeries, Woodfrog Bakery in St. Kilda (Melbourne, Australia) and  Candied Bakery in Spotswood (ditto), both cost $2.70 each and so were predicted to have a taste score out of 10 of 6.6, yet Woodfrog hopped in with an actual score 8.5 and Candied with an actual score of 8.


The results can’t be generalised, prove nothing at all, and mean extremely little, except to suggest that regression residuals can perhaps  be put to interesting uses, but please take care in trying this at home! Tread softly and carry a big (regression) book e.g Tabachnick and Fidell’s  Using Multivariate Statistics

(or the Manga Guide to Regression, when published!  http://www.nostarch.com/mg_regressionanalysis.htm)


A Probability Book your Gran & Grandad could read: David Hand’s “The Improbability Principle”

Most people have heard of, or have actually experienced, ‘strange coincidences’, of the ‘losing wedding ring on honeymoon in coastal village and then years later, when fishing, finding the ring in the belly of a trout’ variety. Sometimes, the story is helped along a little over the years, such as the 1911 demise of Green, Berry and Hill who’d murdered Sir Edmund Berry Godfrey on *Greenberry* Hill, as used in the opening sequence of the 1999 Magnolia movie featuring the late great Philip Seymour Hoffman. The murder, however actually took place in the 17th century, and on *Primrose* Hill, which was later renamed to Greenberry Hill.

Still, odd things do happen, leading many to wonder ‘wow and what’s the probability of that!’. Strange events can however occur without the need for ghostly Theremin music to suddenly play in the background, in that they’re actually merely examples of coincidence, helped along by human foibles.

Coincidences and foibles are entertainingly and educationally examined in Professor David Hand’s excellent new 2014 book ‘The Improbability Principle: why coincidences, miracles and rare events happen every day’.


Prof Hand is an Emeritus Professor of Mathematics at Imperial College London, who like fellow British Statistician Brian ‘Chance Rules OK’ Everitt, has been writing instructive as well as readable texts and general books for nigh on forty years.

The book is not scarily mathematical at all, and illustrates using cards, dice, marbles in urns etc, although it might have been fun in the book, or at least the book’s website, to have some actual exercises that more active readers could have undertaken, using dice, cards or electronic versions thereof, such as the free Java version of Simon and Bruce’s classic Resampling Stats software, known as Stats 101 http://www.statistics101.net/ (commercial Excel version available at http://www.resample.com/excel/)


All in all though, The Improbability Principle is not only highly readable, entertaining and inexpensive, it is an absolute snorter of a book, for a wide audience, including Uncles, Aunties, Grandmama’s and Grandpapa’s, and is thoroughly recommended!