“AIC/BIC not p” : comparing means using information criteria

A basic principle in Science is that of parsimony, or reducing complexity where possible, as typified in the application of Occam’s Razor.

William of Occam (or Ockham), a philosopher monk named after the English town that he came from, said something to the effect of ‘pluralitas non est ponenda sine necessitate’ (‘plurality should not be posited without necessity’). In other words, don’t increase, beyond what is necessary, the number of entities needed to explain something.

Occam’s Razor doesn’t necessarily mean that ‘less is always better’, it merely suggests that more complex models shouldn’t be used unless required, to increase model performance, for example. As is commonly, but probably mistakenly believed to have been proposed by Albert Einstein, ‘everything should be made as simple as possible, but not simpler’.

 

Common methods of measuring performance or ‘bang’, taking into account the cost, complexity or ‘buck’, are the Akaike Information Criterion (AIC), Bayesian or Schwarz Information Criterion (BIC), Minimum Message Length (MML) and Minimum Description Length (MDL).

Unlike the standard AIC, the latter three techniques take sample size into account, while MDL and MML also take the precision of the model estimates into account, but let’s just keep to the comparatively simpler AIC/BIC here.

An excellent new book by Thom Baguley ‘Serious Stats’ (serious in this case meaning powerful rather than scarey) http://seriousstats.wordpress.com/ shows how to do a t-test using AIC/BIC in SPSS and R.

I’ll do it here using Stata regression, the idea being to compare a null model (e.g. just the constant) with a model including the group. In this case we’re looking at the difference between headroom in American and ‘Foreign’ cars in 1978. (well, it’s Thursday night!).

Here’s the t-test results

(1978 Automobile Data)
——————————————————————————
Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
———+——————————————————————–
Domestic |      52    3.153846    .1269928    .9157578    2.898898    3.408795
Foreign |        22    2.613636     .103676    .4862837     2.39803      2.829242

 

Domestic has slightly bigger mean headroom (but also larger variation!), p value is 0.011, indicating that the probability of getting a difference in means as large as or larger than the one above (0.540), IF the null hypothesis, that the populations means are actually identical, holds, is around 1 in a 100.

 

Using the method shown in Dr Thom’s book (Stata implementation on my Downloadables page) we get

 

Akaike’s information criterion and Bayesian information criterion

—————————————————————————–
Model |    Obs    ll(null)   ll(model)     df          AIC         BIC
————-+—————————————————————
nullmodel |     74   -92.12213   -92.12213      1     186.2443    188.5483
groupmodel |     74   -92.12213   -88.78075      2     181.5615    186.1696

 

AIC and BIC values are lower for the model including group, suggesting in this case that increasing complexity (the two groups), also commensurately increases performance (i.e. need to take into account the two group means for US and non-US cars, rather than assuming there’s just one common mean, or universal headroom)

Of course, things get a little more complex when comparing several means, having different variances etc (as the example above actually does, although means still “significantly” different when differences in variances taken into account using separate variance t-test).  Something to think about, and more info on applying AIC/BIC to variety of statistical methods can be found in refs below, particularly 3 and 5.

 

Further Reading (refs 2,3,4 and 5 are the most approachable, with Thom Baguley’s book referred to above, more approachable still)

      1. Akaike, H., A new look at the statistical model identification. IEEE Transactions on Automatic Control, 1974. 19: p. 716-723.
      2. Anderson, D.R., Model based inference in the life sciences: a primer on evidence. 2007, New York: Springer.
      3. Dayton, C.M., Information criteria for the paired-comparisons problem. American Statistician, 1998. 52: p. 144-151.
      4. Forsyth, R.S., D.D. Clarke, and R.L. Wright, Overfitting revisited : an information-theoretic approach to simplifying discrimination trees. Journal of Experimental and Artificial Intelligence, 1994. 6: p. 289-302.
      5. Sakamoto, Y., M. Ishiguro, and G. Kitagawa, Akaike information criterion statistics. 1986, Boston, MA: Dordrecht.
      6. Schwarz, G., Estimating the dimension of a model. Annals of Statistics, 1978. 6: p. 461-464.
      7. Wallace, C.S., Statistical and inductive inference by minimum message length. 2005, New York: Springer.
      8. Wallace, C.S. and D.M. Boulton, An information measure for classification. Computer Journal, 1968. 11: p. 185-194.

 

      —————————————————————————–

 

 

 

 

Simple Stats: Food, Friends, Families and F values

Way back when I was a young data analyst, there were limitations to the techniques available for analysing certain types of data. If the data involved counts, for example, there were certain types of transformation, and for repeated measurements over time, one needed ‘fiddle factors’ such as the G-G and H-F, or ‘scattergun’ mighty MANOVA approaches, that lacked in statistical power what they made up in firepower.

These days, even dear old SPSS has some sophisticated regression models, but whereas once there was a ‘trees not forest’ approach of a whole lot of basic tests, looking for ‘significant’ p values, rather than practical effect sizes and generality, now there’s complex ‘forest’ tests, without understanding the output, or even the question.

When talking about simplicity, analysts often recall the monk William of Occam and his “razor” (‘vain to do with more what can be done with fewer’) or misquote Albert Einstein, who probably never actually said ‘everything should be made as simple as possible, but not simpler’).

I like the ancient Greek, Epicurus of Athens, who was big on simple things like food, and friends and families, (although his name has come to be associated with a sort of false. hoggish hedonism, which defeats the purpose). I reckon we need to get a wooden table, some nice fresh food, jugs of (unfermented & fermented) grape, and after the important things like art and sport and the latest clips on Rage night music discussed, then talk about research questions, how they are to be answered, in what sensible but creative manner, so as to get back to other things.

We’d begin with graphical techniques, with the purpose of saying ‘aha’ or ‘Eureka’;  not ‘gosh’ or ‘wow’ or ‘huh?’. Building up with fundamental methods, then perhaps more complex methods if needed, we’d test our models on fresh samples, and looking at that, and effect sizes, as well as confidence intervals and p values. I reckon that’s the sort of data party that even old Epicurus might have attended! http://textpublishing.com.au/books-and-authors/book/travels-with-epicurus/

http://www.dkstatisticalconsulting.com/practical-statistics/  <great book for analysing counts etc using SPSS & Stata>