Hot Cross Buns: How Much Bang for the Buck?

Good Friday and Easter Monday are public holidays in Australia and UK (the former day is holiday in US  in 12 states). For many down here, including those who don’t pay much nevermind to symbols, Good Friday is traditionally the day to eat Hot Cross Buns. For the last few years, the Melbourne Age newspaper has rated a dozen such  buns for quality, as well as listing their price.

http://www.goodfood.com.au/good-food/search.html?ss=Good+Food&text=bunfight&type=

 

 

We would expect that, quality would increase, to some extent with price, although it would eventually flatten out (e.g. thrice as expensive doesn’t always mean thrice as good). Graphing programs such as Graphpad, Kaleidagraph and SigmaPlot, as well as R and most Stats packages, can readily fit a plethora of polynomial and other nonlinearities, but I used Stata to perform a preliminary scatterplot of the relationship between tasters’ score (out of 10)  and price per bun (A$), smoothed using Bill Cleveland’s locally weighted least squares Lowess/Loess algorithm. http://en.wikipedia.org/wiki/Lowess

easterbun14_lowess

 

The relationship shown above is vaguely linear or, rather, ‘monotonic’, at least until I can have a better go with some nonlinear routines.

A simple linear regression model accounts for around 42% of the variation in taste, in this small and hardly random sample, returning the equation y=1.71*unitprice+1.98, suggesting (at best) that subjective taste, not necessarily representing anyone in particular, increases by 1.7 with every dollar increase in unit price.

But the fun really begins when looking at the residuals, the difference between the actual taste score, and that predicted using the above model. Some buns had negative residuals, indicating (surprise surprise!) that their taste was (much) lower than expected, given their price. I won’t mention the negatives.

As to the positives, two bakeries, Woodfrog Bakery in St. Kilda (Melbourne, Australia) and  Candied Bakery in Spotswood (ditto), both cost $2.70 each and so were predicted to have a taste score out of 10 of 6.6, yet Woodfrog hopped in with an actual score 8.5 and Candied with an actual score of 8.

 

The results can’t be generalised, prove nothing at all, and mean extremely little, except to suggest that regression residuals can perhaps  be put to interesting uses, but please take care in trying this at home! Tread softly and carry a big (regression) book e.g Tabachnick and Fidell’s  Using Multivariate Statistics

(or the Manga Guide to Regression, when published!  http://www.nostarch.com/mg_regressionanalysis.htm)

 

Expected Unexpected: Power bands, performance curves, rogue waves and black swans

Many years ago, I had a ride of a Kawasaki 500 Mach III 2-stroke motorcycle, which along with its even more horrendous 750cc version was known as the ‘widow-maker’. It was incredibly fast in a straight line, but if it went around corners at all, the rider had long since fallen (or jumped) off!

It also had a very narrow ‘power band’ http://en.wikipedia.org/wiki/Power_band, in that it would have no real power until about 7,000 revs per minute, and then all of a sudden it would whoop and holler like the proverbial bat out of hell, the front wheel would lift, the rider’s jaw drop, and well, you get the idea! In statistical terms, this was a nonlinear relationship between twisting the throttle and the available power.

A somewhat less dramatic example of a nonlinear effect is the Yerkes-Dodson ‘law’ http://en.wikipedia.org/wiki/Yerkes%E2%80%93Dodson_law, in which optimum task performance is associated with medium levels of arousal (too much arousal = the ‘heebie-jeebies’, too little = ‘half asleep’).

Various simple & esoteric methods for finding global (follows a standard pattern such as a U shape, or upside down U) or local (different parts of the data might be better explained by different models, rather than ‘one size fits all’) relationships exist. A popular ‘local’ method is known as a ‘spline’ after the flexible metal ruler that draftspeople once fitted curves with. The ‘GT’ version, Multivariate Adaptive Regression Splines http://en.wikipedia.org/wiki/Multivariate_adaptive_regression_splines. is available in R (itself a little reminiscent of a Mach III cycle at times!),  the big-iron ‘1960’s 390 cubic inch Ford Galaxie V8′ of the SAS statistical package and the original, sleek ‘Ferrari V12’ Salford Systems version.

Other nonlinear methods are available http://en.wikipedia.org/wiki/Loess_curve, but the thing to remember is that life doesn’t always fit within the lines, or follow some human’s idea of a ‘natural law’.

For example, freak or rogue waves, that can literally break supertankers in half, were observed for centuries by mariners but are only recently accepted by shore-bound scientists, similarly the black swans (actually native to Australia) of the stock market http://www.fooledbyrandomness.com/

When analysing data, fitting models, (or riding motorcycles), please be careful!