Hot Cross Buns: How Much Bang for the Buck?

Good Friday and Easter Monday are public holidays in Australia and UK (the former day is holiday in US  in 12 states). For many down here, including those who don’t pay much nevermind to symbols, Good Friday is traditionally the day to eat Hot Cross Buns. For the last few years, the Melbourne Age newspaper has rated a dozen such  buns for quality, as well as listing their price.

http://www.goodfood.com.au/good-food/search.html?ss=Good+Food&text=bunfight&type=

 

 

We would expect that, quality would increase, to some extent with price, although it would eventually flatten out (e.g. thrice as expensive doesn’t always mean thrice as good). Graphing programs such as Graphpad, Kaleidagraph and SigmaPlot, as well as R and most Stats packages, can readily fit a plethora of polynomial and other nonlinearities, but I used Stata to perform a preliminary scatterplot of the relationship between tasters’ score (out of 10)  and price per bun (A$), smoothed using Bill Cleveland’s locally weighted least squares Lowess/Loess algorithm. http://en.wikipedia.org/wiki/Lowess

easterbun14_lowess

 

The relationship shown above is vaguely linear or, rather, ‘monotonic’, at least until I can have a better go with some nonlinear routines.

A simple linear regression model accounts for around 42% of the variation in taste, in this small and hardly random sample, returning the equation y=1.71*unitprice+1.98, suggesting (at best) that subjective taste, not necessarily representing anyone in particular, increases by 1.7 with every dollar increase in unit price.

But the fun really begins when looking at the residuals, the difference between the actual taste score, and that predicted using the above model. Some buns had negative residuals, indicating (surprise surprise!) that their taste was (much) lower than expected, given their price. I won’t mention the negatives.

As to the positives, two bakeries, Woodfrog Bakery in St. Kilda (Melbourne, Australia) and  Candied Bakery in Spotswood (ditto), both cost $2.70 each and so were predicted to have a taste score out of 10 of 6.6, yet Woodfrog hopped in with an actual score 8.5 and Candied with an actual score of 8.

 

The results can’t be generalised, prove nothing at all, and mean extremely little, except to suggest that regression residuals can perhaps  be put to interesting uses, but please take care in trying this at home! Tread softly and carry a big (regression) book e.g Tabachnick and Fidell’s  Using Multivariate Statistics

(or the Manga Guide to Regression, when published!  http://www.nostarch.com/mg_regressionanalysis.htm)

 

Watch the Skies: Manga Regression!

Although it’s probably the technique most employed by statisticians, regression or at least multiple linear or multiple logistic regression, is often the concept that is most feared or misunderstood by students and newbie researchers. If someone you know is in the latter categories, and they would like a fun and straightforward introduction to regression that literally uses pictures (cartoons), announce that The Manga Guide to Regression book has now been published, around May 2016,  by the friendly folks at NoStarchPress    http://www.nostarch.com/regression

and available  on http://www.oreilly.com   http://www.amazon.com   http://www.bookdepository.com   http://www.dymocks.com.au etc

Authored by Shin Takahashi (Manga Guide to Statistics, 2008), the new book uses Manga http://en.wikipedia.org/wiki/Manga   (think Osamu Tezuka’s Jungle Emperor: made into the 1965 Japanese anime TV series of the same name and the 1966  US overdub ‘Kimba the White Lion’  http://en.wikipedia.org/wiki/Jungle_Emperor, as well as ‘Atomu’ / Astro Boy’ .

Okay, Kimba himself is not actually included in the Manga Regression  book, but there’s both linear and logistic regression, demonstrated using Microsoft Excel (just a slight gritting of teeth, go for Excel 2010/2013/2016 or higher,  but at least it might allow getting ‘down and dirty with data’.

 

Electric Stats: PSPP and SPSS

Most people use computer stats packages if they want to perform statistical or data analysis. One of the most popular packages, particularly in psychology and physiotherapy, is SPSS, now known as IBM SPSS. Although there is room for growth in some areas such as ‘robust regression’ (regression for handling data that may not follow the usual assumptions), IBM SPSS has many jazzy features / options such as decision trees and neural nets and Monte Carlo simulation, as well as all the old faves like ANOVA, t-tests and chi-square.

I love SPSS and have been using it since 1981, back when SPSS analyses had to be submitted to run after 11 pm (23:00) so as not to hog the ‘mainframe’ computer resources. Alas, as with Minitab, SAS and Stata and others, SPSS can be expensive if you’re not a student or academic. An open source alternative that is free as in sarsparilla and free as in speech, is GNU PSPP, which has nothing whatsoever to do with IBM or the former SPSS Inc.

PSPP has a syntax or command line / program interface for old school users such as myself, *and* a snazzy GUI or Graphic User Interface. Currently, it doesn’t have all the features that 1981 SPSS had (e.g. ‘two-way ANOVA’), let alone the more recent features, although it does have logistic regression for binary outcomes such as depressed / non depressed. PSPP is easy to use (easier than open source R and perhaps even R Commander, although nowhere near as powerful).

PSPP can handle most basic analyses, and is great for starters and those using a computer at a worksite etc where SPSS is not installed, but need to run basic analyses or test syntax. The PSPP team is to be congratulated!

http://www.gnu.org/software/pspp/   free, open-source PSPP

http://www-01.ibm.com/software/analytics/spss/  IBM SPSS

(students and academics can obtain less expensive versions of IBM SPSS from http://onthehub.com)