When Boogie becomes Woogie, when Dog becomes Wolf

An exciting (and not just for statisticians!) area of application in statistics/analytics/data science relates to change/anomaly/outlier detection, the general notion of outliers (e.g. ‘unlikely’ values) having been covered in a previous post, looking at, amongst other things, very long pregnancies.

But tonight’s fr’instance comes from Fleming’s wonderful James Bond Jamaican adventure novel, Dr No, (also a jazzy 1962 movie) which talks of London Radio Security shutting down radio connections with secret agents, if a change in their message transmitting style is detected. This may have indicated that their radio had fallen into enemy hands.

To use a somewhat less exotic example, imagine someone, probably not James Bond, tenpin bowling and keeping track of their scores, this scenario coming from HJ Harrington et al’s excellent Statistical Analysis Simplified: the Easy-to-Understand Guide to SPC and Data Analysis (McGraw-Hill, 1998).

On the 10th week, the score suddenly drops more than three standard deviations (scatter or variation around the mean or average) below the mean.

Enemy agents? Forgotten bowling shoes? Too many milk shakes?

Once again, an anomaly or change, something often examined in industry (Statistical Process Control (SPC) and related areas) to determine the point at which, in the words of Tom Robbin’s great novel Even Cowgirls Get The Blues, ‘the boogie stopped and the woogie began’.

Sudden changes in operations & processes can happen, and so a usual everyday assembly line (‘dog’) can in milliseconds become the unusual, and possibly even dangerous (‘wolf’), at which point hopefully an alarm goes off and corrective action taken.

The basics of SPC were developed many years ago (and taken to Japan after WW2, a story in itself). Anomaly detection is a fast-growing area. For further experimentation / reading, a recent method based upon calculating the closeness of points to their neighbours is described in John Foreman’s marvellous DataSmart: using Data Science to Transform Information into Insight (Wiley, 2014).

We might want to determine if a credit card has been stolen on the basis of different spending patterns/places, or, to return to the opening example, detect an unauthorised intruder to a computer network (e.g. Clifford Stoll’s trailblazing The Cuckoo’s Egg: Tracking a Spy Through the Maze of Computer Espionage).

Finally, we might just want to figure out just exactly when it was that our bowling performance dropped off!

Telstar, Cortina & the Median Quartile Test: where were you in ’62?

It was 1962, the setting of the iconic 1973 movie American Graffiti, from which comes the subtitle of this post. The Beatles had released Love Me Do, their first single. That year also heard and saw Telstar, the eerie but joyful Claviolined Joe Meek instrumental by the Tornados, celebrating the circling communications private transatlantic television satellite it honoured. The British Ford Cortina, named after an Italian ski-resort saw out the humpty-dumpty rounded Prefects and 50’s Zephyrs, while in the US, the first of 50 beautiful, mysterious and largely lost Chrysler Ghia Turbine cars was driven in Detroit.

Meanwhile, the world of statistics was not to be outdone. Rainald Bauer’s Median Quartile test, an extension of Brown and Mood’s early 50’s Median Test, was published, in German, in 1962. The latter test, still available in statistics packages such as IBM SPSS, SAS and Stata simply compares groups on counts below and above the overall median, providing in the case of two groups, a two by two table.

The Median Quartile Test (MQT), as the name suggests, compares each group on the four quartiles.  But the MQT is largely unknown, mainly discussed in books and papers published in, or translated from, German.

The MQT conveys similar information to John Tukey’s boxplot, shows both analysts and their customers and colleagues where the data tend to fall, and provides a test of statistical significance to boot. Does one group show a preponderance of scores in the lower and upper quartiles for example, suggesting in the field of pharma fr’instance, that one group either gets much better or much worse.

A 1967 NASA English translation of the original 1962 Bauer paper is available in the Downloadables section of this site.

Recent Application in Journal of Cell Biology

Click to access 809.full.pdf

Further / Future reading

Bauer RK (1962) Der “Median-Quartile Test”… Metrika, 5, 1-16.

Von Eye A  et al (1996) The median quartiles test revisited. Studia Psychologica, 38, 79-84.

Simple Stats: Food, Friends, Families and F values

Way back when I was a young data analyst, there were limitations to the techniques available for analysing certain types of data. If the data involved counts, for example, there were certain types of transformation, and for repeated measurements over time, one needed ‘fiddle factors’ such as the G-G and H-F, or ‘scattergun’ mighty MANOVA approaches, that lacked in statistical power what they made up in firepower.

These days, even dear old SPSS has some sophisticated regression models, but whereas once there was a ‘trees not forest’ approach of a whole lot of basic tests, looking for ‘significant’ p values, rather than practical effect sizes and generality, now there’s complex ‘forest’ tests, without understanding the output, or even the question.

When talking about simplicity, analysts often recall the monk William of Occam and his “razor” (‘vain to do with more what can be done with fewer’) or misquote Albert Einstein, who probably never actually said ‘everything should be made as simple as possible, but not simpler’).

I like the ancient Greek, Epicurus of Athens, who was big on simple things like food, and friends and families, (although his name has come to be associated with a sort of false. hoggish hedonism, which defeats the purpose). I reckon we need to get a wooden table, some nice fresh food, jugs of (unfermented & fermented) grape, and after the important things like art and sport and the latest clips on Rage night music discussed, then talk about research questions, how they are to be answered, in what sensible but creative manner, so as to get back to other things.

We’d begin with graphical techniques, with the purpose of saying ‘aha’ or ‘Eureka’;  not ‘gosh’ or ‘wow’ or ‘huh?’. Building up with fundamental methods, then perhaps more complex methods if needed, we’d test our models on fresh samples, and looking at that, and effect sizes, as well as confidence intervals and p values. I reckon that’s the sort of data party that even old Epicurus might have attended! http://textpublishing.com.au/books-and-authors/book/travels-with-epicurus/

http://www.dkstatisticalconsulting.com/practical-statistics/  <great book for analysing counts etc using SPSS & Stata>