Give p’s a Chance? Hoochie Coochie Hypothesis Tests

A common request for jobbing analysts is to ‘run these results through the computer and see if they’re significant’. Now, unfortunately, many folk, including scarily, even lecturers in our craft, have a misconception as to what ‘significance’ actually means.

Shout in a desperate monotone “it’s the probability of getting a result as large as, or larger than, what we would obtain if the ‘null hypothesis’ of no difference or association was actually true” and people look flummoxed, yes flummoxed, as if you were speaking to them in the language of the ancient Huns, (another) language no-one has been able to figure out.

True, testing ‘something’ against the concept of ‘nothing’ is a bit kooky. If we really did have a situation where two groups ended up with identical averages we’d think it was a trifle dodgy to say the least.

And as for the notion of effect sizes! Picture, on an enchanted desert isle, two group means of 131.5 and 130, with a pooled standard deviation (sd) of 15. A difference of 1.5 divided by 15, is a Cohen’s (the late great Jacob Cohen; Cohen’s kappa, populariser of power analysis, maven of multiple regression) effect size of 0.10, where given Jack’s arbitrary but conventional guidelines for mean differences, 0.20 is a small effect size, 0.50 medium, 0.80 large.

Using an online calculator e.g.

http://www.graphpad.com/quickcalcs/ttest1/

we find, that if there were 1000 in each group, the t test value would be 2.24 and our p value 0.026.

Voila, Eureka, Significance, as cook smiles and puts an extra dollop of custard on our pudding!

But if we ‘only’ had 100 in each group, our t value would be 0.71, our p value would be 0.48, and there’d be a sigh, a frown, a closing of doors and a grim faced cook doling out the thrice-boiled cabbage….

But they’re the same means, the same sd, and the same effect size!

Coming Up:  Guest Post on a possible, probable, Salvation.

Further/Future reading

G Cumming (2014) How significant is P? Australasian Science, March 2014. p. 37.

http://www.australasianscience.com.au/article/issue-march-2014/how-significant-p.html

also check out Prof G’s website

http://www.latrobe.edu.au/psy/research/cognitive-and-developmental-psychology/esci

with free Excel ESCI program and details of his illuminating 2012 book ‘The New Statistics’.

Now, back to honest resting from honest labour!

dogs and wolves postscript: Truth Stronger Than Fiction?

(travelling home on the tram after writing previous post, in the amazing State Library of Victoria.

thinking about dogs and wolves, from the French, Entre Chien et Loup, the hour between dog and wolf.  Dusk or twilight, or metaphorically the time between the familiar and the much less familiar….

out of the corner of my eye I noticed a young couple playing with a puppy, in one of their coat pockets.

but on closer inspection, that was no puppy, dog, or even a little wolf.

It was a *ferret*.   (little weasel, polecat creature, not the sort of thing you’d want in a pocket, or on a crowded conveyance)

now what’s the probability of That?

When Boogie becomes Woogie, when Dog becomes Wolf

An exciting (and not just for statisticians!) area of application in statistics/analytics/data science relates to change/anomaly/outlier detection, the general notion of outliers (e.g. ‘unlikely’ values) having been covered in a previous post, looking at, amongst other things, very long pregnancies.

But tonight’s fr’instance comes from Fleming’s wonderful James Bond Jamaican adventure novel, Dr No, (also a jazzy 1962 movie) which talks of London Radio Security shutting down radio connections with secret agents, if a change in their message transmitting style is detected. This may have indicated that their radio had fallen into enemy hands.

To use a somewhat less exotic example, imagine someone, probably not James Bond, tenpin bowling and keeping track of their scores, this scenario coming from HJ Harrington et al’s excellent Statistical Analysis Simplified: the Easy-to-Understand Guide to SPC and Data Analysis (McGraw-Hill, 1998).

On the 10th week, the score suddenly drops more than three standard deviations (scatter or variation around the mean or average) below the mean.

Enemy agents? Forgotten bowling shoes? Too many milk shakes?

Once again, an anomaly or change, something often examined in industry (Statistical Process Control (SPC) and related areas) to determine the point at which, in the words of Tom Robbin’s great novel Even Cowgirls Get The Blues, ‘the boogie stopped and the woogie began’.

Sudden changes in operations & processes can happen, and so a usual everyday assembly line (‘dog’) can in milliseconds become the unusual, and possibly even dangerous (‘wolf’), at which point hopefully an alarm goes off and corrective action taken.

The basics of SPC were developed many years ago (and taken to Japan after WW2, a story in itself). Anomaly detection is a fast-growing area. For further experimentation / reading, a recent method based upon calculating the closeness of points to their neighbours is described in John Foreman’s marvellous DataSmart: using Data Science to Transform Information into Insight (Wiley, 2014).

We might want to determine if a credit card has been stolen on the basis of different spending patterns/places, or, to return to the opening example, detect an unauthorised intruder to a computer network (e.g. Clifford Stoll’s trailblazing The Cuckoo’s Egg: Tracking a Spy Through the Maze of Computer Espionage).

Finally, we might just want to figure out just exactly when it was that our bowling performance dropped off!

Telstar, Cortina & the Median Quartile Test: where were you in ’62?

It was 1962, the setting of the iconic 1973 movie American Graffiti, from which comes the subtitle of this post. The Beatles had released Love Me Do, their first single. That year also heard and saw Telstar, the eerie but joyful Claviolined Joe Meek instrumental by the Tornados, celebrating the circling communications private transatlantic television satellite it honoured. The British Ford Cortina, named after an Italian ski-resort saw out the humpty-dumpty rounded Prefects and 50’s Zephyrs, while in the US, the first of 50 beautiful, mysterious and largely lost Chrysler Ghia Turbine cars was driven in Detroit.

Meanwhile, the world of statistics was not to be outdone. Rainald Bauer’s Median Quartile test, an extension of Brown and Mood’s early 50’s Median Test, was published, in German, in 1962. The latter test, still available in statistics packages such as IBM SPSS, SAS and Stata simply compares groups on counts below and above the overall median, providing in the case of two groups, a two by two table.

The Median Quartile Test (MQT), as the name suggests, compares each group on the four quartiles.  But the MQT is largely unknown, mainly discussed in books and papers published in, or translated from, German.

The MQT conveys similar information to John Tukey’s boxplot, shows both analysts and their customers and colleagues where the data tend to fall, and provides a test of statistical significance to boot. Does one group show a preponderance of scores in the lower and upper quartiles for example, suggesting in the field of pharma fr’instance, that one group either gets much better or much worse.

A 1967 NASA English translation of the original 1962 Bauer paper is available in the Downloadables section of this site.

Recent Application in Journal of Cell Biology

Click to access 809.full.pdf

Further / Future reading

Bauer RK (1962) Der “Median-Quartile Test”… Metrika, 5, 1-16.

Von Eye A  et al (1996) The median quartiles test revisited. Studia Psychologica, 38, 79-84.

Minitab 17: think Mini Cooper, not Minnie Mouse

As it has been 3 or 4 years since the previous version, the new release of Minitab 17 statistical package is surely cause for rejoicing, merriment, and an extra biscuit with a strong cup of tea.

At one of the centres where I work, the data analysts sit at the same lunch table, but are known by their packages, the Stata people, the SAS person, the R person, the SPSS person and so on. No Minitab person as yet, but maybe there should be. Not only for its easy to use graphics, mentioned in a previous post, but for its all round interface, programmability (Minitab syntax looks a little like that great Kemeny-Kurtz language from 1964 Dartmouth College, BASIC, but more powerful), and a few new features (Poisson regression for relative risks & counted data, although alas no negative binomial regression for trickier counted data), and even better graphics.

Bubble plots, Outlier tests, and the Box-Cox transformation (another great collaboration from 1964), Minitab was also one of the first packages to include Exploratory Data Analysis (e.g. box plots and smoothed regression), for when the data are about as well-behaved as the next door neighbours strung out on espresso coffee mixed with red cordial.

Not as much cachet for when the R and SAS programmers come a-swaggering in, but still worth recommending for those who may not be getting as much as they should be out of SPSS, particularly for graphics, yet find the other packages a little too high to climb.

http://www.minitab.com/en-us/

Simple Stats: Food, Friends, Families and F values

Way back when I was a young data analyst, there were limitations to the techniques available for analysing certain types of data. If the data involved counts, for example, there were certain types of transformation, and for repeated measurements over time, one needed ‘fiddle factors’ such as the G-G and H-F, or ‘scattergun’ mighty MANOVA approaches, that lacked in statistical power what they made up in firepower.

These days, even dear old SPSS has some sophisticated regression models, but whereas once there was a ‘trees not forest’ approach of a whole lot of basic tests, looking for ‘significant’ p values, rather than practical effect sizes and generality, now there’s complex ‘forest’ tests, without understanding the output, or even the question.

When talking about simplicity, analysts often recall the monk William of Occam and his “razor” (‘vain to do with more what can be done with fewer’) or misquote Albert Einstein, who probably never actually said ‘everything should be made as simple as possible, but not simpler’).

I like the ancient Greek, Epicurus of Athens, who was big on simple things like food, and friends and families, (although his name has come to be associated with a sort of false. hoggish hedonism, which defeats the purpose). I reckon we need to get a wooden table, some nice fresh food, jugs of (unfermented & fermented) grape, and after the important things like art and sport and the latest clips on Rage night music discussed, then talk about research questions, how they are to be answered, in what sensible but creative manner, so as to get back to other things.

We’d begin with graphical techniques, with the purpose of saying ‘aha’ or ‘Eureka’;  not ‘gosh’ or ‘wow’ or ‘huh?’. Building up with fundamental methods, then perhaps more complex methods if needed, we’d test our models on fresh samples, and looking at that, and effect sizes, as well as confidence intervals and p values. I reckon that’s the sort of data party that even old Epicurus might have attended! http://textpublishing.com.au/books-and-authors/book/travels-with-epicurus/

http://www.dkstatisticalconsulting.com/practical-statistics/  <great book for analysing counts etc using SPSS & Stata>

Expected Unexpected: Power bands, performance curves, rogue waves and black swans

Many years ago, I had a ride of a Kawasaki 500 Mach III 2-stroke motorcycle, which along with its even more horrendous 750cc version was known as the ‘widow-maker’. It was incredibly fast in a straight line, but if it went around corners at all, the rider had long since fallen (or jumped) off!

It also had a very narrow ‘power band’ http://en.wikipedia.org/wiki/Power_band, in that it would have no real power until about 7,000 revs per minute, and then all of a sudden it would whoop and holler like the proverbial bat out of hell, the front wheel would lift, the rider’s jaw drop, and well, you get the idea! In statistical terms, this was a nonlinear relationship between twisting the throttle and the available power.

A somewhat less dramatic example of a nonlinear effect is the Yerkes-Dodson ‘law’ http://en.wikipedia.org/wiki/Yerkes%E2%80%93Dodson_law, in which optimum task performance is associated with medium levels of arousal (too much arousal = the ‘heebie-jeebies’, too little = ‘half asleep’).

Various simple & esoteric methods for finding global (follows a standard pattern such as a U shape, or upside down U) or local (different parts of the data might be better explained by different models, rather than ‘one size fits all’) relationships exist. A popular ‘local’ method is known as a ‘spline’ after the flexible metal ruler that draftspeople once fitted curves with. The ‘GT’ version, Multivariate Adaptive Regression Splines http://en.wikipedia.org/wiki/Multivariate_adaptive_regression_splines. is available in R (itself a little reminiscent of a Mach III cycle at times!),  the big-iron ‘1960’s 390 cubic inch Ford Galaxie V8′ of the SAS statistical package and the original, sleek ‘Ferrari V12’ Salford Systems version.

Other nonlinear methods are available http://en.wikipedia.org/wiki/Loess_curve, but the thing to remember is that life doesn’t always fit within the lines, or follow some human’s idea of a ‘natural law’.

For example, freak or rogue waves, that can literally break supertankers in half, were observed for centuries by mariners but are only recently accepted by shore-bound scientists, similarly the black swans (actually native to Australia) of the stock market http://www.fooledbyrandomness.com/

When analysing data, fitting models, (or riding motorcycles), please be careful!