Who gives a toss: the statistics of coins

Spring is here in Melbourne, and a time for fashionable horse racing, including The Melbourne Cup in November., once attended by Mark Twain. Australia is also home of the “two-up” coin tossing game (descended from the British pitch and toss), played in outback pubs, hidden city lanes and now Australian casino’s, described in great old Australian novels such as Come In Spinner, and the eerie book and 1971 movie Wake in Fright (aka Outback).

In the 18th century, the Comte de Buffon obtained 2048 heads from 4040 tosses, while more recently and not to be outdone the statistician Karl Pearson obtained 12,012 heads out of 24,000 tosses (The Jungles of Randomness by Ivars Peterson, 1998). Of course a misunderstanding of the law of large numbers or so-called law of averages, makes the uninitiated think that if there’s say seven heads in a row, a cosmic force will decide “hang on that coin is coming up heads more than 50%, better make the next one a tail”).

While it doesn’t look at two-up, “Digital Dice” by the always entertaining Paul Nahin (2008) examines a tricky coin-tossing problem posed in 1941 and not solved until 1966. Prof Paul shows how to solve it using a computer-based Monte Carlo method, itself named after that famous casino in Monaco, where James Bond correctly observed that “the cards have no memory”.

And who says stats isn’t relevant?!

Applied Australian Change-Point Analysis: Before the Shark Gets Jumped?

Ok I saw the (in)famous Season 5 Episode 3 “Jump the Shark” episode of Happy Days (when Fonzie water skiis over a shark pool) when I was 18 back in 1977, and hated it.

Definitely Uncool.
But one Saturday morning a month or two ago I saw it again and loved it. It’s wild! It’s glorious!

The term has come to mean the point at which a TV series goes down hill, when the wolf becomes a dog, to riff on a previous post.


Anyhow, Australia’s Professor Kerrie Mengersen and Dr Hassen Assareh have developed a snazzy new Bayesian Markov Chain Monte Carlo procedure for working out the change-point in a process, specifically the point where a key change happened to a hospital patient’s condition for example. Helping to identify the ‘why’, as well as the ‘when’.


It’s a great idea and yet another instance of how Statistics can help save the world, again!

When I grow up, I’m gonna be a Statistician!

How many of us said that, I wonder? Rather than children dressing up as sheriffs or doctors or possibly even scientists (?), how many dressed up like Statisticians? Did anyone even know much about Statisticians then? Mathematicians yes, they were sort of nerdy (although that word wasn’t around when I was a kid) but could do important things, like calculate odds of winning at Las Vegas or horse racing, and the chance of thermonuclear war.


But when I was young, inspired by Get Smart and The Man from UNCLE and James Bond I  mainly wanted to be a secret agent! I played with the idea of  becoming a private detective, sorry investigator, for a while until I found out that in real life, as opposed to TvLand and BookWorld, they mainly seemed to be involved in divorces. So, when I was in my very early teens, I toyed with the idea of joining the FBI. As an Australian citizen, this would have been rather difficult, I would have had to become a US citizen,  as well as either a lawyer or an accountant first. So I put that idea in the ‘too hard basket’. (Imagine, a lawyer or an accountant!).

Well, I suppose it shows evidence of an inquiring mind. Further steps, trots, canters and gallops along the road to Statistics is a story for another time. But there were a couple of ‘residuals’ from that childhood long ago. Asking questions, even if  no one else was. The desire to do the right thing, and wear the right colour hat (even if in truth the Jack Palance baddie wearing black was far cooler/groovier/jazzier in the Shane movie, although not the book, than the light coloured cloth-wearing goodie, Alan Ladd).

And a 1963 book which I got for Christmas a year or two later, called The How and Why Wonder Book of Robots and Electronic Brains. I still have that book and I cited it in my PhD Thesis, although back then I was more interested in the robots, especially the black and red tin ones that could be wound up with a key!


But it was a 1979 Texas Instruments TI-55 (simple) programmable LED calculator I got for my 21st, that came with quite a thick manual, showing how one could do fun things like predicting future sales from advertising expenditure, that gave much more excitement, practicality and crunch to the Psych 101 Stats that I was undertaking.


And then, in the early summer of 1981 when I first used SPSS (submitted to be ran at 2300 hours) on a DEC System 20-60 I was truly hooked.

True, James Bond had his Beretta and Walther PPK and Aston Martin and Bentley and Sea Island shirts and Shaken Not Stirred, but at least in the early days, he never used a programmable calculator, let alone a Computer!


Snappy Stepwise Regression

Stepwise regression, the technique that attempts to select a smaller subset of variables from a larger set by at each step choosing the ‘best’ or dropping the ‘worst’ was developed back in the late 1950’s by applied statisticians in the petroleum and automotive industries. With an ancestry like this, there’s no wonder that it is often regarded as the statistical version of the early 60’s Chev Corvair, at best only ‘driveable’ by expert careful users, or in Ralph Nader’s immortal words and title of his 1966 book  ‘Unsafe at Any Speed’.

Well maybe. But if used with cross-validation and good sense, it’s an old-tech standby to later model ‘lasso’ and ‘elastic net’ techniques. However, there’s an easy way for a bit of a softshoe shuffle of the old stepwise routine. See how well (preferably on a fresh set of data) forward entry with just one or, maybe two, or at most three variables do, compared with larger models. (SAS and SPSS allow the number of steps to be specified).

Of if you’d like to do some slightly fancier steps it in twotone spats, try a best subset regression (available in SAS, and SPSS through automatic linear, and Minitab and R etc), of all one variable combinations, two variables, three variables.

The inspiration for this is partly from Gerd Gigerenzer’s ‘take the best’ heuristic, taking the best cue or clue often beats more complex techniques including multiple regression etc. ‘Take the best’ is described in Prof Gigerenzer’s great new general book Risk Savvy: How to Make Good Decisions (Penguin, 2014) http://www.penguin.co.uk/nf/Book/BookDisplay/0,,9781846144745,00.html as well as his earlier academic books such as Simple Heuristics That Make Us Smart (Oxford University Press, 1999)

See if a good little model can do as well as a good (or bad) big ‘un!.


Further Future Reading

Draper NR, Smith H (1966) Applied regression analysis (and later editions). Wiley: New York.

John and Betty’s Journey into Statistics Packages*

In past days of our lives, those who wanted to learn a stats package, would attend courses, and bail up/bake cakes for statisticians, but would mainly raise the drawbridge, lock the computer lab door and settle down with the VT100 terminal or Apple II or IBM PC and a copy of the brown or update blue SPSS Manual, or whatever.

Nowadays, folks tend to look things up on the web, something of a mixed blessing, and so maybe software consultants will now say LIUOTFW (‘Look It Up On The Flipping Web’) rather than the late, great RYFM (‘Read Your Flipping Manual’).

And yes, there are some great websites, and great online documentation supplied by the software venders, but there are also some great books, available in electronic and print form. A list of three of the many wonderful texts available for each package (IBM SPSS, SAS, Stata, R and Minitab) can be downloaded from the Downloadables section on this site.

IBM SPSS (in particular), R (ever growing), and to a slightly lesser extent SAS, seem to have the best range of primers and introductory texts.
IMHO though, Stata could do with a new colourful, fun primer (not necessarily a Dummies Guide, although there’s Roberto Pedace’s Econometrics for Dummies (Wiley, New York, 2013) which features Stata), perhaps one by Andy Field, who has already done superb books on SPSS, R and SAS.

While up on the soapbox, I reckon Minitab could do with a new primer for Psychologists / Social Scientists, much like that early ripsnorter by Ray Watson, Pip Pattison and Sue Finch, Beginning Statistics for Psychology (Prentice Hall, Sydney, 1993).

Anyway, in memories of days gone by, brew a pot of coffee or tea, unplug email, turn off the phone and the mobile/cell, and settle in for an initial night’s journey, on a set or two of real and interesting data, with a good stats package book, or two!

*(The title of this post riffs off the improbably boring and stereotyped 1950’s early readers still used in Victorian primary (grade) schools in the 1960’s
http://nla.gov.au/nla.aus-vn4738114 (think Dick and Jane, or Alice and Jerry), as well as the far more entertaining and recent John and Betty’s Journey into Complex Numbers by Matt Bower http://www.slideshare.net/aus_autarch/john-and-betty )

Hovercrafts and Aunties: Learning Statistics as a Foreign Language

To many who are not members of our Craft, and even some that are, Statistics is something of a Foreign Language, difficult to grasp without a good understanding of its grammar, or at least a whole swag of useful rules.

Stats is also difficult to teach, note our students look of bored angst when we try to explain p values.

So could we teach Stats like a foreign language?

For starters, why don’t we teach statistical ‘tourists’/’travellers’/’consumers’ some useful ‘phrases’ they can actually use, like how to read Excel files into a stats package, how to do a box plot, check for odd values, do some basic recodes etc.

Such things rarely appear in texts. Instead, we tumble about teaching the statistical equivalent of ‘the pen of my aunt is on the table’ or ‘my hovercraft is full of eels’ (Monty Python), or ‘a wolverine is eating my leg’ (Tim Cahill).

For example, as well as assuming that the data are all clean and ready to go, why do stats books persist in showing how to read in a list of 10 or so numbers, rather than reading in an actual file?

Just as human languages may or may not directly have universal concepts, the same may apply for stats packages. The objects of R for example, are very succinct in conception, but very dfficult to explain.

Such apparent lack of universality, may be why English borrows words like ‘gourmand’ (to cite from my own book chapter), as English doesn’t otherwise have words for a person that eats for pleasure. Similarly, courgette/zucchini sounds better than baby marrow (and have you ever seen how big they can actually grow?).

Yet it’s a two way street, with English providing words to other languages, such as ‘weekend’.

According to the old Sapir-Whorf hypothesis, language precedes or at least shapes thought (but see John McWhorter’s recent 2014 book The Language Hoax), so if there’s no word for something, it’s supposedly hard to think about it.

In Stats package terms, instructors would have to somehow explain that it is very easy to extract and store, say, correlation values in R, for further processing, putting smiley faces beside large ones etc. But in SPSS and SAS we would normally have to use OMS/ODS, and think in terms of capturing information that would otherwise be displayed on a line printer. This is a difficult concept to explain to anyone under 45 or so!

Although there are many great books on learning stats packages, (something for a later post), and I myself can ‘speak’ SPSS almost like a native after 33 years, I only know a few words of other human languages, (and, funnily enough, only a few “words” of R).

If you’ll excuse me, my aunt and her pen are now going for a ride on a hovercraft.
(I hope there’s no eels! )


Sapir-Whorf Hypothesis


Counter to the Sapir-Whorf Hypthesis


Hovercraft, Gourmands and Stats Packages

McKenzie D (2013) Chapter 14: ‘Statistics and the Computer’ in


Who wrote what: Statistics and the Federalist

Stats is of course not just about numbers, it’s also often used to analyse words, even more so now with the explosion of social media in the past few years. But the late great Phil Stone of Harvard University’s General Inquirer for the quantitative analysis of text was developed in the early 1960’s. A few years later, in 1964, the release of the Ford Mustang and the Pontiac GTO pony/muscle cars, the late great Fred Mosteller and the great (and still with us) David Wallace published their book on the (mainly) Bayesian analysis of who wrote the Federalist Papers, a year after an introductory paper had appeared in the Journal of the American Statistical Association.

In the late 18th Century, three key figures in the foundation of the United States – Alexander Hamilton, John Jay and James Madison wrote 85 newspaper articles to help ratify the American Constitution.

The papers were published anonymously, but scholars had figured out the authorship of all but twelve, not knowing for sure whether these had been written by Madison or Hamilton. The papers were written in a very formal, and very similar, style,and so Mosteller and Wallace turned to function words like “an” and “of” and “upon and particularly “while” and “whilst”, a researcher from back in 1916 having noticed that Hamilton tended towards the former, Madison the latter. Computers back in the 60’s were pretty slow, and expensive, and hard to come by, there weren’t any at Harvard, where Mosteller had recently established a Statistics Department, and so they had to use the one at MIT.

In Mosteller and Wallace’s own words, after the combined work of themselves and a huge band of helpers, they “tracked the problems of Bayesian analysis to their lair and solved the problem of the disputed Federalist papers” using works of known authorship to conclude that Madison wrote all 12.

In 1984, M & W published a newer edition of their groundbreaking, and highly readable book with a slightly different title, while a few years later, the late great Colin Martindale (with a Harvard doctorate) and myself re-analysed the original data using Stone’s General Inquirer thematic dictionary as well as function words, and a type of kernel discriminant analysis / neural network, coming to the same conclusion.

Case closed? Not quite. It has recently been proposed that the disputed 12 papers were a collaboration, a summary of the evidence, and some other citations to classical & recent quantitative Federalist research are available here

Either way, when you’re getting a bit jaded with numbers, and the 0’s are starting to look like o’s, analyse text!

Further/Future reading

Mosteller F, Wallace DL (1964) Inference and disputed authorship, the Federalist. Addison-Wesley.

McGrayne, SB (2011) The theory that would not die: how Bayes’ rule cracked the Enigma code, hunted down Russian submarines & emerged triumphant from two centuries of controversy. Yale University Press.

Martindale C, McKenzie D (1995) On the utility of content analysis in author attribution: The Federalist. Computers and the Humanities, 29, 259-270.

Stone PJ, Bales RF, Namenwirth JZ, Ogilvie DM (1962). The General Inquirer: a computer system for content analysis and retrieval based on the sentence as a unit of information. Behavioral Science, 7, 484-498.