dogs and wolves postscript: Truth Stronger Than Fiction?

(travelling home on the tram after writing previous post, in the amazing State Library of Victoria.

thinking about dogs and wolves, from the French, Entre Chien et Loup, the hour between dog and wolf.  Dusk or twilight, or metaphorically the time between the familiar and the much less familiar….

out of the corner of my eye I noticed a young couple playing with a puppy, in one of their coat pockets.

but on closer inspection, that was no puppy, dog, or even a little wolf.

It was a *ferret*.   (little weasel, polecat creature, not the sort of thing you’d want in a pocket, or on a crowded conveyance)

now what’s the probability of That?

When Boogie becomes Woogie, when Dog becomes Wolf

An exciting (and not just for statisticians!) area of application in statistics/analytics/data science relates to change/anomaly/outlier detection, the general notion of outliers (e.g. ‘unlikely’ values) having been covered in a previous post, looking at, amongst other things, very long pregnancies.

But tonight’s fr’instance comes from Fleming’s wonderful James Bond Jamaican adventure novel, Dr No, (also a jazzy 1962 movie) which talks of London Radio Security shutting down radio connections with secret agents, if a change in their message transmitting style is detected. This may have indicated that their radio had fallen into enemy hands.

To use a somewhat less exotic example, imagine someone, probably not James Bond, tenpin bowling and keeping track of their scores, this scenario coming from HJ Harrington et al’s excellent Statistical Analysis Simplified: the Easy-to-Understand Guide to SPC and Data Analysis (McGraw-Hill, 1998).

On the 10th week, the score suddenly drops more than three standard deviations (scatter or variation around the mean or average) below the mean.

Enemy agents? Forgotten bowling shoes? Too many milk shakes?

Once again, an anomaly or change, something often examined in industry (Statistical Process Control (SPC) and related areas) to determine the point at which, in the words of Tom Robbin’s great novel Even Cowgirls Get The Blues, ‘the boogie stopped and the woogie began’.

Sudden changes in operations & processes can happen, and so a usual everyday assembly line (‘dog’) can in milliseconds become the unusual, and possibly even dangerous (‘wolf’), at which point hopefully an alarm goes off and corrective action taken.

The basics of SPC were developed many years ago (and taken to Japan after WW2, a story in itself). Anomaly detection is a fast-growing area. For further experimentation / reading, a recent method based upon calculating the closeness of points to their neighbours is described in John Foreman’s marvellous DataSmart: using Data Science to Transform Information into Insight (Wiley, 2014).

We might want to determine if a credit card has been stolen on the basis of different spending patterns/places, or, to return to the opening example, detect an unauthorised intruder to a computer network (e.g. Clifford Stoll’s trailblazing The Cuckoo’s Egg: Tracking a Spy Through the Maze of Computer Espionage).

Finally, we might just want to figure out just exactly when it was that our bowling performance dropped off!

Statistical Outliers: of Baldness and Long Gestations

At what point is a human gestation period ‘impossibly’ long. This was the question a British court had to consider in the 1949 appeal to the 1948 judgement in Hadlum vs Hadlum.

Ms Hadlum had a gestation period of 349 days, taking into account when Mr Hadlum went off to the war. The average human gestation is 40 weeks or 280 days, although new research shows an average of 268 days or 38 weeks, varying by +- 37 days http://www.sciencedaily.com/releases/2013/08/130806203327.htm

The widely used statistical definition of an outlier was given by Douglas Hawkins in 1980, ‘an observation which deviates so much from other observations as to cause suspicions that it was generated by a different mechanism’. (Hawkins DM, 1980, Identification of outliers. Chapman & Hall).

Hmn! The court upheld the 1948 finding that such a long gestation was possible, and so Ms Hadlum had not been ‘unfaithful’ to Mr Hadlum, cause for divorce back in those dark days. In the 1951 case of Preston-Jones vs Preston-Jones, however, the court found a gestation period of 360 days to be the limit. The judge concluded that ‘If a line has to be drawn I think it should be drawn so as to allow an ample and generous margin’.

Statisticians have established guidelines for ‘outliers’, that are lines in the sand, if not in concrete.

But speaking of sand, at what point do grains of sand form a heap of sand?

How many hairs constitutes the threshold distinguishing between bald and not bald?

(philosophers call this is the Sorites or ‘heap’ paradox).

The world ‘forgot’ how to make concrete from about 500-1300 AD, but was there a day when we could still make concrete, and a day in which we couldn’t? Something to think about on a Sunday afternoon!

2014 Excel implementation of some simple outlier detection techniques, by John Foreman http://au.wiley.com/WileyCDA/WileyTitle/productCd-111866146X.html

References on the above legal cases

1978 Statistics journal: http://www.jstor.org/discover/10.2307/2347159?uid=2&uid=4&sid=21103476515283

1953 Medical journal: http://link.springer.com/article/10.1007/BF02949756