Deviations & The Chrysalids

If people remember the British writer John Wyndham (1903-1969) at all it will be because of the Day of The Triffids a wonderful low-key science fiction novel about what happens when certain very large plants (Triffids) are developed, grown and harvested….Wyndham also wrote other great novels such as The Midwich Cuckoos (filmed as The Village of The Damned) amd The Kraken Wakes (about interstellar entities that take to The Deep causing maelstroms and tidal waves / tsunami’s , putting up the price of air travel because everyone ‘s scared to travel by sea, and then They or their Allies begin to invade coastal regions….)

But the Wyndham book most applicable to Statisticians, is The Chrysalids (Michael Joseph 1955, Penguin 1958). A post apocalyptic society scared of mutations, any mutations, in plants, in animals, and particularly in humans, known as Deviations.

‘Blessed is the Norm’,  ‘Watch thou for the Mutant’. So when young David befriends Sophie, who turns out to be a Deviation, because she has six toes (“éach foot five toes, and each toe shall end with a flat nail”);…well you’ll just have to read it. A very thoughtful book indeed.

Keepin’ the Customers satisfied

As a young lad in Geelong, I sold newspapers, for the Experience. Around that time I bought a 45 single, Bridge Over Troubled Water by Simon & Garfunkel. Bit orchestrated but amazing song, although  I didn’t much care for the flip side ‘Keep the customer satisfied’. I didn’t think it all that good a song, and it didn’t seem to have much to do with customers, or not the sort of customers I was likely to come across across back then.

I like that song more now, but still don’t think it has much to do with customers, apart from the snappy title. Similarly, the great Australian song Esmerelda by the Indelible Murtceps (an anagram of their alter ego Spectrum), sang about ‘always one more customer to go’, but once again, not the sort of customers a newspaper ‘distributor’ or Statistical consultant is (hopefully) likely to meet.

Bridge Over Troubled Water sounds better though, with its message of hope and reassurance (although a some of the words are admittedly a bit dodgy). Reassurance is vital in Statistical Consulting, where clients are often scared of statistics. The other important thing, in any form of consulting, is that clients must feel they come out with something positive that they didn’t come in with (information, a new ‘clever consultant riff they can try in SPSS or whatever), as well as feeling a bit more relaxed.

Whether it be cafe’s or consulting , offering a glass of water or cup of tea or coffee, keeping up an interesting patter about the daily specials, or how it is that median quartile regression may help to make length of stay data clearer, you always got to

Keep the Customer Satisfied!

Power Analysis Surge

December-February (and generally March) is Summer in the Southern Hemisphere. It can get pretty hot even in the more temperate southern states (= northern states in the Northern Hemisphere). Today, the last day of 2015, has an expected top of 40 Celsius, which is 104 Fahrenheit and a nose-peeling 313.5 Kelvin!

Summer is also the start of the Australian Academic Year, and Grant Season, where everyone is looking for Statisticians (who are in swimming pools or pool halls) to run power analyses for them. With no statisticians to be found, it’s off to the library and borrow whatever randoms they can find, or use one of the free web packages, which often works out like a home haircut.

There’s great specific software such as PASS and NQuery Adviser & Nterim

They’re not cheap (but neither are grants!), and aid professional statisticians as well.  There’s R of course, and the excellent menu-driven power and sample size routines in SAS and Stata.

But first  define the differences you’re expecting, based on the actual Literature as well as Clinical Judgement, and always see a Statistician!

Wisdom of the Cloud

Many summers ago when I started out in the Craft, I could log onto the trusty DEC-20 literally anywhere in the world, and use SPSS or BMDP to analyse data. Nowadays, I have to have IBM SPSS or Stata installed on the right laptop or computer, and bring it with me, wherever I may roam, and wonder dreamily  if I could just access my licensed stats packages from anywhere, like a library, a beach, a forest, a coffee shop.

One option would to subscribe to a stats package in the Cloud! Iin terms of main line stats packages, has R (free plus 8 euro’s ($A12.08) per month for platform, NCSS 10 at 18/27.19 per month + platform, IBM SPSS 23 Base 99/149.54 ditto and Standard (adds logistic regression, hierarchical linear modelling, survival analysis etc) for 199/300.59 per month + platform.

Another option, particularly if you’re more into six sigma / quality control type analyses, is Engineroom from at $US275 ($A378.55) per year.

Obviously,  compare the prices against actually buying the software , but to be able to log in from anywhere, on different computers, and analyse data,  sigh, it’s almost like the summer of ’85!

Coin Chops: Can the Law of Averages be Replaced by the Law of Probability?

Alas, to the ‘average’ consumer of statistics, unlike we statisticians and data analysts, Probability is a sort of Comic I mean Cosmic Force i.e. ‘The Laws of Probability’ . David Hand OBE FBA has entertainingly looked at misunderstandings of this Comic Force and Coincidences in ‘the improbability principle: why coincidences, miracles and rare events happen all the time’ (2014).

But sitting here in the State Library of Victoria, I’m reading Frank ‘Power Without Glory’ Hardy’s novel ‘Four-legged lottery’ (1958). On page 179 of the Gold Star paperback edition there’s a bit of blarney about the ‘law of probability’ replacing the ‘law of averages’ where one of the two main characters, a professional gambler by the name of Jim Roberts, talks about the Anglo-Australian game of Two-Up which involves throws of pairs of coins, and is legal in Australian casino’s and traditionally, on the streets on ANZAC Day (25th April)..

‘in an honestly conducted two-up school, an equal number of heads and tails will be thrown over a long period; both head and tail bettor must lose [as the ‘house’ must take a percentage]. [To try and overcome the Law of Averages, giggle!] a tail better can back the tail on every spin – only for two throws, doubling [the] stake on the second throw if the spinner [bets heads] the first time. In this way [he or she] defeats the law of average <by winning> every time a spinner throws [both tails or one head and one tail and only loses when spinner throws both heads]’. Time for a simulation !

Watch this space. Same Stat Time! Same Stat Channel!


Hobart and Randomicity

Mona, lower case, is a great 50’s song by Bo Diddely, covered a few years later by the Rolling Stones on their first album.

MONA, upper case, standing for Museum of Old and New Art, is an amazing underground (literally) art gallery in Hobart, the capital of Tasmania, the island state of Australia.

Hobart is the second oldest state capital in Australia (after Sydney), was liked by both Mark Twain and Agatha Christie, and is the birthplace of Hollywood actor Errol Flynn (1909-1959), as well as the final resting place of the last thylacine or ‘Tasmanian Tiger’ a carnivorous mammal, the last of which died in captivity in 1936. Hobart is also the setting for the development, in the mid 1930’s,  of Edward James George Pitman’s (1897-1993) development of randomization or permutation tests, which Sir Ronald Aylmer Fisher had also worked on. Permutation tests rely (these days) on computers, and don’t require reference to statistical arcana such as the Normal and Student’s T distributions, etc.

As shown by the late Julian Simon and more recently in that wonderful stats book that sounds like a law firm (Lock, Frazer Lock, Lock Morgan, Lock and Lock, 2012), permutation tests can also be easier to understand by students than the parametric alternatives.

MONA itself is currently showing the movie ‘David Bowie Is’, a segment of which talks about the London singer’s use of the William Burroughs / Brion Gysin cut-up technique and later a computer program called Verbasizer, to randomly suggest combinations of particular words to aid in the creative song-writing process.

While you may or may not be interested in randomicity, and the David Bowie movie may no longer be showing, but whether it’s out of the desire for adventure, curiosity, necessity or for purely random reasons, visit MONA and Hobart!!

Further reading:

Lock EH, Frazer Lock P, Lock Morgan K, Lock EF, Lock DF (2012). Statistics: unlocking the power of data. Wiley.

McKenzie D (2013). Chapter 14: Statistics and the Computer. In McKenzie S: Vital Statistics: an introduction for health science students. Elsevier.

Robinson ES (2011). Shift linguals: cut-up narratives from William S. Burroughs to the present. Rodopi.

Timms P (2012). Hobart. (revised edition). University of New South Wales Press.

Divisive Rules OK: Clustering #1

Back about 1965, while (whilst?) attending Primary (grade) School in a little northern Victorian town, lunchtimes would see a behaviour pattern in which, say, two boys would link arms and march around the boys’ playground chanting “join on the boys who want to play Chasey” or some other sport or game, and soon, unless something bizarre or boring was called out for, there’d be three, five, eight, ten etc until there were enough people for that particular game.

Now, let’s imagine a slightly more surreal version, a big group of boys, or girls, or indeed a mixture thereof, wanders around the playground. But what sport are they going to play, it’s unlikely (especially for our purposes here) they’ll all want to play the same thing, and even if they did, there may be too many, or some people may well be better suited to some other sport or game. If we were brave enough and if lunchtimes extended into infinity, we could try every possible way of splitting our big group into two or more smaller groups as, in the general field of cluster analysis, Edwards and Cavalli-Sforza showed back in ’65.

Alternatively, we could ask the single person most different from the rest of the main group M in terms of the game they wanted to play. That person, (let’s call them Brian after Brian Everitt, who wrote a great book on cluster analysis in several editions, and Brian Setzer as in Stray Cats and the Brian Setzer Orchestra, this being the unofficial Brian Setzer Summer) splits off from the group and forms a splinter group S. For each of the remaining members, we check whether on average, they’re more dissimilar to the members of M, than the members of S (i.e. Brian et al). If so, then they too join S.

Known as divisive clustering (the earlier  “join on”  syndrome is sorta kinda like agglomerative clustering, start off with individuals and group em together), this particular method was published in ’64 by Macnaughton-Smith. Described in Kaufman and Rousseeuw’s book as DIANA, with shades of a great steak sauce and an old song by Paul Anka,  DIANA is available in R as part of the  cluster  package.

Now if you’ll excuse me, there’s a group looking for members to march down the road for a cold drink, on this hot Australian summer night! Once we get to the bar, the most dissimilar, perhaps a nondrinker, will split off, clusters will be formed, and through the night there may be re-splitting and re-joining of groups or cliques, as some go off to the pinball parlour, others to the pizza joint, while some return to the bar, all in the manner of another great clustering algorithm, Ball and Hall’s ISODATA.

Bottled Sources:

Ball GH, Hall DJ (1965). A novel method of data analysis and pattern classification. Technical Report, Stanford Research Institute, California.

Edwards AWF, Cavalli-Sforza, LL (1965). A method for cluster analysis. Biometrics, 21, 362-375.

Everitt, B.S. (1974 and more recent editions). Cluster analysis. Heinemann: London.

Kaufman L, Rousseeuw PJ (1990). Finding groups in data: an introduction to cluster analysis. Wiley: New York.

Macnaughton-Smith P, Williams WT, Dale MB, Mockett LG (1964). Dissimilarity analysis: A new technique of hierarchical sub-division. Nature, 202, 1034-1035.