March 2014 – Synergistic Statistical Consulting, Analysis, Arboronics

Statistics of Social Spiders: postscript

Some new information has just scuttled across my desk!

It seems that there is in fact a type of social spider, and a recent paper in the Proceedings of the Royal Society has looked at the role of social interaction amongst these creatures (Stegodyphus Mimosarum).

http://www.the-scientist.com/?articles.view/articleNo/39577/title/Behavior-Brief/

This is all very well, but has someone counted these spiders up and ascertained if the counts fit a Poisson distribution (e.g. are these spiders Poisson-ous (!!!), or some other distribution, (and what are their views on competing sundew plants?)

Laskowski KL, Pruitt JN (March, 2014). Evidence of social niche construction: persistent and repeated social interactions generate stronger personalities in a social spider. Proc Royal Society B.

Spiders, sowbugs and sundew statistics

Statisticians often like to think that non-statisticians don’t know what exactly it is that we do. The truth is of course that not only do they not know, they do not particularly care! With the possible exception of someone like Nat ‘2012 US election’ Silver, what statisticians are thought to do is about as exciting as driving around in cardigan and slippers in a two-tone ’74 Morris Marina with no radio.

But what if statisticians went around ripping up floorboards and counting up spiders? Now you’re talking!!

Back in ’46 a scientist named LC Cole published some data on counts of spiders, and sowbugs, (or woodlice, roly poly’s or slaters).

Cole and various bright sparks ever since, had the idea of fitting the spider / sowbug counts to various types of probability distribution. Voila!, it was found that spider counts could be quite happily fitted by the Poisson distribution, as can the number of typewriter errors made on a page, the number of people killed by horse kicks in the Prussian cavalry, etc etc.

But not sowbug counts, which are better fitted by a ‘contagious distribution’, such as the ‘generalized Poisson’ or ‘generalized Negative binomial’, in which the event of something happening is itself dependent on other events. Sowbugs, it seems are a social breed, and when they notice their numbers dwindling, to the point where there’s only one or two left, they pick up sticks and try the house down the road, in search of other sowbugs, if not adventure.

Spiders, on the other hand, are more individualistic or anti-social and don’t care if they’re left by themselves. (In fact they probably appreciate the peace and quiet after those pesky sowbugs have marched off elsewhere, unless of course the spiders belong to the type known as woodlouse spiders or sowbug hunters, which is a very different kettle of fish, or spiders, altogether, as are ‘shy spiders’and ‘social spiders’)

Finally, a paper published in the journal of the highly prestigious Royal Society in 2010 found that carnivorous wolf spiders (Lycosidae) and pink sundew plants (Drosera capillaris) competed with each other for available food, in statistically interesting ways, indeed the lead author described the study as ‘awfully fun’ http://www.livescience.com/8566-plant-spider-compete-food.html
http://www.americanscientist.org/issues/pub/2010/6/in-the-news-30
http://ittakes30.wordpress.com/2010/10/25/feed-me-seymour/

So, next time someone asks (without really caring what the answer is) ‘just what is it that statisticians actually do.?….’.

[updated, 9 October 2016]

References

Cole LC (1946) A study of the cryptozoa of an Illinois woodland. Ecological Monographs, 16, 49-86.

Consul PC (1989) Generalized Poisson distributions. Marcel Dekker, New York.

Forbes C, Evans M et al (2011) Statistical distributions. 4th ed. Wiley, Hoboken, New Jersey.

Janardan KG et al. (1979). Biological applications of the Lagrangian Poisson distribution. BioScience, 29, 599-602.
Jennings DE et al. (2010). Evidence for competition between carnivorous plants and spiders. Proc Royal Society B, 277, 3001-3008.

Raja TA, Mir AH (2011). On applications of some probability distributions. Journal of Research & Development, 11, 107-116.

Watch the Skies: Manga Regression!

Although it’s probably the technique most employed by statisticians, regression or at least multiple linear or multiple logistic regression, is often the concept that is most feared or misunderstood by students and newbie researchers. If someone you know is in the latter categories, and they would like a fun and straightforward introduction to regression that literally uses pictures (cartoons), announce that The Manga Guide to Regression book has now been published, around May 2016, by the friendly folks at NoStarchPress http://www.nostarch.com/regression

and available on http://www.oreilly.com http://www.amazon.com http://www.bookdepository.com http://www.dymocks.com.au etc

Authored by Shin Takahashi (Manga Guide to Statistics, 2008), the new book uses Manga http://en.wikipedia.org/wiki/Manga (think Osamu Tezuka’s Jungle Emperor: made into the 1965 Japanese anime TV series of the same name and the 1966 US overdub ‘Kimba the White Lion’ http://en.wikipedia.org/wiki/Jungle_Emperor, as well as ‘Atomu’ / Astro Boy’ .

Okay, Kimba himself is not actually included in the Manga Regression book, but there’s both linear and logistic regression, demonstrated using Microsoft Excel (just a slight gritting of teeth, go for Excel 2010/2013/2016 or higher, but at least it might allow getting ‘down and dirty with data’.

Happy Numbers

Why there are so very few statisticians as heroes (or even dashing villains) in novels is a pop culture mystery even bigger than the true identity of reggae magicians Johnny and the Attractions, or the actual final resting place of Butch and Sundance.

I have heard of, but don’t have, the 2008 novel Dancing with Dr Kildare, which features British medical statistician Nina, as well as the Finnish composer Sibelius, and the Tango, by Jane Yardley PhD, in real life a co-ordinator of medical trials for a small pharma.

http://onlinelibrary.wiley.com/doi/10.1111/j.1740-9713.2009.00341.x/abstract.

http://www.transworld-publishers.co.uk/catalog/book.htm?command=Search&db=twmain.txt&eqisbndata=0552773107

I’m now performing statistical consulting at two major hospitals so I’m about to re-read that wonderful book by major scriptwriter / drama writer Jim Keeble ‘The Happy Numbers of Julius Miles’, originally published in 2012 by independent outfit Alma

http://www.almabooks.com/the-happy-numbers-of-julius-miles-p-387-book.html

but there seems to be an April 2014 printing for the US.

It’s a great book about a big fellow, Julius Miles, a professional statistician with Barts Health NHS Trust, Royal London Hospital, Whitechapel, East London, England. Julius loves stats – nose-counting ones such as the fact that it takes him 2 minutes to polish his shoes (with 30 seconds airing between polish, application and buff), as well as meaty methods such as multilevel Poisson regression for length of hospital stay.

Julius is about 1.93 metres (6 foot 4 inches) and wears size 13 (UK) shoes, a solid fellow (although not reminiscent of the solid Ignatius Reilly in John Kennedy Toole’s classic posthumous 1980 novel ‘A Confederacy of Dunces’).

There’s something about the name Julius, Julius Sumner Miller the US physicist and educator whose ‘Why is it so?’ ran on Australian TV for over 20 years from the 1960’s, and the frothy US drink Orange Julius, named after Julius Freed, around since 1926, taking off in ’29 (the official drink of the 1964 New York World’s Fair).

I can thoroughly recommend this colourful & warm book about Julius Miles, medical statistician.

Give p’s a Chance? Hoochie Coochie Hypothesis Tests

A common request for jobbing analysts is to ‘run these results through the computer and see if they’re significant’. Now, unfortunately, many folk, including scarily, even lecturers in our craft, have a misconception as to what ‘significance’ actually means.

Shout in a desperate monotone “it’s the probability of getting a result as large as, or larger than, what we would obtain if the ‘null hypothesis’ of no difference or association was actually true” and people look flummoxed, yes flummoxed, as if you were speaking to them in the language of the ancient Huns, (another) language no-one has been able to figure out.

True, testing ‘something’ against the concept of ‘nothing’ is a bit kooky. If we really did have a situation where two groups ended up with identical averages we’d think it was a trifle dodgy to say the least.

And as for the notion of effect sizes! Picture, on an enchanted desert isle, two group means of 131.5 and 130, with a pooled standard deviation (sd) of 15. A difference of 1.5 divided by 15, is a Cohen’s (the late great Jacob Cohen; Cohen’s kappa, populariser of power analysis, maven of multiple regression) effect size of 0.10, where given Jack’s arbitrary but conventional guidelines for mean differences, 0.20 is a small effect size, 0.50 medium, 0.80 large.

Using an online calculator e.g.

http://www.graphpad.com/quickcalcs/ttest1/

we find, that if there were 1000 in each group, the t test value would be 2.24 and our p value 0.026.

Voila, Eureka, Significance, as cook smiles and puts an extra dollop of custard on our pudding!

But if we ‘only’ had 100 in each group, our t value would be 0.71, our p value would be 0.48, and there’d be a sigh, a frown, a closing of doors and a grim faced cook doling out the thrice-boiled cabbage….

But they’re the same means, the same sd, and the same effect size!

Coming Up: Guest Post on a possible, probable, Salvation.

Further/Future reading

G Cumming (2014) How significant is P? Australasian Science, March 2014. p. 37.

http://www.australasianscience.com.au/article/issue-march-2014/how-significant-p.html

also check out Prof G’s website

http://www.latrobe.edu.au/psy/research/cognitive-and-developmental-psychology/esci

with free Excel ESCI program and details of his illuminating 2012 book ‘The New Statistics’.

Now, back to honest resting from honest labour!

dogs and wolves postscript: Truth Stronger Than Fiction?

(travelling home on the tram after writing previous post, in the amazing State Library of Victoria.

thinking about dogs and wolves, from the French, Entre Chien et Loup, the hour between dog and wolf. Dusk or twilight, or metaphorically the time between the familiar and the much less familiar….

out of the corner of my eye I noticed a young couple playing with a puppy, in one of their coat pockets.

but on closer inspection, that was no puppy, dog, or even a little wolf.

It was a *ferret*. (little weasel, polecat creature, not the sort of thing you’d want in a pocket, or on a crowded conveyance)

now what’s the probability of That?

When Boogie becomes Woogie, when Dog becomes Wolf

An exciting (and not just for statisticians!) area of application in statistics/analytics/data science relates to change/anomaly/outlier detection, the general notion of outliers (e.g. ‘unlikely’ values) having been covered in a previous post, looking at, amongst other things, very long pregnancies.

But tonight’s fr’instance comes from Fleming’s wonderful James Bond Jamaican adventure novel, Dr No, (also a jazzy 1962 movie) which talks of London Radio Security shutting down radio connections with secret agents, if a change in their message transmitting style is detected. This may have indicated that their radio had fallen into enemy hands.

To use a somewhat less exotic example, imagine someone, probably not James Bond, tenpin bowling and keeping track of their scores, this scenario coming from HJ Harrington et al’s excellent Statistical Analysis Simplified: the Easy-to-Understand Guide to SPC and Data Analysis (McGraw-Hill, 1998).

On the 10th week, the score suddenly drops more than three standard deviations (scatter or variation around the mean or average) below the mean.

Enemy agents? Forgotten bowling shoes? Too many milk shakes?

Once again, an anomaly or change, something often examined in industry (Statistical Process Control (SPC) and related areas) to determine the point at which, in the words of Tom Robbin’s great novel Even Cowgirls Get The Blues, ‘the boogie stopped and the woogie began’.

Sudden changes in operations & processes can happen, and so a usual everyday assembly line (‘dog’) can in milliseconds become the unusual, and possibly even dangerous (‘wolf’), at which point hopefully an alarm goes off and corrective action taken.

The basics of SPC were developed many years ago (and taken to Japan after WW2, a story in itself). Anomaly detection is a fast-growing area. For further experimentation / reading, a recent method based upon calculating the closeness of points to their neighbours is described in John Foreman’s marvellous DataSmart: using Data Science to Transform Information into Insight (Wiley, 2014).

We might want to determine if a credit card has been stolen on the basis of different spending patterns/places, or, to return to the opening example, detect an unauthorised intruder to a computer network (e.g. Clifford Stoll’s trailblazing The Cuckoo’s Egg: Tracking a Spy Through the Maze of Computer Espionage).

Finally, we might just want to figure out just exactly when it was that our bowling performance dropped off!