Give p’s a Chance? Hoochie Coochie Hypothesis Tests

A common request for jobbing analysts is to ‘run these results through the computer and see if they’re significant’. Now, unfortunately, many folk, including scarily, even lecturers in our craft, have a misconception as to what ‘significance’ actually means.

Shout in a desperate monotone “it’s the probability of getting a result as large as, or larger than, what we would obtain if the ‘null hypothesis’ of no difference or association was actually true” and people look flummoxed, yes flummoxed, as if you were speaking to them in the language of the ancient Huns, (another) language no-one has been able to figure out.

True, testing ‘something’ against the concept of ‘nothing’ is a bit kooky. If we really did have a situation where two groups ended up with identical averages we’d think it was a trifle dodgy to say the least.

And as for the notion of effect sizes! Picture, on an enchanted desert isle, two group means of 131.5 and 130, with a pooled standard deviation (sd) of 15. A difference of 1.5 divided by 15, is a Cohen’s (the late great Jacob Cohen; Cohen’s kappa, populariser of power analysis, maven of multiple regression) effect size of 0.10, where given Jack’s arbitrary but conventional guidelines for mean differences, 0.20 is a small effect size, 0.50 medium, 0.80 large.

Using an online calculator e.g.

http://www.graphpad.com/quickcalcs/ttest1/

we find, that if there were 1000 in each group, the t test value would be 2.24 and our p value 0.026.

Voila, Eureka, Significance, as cook smiles and puts an extra dollop of custard on our pudding!

But if we ‘only’ had 100 in each group, our t value would be 0.71, our p value would be 0.48, and there’d be a sigh, a frown, a closing of doors and a grim faced cook doling out the thrice-boiled cabbage….

But they’re the same means, the same sd, and the same effect size!

Coming Up:  Guest Post on a possible, probable, Salvation.

Further/Future reading

G Cumming (2014) How significant is P? Australasian Science, March 2014. p. 37.

http://www.australasianscience.com.au/article/issue-march-2014/how-significant-p.html

also check out Prof G’s website

http://www.latrobe.edu.au/psy/research/cognitive-and-developmental-psychology/esci

with free Excel ESCI program and details of his illuminating 2012 book ‘The New Statistics’.

Now, back to honest resting from honest labour!

Resulting Consulting: Excel for Stats – 800 pound Gorilla or just Monkeying around?

When hearing of folks running statistical analysis with Excel , statisticians often have panicky images of ‘Home Haircutting , with Electric Shears, in the Wet’!

Mind you, Excel really is great for processing data, but analysing it in a more formal or even exploratory sense, can be a trifle tricky.

On the upside, many work computers have Excel installed, it’s readily available for quite a low price even if one is not a student or an academic, and for the most part is well designed and simple to use. It’s very easy to develop a spreadsheet that shows each individual calculation needed for a particular formula such as the standard deviation, for instance. Such flexibility is wonderful for learning and teaching stats, because everyone can see the steps involved in actually getting an answer, more so than the usual press-button, window click, typing ‘esoteric’ commands.

On the downside, pre-2010 versions of Excel had both practical accuracy issues (with functions & the add-in statistics toolpak) and validity issues (employed non-usual methods for things like handling ties in ranked data). There’s still no nonparametric tests (e.g. Wilcoxon), and Excel is still a bit light on for confidence intervals, regression diagnostics,  and for performing production, shop-floor type statistical analyses. More of an adjustable wrench than a set of spanners?

In sum, if used wisely, Excel is a useful adjunct to third party statistical add-ins or  statistical packages, but please avoid pie charts, especially 3D ones, and watch out for those banana skins….

**Excel 2010 (& Gnumeric & OpenOffice) Accuracy / Validity**

http://www.tandfonline.com/doi/abs/10.1198/tas.2011.09076#.UvH4rp24a70

http://homepages.ulb.ac.be/~gmelard/rech/gmelard_csda23.pdf

**Some Excel Statistics Books**

Conrad Carlberg http://www.quepublishing.com/store/statistical-analysis-microsoft-excel-2013-9780789753113

Mark Gardener http://www.pelagicpublishing.com/statistics-for-ecologists-using-r-and-excel-data-collection-exploration-analysis-and-presentation.html

Neil Salkind http://www.sagepub.com/books/Book236672?siteId=sage-us&prodTypes=any&q=salkind&fs=1

**Some Statistical Add-Ins for Excel**

Analyse-It http://analyse-it.com     DataDesk /XL   http://www.datadesk.com

RExcel (interfaces Excel to open source R) http://rcom.univie.ac.at/

XLStat http://www.xlstat.com/en/

**Some Open Source Spreadsheets**

Gnumeric https://projects.gnome.org/gnumeric/  OpenOffice http://www.openoffice.org.au/