terça-feira, 19 de novembro de 2013

The problem with p values: how significant are they, really? via GeoffCumming

The problem with p values: how significant are they, really?

By Geoff Cumming, La Trobe University
For researchers there’s a lot that turns on the p value, the number used to determine whether a result is statistically significant. The current consensus is that if p is less than .05, a study has reached the holy grail of being statistically significant, and therefore likely to be published. Over .05 and it’s usually back to the drawing board.
But today, Texas A&M University professor Valen Johnson, writing in the prestigious journal Proceedings of the National Academy of Sciences, argues that p less than .05 is far too weak a standard.
Using .05 is, he contends, a key reason why false claims are published and many published results fail to replicate. He advocates requiring .005 or even .001 as the criterion for statistical significance.

What is a p value anyway?

The p value is at the heart of the most common approach to data analysis – null hypothesis significance testing (NHST). Think of NHST as a waltz with three steps:
  1. State a null hypothesis: that is, there is no effect.
  2. Calculate the p value, which is the probability of getting results like ours – if the null hypothesis is true.
  3. If p is sufficiently small, reject the null hypothesis and sound the trumpets: our effect is not zero, it’s statistically significant!
British statistician and geneticist Sir Ronald Fisher introduced the p value in 1925. He adopted .05 as a reference point for rejecting a null hypothesis. For him it was not a sharp cutoff: a thoughtful researcher should consider the context, and other results as well.
NHST has, however, become deeply entrenched in medicine and numerous other disciplines. The precise value .05 has become a bar to wriggle under to achieve publication in top journals. Generations of students have been inducted into the rituals of .05 meaning “significant”, and .01 “highly significant”.

Sounds good. What’s the problem?

The trouble is there are numerous deep flaws in NHST.
There’s evidence that students, researchers and even many teachers of statistics don’t understand NHST properly. More worryingly, there’s evidence it’s widely misused, even in top journals.
Most researchers don’t appreciate that p is highly unreliable. Repeat your experiment and you’ll get a p value that could be extremely different. Even more surprisingly, p is highly unreliable even for very large samples.
NHST may be a waltz, but the dance of p is highly frenetic. Here’s a demonstration of why we simply shouldn’t trust any p value:

Despite all those problems, NHST persists, perhaps because we yearn for certainty. Declaring a result “significant” suggests certainty, even though our results almost always contain considerable uncertainty.

Should we require stronger evidence?

Johnson makes a cogent argument that .05 provides only weak evidence against the null hypothesis: perhaps only odds of 3 or 4 to 1 against reasonable alternative hypotheses.
He suggests we should require more persuasive odds, say 50 to 1 or even 200 to 1.
To do this, we need to adopt .005 or .001 as our p value criterion for statistical significance.
He recognises there’s a price to pay for demanding stronger evidence. In typical cases, we’d need to roughly double our sample sizes to still have a reasonable chance of finding true effects. Using larger samples would indeed be highly desirable, but sometimes that’s simply not possible. And are research grants about to double?
Johnson is correct that .05 corresponds to weak evidence, and .005 or .001 to evidence that’s usefully stronger. Adopting his stricter criterion would, however, mean that the majority of all published research analysed using NHST would fail the new test, and suddenly be statistically non-significant!


More fundamentally, merely shifting the criterion does not overcome the unreliability of p, or most of the other deep flaws of NHST. The core problem is that NHST panders to our yearning for certainty by presenting the world as black or white — an effect is statistically significant or not; it exists or it doesn’t.
In fact our world is many shades of grey — I won’t pretend to know how many. We need something more nuanced than NHST, and fortunately there are good alternatives.

A better way: estimation and meta-analysis

Bayesian techniques are highly promising and becoming widely used. Most readily available and already widely used is estimation based on confidence intervals.
A confidence interval gives us the best estimate of the true effect, and also indicates the extent of uncertainty in our results. Confidence intervals are also what we need to use meta-analysis, which allows us to integrate results from a number of experiments that investigate the same issue.
We often need to make clear decisions — whether or not to licence the new drug, for example — but NHST provides a poor basis for such decisions. It’s far better to use the integration of all available evidence to guide decisions, and estimation and meta-analysis provides that.
Merely shifting the NHST goal posts simply won’t do.
Further reading: Give p a chance: significance testing is misunderstood
Geoff Cumming has received funding from the Australian Research Council.
The Conversation
This article was originally published at The Conversation. Read the original article.

Clique aqui!

2015 A.C. Camargo academic journals acesso aberto adverse drug reactions alergia alquilantes alto custo ambiente ambientes virtuais analgésicos anomalias vasculares anti-eméticos anti-helmínticos anti-histamínico antianêmicos antiangiogênico anticâncer anticoagulantes antifúngicos antiprotozoários antivirais artemisinina arXiv asma asthma atopia atualização aula aulas auto-arquivamento avastin avermectina bevacizumab biologicals bioRxiv Blogger brain tumor bundler cancer cancerologia pediátrica Carlos Chagas carne vermelha cauterização Ceará CERN child chronic fatigue syndrome ciência ciência brasileira ciências biológicas e da saúde cientistas influentes cirurgia CLI Command Line Tools conselho internacional crime virtual CT scans Curtis Harris darbopoietina dermatite desenvolvedor diabetes dieta disautonomia dislipidemias doença renal doenças cardíacas doenças parasitárias dor DPOC eczema editoras predatórias efeitos adversos eficácia ensino e pesquisa eritropoietina erlotinib ESA escleroterapia estatinas esteróides estilo de vida exercícios F1000Research farmacogenética farmacologia fatores de crescimento fibromialgia Figshare Fisiologia e Medicina fitness flu FMJ Fortaleza fosfoetanolamina fraude acadêmica fraude eletrônica genetics GitHub glioblastoma gliomas Google Books Google mapas gordos green way Harald zur Hausen hemangiomas hemophagocytic lymphohistiocytosis High Sierra homebrew horário imagem immunology imunossupressores imunoterapia infecção urinária inibidores de ECA inibidores tirosina-quinase iniciação científica insulina irracionalismo ivermectina Jeffrey Beall Jekyll journal hijack Lectures lepra leucemia leukemia linfangiomas Mac OS Mac OS X macrophage activation syndrome magrinhas mal-formações March for Science Marcha pela Ciência medicina personalizada meta-análise Milton Santos modelos monoclonais monoclonal antibody mortalidade morte mudança Mulliken neuro-oncologia neuroblastoma neurology ngram viewer Nobel Nobel em Medicina ou Fisiologia novas drogas novos tratamentos obesidade ômega 3 open access open science OpenAIRE osteoporose Osvaldo Cruz package installer pediatria pediatric cancer pediatric tumors pediatrics peer review PeerJ personalized medicine PET/CT pharmacogenetics pharmacological treatment pharmacology plágio política de C&T posters postprints predatory publishers Preprints pressão arterial prevenção progressista projeto de pesquisa propranolol próstata publicação publicação científica publicações publication pubmed Python python 2 python 3 quimioterapia radiation radioterapia rapamycin rbenv recidiva regressão espontânea Regulação médica repository resposta resultados retrospectiva revisão por pares risco Ruby Satoshi Ömura Scholarly Open Access science ScienceNOW seguimento selênio self-archiving sequestro de periódico científico serotonina SIDA sildenafil slides sobrevida sulfa suplementos survival tacerva targeted therapy temozolamida temozolomide terapia alternativa tireóide tratamento tuberculose tumores cerebrais tumores pediátricos vaccine vacina venv via dourada via verde virtualenv virtualenvwrapper vitamina E vitaminas William C. Campbell Xcode Youyou Tu Zenodo

Postagens populares