Friday, November 3, 2017

The scientific practices of experimental psychologists have improved dramatically

Psychology's Renaissance. Leif D. Nelson, Joseph P. Simmons, and Uri Simonsohn. Annual Review of Psychology, forthcoming. https://doi.org/10.1146/annurev-psych-122216-011836

Abstract: In 2010–2012, a few largely coincidental events led experimental psychologists to realize that their approach to collecting, analyzing, and reporting data made it too easy to publish false-positive findings. This sparked a period of methodological reflection that we review here and call Psychology’s Renaissance. We begin by describing how psychologists’ concerns with publication bias shifted from worrying about file-drawered studies to worrying about p-hacked analyses. We then review the methodological changes that psychologists have proposed and, in some cases, embraced. In describing how the renaissance has unfolded, we attempt to describe different points of view fairly but not neutrally, so as to identify the most promising paths forward. In so doing, we champion disclosure and preregistration, express skepticism about most statistical solutions to publication bias, take positions on the analysis and interpretation of replication failures, and contend that meta-analytical thinking increases the prevalence of false positives. Our general thesis is that the scientific practices of experimental psychologists have improved dramatically.

Keywords: p-hacking, publication bias, renaissance, methodology, false positives, preregistration

---
Psychologists have long been aware of two seemingly contradictory problems with the published literature. On the one hand, the overwhelming majority of published findings are statistically significant (Fanelli 2012, Greenwald 1975, Sterling 1959). On the other hand, the overwhelming majority of published studies are underpowered and, thus, theoretically unlikely to obtain results that are statistically significant (Chase & Chase 1976, Cohen 1962, Sedlmeier & Gigerenzer 1989). The sample sizes of experiments meant that most studies should have been failing, but the published record suggested almost uniform success.

There is an old, popular, and simple explanation for this paradox. Experiments that work are sent to a journal, whereas experiments that fail are sent to the file drawer (Rosenthal 1979). We believe that this “file-drawer explanation” is incorrect. Most failed studies are not missing. They are published in our journals, masquerading as successes.

The file-drawer explanation becomes transparently implausible once its assumptions are made explicit. It assumes that researchers conduct a study and perform one (predetermined) statistical analysis. If the analysis is significant, then they publish it. If it is not significant, then the researcher gives up and starts over. This is not a realistic depiction of researcher behavior. Researchers would not so quickly give up on their chances for publication, nor would they abandon the beliefs that led them to run the study, just because the first analysis they ran was not statistically significant. They would instead explore the data further, examining, for example, whether outliers were interfering with the effect, whether the effect was significant within a subset of participants or trials, or whether it emerged when the dependent variable was coded differently. Pre-2011 researchers did occasionally file-drawer a study, although they did not do so when the study failed, but rather when p-hacking did. Thus, whereas our file drawers are sprinkled with failed studies that we did not publish, they are overflowing with failed analyses of the studies that we did publish.

No comments:

Post a Comment