Thursday, April 2, 2020

Lay People Are Unimpressed by the Effect Sizes Typically Reported in Psychology

McPhetres, Jonathon, and Gordon Pennycook. 2020. “Lay People Are Unimpressed by the Effect Sizes Typically Reported in Psychological Science.” PsyArXiv. April 2. doi:10.31234/osf.io/qu9hn

Abstract: It is recommended that researchers report effect sizes along with statistical results to aid in interpreting the magnitude of results. According to recent surveys of published research, psychologists typically find effect sizes ranging from r = .11 to r = .30. While these numbers may be informative for scientists, no research has examined how lay people perceive the range of effect sizes typically reported in psychological research. In two studies, we showed online participants (N = 1,204) graphs depicting a range of effect sizes in different formats. We demonstrate that lay people perceive psychological effects to be small, rather meaningless, and unconvincing. Even the largest effects we examined (corresponding to a Cohen’s d = .90), which are exceedingly uncommon in reality, were considered small-to-moderate in size by lay people. Science communicators and policymakers should consider this obstacle when attempting to communicate the effectiveness of research results.

Why Do so Few People Share Fake News? It Hurts Their Reputation

Altay, Sacha, Anne-Sophie Hacquin, and Hugo Mercier. 2019. “Why Do so Few People Share Fake News? It Hurts Their Reputation.” PsyArXiv. October 1. doi:10.31234/osf.io/82r6q

Abstract: Despite their potential attractiveness, fake news is shared by a very small minority of internet users. As past research suggests a good reputation is more easily lost than gained, we hypothesized that the majority of people and media sources avoid sharing fake news stories so as to maintain a good reputation. In two pre-registered experiments (N = 3264) we found that the increase in trust that a source (media outlet or individual) enjoys when sharing one real news against a background of fake news is smaller than the drop in trust a source suffers when sharing one fake news against a background of real news. This asymmetry holds even when the outlet only shares politically congruent news. We suggest that individuals and media outlets avoid sharing fake news because it would hurt their reputation, reducing the social or economic benefits associated with being seen as a good source of information.

How Many Jobs Can be Done at Home? About 34pct of the labor force could work from home in the US

How Many Jobs Can be Done at Home? Jonathan Dingel, Brent Neiman. Chicago U, March 27, 2020. https://bfi.uchicago.edu/wp-content/uploads/BFI_White-Paper_Dingel_Neiman_3.2020.pdf

1  Introduction
Evaluating the economic impact of “social distancing” measures taken to arrest the spread of COVID-19 raises a number of fundamental questions about the modern economy: How many jobs can be performed at home? What share of total wages are paid to such jobs? How does the scope for working from home vary across cities or industries? To answer these questions, we classify the feasibility of working at home for all occupations and merge this classification with occupational employment counts for the United States. Our feasibility measure is based on responses to two Occupational Information Network (O*NET) surveys covering “work context” and “generalized work activities.” For example, if answers to those surveys reveal that an occupation requires daily “work outdoors” or that “operating vehicles, mechanized devices, or equipment” is very important to that occupation’s performance, we determine that the occupation cannot be performed from home.1 We merge this classification of O*NET occupations with information from the U.S. Bureau of Labor Statistics (BLS) on the prevalence of each occupation in the aggregate as well as in particular metropolitan statistical areas and 2-digit NAICS industries

2  Results

Our classification implies that 34 percent of U.S. jobs can plausibly be performed at home. We obtain our estimate by identifying job characteristics that clearly rule out the possibility of working entirely from home, neglecting many characteristics that would make working from home difficult.2

When individuals are exposed to their own image in a mirror, known to increase self-awareness, they may show increased accessibility of suicide-related words (the mirror effect); replication fails in this paper

Mon├ęger, J., Chatard, A., & Selimbegovi─ç, L. (2020). The Mirror Effect: A Preregistered Replication. Collabra: Psychology, 6(1), 18. http://doi.org/10.1525/collabra.321

Abstract: When individuals are exposed to their own image in a mirror, known to increase self-awareness, they may show increased accessibility of suicide-related words (a phenomenon labeled “the mirror effect”; Selimbegović & Chatard, 2013). We attempted to replicate this effect in a pre-registered study (N = 150). As in the original study, self-awareness was manipulated using a mirror and recognition latencies for accurately detecting suicide-related words, negative words, and neutral words in a lexical decision task were assessed. We found no evidence of the mirror effect in pre-registered analyses. A multiverse analysis revealed a significant mirror effect only when excluding extreme observations. An equivalence TOST test did not yield evidence for or against the mirror effect. Overall, the results suggest that the original effect was a false positive or that the conditions for obtaining it (in terms of statistical power and/or outlier detection method) are not yet fully understood. Implications for the mirror effect and recommendations for pre-registered replications are discussed.

Keywords: Self-awareness , Suicide thought accessibility , Median Absolute Deviation

4. Discussion

In the present study, we attempted to replicate the mirror effect. We expected recognition latencies to suicide-related words to be shorter in the mirror exposure condition than in the control condition, when controlling for neutral words latencies or negative words latencies. These predictions remained unsupported when using the pre-registered outlier detection method in the confirmatory analyses. However, a test assessing the equivalence of the observed effect to a null effect failed to significantly indicate that the mirror effect was equivalent to a null effect (considering d = 0.2 as the smallest effect size of interest). Moreover, an exploratory multiverse analyses showed increasing effect sizes as a function of the decreasing threshold of outlier exclusion, as detected by a robust outlier detection method (i.e, the median absolute deviation, Leys et al., 2013) such that the mirror effect was significant after excluding observations diverging from 2 or less median absolute deviations from the median, but only when using negative words’ RT as a covariate. This partial replication raises several interesting questions about the status of the mirror effect, the effect of outliers in a sample, and, more generally, about what allows for concluding that a replication is successful.

4.1. Mixed results concerning the mirror effect

Several large-scale replication projects show that about half of published findings fail to replicate in direct and high-powered replications in psychology (Klein et al., 2018Open Science Collaboration, 2015Simons, Holcombe, & Spellman, 2014). These recent studies point out that it is often difficult to replicate published effects. Between the noise inherent to behavioral sciences and the small-sized effects that we often encounter in psychology, observing statistically significant differences is not guaranteed in replication attempts, even when the effect exists in the population. Indeed, one must take into account the inevitable heterogeneity that exists between a study and its replications (Kenny & Judd, 2019), among other factors.
The present replication findings suggest that the original finding might be a false positive. At the same time, equivalence testing does not warrant a conclusion that the effect is equivalent to 0. Also, multiverse analyses show that the effect was significant in some cases, when using a robust method and a severe criterion for detecting outliers. We believe that if the effect exists, the effect size is likely to be smaller than initially thought. In sum, the study did not provide evidence for a robust mirror effect, but neither did it provide evidence for a null effect (i.e., an effect too trivial to be studied, as defined by a Cohen’s d smaller than 0.2). Therefore, further studies using larger samples are needed to establish more reliable estimates of the effect size and a better understanding of the mechanisms involved in this effect, if it exists.

4.2. Detecting outliers in a sample

Outliers are atypical data points that are abnormally different from the “bulk” of observations in a study, and therefore non-representative of the population (Leys, Delacre, Mora, Lakens, & Ley, 2019). There are many ways to define an outlier in a specific data set, as there are many statistical criteria that have been put forward in the literature. Studentized residuals and z-scores are among the most popular ways to detect outliers (Cousineau & Chartier, 2010). However, as underlined by Rousseeuw (1990), these criteria can underperform. The reason for this is that they are based on the sample standard deviation, which is itself a parameter highly sensitive to outliers (Wilcox, 2010). Robust estimators are hence needed to detect outliers. Contrary to studentized and standardized residuals, the median is highly insensitive to outliers (Leys et al., 2013). As one robust estimator, the median absolute deviation (MAD) is particularly relevant in this case, since the classic methods would have failed to detect influential data points (Leys et al., 2013; see also Wilcox, 2017).
How we manage the presence of outliers in a sample is a fundamental aspect of data analysis. However, to date, there is no consensus about which method is the most appropriate and what threshold should be used for detecting and excluding outliers (Leys et al., 2013). In an attempt to optimize the quality of the replication, the hypothesis, method, and statistical analysis were pre-registered. However, what we failed to predict was that excluding outliers on the basis of studentized residuals would not be sufficient to discard all influential data points. Hence, pre-registering a single outlier detection technique might be insufficient. In this view, Leys et al. (2019) recently provided specific recommendations concerning pre-registering and detecting outliers, one of which is to expand a priori reasoning in the registration, in order to manage unpredicted outliers. In our view, this amounts to the option of registering multiple ways to handle outliers. For instance, one could register a decision tree regarding the possible ways to handle outliers, as a function of the distribution. For instance, Nosek, Ebersole, DeHaven, and Mellor (2017) mention the possibility to define a sequence of tests and to determine the use of parametric or non-parametric approach according to the outcome of normality assumption tests. In a similar vein, standard operating procedures (SOPs) are procedures more general than decision trees that are shared in a given field of research in order to ground standardization of data handling (e.g., Lin & Green, 2016). The development of such standard procedures applied to outlier detection and exclusion could provide a useful tool for pre-registration.
Developing common, consensual procedures can thus be a solution for dealing with the unpredictable aspects of data, such as the presence of outliers. This would be a controlled, transparent, and probably the optimal manner of handling unpredictability, while suppressing the researchers’ degrees of freedom in post-hoc decisions concerning the method used to detect outliers (see Wicherts et al., 2016). In statistics and methodology, as in many fields, a perfect plan does not exist, so it is difficult to offer a perfect solution that fits all studies. In our view, there is a need to define a more general plan of how to handle data, a plan that could fit a large amount of studies. Among the issues that would need to be addressed in such a plan are, for instance, the question of outlier detection/exclusion criterion definition (intraindividually or interindividually), the question of the specific (robust) criterion to be used, and the question of the desired distribution.