Thursday, April 2, 2020

When individuals are exposed to their own image in a mirror, known to increase self-awareness, they may show increased accessibility of suicide-related words (the mirror effect); replication fails in this paper

Monéger, J., Chatard, A., & Selimbegović, L. (2020). The Mirror Effect: A Preregistered Replication. Collabra: Psychology, 6(1), 18. http://doi.org/10.1525/collabra.321

Abstract: When individuals are exposed to their own image in a mirror, known to increase self-awareness, they may show increased accessibility of suicide-related words (a phenomenon labeled “the mirror effect”; Selimbegović & Chatard, 2013). We attempted to replicate this effect in a pre-registered study (N = 150). As in the original study, self-awareness was manipulated using a mirror and recognition latencies for accurately detecting suicide-related words, negative words, and neutral words in a lexical decision task were assessed. We found no evidence of the mirror effect in pre-registered analyses. A multiverse analysis revealed a significant mirror effect only when excluding extreme observations. An equivalence TOST test did not yield evidence for or against the mirror effect. Overall, the results suggest that the original effect was a false positive or that the conditions for obtaining it (in terms of statistical power and/or outlier detection method) are not yet fully understood. Implications for the mirror effect and recommendations for pre-registered replications are discussed.

Keywords: Self-awareness , Suicide thought accessibility , Median Absolute Deviation

4. Discussion

In the present study, we attempted to replicate the mirror effect. We expected recognition latencies to suicide-related words to be shorter in the mirror exposure condition than in the control condition, when controlling for neutral words latencies or negative words latencies. These predictions remained unsupported when using the pre-registered outlier detection method in the confirmatory analyses. However, a test assessing the equivalence of the observed effect to a null effect failed to significantly indicate that the mirror effect was equivalent to a null effect (considering d = 0.2 as the smallest effect size of interest). Moreover, an exploratory multiverse analyses showed increasing effect sizes as a function of the decreasing threshold of outlier exclusion, as detected by a robust outlier detection method (i.e, the median absolute deviation, Leys et al., 2013) such that the mirror effect was significant after excluding observations diverging from 2 or less median absolute deviations from the median, but only when using negative words’ RT as a covariate. This partial replication raises several interesting questions about the status of the mirror effect, the effect of outliers in a sample, and, more generally, about what allows for concluding that a replication is successful.

4.1. Mixed results concerning the mirror effect

Several large-scale replication projects show that about half of published findings fail to replicate in direct and high-powered replications in psychology (Klein et al., 2018Open Science Collaboration, 2015Simons, Holcombe, & Spellman, 2014). These recent studies point out that it is often difficult to replicate published effects. Between the noise inherent to behavioral sciences and the small-sized effects that we often encounter in psychology, observing statistically significant differences is not guaranteed in replication attempts, even when the effect exists in the population. Indeed, one must take into account the inevitable heterogeneity that exists between a study and its replications (Kenny & Judd, 2019), among other factors.
The present replication findings suggest that the original finding might be a false positive. At the same time, equivalence testing does not warrant a conclusion that the effect is equivalent to 0. Also, multiverse analyses show that the effect was significant in some cases, when using a robust method and a severe criterion for detecting outliers. We believe that if the effect exists, the effect size is likely to be smaller than initially thought. In sum, the study did not provide evidence for a robust mirror effect, but neither did it provide evidence for a null effect (i.e., an effect too trivial to be studied, as defined by a Cohen’s d smaller than 0.2). Therefore, further studies using larger samples are needed to establish more reliable estimates of the effect size and a better understanding of the mechanisms involved in this effect, if it exists.

4.2. Detecting outliers in a sample

Outliers are atypical data points that are abnormally different from the “bulk” of observations in a study, and therefore non-representative of the population (Leys, Delacre, Mora, Lakens, & Ley, 2019). There are many ways to define an outlier in a specific data set, as there are many statistical criteria that have been put forward in the literature. Studentized residuals and z-scores are among the most popular ways to detect outliers (Cousineau & Chartier, 2010). However, as underlined by Rousseeuw (1990), these criteria can underperform. The reason for this is that they are based on the sample standard deviation, which is itself a parameter highly sensitive to outliers (Wilcox, 2010). Robust estimators are hence needed to detect outliers. Contrary to studentized and standardized residuals, the median is highly insensitive to outliers (Leys et al., 2013). As one robust estimator, the median absolute deviation (MAD) is particularly relevant in this case, since the classic methods would have failed to detect influential data points (Leys et al., 2013; see also Wilcox, 2017).
How we manage the presence of outliers in a sample is a fundamental aspect of data analysis. However, to date, there is no consensus about which method is the most appropriate and what threshold should be used for detecting and excluding outliers (Leys et al., 2013). In an attempt to optimize the quality of the replication, the hypothesis, method, and statistical analysis were pre-registered. However, what we failed to predict was that excluding outliers on the basis of studentized residuals would not be sufficient to discard all influential data points. Hence, pre-registering a single outlier detection technique might be insufficient. In this view, Leys et al. (2019) recently provided specific recommendations concerning pre-registering and detecting outliers, one of which is to expand a priori reasoning in the registration, in order to manage unpredicted outliers. In our view, this amounts to the option of registering multiple ways to handle outliers. For instance, one could register a decision tree regarding the possible ways to handle outliers, as a function of the distribution. For instance, Nosek, Ebersole, DeHaven, and Mellor (2017) mention the possibility to define a sequence of tests and to determine the use of parametric or non-parametric approach according to the outcome of normality assumption tests. In a similar vein, standard operating procedures (SOPs) are procedures more general than decision trees that are shared in a given field of research in order to ground standardization of data handling (e.g., Lin & Green, 2016). The development of such standard procedures applied to outlier detection and exclusion could provide a useful tool for pre-registration.
Developing common, consensual procedures can thus be a solution for dealing with the unpredictable aspects of data, such as the presence of outliers. This would be a controlled, transparent, and probably the optimal manner of handling unpredictability, while suppressing the researchers’ degrees of freedom in post-hoc decisions concerning the method used to detect outliers (see Wicherts et al., 2016). In statistics and methodology, as in many fields, a perfect plan does not exist, so it is difficult to offer a perfect solution that fits all studies. In our view, there is a need to define a more general plan of how to handle data, a plan that could fit a large amount of studies. Among the issues that would need to be addressed in such a plan are, for instance, the question of outlier detection/exclusion criterion definition (intraindividually or interindividually), the question of the specific (robust) criterion to be used, and the question of the desired distribution.

No comments:

Post a Comment