Thursday, December 26, 2019

We find that meta-analytic effect sizes are significantly different from replication effect sizes; the differences are systematic & m-a effect sizes are almost three times as large as replication effect sizes

Comparing meta-analyses and preregistered multiple-laboratory replication projects. Amanda Kvarven, Eirik Strømland & Magnus Johannesson. Nature Human Behaviour, December 23 2019.

Abstract: Many researchers rely on meta-analysis to summarize research evidence. However, there is a concern that publication bias and selective reporting may lead to biased meta-analytic effect sizes. We compare the results of meta-analyses to large-scale preregistered replications in psychology carried out at multiple laboratories. The multiple-laboratory replications provide precisely estimated effect sizes that do not suffer from publication bias or selective reporting. We searched the literature and identified 15 meta-analyses on the same topics as multiple-laboratory replications. We find that meta-analytic effect sizes are significantly different from replication effect sizes for 12 out of the 15 meta-replication pairs. These differences are systematic and, on average, meta-analytic effect sizes are almost three times as large as replication effect sizes. We also implement three methods of correcting meta-analysis for bias, but these methods do not substantively improve the meta-analytic results.

From the open version, (17 studies then, 15 studies in the final version):


To summarize our findings, we find that there is a significant difference between the metaanalytic effect size and the replication effect size for 12 of the 17 studies (70.6%), and
suggestive evidence for a difference in two additional studies. These differences are systematic
– the meta-analytic effect size is larger than the replication effect for all these studies- and on
average for all the 17 studies the estimated effect sizes are about 3 times as large in the metaanalyses. Interestingly, the relative difference in estimated effect sizes is of at least the same
magnitude as that observed between replications and original studies in the RP:P and other
similar systematic replication projects5,6,10. Publication bias and selective reporting in original
studies has been suggested as possible reasons for the low reproducibility in RP:P and other
replication projects, and our results suggest that these biases are not eliminated by the use of
To test further whether meta-analyses reduce the influence of publication bias or
selective reporting, we compare the average unweighted effect size of the original studies to the
meta-analyses. We were able to obtain effect sizes of the original studies converted to Cohen’s
D for all original studies except one where the standard deviation was unavailable.41 We were
additionally able to compute a valid standard error for 14 out of 17 original studies. The average
unweighted effect size of these 14 original studies is 0.561, which is about 42% higher than the
average unweighted effect size of 0.395 of the same 14 studies in the meta-analyses. These
point estimates are consistent with meta-analyses reducing the effect sizes estimated in original
studies somewhat, and in formal meta-analytic models the estimated difference between the
original effect and the summary effect in the meta-analysis varies between 0.089 and 0.166.
These estimated differences are not statistically significant but suggestive of a difference in all
three cases using our criterion for statistical significance. (see Supplementary Table 3 for
details). Further work on larger samples are needed to more conclusively test if meta-analytic
effect sizes differ from original effect sizes.
In a previous related study in medicine, 12 large randomized, controlled trials published
in four leading medical journals were compared to 19 meta-analyses published previously on
the same topics.24 They compared several clinical outcomes between the studies and found a
significant difference between the meta-analyses and the large clinical trials for 12% of the
comparisons. They did not provide any results for the pooled overall difference between metaanalyses and large clinical trials, but from graphically inspecting the results there does not
appear to be a sizeable systematic difference. Those previous results for medicine are thus
different from our findings. This could reflect a genuine difference between psychology and
medicine, but it could also reflect that even large clinical trials in medicine are subject to
selective reporting or publication bias or that large clinical trials with null results are published
in less prestigious journals.
Although we believe the most plausible interpretation of our results is that metaanalyses overestimate effect sizes on average in our sample of studies, there are other possible
explanations. In testing a specific scientific hypothesis in an experiment there can be
heterogeneity in the true effect size due to several sources. The true effect size can vary between
different populations (sample heterogeneity) and the true effect size can vary between different
experimental designs to test the hypothesis (design heterogeneity). If the exact statistical test
used or the inclusion/exclusion criteria of observations included in the analysis differ, this will
yield a third source of heterogeneity in estimated effect sizes (test heterogeneity). In the
multiple lab replications included in our study the design and statistical tests used is held
constant across the labs, whereas the samples vary across labs. The effect sizes across labs will
therefore vary due to sample heterogeneity, but not due to design or test heterogeneity. In the
meta-analyses the effect sizes can vary across the included studies due to sample, design- and
test heterogeneity. Sample, design or test heterogeneity could potentially explain our results.
For sample heterogeneity to explain our results, the replications need to have been
conducted in samples with on average lower true effect sizes than the samples included in the
studies in the meta-analyses. We find this explanation for our results implausible. The Many
Labs studies estimate the sample heterogeneity and only find small or moderate heterogeneity
in effect sizes7-9
. In the recent Many Labs 2 study the average heterogeneity measured as the
standard deviation in the true effect size across labs (Tau) was 0.048
. This can be compared to
the measured difference in meta-analytic and replication effect sizes in our study of 0.232-0.28
for the three methods.
For design or test heterogeneity to explain our results it must be the case that replication
studies select experimental designs or tests producing lower true effect sizes than the average
design and test included to test the same hypotheses in meta-analyses. For this to explain our
results the design and test heterogeneity in meta-analyses would have to be substantial and the
“replicator selection” of weak designs needs to be strong. This potential explanation of our
results would imply a high correlation between design and test heterogeneity in the metaanalysis and the observed difference in the meta-analytic and replication effect sizes; as a larger
design and test heterogeneity increases the scope for “replicator selection”. To further shed
some light on this possibility we were able to obtain information about the standard deviation
in true effect sizes across studies (Tau) for ten of the meta-analyses in our sample; Tau was
reported directly for two of these meta-analyses and sufficient information was provided in the
other eight meta-analyses so that we could estimate Tau. The mean Tau was 0.30 in these ten
meta-analyses with a range from 0.00 to 0.735. This is likely to be an upper bound on the design
and test heterogeneity as the estimated Tau also includes sample heterogeneity. While this is
consistent with a sizeable average design and test heterogeneity in the meta-analyses, it also
needs to be coupled with strong “replicator selection” to explain our results. To test for this, we
estimated the correlation between the Tau of these ten meta-analyses and the difference in the
meta-analytic and replication effect sizes. The Spearman correlation was -0.1879 (p=0.6032)
and the Pearson correlation was -0.3920 (p=0.2626), showing no sign of the observed
differences in effect sizes to be related to the scope for “replicator selection”. In fact, the
estimated correlation is in the opposite direction than the direction predicted by the “replicator
selection” mechanism. This tentative finding departs from a recent meta-research paper that
attributes reproducibility failures in Psychology to heterogeneity in the underlying effect
sizes.25Further work with larger samples is needed on this to more rigorously test for “replicator
selection”. It should also be noted that the pooled replication rate across Many Labs 1-3 is
53%, which is in line with the replication rate observed in three large scale systematic
replication project that should not be prone to “replicator selection” (the Reproducibility
Project: Psychology10, the Experimental Economics Replication Project5 and the Social
Sciences Replication project6
). This suggests no substantial “replicator selection” in the Many
Labs studies that form the majority of our sample.
Another caveat about our results concerns the representativity of our sample. The
inclusion of studies was limited by the number of pre-registered multiple labs replications
carried out so far, and for which of these studies we could find a matching meta-analysis. Our
sample of 17 studies should thus not be viewed as being representative of meta-analysis in
psychology or in other fields. In particular, the relative effect between the original studies and
replication studies for the sample of studies included in our analysis is somewhat larger than
the one observed in previous replication projects5,6,10 – indicating that our sample could be a
select sample of psychological studies where selective reporting is particularly prominent. In
the future the number of studies using our methodology can be extended as more pre-registered
multiple labs replications become available and as the number of meta-analyses continue to
increase. We also encourage others to test out our methodology for evaluating meta-analyses
on an independent sample of studies.
We conclude that meta-analyses produce substantially larger effect sizes than
replication studies in our sample. This difference is largest for replication studies that fail to
reject the null hypothesis, which is in line with recent arguments about a high false positive rate
of meta-analyses in the behavioral sciences20. Our findings suggest that meta-analyses is
ineffective in fully adjusting inflated effect sizes for publication bias and selective reporting. A
potentially effective policy for reducing publication bias and selective reporting is preregistering analysis plans prior to data collection. There is currently a strong trend towards
increased pre-registration in psychology22. This has the potential to increase both the credibility
of original studies, but also of meta-analyses, making meta-analysis a more valuable tool for
aggregating research results. Future meta-analyses may thus produce effect sizes that are closer
to the effect sizes in replication studies.

Religion quantified as affiliation, but not religiosity, was related to negative migrant attitudes; Muslims have more negative attitudes toward migrants than Christians

Religion and Prejudice Toward Immigrants and Refugees: A Meta-Analytic Review. Christine Deslandes & Joel R. Anderson. The International Journal for the Psychology of Religion, Volume 29, 2019 - Issue 2, Feb 15 2019.

ABSTRACT: Religion is often a driving force in negative attitudes; however, in the specific case of migrant-based attitudes, research has produced conflicting findings. That is, religion can paradoxically facilitate either tolerance or intolerance toward this group. In light of these inconsistent findings, we conducted a meta-analytic review to estimate the effect size of this relationship with two major aims—first, to explore differences as a function of how religion was operationalised, and second, to explore differences in the target migrant-type (e.g., differences in religion-based attitudes toward immigrants and refugees/asylum seekers). Our search strategy was applied to PsycINFO, EBSCO Psychology and Behavioural Sciences Collection, Web of Science, PsycEXTRA, and ProQuest Central for peer-reviewed English language studies and made calls for unpublished data through relevant professional bodies. This search strategy yielded 37 records (including 43 studies; N = 472,688). Religion was quantified in two ways: either as categorical religious affiliations (k = 60) or as individual differences in self-reported religiosity (k = 30). The meta-analyses revealed that religion quantified as affiliation, but not religiosity, was related to negative migrant attitudes. Specifically, religiously affiliated samples report more negative attitudes than nonreligious affiliated samples, and this effect was often stronger when the target groups were refugees rather than immigrants. In addition, analyses revealed that Muslims have more negative attitudes toward migrants than Christians. Religiosity was unrelated to negative attitudes. These findings are discussed in light of rising antimigrant attitudes.

Check also Cowling, Misha M., Joel Anderson, and Rose Ferguson. 2019. “Prejudice-relevant Correlates of Attitudes Towards Refugees: A Meta-analysis.” OSF Preprints. January 16. doi:10.1093/jrs/fey062
Abstract: This paper meta-analyses the available data on attitudes towards refugees and asylum seekers, with the aim of estimating effect sizes for the relationships between these attitudes and prejudice-relevant correlates. Seventy studies (Ntotal = 13,720) were located using systematic database searches and calls for unpublished data. In the case of demographic factors, being male, religious, nationally identified, politically conservative, and less educated were associated with negative attitudes (Fisher’s zs = 0.11, 0.17, 0.18, 0.21, & -0.16, respectively). For ideological factors, increases in right-wing authoritarianism and socialdominance orientations correlated with negative attitudes, while the endorsement of macro (but not micro) justice principles were associated with positive attitudes (Fisher’s zs = 0.50, 0.50, -0.29, & 0.00 respectively). Perceptions of refugees as symbolic and realistic threats were the strongest correlates of negative attitudes (Fisher’s zs = 0.98, & 1.11, respectively). These findings have contributed to the growing body of knowledge that endeavors to understand the antecedents of refugee-specific prejudice, and are discussed in light of the global refugee crisis. 

In large part, the wish to change personality did not predict actual change in the desired direction; & desired increases in Extraversion, Agreeableness & Conscientiousness corresponded with decreases

From Desire to Development? A Multi-Sample, Idiographic Examination of Volitional Personality Change. Erica Baransk et al. Journal of Research in Personality, December 26 2019, 103910.

• In large part, individuals’ volitional personality change desires did not predict actual change in the desired direction.
• Desired increases in Extraversion, Agreeableness and Conscientiousness corresponded with decreases in corresponding traits.
• Participants perceived more change than actually occurred.
• Decreases in Emotional Stability predicted perceptions of personality change.

Abstract: Using an idiographic-nomothetic methodology, we assessed individuals’ ability to change their personality traits without therapeutic or experimental involvement. Participants from internet and college populations completed trait measures and reported current personality change desires. Self-reported traits as well as perceptions of trait change were collected after 1-year (Internet) and 6-months (College). In large part, volitional personality change desires did not predict actual change. When desires did predict change, (a) desired increases in Extraversion, Agreeableness and Conscientiousness corresponded with decreases in corresponding traits, (b) participants perceived more change than actually occurred, and (c) decreases in Emotional Stability predicted perceptions of personality change. Results illustrate the difficulty in purposefully changing one’s traits when left to one’s own devices.

Keywords: Volitional personality changeIdiographic-nomotheticPersonality development

From Baranski's 2018 PhD Thesis

Volitional personality change across 58 countries

First, on average across 58 countries, 61.38% participants report that they are
currently trying to change an aspect of their personalities. The sheer number of people
around the world that are trying to accomplish personality change goals is in and of itself
notable. Indeed only eight countries had percentages lower than 50%. Nevertheless, there
was substantial variation across countries in the percentage of individuals who were
attempting this change. Specifically, country proportion of volitional personality change
attempts ranged from 84.75% (Indonesia) to 28.07% (Israel).
In an attempt to explain this variation, I first related country-level variables to
countries’ proportion of volitional personality change. In countries with high employment
rates, a higher proportion of individuals report trying to change their personalities. It may
be the case that workplace demands inspire individuals to attempt to improve their
personalities in ways that would be beneficial to workplace success. In support of this
possibility, previous research in lifespan development indicates success in the workforce
(e.g., being detailed oriented and dependable) is related to high levels of
conscientiousness (Barrick & Mount, 1991; Tett & Burnett, 2003). It may be the case,
therefore, that individuals beginning a new job or adding new responsibilities to an
existing position may be intentionally increasing levels of conscientiousness to meet their
new workplace demands. Also, low levels of country-level subjective health was related
to high proportions of volitional personality change. One possible explanation for this
relationship is that individuals residing in countries with low averages of self-reported
health might be inspired to work towards feeling better in all areas of their lives. In other
words, in an attempt to improve low wellbeing evidenced by their subjective health
ratings, individuals may seek to be more emotionally stable (to improve psychological
well-being) or conscientious (to improve self-care).
I next investigated what predicted volitional personality change on the individual
level. Across the majority of countries, individuals with high levels of negative
emotionality and its facets (i.e., anxiety, depression and emotionality) and low levels of
both subjective and interdependent happiness tended to report currently trying to change
an aspect of their personalities. There was also a trend for individuals high in openness
(driven by intellect) to also report volitional personality change, albeit less consistently
across countries. These results imply that individuals who have negative emotions yet are
highly intellectual tend to want to change an aspect of their personalities. In other words,
individuals who are thinking deeply about their own negative personality traits or general
wellbeing, tend to be report changing something about their personalities.
 The aforementioned findings cue us in to who is trying to change their
personalities around the world. The next question to examine, then, is what exactly it is
people want to change. Similar to individuals across US states, the majority of
participants from our international sample indicated that they were trying to be more
emotionally stable, conscientious, extraverted and agreeable. Again replicating analyses
from our US sample, facet level analyses revealed that a proportion of responses that fell
in to each category, some categories varied more than others. For instance, the degree of
variation for increased emotional stability was nearly a fourth of that for increased
extraversion. Indeed, the lowest proportion of individuals with an volitional personality
attempt to increase emotional stability is 14.55% (Hong Kong), whereas the lowest
proportion for attempts to increase extraversion across countries was 3.37% (Croatia).
The latter finding may be explained by already high levels of extraversion for Croatian
participants – who had among the highest levels of this trait relative to the other countries
included in the analyses.
 Finally, I assessed the relationship between current personality traits and specific
volitional personality change attempts. For extraversion, agreeableness,
conscientiousness and negative emotionality, there were strong relationships between
current trait levels and corresponding volitional personality change traits. For instance,
individuals with low levels of extraversion tended to report that they were currently
trying to increase levels of extraversion (driven by attempts to increase levels of
sociability). Like analyses across US states, these patterns did not vary across countries.
The one exception, however, was negative emotionality which did vary in its relationship
to attempts to increase emotional stability across countries. Indeed, looking at these
relationships by country reveals that in some countries there is a positive relationship
between current levels of negative emotionality and the attempt to increase emotional
stability, and in others there is a strong positive relationship. For example, in Slovakia,
those who reported a current attempt to increase emotional stability tended to have low
levels of negative emotionality, whereas in New Zealand, individuals who report trying to
increase levels of emotional stability tend to be high in negative emotionality. It seems to
be the case that in some countries, negative emotionality prompts volitional personality
change in the same way it does with other traits (e.g., high negative emotionality
prompting attempts to be more emotionally stable), yet in others, low levels of negative
emotionality prompts individuals to be even more emotionally stable.

Sports: Tendency to attribute personal success to internal factors & personal failure to external ones, & a tendency to attribute team success to factors within the team & failure to factors outside the team

Systematic Review and Meta-Analysis of Self-Serving Attribution Biases in the Competitive Context of Organized Sport, Mark S. Allen et al. Personality and Social Psychology Bulletin, December 25, 2019.

Abstract: This meta-analysis explored the magnitude of self-serving attribution biases for real-world athletic outcomes. A comprehensive literature search identified 69 studies (160 effect sizes; 10,515 athletes) that were eligible for inclusion. Inverse-variance weighted random-effects meta-analysis showed that sport performers have a tendency to attribute personal success to internal factors and personal failure to external factors (k = 40, standardized mean difference [SMD] = 0.62), a tendency to attribute team success to factors within the team and team failure to factors outside the team (k = 23, SMD = 0.63), and a tendency to claim more personal responsibility for team success and less personal responsibility for team failure (k = 4, SMD = 0.28). There was some publication bias and heterogeneity in computed averages. Random effects meta-regression identified sample sex, performance level, and world-region as important moderators of pooled mean effects. These findings provide a foundation for theoretical development of self-serving tendencies in real-world settings.

Keywords: group processes, judgment, meta-regression, self-serving bias, sport psychology