Sunday, December 6, 2020

No negative Flynn effect in France: Why variations of intelligence should not be assessed using tests based on cultural knowledge

No negative Flynn effect in France: Why variations of intelligence should not be assessed using tests based on cultural knowledge. Corentin Gonthier, Jacques GrĂ©goire, Maud Besançon. Intelligence, Volume 84, January–February 2021, 101512.


• We tested the claim that intelligence decreases in France (negative Flynn effect).

• We re-analyzed princeps data (Dutton & Lynn, 2015) and collected a new sample.

• Performance only decreases on tests involving declarative knowledge, not reasoning.

• This is attributable to measurement bias for older items, due to cultural changes.

• There is fluctuation of knowledge, but no overall negative Flynn effect in France.

Abstract: In 2015, Dutton and Lynn published an account of a decrease of intelligence in France (negative Flynn effect) which had considerable societal impact. This decline was argued to be biological. However, there is good reason to be skeptical of these conclusions. The claim of intelligence decline was based on the finding of lower scores on the WAIS-III (normed in 1999) for a recent sample, but careful examination of the data suggests that this decline was in fact limited to subtests with a strong influence of culture-dependent declarative knowledge. In Study 1, we re-analyzed the data used by Dutton and Lynn (2015) and showed that only subtests of the WAIS primarily assessing cultural knowledge (Gc) demonstrated a significant decline. Study 2 replicated this finding and confirmed that performance was constant on other subtests. An analysis of differential item functioning in the five subtests with a decline showed that about one fourth of all items were significantly more difficult for subjects in a recent sample than in the original normative sample, for an equal level of ability. Decline on a subtest correlated 0.95 with its cultural load. These results confirm that there is currently no evidence for a decrease of intelligence in France, with prior findings being attributable to a drift of item difficulty for older versions of the WAIS, due to cultural changes. This highlights the role of culture in Wechsler's intelligence tests and indicates that when interpreting (negative) Flynn effects, the past should really be treated as a different country.

Keywords: Flynn effectNegative Flynn effectFluid intelligenceCrystallized intelligenceDifferential item functioning (DIF)

5. General discussion

The results of both Study 1 and Study 2 unambiguously indicated that there was no negative Flynn effect in France, in the sense of a general decrease of intelligence or a decrease in the ability to perform logical reasoning: there were no reliable differences between WAIS-III and WAIS-IV for any of the subtests reflecting visuo-spatial reasoning (Gf and Gv), or working memory and processing speed (Gsm and Gs), and which were based on abstract materials. We did find lower total performance on the WAIS-III for a recent sample, but contrary to the classic Flynn effect, this difference between cohorts was exclusively driven by the five subtests involving Gc - acquired declarative knowledge tied to a specific cultural setting.

When considered under the angle of item content, it appeared that this decrease on subtests involving declarative knowledge largely reflected, not an actual decrease of ability, but measurement bias due to differences of item difficulty for samples collected at different dates. All in all, in the five subtests demonstrating a decline, about one fourth of items were comparatively more difficult for the 2019 sample than for the 1999 sample for an equal level of ability. These differences could be traced down to a few specific skills. All but one of the Information items that were biased against a recent sample related to the names of famous people, and biased Comprehension items were all related to civic education; interestingly, the test publisher decided to practically eliminate both topics from the WAIS-IV. All but one of the biased Arithmetic items required computing mental division or proportions. For Vocabulary, the negative net effect of bias was partly compensated by the fact that some words were easier in the recent sample, more consistent with a change in language frequency patterns than with an absolute decrease in vocabulary skills. In all cases, these increases in item difficulty for a recent sample could be attributed to environmental changes in school programs, topics covered by the media, and other societal evolutions.

The fact that the performance decrease on a subtest correlated at 0.95 with its cultural load confirms this conclusion and runs counter to the interpretation that the observed decline is caused by biological factors (Woodley of Menie & Dunkel, 2015). This does not completely rule out biological factors, as cultural loads are not pure indicators of cultural influences: a possible alternative interpretation, as suggested by Edward Dutton and Woodley of Menie, is that a genetic decrease in fluid reasoning could negatively affect the culture of a country, in turn reverberating on Gc subtests (see Dutton et al., 2017; this is a variant of investment theory and of explanations assuming genotype-environment covariance; e.g. Kan et al., 2013). However, this idea would be almost impossible to falsify, and it would be difficult to reconcile with the facts that the correlation with heritability was non-significant and that there was no decline at all for the Gf and Gv subtests, which tend to have high heritability (e.g. Kan et al., 2013Rijsdijk, Vernon, & Boomsma, 2002van Leeuwen, van den Berg, & Boomsma, 2008), and which would be expected to decrease before effects on Gc could be observable. There is also a lack of plausible biological mechanisms that could create such a large decline in the dataset in such a short timeframe. All this converges to clearly suggest a role of cultural changes as the most parsimonious interpretation of the data.

In short, the conclusion that can be drawn from a comparison of WAIS-III and WAIS-IV is that over the last two decades, there has been no decline of reasoning abilities in the French population, but there has been an average decrease in a limited range of cultural knowledge (essentially related to using infrequent vocabulary words, knowing the names of famous people, discussing civic education and performing mental division), which biases performance on older items. In other words, the data do indicate a lower average performance on the WAIS-III in the more recent sample, in line with Dutton and Lynn (2015) results, but a more fine-grained analysis contradicts their interpretation of a general decrease of intelligence in France. In the terms of a hierarchical model of intelligence (Wicherts, 2007), there appears to be no decrease in latent ability at the first level of g; there is a decrease at the second level of broad abilities, but only for Gc; and this decrease seems essentially due to cultural changes creating measurement bias at the fourth level composed of performance for specific items.

This pattern is entirely distinct from the Flynn effect, which represents an increase in general intelligence, and especially in Gf performance, accompanied by much smaller changes on Gc (Pietschnig & Voracek, 2015). Hence it is our conviction that this pattern reflects substantially different mechanisms and cannot reasonably be labeled a “negative Flynn effect”, without extending the definition of the Flynn effect to the point where any difference between cohorts could be called a “Flynn effect” and where it would no longer be useful as a heuristic concept. This point is compounded by the fact that the difference reflected item-related measurement bias, rather than an actual change of ability. To quote Flynn (2009a): “Are IQ gains ‘cultural bias’? We must distinguish between cultural trends that render neutral content more familiar and cultural trends that really raise the level of cognitive skills. If the spread of the scientific ethos has made people capable of using logic to attack a wider range of problems, that is a real gain in cognitive skills. If no one has taken the trouble to update the words on a vocabulary test to eliminate those that have gone out of everyday usage, then an apparent score loss is ersatz.” The current pattern is clearly ersatz: “ersatz effect” may be a better name than “negative Flynn effect”.

There are two possible interpretations to the ersatz difference observed here. On one hand, this decline could be restricted to areas covered by the WAIS-III, and could be compensated by increases in other areas: in other words, the 2019 sample may possess different knowledge, but not less knowledge than the 1999 sample. On the other hand, this might represent a real decline and a cause for concern: results of the large-scale PISA surveys (performed on about 7.000 pupils) routinely point to significant inequalities in the academic skills of French pupils, and their average level of mathematics performance has declined since the early 2000s (e.g. OECD, 2019). It is impossible to adjudicate between these two possibilities (which would require having the 1999 sample perform the WAIS-IV), but even if there were an actual decrease in average knowledge, this conclusion would be significantly less bleak than the picture of a biologically-driven intelligence decrease painted by Dutton and Lynn (2015), and would highlight possible shortfalls of the French educational system (see also Blair, Gamson, Thorne, & Baker, 2005) rather than the downward trajectory of a population becoming less and less intelligent.

This conclusion is in line with a tradition of studies attributing fluctuations of intelligence scores to methodological biases, especially as they relate to [cultural] item content (e.g. Beaujean & Osterlind, 2008Beaujean & Sheng, 2010Kaufman, 2010Nugent, 2006Pietschnig et al., 2013Rodgers, 1998Weiss et al., 2016). As an example, Flieller (1988) reached the same conclusion in a French dataset over three decades ago; Brand et al. (1989) also found a similar result of decreasing scores due to changes of items difficulty, which they illustrated with an understandable decline of the proportion of correct answers for the item “What is a belfry?” between 1961 and 1984. This conclusion is also in line with studies arguing for the role of cultural environment and culture-based knowledge in Flynn-like fluctuations of intelligence over time (e.g. Bratsberg & Rogeberg, 2018). Note that drifts of item difficulty are only one aspect of such cultural changes; changes of test-taking pattern behavior, such as increased guessing, are another example (e.g. Must & Must, 2013; Pietschnig & Voracek, 2013).

Beyond the specific case of average intelligence in France, the current results constitute a reminder that intelligence scores are not pure reflections of intelligence and have multiple determinants, some of which can be affected by cultural factors that do not reflect intelligence itself. Put otherwise, this is an illustration of the principle that performance can differ between groups of subjects without representing a true difference of ability (Beaujean & Osterlind, 2008Beaujean & Sheng, 2010). This is a well-known bias of cross-country comparisons, where test performance can be markedly lower in a culture for which the test was not designed (e.g. Cockcroft, Alloway, Copello, & Milligan, 2015Greenfield, 1997Van de Vijver, 2016). In other words, this principle generalizes to all comparisons between samples, not just intelligence fluctuations over time: investigators should be skeptical of the origin of between-group differences whenever cultural content is involved. This also applies to clinical psychologists using intelligence tests to compare patients from specific cultural groups to a (culturally different) normative sample.

Seven major recommendations for cross-sample comparisons can be derived from the current results:

1) comparisons based on validity samples collected by the publishers of Wechsler scales have to be avoided due to uncertainties about sample composition (as already stressed by Zhu & Tulsky, 1999; the distribution of ages in Study 1 as represented in Fig. 1 constitutes a stark reminder of this fact);

2) comparisons involving multiple subtests should carefully consider which subtests exactly demonstrate differences, and especially which dimension of intelligence they measure (Gf or Gc?);

3) comparisons between different samples should never be performed using different tests with substantial differences of item content, if there is a possibility that the items will be differentially affected by cultural variables extraneous to ability itself (Kaufman, 2010Weiss et al., 2016);

4) even when the same version of a test involving cultural content is used, differences between samples collected at different dates in the same country should be treated as if the past sample were from a different country, due to the possibility of differential item functioning emerging over time;

5) as a consequence, comparisons between samples should primarily rely on tests that involve as little contribution of culture-based declarative knowledge as possible, such as Raven's matrices (e.g. Flynn, 2009b);

6) when only tests requiring culture-based declarative knowledge are available, differences should necessarily be interpreted taking into account possible measurement bias. The issue of measurement bias can be considered under the prism of IRT as a way to separate item parameters from ability estimates and test for DIF, and/or using multigroup confirmatory factor analyses as a way to more accurately specify at which level of a hierarchical model of intelligence samples actually differ (Wicherts et al., 2004);

7) lastly, and as exemplified by the pattern of correlations between performance decline, heritability and g-loadings, and cultural load, no conclusions about the biological origin of between-group differences in test scores can be drawn without also testing the role of cultural factors.

Only 54pct of newspapers than published erroneous research findings published the retraction; the retraction stories were balanced, but shorter than those on the article’s publication and often lacking in context & detail

Dissemination of Erroneous Research Findings and Subsequent Retraction in High-Circulation Newspapers: A Case Study of Alleged MDMA-Induced Dopaminergic Neurotoxicity in Primates. Brian S. Barnett & Richard Doblin. Journal of Psychoactive Drugs, Nov 26 2020.

Rolf Degen's take:

Abstract: Ensuring the public is informed of retractions has proven difficult for the scientific community. While it is possible that newspapers focus differential attention on publication of scientific articles and their subsequent retractions, this topic has received minimal attention from researchers. To learn more, we analyzed newspaper coverage of the high-profile 2002 article Severe dopaminergic neurotoxicity in primates after a common recreational dose regimen of MDMA (“ecstasy”) and its retraction in a case study. We searched the 50 largest American newspapers with available online archives for stories about the article’s publication and retraction. Of the 50 newspapers, 26 (52%) covered the article’s publication and 20 (40%) its retraction. Six of the 50 newspapers (12%) published stories on the article’s retraction without covering its initial publication. Of the 26 newspapers covering the article’s publication, only 14 (54%) covered its retraction. Stories about the retraction were balanced, but shorter than those on the article’s publication and often lacking in context and detail. While the decrease in coverage of the article’s retraction was moderate among the entire sample, the much lower retraction coverage in newspapers that had already covered the article’s publication is concerning and emphasizes the need for increased media coverage of retractions.

KEYWORDS: MDMA, ecstasy, retraction, media, newspaper

Lottery winners that keep working vs. retiring: Across samples and nations, participants morally praise needless work

A creative destruction approach to replication: Implicit work and sex morality across cultures. Warren Tierney et al. Journal of Experimental Social Psychology, Volume 93, March 2021, 104060.

Rolf Degen's take:

• This “creative destruction” replication initiative added new measures and populations to four original study designs.

• The theory of Implicit Puritanism was competed against seven alternative accounts of work morality.

• A number of original findings replicated across multiple cultures, whereas two were identified as likely false positives.

• The best-fitting model suggests work is intuitively moralized across cultures.

Abstract: How can we maximize what is learned from a replication study? In the creative destruction approach to replication, the original hypothesis is compared not only to the null hypothesis, but also to predictions derived from multiple alternative theoretical accounts of the phenomenon. To this end, new populations and measures are included in the design in addition to the original ones, to help determine which theory best accounts for the results across multiple key outcomes and contexts. The present pre-registered empirical project compared the Implicit Puritanism account of intuitive work and sex morality to theories positing regional, religious, and social class differences; explicit rather than implicit cultural differences in values; self-expression vs. survival values as a key cultural fault line; the general moralization of work; and false positive effects. Contradicting Implicit Puritanism's core theoretical claim of a distinct American work morality, a number of targeted findings replicated across multiple comparison cultures, whereas several failed to replicate in all samples and were identified as likely false positives. No support emerged for theories predicting regional variability and specific individual-differences moderators (religious affiliation, religiosity, and education level). Overall, the results provide evidence that work is intuitively moralized across cultures.

Keywords: ReplicationTheory testingFalsificationImplicit social cognitionPrimingWork valuesCulture

9. General discussion

This large-scale creative destruction replication initiative, which involved over eight thousand participants from half a dozen nations, systematically competed theories of culture and work morality against one another. In addition to directly replicating a set of original experimental effects central to the theory of Implicit Puritanism (Poehlman, 2007Uhlmann et al., 2009Uhlmann et al., 2011), we included new measures and populations facilitating novel conceptual tests of the predictions of the Explicit American Exceptionalism, general moralization of work, self-expression values, social class, religious differences, and regional folkways accounts of work values.

The observed pattern of experimental and cross-national differences and similarities severely undermines the original theory of Implicit Puritanism. In every instance, the targeted effect either failed to replicate entirely, or unexpectedly replicated in multiple cultures when it had been predicted to emerge only among Americans. Two original effects— specifically, the moderating effect of target age on judgments of needless work, and influence of implicit salvation primes on work behavior— failed to replicate in all populations examined and are identified as likely false positives (Poehlman, 2007Uhlmann et al., 2011). In contrast, the main effect of moral praise for a lottery winner who continues to work, and false memories consistent with an implicit link between work and sex morality (Poehlman, 2007Uhlmann et al., 2009), were robust across cultures (India, the United States, Australia, and United Kingdom). Finally, the effects of an intuitive mindset on moral judgments of needless work replicated across the USA, Australia, and UK samples, but not the India sample. The emergence of a number of key effects across a number of different nations sharply contradicts Implicit Puritanism's core theoretical claim of a unique American work morality.

Rather than leaving a theoretical void in the form of reduced confidence in the original findings and the underlying ideas, these results point in new theoretical directions. Specifically, they provide initial evidence that work behavior elicits strong moral intuitions across cultures, and that the gap between intuitive and deliberative feelings about work could be larger in wealthier societies. Personal religion (e.g., Protestant faith), degree of religiosity, socioeconomic status, and region of the United States (e.g., historically Puritan-Protestant New England) did not moderate any of the observed experimental effects, failing to support the associated accounts of work values. More investigations involving larger samples of countries, especially societies in which survival rather than self-expression values are widely endorsed (Inglehart, 1997Inglehart & Welzel, 2005), and with varied historic backgrounds and diverse workways (Sanchez-Burks & Lee, 2007) are needed before drawing strong conclusions (Simons, Shoda, & Lindsay, 2017). At the same time, we believe the present investigation highlights the feasibility and generative nature of the creative destruction approach to replication, in identifying the most promising theories to guide further empirical research.

9.1. A Bayesian multiverse analysis

A pre-registered ( Bayesian multiverse analysis examined the consequences of different inclusion criteria, variable operationalizations, and statistical approaches for the replication results (see Haaf, Hoogeveen, Berkhout, Gronau, & Wagenmakers, 2020Haaf & Rouder, 2017Rouder, Haaf, Davis-Stober, & Hilgard, 2019). Overall, the results of the Bayesian multiverse are highly consistent with the frequentist analyses reported earlier (see Supplement 9 for a more detailed report). Strong evidence emerged that the tacit inference effect and overall valorization of needless work (regardless of target age or participant mindset) are true-positives and further present across samples. Although less strongly, the data also support an overall intuitive mindset effect across all samples combined. Finally, strong evidence emerged against the target age and needless work effect, and the salvation prime effect. The latter remained unsupported even in those conditions pre-specified as most favorable for priming effects, specifically controlled laboratory studies and excluding participants suspicious of being influenced or whom had failed to complete all the scrambled sentences. The Implicit Puritanism model performed worse than the winning model for all six original effects. The General Moralization of Work and False Positives accounts were the best fitting models overall, depending on the effect in question. The Protestant work ethic was found to positively predict the main effects of needless work (i.e., preference for worker over retiree regardless of target age or participant mindset), but such judgments did not vary across cultures as predicted by the Explicit American Exceptionalism account or any of the other competing theories (see Furnham et al., 1993, and Leong, Huang, & Mak, 2014, for evidence “Protestant” work ethic beliefs are broadly applicable). Empirical estimates converged across the different universes of potential analyses (see Fig. S9–1 in Supplement 9). Effects that were not replicated in the primary analyses were not supported under any specification in the Bayesian multiverse, and replicable effects found evidentiary support across many different specifications.

9.2. False inferences in cross-cultural experiments

The present replication results highlight potential broader challenges for producing robust and reliable cross-cultural experimental research (Milfont & Klein, 2018). We define an x-cultural experiment as a study containing a manipulation (e.g., random assignment to condition A or condition B) and sampling at least two distinct cultural populations (e.g., university students in China and the United States). More broadly than the typical concerns about false positive findings (Open Science Collaboration, 2015Simmons et al., 2011), such cross-cultural investigations are open to false inferences about patterns of experimental results across different human populations. In addition to the expected condition differences failing to emerge (e.g., salvation prime effect, target age and needless work effect), cross-cultural findings may prove over-robust, in other words emerging in societies where they were theoretically expected not to (e.g., the tacit inferences effect and intuitive work morality effect replicating outside the United States). False inferences could also involve concluding a phenomenon is culturally bounded when it is fact universal, and mis-estimating the direction or relative magnitude of an effect between two cultures, among other empirical patterns.

At least two major features of an x-cultural experiment increase the chances of drawing such false conclusions, relative to a simple two-condition experiment in a single population. First, x-cultural studies often rely on an interaction between membership in a cultural group and an experimental manipulation as the key statistical test of the hypothesized cultural difference. Between-subjects interaction tests are typically underpowered unless very large samples are recruited (Simonsohn, 2014Smith, Levine, Lachlan, & Fediuk, 2002). The Open Science Collaboration's Reproducibility Project: Psychology replicated 23 of 49 targeted studies (47%) whose key test was a main or simple effect, and only 8 of 37 studies (22%) when the key test was an interaction. Second, x-cultural experiments typically rely on small convenience samples and attempt to generalize to broader cultures. For example, 100 participants per location might be recruited from universities in New Haven, USA, and Xiamen, China. Since societies are quite heterogeneous (Kitayama et al., 2006Muthukrishna et al., 2020Nisbett & Cohen, 1996Talhelm et al., 2014), this approach may or may not capture central tendencies in the United States and China.

In the present replication initiative a number of the experimental condition differences emerged (i.e., tacit inferences effect, intuitive work morality effect, needless work main effect), yet none of the original condition x national culture interactions (Poehlman et al., 2007; Uhlmann et al., 2009Uhlmann et al., 2011) were obtained again. The Many Labs 2 crowd initiative likewise failed to replicate previously reported interactions between experimental manipulations and cultural populations, even some considered well-established findings (Klein et al., 2018). To guard against such problems, future cross-cultural behavioral research should seek to collect larger and more varied samples. Researchers might form a network of laboratories and crowdsource data collections at multiple sites in each nation (Cuccolo, Irgens, Zlokovich, Grahe, & Edlund, in pressMoshontz et al., 2018), or partner with a survey firm to systematically sample respondents from different regions of the same country, ideally achieving representative sampling.

Different cultural theories predict distinct patterns of empirical results, and some may be more subject to false inferences than others. In a presence-absence pattern, an experimental effect is hypothesized to emerge in one culture, but not in the other. Most of the original Implicit Puritanism studies predicted and found such a pattern, for example an implicit link between work and sex morality among Americans, but not members of other cultures. In a reduced pattern, the effect is in the same direction for both cultures, but diminished in some cultures relative to others (e.g., varying degrees of loss aversion among members of different nations; Arkes, Hirshleifer, Jiang, & Lim, 2010). Finally, in a reversal pattern, the effects of an experimental manipulation are expected to fully reverse between a focal culture and comparison culture. For example, Gelfand et al. (2002) predicted and found that whereas American participants were significantly more disposed to accept positive than negative feedback, Japanese participants exhibited the opposite pattern, accepting more personal responsibility for negative than for positive feedback. We suggest that future theorizing on culture focus on developing such reversal predictions, which rely on better powered crossover interactions, and are less likely to be confounded by measurement challenges than presence-absence patterns or reduced patterns.

9.3. The broader utility of the creative destruction approach

The present culture and work morality project is the first of several recent initiatives applying the creative destruction approach to replication to previously published findings from our research group (see Tierney et al., in press, for a review). Adding to the recent deluge of failed replications of experimental behavioral findings (e.g., Klein et al., 2014Klein et al., 2018Open Science Collaboration, 2015), none of these replication studies succeeding in reproducing the original patterns of results. However, unlike prior replication initiatives, we were able to obtain positive evidence for alternative theoretical accounts (Supplement 13).

We believe this highlights the general utility of the creative destruction approach to replication, which seeks to combine theory pruning methods from the management literature (Leavitt et al., 2010), with best practices from the open science movement in psychology such as pre-registration (Van't Veer & Giner-Sorolla, 2016Wagenmakers et al., 2012) to achieve critical tests (Mayo, 2018) of competing intellectual ideas. Unlike traditional replication approaches, in which the original finding is tested against the expectation of null effects, the creative destruction approach seeks to identify the strongest theory currently operating in a given intellectual space.

Of course, not all research topics and original findings are well suited for large-scale competitive theory testing. As discussed at greater length by Tierney et al. (in press), the creative destruction approach is best suited to mature research areas with substantial published evidence, common methodological approaches, and well-developed theories that make precise, bounded predictions distinct from those of other theories. In contrast, traditional replications simply repeating the original method are better suited to confirming or disconfirming potential new breakthrough findings. Scientists should carefully allocate scarce replication resources for maximum impact, leveraging the methods best suited to the situation. It is our hope the present line of research contributes to a Replication 2.0 movement, in which rather than solely probing the reliability of past findings, scientists also focus on replacing them with new and improved accounts of human behavior.

Participants evaluated the same costs (public shaming, deaths & illnesses, & police abuse of power) as more acceptable when they resulted from efforts to minimize C19's health impacts, than when they resulted from prioritizing economic costs

Moralization of Covid-19 health response: Asymmetry in tolerance for human costs. Maja Graso, Fan Xuan Chen, Tania Reynolds. Journal of Experimental Social Psychology, December 4 2020, 104084.

Abstract: We hypothesized that because Covid-19 (C19) remains an urgent and visible threat, efforts to combat its negative health consequences have become moralized. This moralization of health-based efforts may generate asymmetries in judgement, whereby harmful by-products of those efforts (i.e., instrumental harm) are perceived as more acceptable than harm resulting from non-C19 efforts, such as prioritizing the economy or non-C19 issues. We tested our predictions in two experimental studies. In Study 1, American participants evaluated the same costs (public shaming, deaths and illnesses, and police abuse of power) as more acceptable when they resulted from efforts to minimize C19's health impacts, than when they resulted from non-health C19 efforts (e.g., prioritizing economic costs) or efforts unrelated to C19 (e.g., reducing traffic deaths). In Study 2, New Zealand participants less favorably evaluated the quality of a research proposal empirically questioning continuing a C19 elimination strategy in NZ than one questioning abandoning an elimination strategy, although both proposals contained the same amount of methodology information. This finding suggests questioning elimination approaches is morally condemned, a similar response to that found when sacred values are questioned. In both studies, condition effects were mediated by lowered moral outrage in response to costs resulting from pursuing health-minded C19 efforts. Follow-up analyses revealed that both heightened personal concern over contracting C19 and liberal ideology were associated with greater asymmetries in human cost evaluation. Altogether, results suggest reducing or eliminating C19 have become moralized, generating asymmetries in evaluations of human suffering.

Keywords: Covid-19MoralizationHuman costMoral outrageInstrumental harmideology


Covid-19 (C19) has been a terrifying global health threat since its detection. In comparison to the familiar seasonal influenza, C19 is more contagious, insidious, deadly, and potentially overwhelming of health care systems (Resnick & Animashaun, 2020). Governments around the world have responded by implementing various restrictions, which had been relatively unprecedented in Western civilizations. Despite these restrictions' capacity to save lives (Alwan et al., 2020), prolonged regulation of human contact and economic activity is not without devastating health, welfare, and economic costs (Glover et al., 2020). Minimizing fatalities and health system burden, while simultaneously protecting people's social wellbeing and livelihoods appears unattainable. In the absence of effective and widely available vaccines or therapeutics, no country is well positioned to provide both sustained health care and economic support for all. Because resources are finite, difficult trade-offs surrounding lives and livelihoods are inevitable. How do people evaluate such trade-offs? The current investigation sought to examine these psychological calculi.

We test the possibility that within the current C19 pandemic, not all human costs are perceived as equally tolerable. Because C19 is a salient threat, we contend that eliminating it has become moralized, perhaps even to the point of a sacred value (Tetlock, 2003; Tetlock, Kristel, Elson, Green, & Lerner, 2000). As a result, we hypothesized that people would exhibit asymmetries in their evaluations of human costs, such that the harmful by-products of C19 reduction or elimination efforts are viewed as more tolerable than those resulting from non-C19 efforts. Moreover, in line with extant work on sacred values, we anticipated that merely questioning the elimination strategy would elicit moral outrage, disapproval, and a desire to reaffirm one's moral commitments.

General Discussion

We investigated whether the moralization of health-based C19 efforts (i.e., to reduce C19 deaths and illnesses, or eliminate the virus) would generate asymmetries in the evaluation of human costs. We hypothesized that because the health impacts of C19 remain an urgent, visible, and quantifiable threat, efforts to reduce that harm would become moralized as moral mandates (Rozin, 1999; Skitka & Houston, 2001). As such, the harmful by-products inherent in combating C19's health effects would be accepted as more tolerable than identical harm resulting from efforts unrelated to C19's health effects. Predictions were overwhelmingly supported. In Study 1 participants exhibited asymmetries in their tolerance for health, social, and human rights costs; identical costs (e.g., number of deaths, online harassment, or police abuse of power) arising from health-related C19 strategies were more readily accepted than those arising from either non-health-based strategies (e.g., economic), or from other unrelated efforts. Moreover, these effects were mediated by moral outrage, supporting that elimination efforts have become moralized.

Study 2 furnished additional evidence for the moralization of C19 health-targeted efforts. Indeed, participants in NZ evaluated a research proposal as less accurate, less methodologically sound, and less valuable to society when it posited the hypothesis that the suffering resulting from continuing an elimination approach in NZ outweighed that from abandoning the approach (compared to one forwarding the reverse hypothesis). Yet, both proposals contained the same amount of empirically validated information. Moreover, Study 2 participants evaluated the researchers as less competent and were less trustful they would honor participants' donation wishes when the researchers merely posited the empirical possibility the elimination approach led to increased suffering. These patterns are congruent with extant work on sacred values (Tetlock, 2003), whereby merely opening cherished beliefs up to scrutiny evokes moral outrage and motivates individuals to further demonstrate their moral commitments. In a similar vein, Study 2 participants who read the research proposal questioning the elimination strategy espoused heightened moral commitments to an elimination approach. Altogether, these patterns support that efforts to control or eliminate C19 have become moralized, leading individuals to overlook potential collateral costs from such efforts.

Our results also provide insight into the individual-level factors that may exacerbate the asymmetries we observed: 1) personal fear of contracting the virus, and 2) political ideology. Across our two studies, both those who more strongly feared contracting the virus and those who more strongly identified as liberal exhibited widened asymmetries, as well as greater moral outrage. Indeed, these greater asymmetries were mediated by heightened moral outrage. Of note, we observed these patterns in both the USA and NZ, suggesting they were not a relic of a particular political climate or a country that had yet to effectively contain the virus. Rather these patterns may reflect deeper ideological differences, such as liberals' greater emphasis on avoiding harm (Graham, Haidt, & Nosek, 2009) or conservatives' greater valuation of personal liberties (Boaz, 1997). Irrespective of their origin, the divergent conceptualizations of morality observed here may undermine empathy for those proffering alternative responses to C19, thereby exacerbating political polarization in the US and beyond (Ditto & Koleva, 2011).

Although our findings lend support for the contention that elimination efforts have become moralized and that perceived threat contributes to this moralization, many other factors undoubtedly contribute to the asymmetries observed here. For example, it is possible that the salience of C19 drives moralization more strongly than perceptions of its harm (see Philipp-Muller, Wallace, & Wegener, 2020; Skitka, Wisneski, & Brandt, 2018). Alternatively, the moral language and media depictions surrounding C19 may amplify moralization, such as by activating disgust. We leave these intriguing possibilities as open questions for future research. Our investigation was also limited by its examination of only a few types of human costs. However, there are numerous tragic costs that can result from both aiming to reduce the spread of C19 and failing to do so. Future studies might assess how individuals weigh these additional costs (Alwan et al., 2020; Glover et al., 2020).

It is worth clarifying that our investigation cannot speak to the moral standing of C19-efforts, nor does it aim to. Behaviors, including C19-directed strategies, are often moralized out of necessity (Rozin et al., 1997; Rozin & Singh, 1999). Indeed, C19 continues to spread rapidly in many places around the world, with devastating consequences. It is perhaps unsurprising then that efforts to combat the pandemic have been moralized and elevated to the status of a sacred value. Nonetheless, C19 is an evolving threat. If, for example, an effective vaccine is developed, the human costs resulting from C19 elimination strategies, such as ‘deaths of despair’, may exceed C19's direct health effects, and consequently, the trends observed here might reverse entirely. However, our findings among New Zealanders suggest the reluctance to consider the instrumental harm of C19 health-based efforts may persist after C19 elimination.

Indeed, our findings suggest potential human costs beyond C19's direct health effects may be relatively under-acknowledged, deprioritized, or granted less moral weight. Within our studies, we held suffering constant, revealing that even the loss of human lives is differentially weighted, depending on the cause. Our findings also reveal that empirical endeavors that might allow scientists to better understand costs resulting from C19 restrictions may be discouraged, unfunded, or dismissed. There are significant disagreements between the world's leading scientists on how C19 should be handled, given its severity and costs (see Alwan, 2020; Horton, 2020). Yet, the current findings identify and underscore a prominent obstacle in evaluating those costs dispassionately or through empirical scrutiny: moral outrage. Without tempered discussions or comprehensive data, assessing the true calculus of human suffering will pose challenges for scientists, policy makers, and the general public alike. The current trade-offs facing decision-makers and individual citizens are difficult, unprecedented, and costly. Providing a nuanced understanding of how individuals evaluate these human costs can help guide an informed pathway towards weathering these ongoing difficulties and ultimately, minimizing human suffering.

Deficit Attention Disorder: Partisans systematically adjust the importance of government overspending based upon which party occupies the presidency; used both to protect one’s own party & rebuke the opposing party

Kane, John V., and Ian G. Anson. 2020. “Deficit Attention Disorder:  Partisan-Motivated Reasoning About Government Overspending.” APSA Preprints, Dec 2020. doi: 10.33774/apsa-2020-nqpr9. Has not been peer-reviewed.

Abstract: Government overspending remains a prominent concern in American politics. Yet, despite the burgeoning literature on partisan-motivated reasoning (PMR), we know little about the extent to which such concern arises from partisan considerations. We advance extant literature by uncovering a novel means by which citizens reason about deficits in a partisan-motivated fashion—i.e., by shifting the importance of the issue. Leveraging pre-registered experimental and observational studies, we find that partisans systematically adjust the importance of government overspending based upon which party occupies the presidency. Further, this proclivity to engage in PMR does not require explicit cues from elites, is symmetrical across parties, and appears to function both to protect one’s own party and rebuke the opposing party. Lastly, in a large-scale text analysis of transcripts from televised partisan media, we again find strong evidence of PMR on the issue of government overspending, though primarily in conservative media.

Association between perceptions of mental illness and reduced mate value, as well as an association between self-reported mental illness and a strong tendency to select mates with mental illness

Boysen, G. A. (2020). Mental illness and mate value: Evidence for reduced mate value among romantic partners perceived as having mental illness. Evolutionary Behavioral Sciences, Dec 2020.

Rolf Degen's take:

Abstract: Evolutionary psychology predicts that people with mental illness should have reduced value as mates. Nonetheless, people with mental illness successfully find mates and pass on their genes. The current research explored people’s evaluations of real-world dating and romantic partners who they perceive as having mental illness to better understand the associated mate value. Study 1 (N = 193) examined participants’ ratings of their romantic partners’ desirable traits. Romantic partners perceived as having a mental illness received lower ratings of desirability than did romantic partners without mental illness. Study 2 (N = 271) demonstrated that romantic partners perceived as having mental illness also received lower ratings on desirable traits when compared with participants’ last partner without mental illness. Study 3 (N = 270) replicated the result of Study 2 by using a rating of holistic mate quality rather than traits. In Study 4 (N = 305), participants rated the holistic value of their current partner, as well as their commitment to that partner. Current partners perceived as having mental illness received lower ratings of value when controlling for commitment. In addition to these findings, across all 4 studies, there was a large and consistent effect of assortative mating such that participants with a self-reported history of mental illness also reported having romantic partners with mental illness. Overall, the results suggest an association between perceptions of mental illness and reduced mate value, as well as an association between self-reported mental illness and the tendency to select mates with mental illness.