Friday, March 10, 2023

The COVID-19 epidemic is accompanied by substantially and significantly lower intelligence test scores

Breit M, Scherrer V, Blickle J, Preckel F (2023) Students’ intelligence test results after six and sixteen months of irregular schooling due to the COVID-19 pandemic. PLoS ONE 18(3): e0281779, Mar 8 2023.

Abstract: The COVID-19 pandemic has affected schooling worldwide. In many places, schools closed for weeks or months, only part of the student body could be educated at any one time, or students were taught online. Previous research discloses the relevance of schooling for the development of cognitive abilities. We therefore compared the intelligence test performance of 424 German secondary school students in Grades 7 to 9 (42% female) tested after the first six months of the COVID-19 pandemic (i.e., 2020 sample) to the results of two highly comparable student samples tested in 2002 (n = 1506) and 2012 (n = 197). The results revealed substantially and significantly lower intelligence test scores in the 2020 sample than in both the 2002 and 2012 samples. We retested the 2020 sample after another full school year of COVID-19-affected schooling in 2021. We found mean-level changes of typical magnitude, with no signs of catching up to previous cohorts or further declines in cognitive performance. Perceived stress during the pandemic did not affect changes in intelligence test results between the two measurements.


Intelligence test results were lower in the pandemic 2020 sample than in the prepandemic 2002 and 2012 samples. The differences in test scores were large, with a difference in general intelligence of 7.62 IQ points between 2020 and 2002 (Analysis 1a). This difference did not appear to be a continuation of a longer decreasing trend. In contrast, we observed larger test scores in 2012 than in 2002 but lower scores in 2020. The difference between 2012 and 2020 was also substantial, with a difference in general intelligence of 6.54 points (Analysis 1b). The cross-sectional cohort comparisons therefore seem to corroborate previous results that regular schooling has a substantial impact on intelligence development and its absence is detrimental for intelligence test performance [9]. The difference in test scores was remarkably large. It may be the case that the student population was hit particularly hard by the pandemic, having to deal with both the disruption of regular schooling and other side effects of the pandemic, such as stress, anxiety, and social isolation [68]. Moreover, students are usually very accustomed to testing situations, which may be less the case after months of remote schooling.

Creativity scores were notably lower than other scores in 2002. It therefore seems like the nonsignificant difference in creativity between 2002 and 2020 was not due to creativity being unaffected by the pandemic, but instead due to creativity scores being low in 2002. This is supported by significantly higher creativity scores in 2012. Lower creativity in 2002 than in later years may be due to unfamiliarity with the testing format, changes in curricula, or changes in out of school activities.

Importantly, the overall results are inconsistent with one possible alternative explanation of decreasing intelligence test scores, namely, a reverse Flynn effect. Flynn observed a systematic increase in intelligence scores across generations in the 20th century [69]. In some countries, a reversed Flynn effect with decreasing intelligence scores across generations has been observed in recent years [177071]. This seems to be an especially plausible alternative explanation for the observed differences in test scores in our Analysis 1a. However, there are arguments against this alternative explanation. A reversal of the Flynn effect has not yet been observed in Germany. Instead, even in recent years, a regular positive Flynn effect has been reported [4572]. Moreover, a reverse Flynn effect is also inconsistent with our observation of increasing test scores from 2002 to 2012. We observed an increase in General Intelligence equivalent to .47 IQ points per year, which is slightly larger than the typically observed Flynn effect [73] or the Flynn effect observed in Germany [45]. The observed decrease in test scores from 2012 to 2020 with .82 IQ points per year for General Intelligence is also much larger than the reverse Flynn effect observed elsewhere (.32 IQ points) [74], making it unlikely that this effect alone could account for the observed decline.

The longitudinal results (Fig 9) showed an increase in test scores between the test (2020) and retest (2021). The magnitude of the increase is in line with the retest effects for intelligence testing that have been quantified meta-analytically (d = .33) [46]. In some cases the retest effects were larger than expected based on the meta-analysis (e.g., Processing Speed, Figural Ability). However, these cases were largely in line with a previous investigation of retest effects in a subsample of the BIS-HB standardization sample, [75] with no clear pattern of consistently larger or smaller retest effects in the present sample. These results indicate neither a remarkable decrease nor a “catching up” to previous cohorts.

Interestingly, we found no impact of perceived stress on the change in intelligence test scores. A possible explanation for the observed results is that stress levels were especially high in the first months of the pandemic, when there was the greatest uncertainty about the nature of the disease and lockdowns and school closures were novel experiences. Some evidence for a spike in stress levels at the beginning of the pandemic comes from tracking stress-related migraine attacks [76] and from a longitudinal survey of college students that was conducted in April and June 2020, finding the highest stress levels in April [77]. Moreover, teachers and students were both completely unprepared for school closures and online teaching at the beginning of the pandemic. The retest was conducted after a month-long period of regular schooling, followed by a now more predictable and better prepared switch to remote schooling that did not catch teachers and students off guard entirely. These factors may explain why intelligence performance did not drop further and why stress levels did not have an effect on the change in performance in the second test.

Strengths and limitations

The present study has several strengths. To our knowledge, this is the first investigation of the development of intelligence test performance during the pandemic. Moreover, we used a relatively large, heterogeneous sample and a comprehensive, multidimensional intelligence test. We were able to compare the results of our sample with two highly similar prepandemic samples using propensity score matching. Last, we retested a large portion of the sample to longitudinally investigate the development of intelligence during the pandemic.

However, the present study also has several limitations that restrict the interpretation of the results. First, due to the pandemic affecting all students, we were not able to use a control group but had to rely on samples collected in previous years. Cohort effects cannot be completely excluded, although we tried to minimize their influence through propensity score matching and the use of two different prepandemic comparison groups. We could not control for potential differences in socioeconomic status (SES) between the samples because no equivalent measure was used in all three cohorts. It would have been beneficial to control for SES because of its influence on cognitive development and on the bidirectional relationship of intelligence and academic achievement [9]. SES differences between samples therefore may account for some of the observed test score differences. However, large differences in SES between the samples are unlikely because the 2012 and 2020 samples were drawn from the same four schools. Regarding the impact of SES on the longitudinal change during the pandemic in the 2020 sample, we did not have a comprehensive SES measure available. However, we had information on the highest level of education of parents. When adding this variable as a predictor in the LCA analyses, the results did not change, and parents’ education was no significant predictor of change.

Second, both measurement points of the study fell within the pandemic. A prepandemic measurement is not available for our 2020 sample. This limits the interpretation of the change in test scores over the course of the pandemic, even though we compared the observed retest effects with those found in meta-analysis and a previous retest-study of the BIS-HB.

Third, the 2020 measurement occurred only a few weeks after the summer break. It has often been shown that the summer break causes a decrease in math achievement test scores [78] as well as intelligence test scores [79]. However, this “summer slide” effect on intelligence seems to be very modest in size [80] and is therefore unlikely to be fully responsible for the large observed cohort differences in the present investigation.

Fourth, perceived stress was only measured by a short, retrospective scale. The resulting scores may not very accurately represent the actual stress levels of the students over the school year. Moreover, perceived stress was not measured at the first measurement point, so changes in stress levels during the pandemic could not be examined. This limits the interpretation of the absence of stress effects on changes in intelligence.

Fifth, the matched groups in Analysis 1b were somewhat unbalanced with regard to grade level (Table 1). The students in the 2020 sample tended to be in higher grades while being the same age. However, this pattern is unlikely to explain the differences in intelligence. The students in the 2020 sample tended to have experienced more schooling at the same age than the other samples, which would be expected to be beneficial for intelligence development [1011].

Sixth, there was some attrition between the first and second measurement of the 2020 sample. This was due to students changing schools or school classes, being sick or otherwise absent on the second day of testing or failing to provide parental consent for the second testing. It may be plausible that especially students with negative motivational or intellectual development changed school or avoided the second testing. This means that the improvement between the first and second time of measurement may be somewhat overestimated in the present analyses.

Seventh and last, only a modest percentage of the samples were matched in the PSM procedure because we followed a conservative recommendation for the caliper size [55] that yielded a very balanced matching solution. The limited common support somewhat diminishes the generalizability of the findings to the full samples.


The pandemic and the associated countermeasures affected the academic development of an entire generation of students around the world, as evidenced by decreases in academic achievement [3]. Simulations predict a total learning loss between .3 and 1.1 school years, a loss valued at approximately $10 trillion [81]. Although we cannot make any causal claims with the present study, our results suggest that these problems might extend to students’ intelligence development. They point out that possible detrimental effects especially took place during the first months of the pandemic. Moreover, our longitudinal results do not point to any recovery effects.

As schooling has a positive impact on students’ cognitive development, educational institutions worldwide have a chance to compensate for such negative effects in the long term. As interventions aimed at the improvement of academic achievement also affect intelligence, [9] the decline in intelligence could be recovered if targeted efforts are made to compensate for the deficit in academic achievement that has occurred. Furthermore, schools could pay attention to offering intellectually challenging lessons or supplementary programs in the afternoons or during vacations, as intellectually more stimulating environments have a positive effect on intelligence development [82].

A second implication concerns current intelligence testing practice. If there is a general, substantial decrease in intelligence test performance, testing with prepandemic norms will lead to an underestimation of the percentile rank (and thus IQ) of the person being tested. This can have significant consequences. For example, some giftedness programs use IQ cutoffs to determine eligibility. Fewer students tested during (or after) the pandemic may meet such a criterion. If the lower test performance persists even after the pandemic, it may even be necessary to update intelligence test norms to account for this effect.

As discussed in the previous section, the present study has several limitations. The results can therefore only be regarded as a first indication that the pandemic is affecting intelligence test performance. There is a need for further research on this topic to corroborate the findings. It is obviously no longer possible to start a longitudinal project with prepandemic measurement points. However, the present article presented a way to investigate the effect of the pandemic if prepandemic comparison samples are available. Ideally, the prepandemic samples would have been assessed shortly before the pandemic onset to minimize differences between cohorts due to the (reverse) Flynn effect, changes in school curricula, or school policy changes. If a sample was assessed very recently before the pandemic, it may also be possible to retest the participants for the investigation of the pandemic effects. Although we cannot make any causal claims with the present study, our results suggest that COVID-19-related problems might extend to students’ cognitive abilities. As intelligence plays a central role in many areas of life, it would be important to further investigate differences between prepandemic and current student samples to account for these differences in test norms and for possible disadvantages by offering specific interventions.

Contrary to the cliché widespread among intellectuals of ordinary people as easily deceived simpletons, humans have an evolutionary rooted distrust of what others say., of all things, this "epistemic vigilance" may be the foundation for delusions

Delusions as Epistemic Hypervigilance. Ryan McKay, Hugo Mercier. Current Directions in Psychological Science, March 8, 2023.

Abstract: Delusions are distressing and disabling symptoms of various clinical disorders. Delusions are associated with an aberrant and apparently contradictory treatment of evidence, characterized by both excessive credulity (adopting unusual beliefs on minimal evidence) and excessive rigidity (holding steadfast to these beliefs in the face of strong counterevidence). Here we attempt to make sense of this contradiction by considering the literature on epistemic vigilance. Although there is little evolutionary advantage to scrutinizing the evidence our senses provide, it pays to be vigilant toward ostensive evidence—information communicated by others. This asymmetry is generally adaptive, but in deluded individuals the scales tip too far in the direction of the sensory and perceptual, producing an apparently paradoxical combination of credulity (with respect to one’s own perception) and skepticism (with respect to the testimony of others).

Epistemic Vigilance

A set of putative cognitive mechanisms serves a function of epistemic vigilance: to evaluate communicated information so as to accept reliable information and reject unreliable information (Sperber et al., 2010). The existence of these mechanisms has been postulated on the basis of the theory of the evolution of communication (e.g., Maynard Smith & Harper, 2003Scott-Phillips, 2008). For communication between any organisms to be stable, it must benefit both those who send the signals (who would otherwise refrain from sending them) and those who receive them (who would otherwise evolve to ignore them). However, senders often have incentives to send signals that benefit themselves but not the receivers. As a result, for communication to remain stable, there must exist some mechanism that keeps signals, on average, reliable. In some species, the signals are produced in such a way that it is simply impossible to send unreliable signals—for instance, if the signal can be produced only by large or fit individuals (see, e.g., Maynard Smith & Harper, 2003). In humans, however, essentially no communication has this property.1 It has been suggested instead that humans keep communication mostly reliable thanks to cognitive mechanisms that evaluate communicated information, rejecting unreliable signals and lowering our trust in their senders—mechanisms of epistemic vigilance.
To evaluate communicated information, mechanisms of epistemic vigilance process cues related to the content of the information (Is it plausible? Is it supported by good arguments?) and to its source (Are they honest? Are they competent?). A wealth of evidence shows that humans possess such well-functioning mechanisms (for review, see, e.g., Mercier, 2020), that they are early developing (being already present in infants or toddlers; see, e.g., Harris & Lane, 2014), and that they are plausibly universal among typically developing individuals. Crucially for the point at hand, these epistemic vigilance mechanisms are specific to communicated information. Our own perceptual mechanisms evolved to best serve our interests, and there are thus no grounds for subjecting their deliverances to the scrutiny that must be deployed for other individuals.
There is now a large amount of evidence that people systematically discount information communicated by others. This tendency has often been referred to as egocentric discounting (Yaniv & Kleinberger, 2000), and it has been observed in a wide variety of experimental settings (for a review, see Morin et al., 2021). For instance, in advice-taking experiments, participants are asked a factual question (e.g., What is the length of the Nile?), provided with someone else’s opinion, and given the opportunity to take this opinion into account in forming a final estimate. Overall, participants put approximately twice as much weight on their initial opinion as on the other participant’s opinion, even when they have no reason to believe the other participant less competent than themselves (Yaniv & Kleinberger, 2000).
The discounting of others’ opinions can be overcome if we have positive reasons to trust them or if they present good arguments—in particular, if our prior opinions are weak (see, e.g., Mercier & Sperber, 2017). However, in the absence of such positive reasons, discounting is a pervasive phenomenon. There is no such systematic equivalent when it comes to perception. Although in some cases we can or should learn to doubt what we perceive (e.g., when attending to the reminder that “objects in mirror are closer than they appear” while driving), this is typically an effortful process with uncertain outcomes. In visual perception, for example, models in which the observer behaves like an optimal Bayesian learner have proven very successful at explaining participants’ behavior (e.g., Geisler, 2011). Even if there are deviations from this optimal behavior (e.g., Stengård & van den Berg, 2019), they do not take the form of a systematic tendency to favor our priors over novel information.
There is thus converging evidence (a) that humans process communicated information differently than information they acquire entirely by their own means and (b) that the former is systematically discounted by default (i.e., in the absence of reasons to behave otherwise, such as reasons to believe the source particularly trustworthy or competent). This, however, leaves open significant questions of great relevance for the present argument. In particular, to what stimuli does epistemic vigilance apply to? Presumably, epistemic vigilance evolved chiefly to process the main form of human communication: ostensive communication, which includes verbal communication but also many nonverbal signals (from pointing to frowning). Related mechanisms apply to other types of communication, such as emotional communication (Dezecache et al., 2013).
What of behaviors that have no ostensive function (e.g., eating an apple) or even aspects of our environment that might have been modified by others (e.g., a book found on the coffee table)? Although such stimuli should not trigger epistemic vigilance by default, they may under some circumstances. One might interpret a friend eating an apple as an indication that the friend has followed health advice to eat more fruit, or one could interpret one’s spouse’s placement of a book on a table as an invitation to read it—whether it was so intended or not. The behavior might then be discounted: We might suspect our friend of eating the apple only for our benefit while privately gorging on junk food.
Other cognitive mechanisms, more akin to strategic reasoning, but bound to overlap with epistemic vigilance, must process noncommunicative yet manipulative information (on the definition of communication vs. manipulation or coercion, see Scott-Phillips, 2008). A detective should be aware that some clues might have been placed by the criminal to mislead her. In some circumstances, therefore, epistemic vigilance and related mechanisms might apply even to our material environments, instead of applying only to straightforward cases of testimony. Still, epistemic vigilance should always apply to testimony, whereas it should apply to perception only under specific circumstances, such that the distinction between these two domains (testimony vs. perception) remains a useful heuristic.
How might these considerations inform our understanding of delusions? Whereas in healthy individuals the scales are adaptively tipped in favor of trusting the perceptual over the ostensive, this imbalance may be maladaptively exacerbated in delusions (Fig. 1). This could be for at least two complementary reasons: Sensory or perceptual evidence may be overweighted, and testimonial evidence may be underweighted. We review each of these possibilities in turn.