Thursday, October 7, 2021

Some evidence suggests that people behave more cooperatively and generously when observed or in the presence of images of eyes (termed the ‘watching eyes’ effect); replication failed

Rotella A, Sparks AM, Mishra S, Barclay P (2021) No effect of ‘watching eyes’: An attempted replication and extension investigating individual differences. PLoS ONE 16(10): e0255531. Oct 6 2021.

Abstract: Some evidence suggests that people behave more cooperatively and generously when observed or in the presence of images of eyes (termed the ‘watching eyes’ effect). Eye images are thought to trigger feelings of observation, which in turn motivate people to behave more cooperatively to earn a good reputation. However, several recent studies have failed to find evidence of the eyes effect. One possibility is that inconsistent evidence in support of the eyes effect is a product of individual differences in sensitivity or susceptibility to the cue. In fact, some evidence suggests that people who are generally more prosocial are less susceptible to situation-specific reputation-based cues of observation. In this paper, we sought to (1) replicate the eyes effect, (2) replicate the past finding that people who are dispositionally less prosocial are more responsive to observation than people who are more dispositionally more prosocial, and (3) determine if this effect extends to the watching eyes effect. Results from a pre-registered study showed that people did not give more money in a dictator game when decisions were made public or in the presence of eye images, even though participants felt more observed when decisions were public. That is, we failed to replicate the eyes effect and observation effect. An initial, but underpowered, interaction model suggests that egoists give less than prosocials in private, but not public, conditions. This suggests a direction for future research investigating if and how individual differences in prosociality influence observation effects.

Check also Stylized and photographic eye images do not increase charitable donations in a field experiment. Paul Lennon, Rachel Grant, and V. Tamara Montrose. Letters on Evolutionary Behavioral Science, Vol 8, No 2 (2017).


This study examined if people were more prosocial in public, under “watching eyes”, or in a control condition with no-eyes. We failed to replicate the previously reported eyes and observation effects. Our results suggest that prosocial disposition (as measured by social value orientation) relates to responses to reputational incentives, where SVO prosocials gave similar amounts in both public and private conditions, but SVO egoists give less than prosocials in private conditions. Only SVO was a consistent predictor of dictator game donations, with prosocials giving more than egoists. Below we discuss each of these results and study limitations.

Failed replications: Observation and ‘watching eyes’ effects

Our manipulation check found that participants felt more observed in the public condition compared to both the eyes and no eyes conditions, suggesting that our public manipulation worked. Despite this, participants did not give more in the dictator game in the public condition compared to the eyes and control conditions. That is, we did not find an observation effect. This result was surprising, given that many prior studies suggests that people are more generous when they are being watched [69,1116,39].

Based on the effect size for watching eyes in a prior study using similar methodology (i.e., short exposure to eyespots; Cohen’s f of .21 [23]), our sample of 355 participants would have given us 95% power to detect the eyes effect and observation effect. Despite this, we did not replicate the canonical “watching eyes” effect. Thus, our first prediction was not supported.

Our result is consistent with several recent failed replications [3339]. Notably, a recent meta-analysis argues that eyes effects are effective at reducing antisocial behavior, with the speculation that images of eyes may be more effective at reducing bad behaviours than increasing good ones [32]. Watching eyes may not be particularly effective at increasing prosocial behaviours.

Reputation and social value orientation

In our pre-registered analysis, egoists did not give less than prosocials across all three conditions. However, we conducted an exploratory analysis where we combined the no eyes control condition and eyes condition into a single private condition to replicate the analyses in a prior study [39]. Although the overall analysis did not reach statistical significance, egoists gave less than prosocials in private conditions, but not in public conditions. This finding is consistent with the prior study [39], where proselfs (egoists and competitors combined) contributed less in private conditions, whereas prosocials did not. This result suggests that egoists give less than prosocials in a dictator game when anonymous. When comparing dictator game allocations among egoists in public and private conditions we did not find any differences. Given that egoists give less than prosocials in anonymous conditions, this suggests that the strategic motives of egoists are different than that of prosocials. Notably, this analysis was underpowered and we cannot draw definitive conclusions whether SVO relates to responses to observation.

Although we had a larger sample in this study compared to Simpson and Willer (2008; [39]), they used a decision with consequences as their primary dependent measure (i.e., participants were informed that a third party could see their decision and use it to inform a subsequent decision). Their manipulation was likely stronger than a decision without consequences, as employed in the present study. It is worth noting, however, that our study was underpowered to find this effect; we could not match the SVOs of approximately 44% of participants due to an error in survey administration. Nevertheless, this is the third study suggesting that SVO may relate to responses to reputation-relevant stimuli and emotions; future studies should continue to investigate the role of individual differences in reputation-based responses.

Notably, our results are suggestive of gender effects in response to reputation-based cues. Researchers have previously proposed gender differences in prosociality [57,58], though see meta-analysis in [59], and recent research finds that people expect women to be more prosocial than men [58]. These findings suggest that there may be gender differences in reputational costs/benefits for acting prosocial in public contexts, which should be further investigated.


The most notable limitation in this study is our sample size. Although our sample was sufficient to replicate the observation and eyes effects, given prior samples (we had 95% power), we could not match the SVOs to their in-lab data of for a large proportion of participants, limiting our ability to draw conclusions about how SVO influences participants responses to reputation-based cues. These results should therefore be interpreted with caution. Despite this limitation, our sample size is much larger than those included in the original study (189 participants, compared to 89 and 70 in two studies [39]). This larger sample can provide a more accurate effect size to estimate power and sample sizes for future studies.

Another possible limitation to our study is that participants gave close to ceiling in the dictator game (i.e., $5) in all conditions (overall M = 4.06, SD = 2.00; all medians = 5), which may have limited our ability to find an observation effect. In fact, 62.4% of participants gave at ceiling in the public condition, and 53.6% in the private condition. However, prior research on eyes effects with a dictator game also found high allocations in the control condition (i.e., $4 out of $10) and found that images of watching eyes increased dictator game allocations beyond $4 [23]. Given that our study used similar methodology as Sparks and Barclay (2013) [23], we can conclude that we failed to replicate the eyes effect in this study. Participants did not report feeling more observed in the presence of eyes and did not give more money in a dictator game when images of eyes were present compared to the control condition. We also failed to replicate an observation effect, despite people feeling more observed in the public condition compared to the control condition, which suggests that people may not always increase cooperation when there are reputational incentives. Notably, many studies investigating observation and eyes effects do not include manipulation checks to confirm if participants feel observed. Future research could investigate when and why we would expect observation effects to occur and should include manipulation checks to confirm the experimental manipulation.

Additionally, people in our anonymous control condition (i.e., no eyes control) reported feeling somewhat observed, likely because they were in a lab environment, where there are some cues of observation such as the presence of other participants and the experimenter [59,60]. Although participants in the public condition reported feeling more observed than those in the control and eyes conditions, their scores were close to the midpoint of the scale, which suggests that participants in the public condition didn’t feel particularly observed. Notably, perceptions of observability were not correlated with dictator game allocations (see supplementary material).

A recent meta-analysis found that decisions with consequences—where participants expected their behaviours to influence how others will respond to them within the experimental protocol—produced larger observation effects on economic game allocations than decisions without consequence (rs of 0.25 and 0.12 respectively; [14]). The dictator game decision in this experiment was a decision without consequence, which may have limited the strength of our manipulation. However, studies using similar methodologies in small group sessions (as in this study) have reported eyes effects [20,23]. We also note that the ‘revelation moment’ differed between the eyes condition and public condition. In the eyes condition, reputational cues (eyes) were revealed right before the dictator game decision, whereas in the public condition participants were told more in advance that others would see their decisions, but the decisions were only made known to others after all decisions were made. Although both of these conditions are comparable to our control condition, these methodological differences may alter participants’ response patterns and should be considered when designing future studies.

Moreover, there are methodological similarities between SVO measures and the dictator game, where both measures ask participants to divide resources. In the present experiment, a key difference is that the dictator game is incentivized and continuous, while the SVO task is a series of hypothetical forced-choice scenarios. A conceptual replication with another measure of prosocial (or antisocial) behavior is needed to determine the generalizability of how SVO relates to prosocial behaviors.

Given the limitations outlined above, future research should investigate individual differences in observation and ‘watching eyes’ effects using dependent measures with greater reputational benefits or costs (see [32]). Moreover, future studies could use the SVO slider measure [47], as opposed to the triple-dominance measure employed in the present study. The SVO slider measure is a continuous measure as opposed to categorical, allowing a more precise classification of participants’ level of SVO [47]. However, SVO is a narrow personality construct, which may limit the ability to detect individual differences in reputation-based effects. Future studies could also examine if broader personality constructs, such as HEXACO Honesty-Humility or Agreeableness [58] are associated with differential response to reputation-based cues.


This study adds to the literature in several ways. Using established methodology, our aggregate data provide a well-powered attempted replication of the eyes effect (which excludes individual difference data based on SVO). Additionally, our results are suggestive that individual differences may influence how people respond to reputation-based cues. These findings are in the same direction as Simpson and Willer’s (2008; [39]) finding that people who are less prosocial (i.e., SVO egoists) are more likely to calibrate their decisions according to reputation-based cues, whereas SVO prosocials are consistently prosocial. Although our study was underpowered to detect individual differences, our sample size is much larger than the original study [39]. These results can inform future research methodologies; future studies should use observation manipulation with consequences, broader personality variables, and a dependent measure with higher reputational benefits or costs to participants to investigate reputation-based effects.

Compared to vegans, meat consumers experienced both lower depression & anxiety; the more rigorous the study, the more positive and consistent the relation between meat consumption and better mental health

Meat and mental health: A meta-analysis of meat consumption, depression, and anxiety. Urska Dobersek et al. Critical Reviews in Food Science and Nutrition, Oct 6 2021.

Abstract: In this meta-analysis, we examined the quantitative relation between meat consumption or avoidance, depression, and anxiety. In June 2020, we searched five online databases for primary studies examining differences in depression and anxiety between meat abstainers and meat consumers that offered a clear (dichotomous) distinction between these groups. Twenty studies met the selection criteria representing 171,802 participants with 157,778 meat consumers and 13,259 meat abstainers. We calculated the magnitude of the effect between meat consumers and meat abstainers with bias correction (Hedges’s g effect size) where higher and positive scores reflect better outcomes for meat consumers. Meat consumption was associated with lower depression (Hedges’s g = 0.216, 95% CI [0.14 to 0.30], p < .001) and lower anxiety (g = 0.17, 95% CI [0.03 to 0.31], p = .02) compared to meat abstention. Compared to vegans, meat consumers experienced both lower depression (g = 0.26, 95% CI [0.01 to 0.51], p = .041) and anxiety (g = 0.15, 95% CI [-0.40 to 0.69], p = .598). Sex did not modify these relations. Study quality explained 58% and 76% of between-studies heterogeneity in depression and anxiety, respectively. The analysis also showed that the more rigorous the study, the more positive and consistent the relation between meat consumption and better mental health. The current body of evidence precludes causal and temporal inferences.

Keywords: anxietydepressionmeatmental healthveganvegetarianismsex


This meta-analysis extends the findings of our prior systematic review (Dobersek et al. 2020) by presenting a quantitative evaluation of the relation between meat consumption/abstention and mental health. It included 171,802 participants aged 11 to 105 years, from varied geographic regions, including Europe, Asia, North America, and Oceania. The findings show a significant association between meat consumption/abstention and depression and anxiety. Specifically, individuals who consumed meat had lower average depression and anxiety levels than meat abstainers. We also showed that vegans experienced greater levels of depression than meat consumers. Sex did not modify these relations. Study quality explained a significant proportion of between-studies heterogeneity and a cumulative meta-analysis confirmed these findings. Specifically, the higher the study quality, the more positive the benefit of meat consumption.

Our results may explain the equivocal nature of prior research. In contrast to our clear findings (both past (Dobersek et al. 2020 and present), other systematic reviews and meta-analytic results were inconsistent or contradictory. These equivocal results suggested that vegetarians, and in some cases vegans had lower levels of depression or anxiety (Askari et al. 2020; Iguacel et al. 2020; Lai et al. 2014; Li et al. 2017; Liu et al. 2016; Nucci et al. 2020; Zhang et al. 2017). As detailed in our systematic review (Dobersek et al. 2020), numerous factors explain these inconsistent conclusions. Briefly, most prior studies employed invalid or unreliable assessment protocols to measure exposures and outcomes (i.e., diet and mental health, respectively). For example, it is well established that dietary recalls and FFQs produce physiologically implausible and non-falsifiable (pseudo-scientific) data (Archer, Pavela, and Lavie 2015; Archer, Hand, and Blair 2013; Archer, Lavie, and Hill 2018a; Archer, Marlow, and Lavie 2018b2018c). Thus, the disparity between self-reported and actual dietary intake may render definitive conclusions impossible when analyzing meat consumption as a continuous rather than dichotomous variable (Archer, Pavela, and Lavie 2015; Archer, Hand, and Blair 2013; Archer, Lavie, and Hill 2018a; Archer, Marlow, and Lavie 2018b2018c).

With respect to mental health, the most rigorous research relied on physician-diagnosed disorders using the Diagnostic and Statistical Manual of Mental Disorders (DSM) (Michalak, Zhang, and Jacobi 2012) (APA 2013) rather than self-reported (subjective) assessments with untested validity. The use of tools with questionable validity can lead to ambiguous findings and limited cross-study analyses.

Another major design error was the use of biased and selective sampling strategies. Several of the included studies recruited samples from vegan and vegetarian websites, social-networking groups, communities, and restaurants. We surmise this may have substantially biased data collection and may skew self-reported variables and findings if participants with a high degree of emotional or ideological commitment to their dietary behaviors intentionally or unconsciously misreport. An antecedent of this error may be a form of confirmation bias in which the flawed sampling confirms the investigators’ ideology or expectations rather than providing dispassionate data and results.

Finally, statistical and communication errors were ubiquitous. These included the failure to correct for multiple comparisons and the inappropriate use of causal language which can lead to invalid results, interpretations, and conclusions. In summary, given that these errors are widespread in the literature, valid conclusions from previous reviews that failed to examine study quality are not possible.

In the present meta-analysis, these errors taken together are related to significant between-studies variation in effect sizes. Study quality explained 58% and 76% of between-studies heterogeneity in the differences in depression and anxiety, respectively. Furthermore, our analyses (see Figures 2, 4, 6, 7, 10, and 11) demonstrated that higher quality studies showed a more positive and consistent relation between meat consumption and mental health. Higher quality studies had much larger sample sizes.

Finally, limited reporting of participant characteristics prevented an examination of several covariates (e.g., BMI, age of diet adoption/length of diet, clinical history, socioeconomic status, culture) that could potentially contribute to between-studies heterogeneity.

Strengths and limitations


This meta-analysis had several strengths. First, our a priori decision to select only studies that provided a clear dichotomy between meat consumers and meat abstainers allowed for a clear and rigorous assessment. While myriad studies have examined vegetarianism along a continuum, these were excluded because the lack of a clear distinction between groups rendered inferences equivocal. This distinction is necessary because self-reported (memory-based) dietary assessments (FFQ) should not be used for quantitative analyses because of their invalidity. Any study that attempts to use FFQs as continuous variables are invalid due to nonquantifiable measurement error (Archer, Lavie, and Hill 2018a; Archer, Marlow, and Lavie 2018b; Archer, Pavela, and Lavie 2015).

Second, we limited our psychological outcomes to the most prevalent and debilitative disorders: depression and anxiety. This allowed a focused yet rigorous analysis and ameliorated the effects of poorly operationalized psychological phenomena such as disordered eating, dietary restraint, orthorexia, and neuroticism. This exclusion helps to avoid potential misclassification and concomitant pathologizing of those who simply wish to avoid specific foods or food groups (e.g., vegans). Finally, with over 170,000 participants from several geographic regions, our meta-analysis allowed for more generalizable and definitive conclusions.


Our meta-analysis also had limitations. First, we excluded non-English-language studies. This potentially biased our results in favor of ‘Western’ norms which include meat consumption. For example, we excluded papers published in languages other than English (e.g., Japanese, Hindi). Thus, we may have omitted studies from geographic regions that follow predominantly vegetarian or plant-based dietary patterns.

Second, while our search was clearly defined and comprehensive, our inclusion criteria excluded many publications that provided data on this topic (e.g., see (Anderson et al. 2019; Barthels, Meyer, and Pietrowsky 2018; Burkert et al. 2014; Cooper, Wise, and Mann 1985; Jacka et al. 2012; Larsson et al. 2002; Li et al. 2019; Northstone, Joinson, and Emmett 2018)). Specifically, these papers were excluded because they examined constructs other than depression or anxiety (e.g., orthorexia, restrained eating behavior) or assessed meat consumption as a continuous rather than dichotomous variable. As previously stated, self-reported dietary status and FFQs lead to nonquantifiable measurement error. Nevertheless, we think that our rigorous and highly focused meta-analysis has the potential to provide stronger evidence for the medical, research, and lay communities.

Third, despite the high confidence we place in our finding that meat abstention is linked to a greater prevalence of psychological disorders, study designs precluded inferences of temporality and causality. Specifically, only two of the included studies (Lavallee et al. 2019; Velten et al. 2018) provided information on temporality. Therefore, we were unable to conclusively examine this effect. Given that there are many reasons why people abstain from meat (e.g., ethical, environmental, animal rights-related reasons), this empirical question has not been adequately addressed. However, our previous systematic review (Dobersek et al. 2020) showed conflicting evidence on the temporal relations between meat abstention and depression and anxiety. Also, conclusions on causality require evidence from rigorous RCTs. Since only one low-quality RCT met our inclusion criteria (Beezhold and Johnston 2012), no conclusions regarding causality are supported.

Finally, the results of our meta-analysis are only as valid as the data collected in the included primary studies. Given that most studies used FFQs and self-reported questionnaires, participants may have been misclassified. Merely reporting that one avoids meat is not the equivalent of actual meat abstention (Archer, Pavela, and Lavie 2015; Archer, Hand, and Blair 2013; Archer, Lavie, and Hill 2018a; Archer, Marlow, and Lavie 2018b2018c). In fact, self-defined vegetarians and meat abstainers may consume meat (Haddad and Tanzman 2003).

Recommendations for future directions

Future investigators should avoid the most common flaws exhibited in the included studies. First, investigators must acknowledge and address the effects of both researcher and participant biases (e.g., confirmation bias, cognitive dissonance, observer-expectancy effects/reactivity) when employing highly selective or biased samples. Individuals who are highly invested in their dietary behaviors may be predisposed to intentional and non-intentional misreporting.

Second, the use of physician-diagnosed disorders based on criteria from the DSM-V (APA 2013; Michalak, Zhang, and Jacobi 2012) is preferable to self-reported symptoms, and assists in producing more definitive results. Additionally, the severe limitations and pseudo-scientific nature of self-reported dietary data and FFQs (Archer, Pavela, and Lavie 2015; Archer, Hand, and Blair 2013; Archer, Lavie, and Hill 2018a; Archer, Marlow, and Lavie 2018b2018c) could be overcome in part with point-of-purchase (barcode) data (Ng and Popkin 2012). However, while these data may be less biased, they are not necessarily an accurate proxy for actual dietary consumption.

Third, the use of more rigorous study designs (e.g., RCTs) is desirable over mere observational investigations. However, it would be extremely difficult to conduct a randomized study of diets with a long enough duration to impact fundamental affective outcomes such as anxiety and depression. Furthermore, detailed participant information regarding behavioral and health-related histories and current lifestyles is essential to valid interpretation and conclusions. Finally, studies should provide complete statistical information that allow for the calculations of effect sizes. More complete reporting would enable meta-analysts to extract both effect measures and study characteristics thus allowing for exploration of potentially important but unanswered questions (e.g., how is time of diet adoption related to mental health?).

Heterozygosity of the major histocompatibility complex predicts later self-reported pubertal maturation in men, suggesting a genetic trade-off between immunocompetence and sexual maturation in human males

Heterozygosity of the major histocompatibility complex predicts later self-reported pubertal maturation in men. Steven Arnocky, Carolyn Hodges-Simeon, Adam C. Davis, Riley Desmarais, Anna Greenshields, Robert Liwski, Ellen E. Quillen, Rodrigo Cardenas, S. Marc Breedlove & David Puts. Scientific Reports volume 11, Article number: 19862. Oct 6 2021.

Abstract: Individual variation in the age of pubertal onset is linked to physical and mental health, yet the factors underlying this variation are poorly understood. Life history theory predicts that individuals at higher risk of mortality due to extrinsic causes such as infectious disease should sexually mature and reproduce earlier, whereas those at lower risk can delay puberty and continue to invest resources in somatic growth. We examined relationships between a genetic predictor of infectious disease resistance, heterozygosity of the major histocompatibility complex (MHC), referred to as the human leukocyte antigen (HLA) gene in humans, and self-reported pubertal timing. In a combined sample of men from Canada (n = 137) and the United States (n = 43), MHC heterozygosity predicted later self-reported pubertal development. These findings suggest a genetic trade-off between immunocompetence and sexual maturation in human males.


Our results support the prediction that greater MHC heterozygosity, a genetic contributor to pathogen resistance33,34, predicts later pubertal timing. In a combined data set derived from two independent samples, MHC heterozygosity predicted relative, but not absolute, recalled puberty. Because males lack a salient, singular pubertal event like menarche, when considering retrospective reports relative pubertal timing may be more accurate because men may be better able to recall whether they matured earlier or later than their peers rather than the precise ages of pubertal events48,49.

Correlations between immunocompetence and pubertal timing could reflect the linked heritability of both traits, common developmental underpinnings50, or pleiotropic effects of MHC genes, which could influence both immunocompetence and sexual maturation. Indeed, some research has shown that MHC class II expression occurs alongside maturation of the adrenal cortex51. Interestingly, dehydroepiandrosterone (DHEA), which is produced by the adrenal cortex and affects aspects of reproductive development, has been implicated in immune function in humans and other species52,53.

LHT offers a framework to explain why immunocompetence and pubertal timing may be related at a functional level: Individuals with reduced extrinsic mortality risk due to lower vulnerability to pathogens may be able to continue growth and delay sexual maturation and reproduction. If so, then selection should favor mechanisms, potentially including pleiotropy and genetic linkage, that couple immunocompetence and the timing of sexual maturation. This possibility aligns with some research on intra-species differences in LH. For example, Tasmanian devil populations affected by an infectious facial tumor disease had a 16-fold higher chance of reaching sexual maturity at an earlier age than usual26. In a study of 22 small-scale human societies, populations with higher extrinsic mortality risk displayed earlier puberty and reproduction—as well as shorter adult height and life expectancy5.

Future work must reconcile research showing opposing patterns, such as among perinatal HIV infection and slower pubertal maturation54. Perhaps the distinction lies in genetic versus acquired factors affecting immunocompetence, or environmental factors (e.g., food energy availability of safety/survival rates), which might also influence luteinizing hormone (LH) release in diverse human populations55. For example, malnutrition has been linked to delayed pubertal maturation in humans44. Accordingly, future research should consider the role of energy availability in the environment as a potentially important moderator of the potential link between MHC and pubertal development. For instance, perhaps the influence of infectious burden on energy availability may be lower in populations with energy abundance and substantial health infrastructure, such as in Western industrialized nations.

Our findings also help explain why MHC heterozygous men have been found to be taller in adulthood56. Height is driven by long bone growth via chondrogenesis at the growth plate57, and epiphyseal fusion at puberty terminates growth. Later pubertal maturation allows more long bone growth before epiphyseal fusion, resulting in taller adult height18; therefore, heterozygous individuals may be taller because they begin puberty later. Future research could test whether pubertal timing mediates the relationship between MHC heterozygosity and adult height. It may be useful to examine the potential moderating role of early stressors in the environment to the MHC-pubertal timing link. From this perspective, developmental plasticity gives rise to an array of phenotypes that emerge in response to specific local social and ecological conditions58. These genetic variants are putatively adaptive insofar as they contribute to greater fitness in the environments in which they manifest. Accordingly, an interaction between HLA homozygosity and early life stressors may be a stronger predictor of pubertal timing than either variable alone.

Within the context of LHT, some researchers have predicted that greater investment in immunocompetence should correspond with later sexual maturation. Although previous research linking early pubertal maturation to a diverse range of health problems supports this notion, this is the first research to demonstrate a correlation between MHC heterozygosity and later recalled pubertal development. Such a link has important implications for understanding the development of puberty-linked physical and mental health outcomes. These results suggest that variation in genetic influences on pubertal timing may reflect a trade-off between somatic growth and maintenance and reproduction, at least in energy-rich environments. However, within the broader context of well-established positive links between environmental condition and earlier (rather than later) pubertal timing, these findings imply that understanding variability in reproductive effort will likely rely upon examining more complex interactions between genetics and local ecological condition.