Monday, September 12, 2022

Rolf Degen summarizing... Neuroscience's cherished idea that the dorsolateral prefrontal cortex is crucially involved in the exertion of self-control flunks the replication test

Can we have a second helping? A preregistered direct replication study on the neurobiological mechanisms underlying self-control. Christin Scholz, Hang-Yee Chan, Russell A. Poldrack, Denise T. D. de Ridder, Ale Smidts, Laura Nynke van der Laan. Human Brain Mapping, September 9 2022. https://doi.org/10.1002/hbm.26065

Abstract: Self-control is of vital importance for human wellbeing. Hare et al. (2009) were among the first to provide empirical evidence on the neural correlates of self-control. This seminal study profoundly impacted theory and empirical work across multiple fields. To solidify the empirical evidence supporting self-control theory, we conducted a preregistered replication of this work. Further, we tested the robustness of the findings across analytic strategies. Participants underwent functional magnetic resonance imaging while rating 50 food items on healthiness and tastiness and making choices about food consumption. We closely replicated the original analysis pipeline and supplemented it with additional exploratory analyses to follow-up on unexpected findings and to test the sensitivity of results to key analytical choices. Our replication data provide support for the notion that decisions are associated with a value signal in ventromedial prefrontal cortex (vmPFC), which integrates relevant choice attributes to inform a final decision. We found that vmPFC activity was correlated with goal values regardless of the amount of self-control and it correlated with both taste and health in self-controllers but only taste in non-self-controllers. We did not find strong support for the hypothesized role of left dorsolateral prefrontal cortex (dlPFC) in self-control. The absence of statistically significant group differences in dlPFC activity during successful self-control in our sample contrasts with the notion that dlPFC involvement is required in order to effectively integrate longer-term goals into subjective value judgments. Exploratory analyses highlight the sensitivity of results (in terms of effect size) to the analytical strategy, for instance, concerning the approach to region-of-interest analysis.

4 DISCUSSION

Hare et al. (2009) were among the first to provide empirical evidence on the neural correlates of self-control. Since then, this seminal study has had profound impact on theory and empirical work across multiple fields, but it has never been directly replicated. We performed a preregistered, direct replication of this experiment with two goals: (1) to further strengthen the evidence base for self-control theory and research, and (2) to test the robustness of the original results across analytical choices. The results of the four key hypothesis tests are summarized in Table 2.

TABLE 2. Hypothesis test overview
Hypothesis (quoted from Hare et al., 2009, p. 646)Replication findings
  1. [Activity] in vmPFC should be correlated with participants' goal values regardless of whether or not they exercise self-control
Supported
  1. [A]ctivity in the vmPFC should reflect the health ratings in the SC group but not in the NSC group.
Supported, with reservations
  1. [T]he dlPFC should be more active during successful than failed self-control trials.
Not supported
  1. dlPFC and vmPFC should exhibit functional connectivity during self-control trials.
Mixed evidence
  • Abbreviations: dlPFC, dorsolateral prefrontal cortex; NSC, non-self-controllers; SC, self-controllers; vmPFC, ventromedial prefrontal cortex.

Our data provide further support for the now widely accepted notion that decisions are associated with a value signal in vmPFC, which integrates relevant choice attributes to inform a final decision (Hypotheses 1 and 2; Table 2). Specifically, like Hare et al. (2009), we found positive correlations between participants' goal values (choices for food items) and activity within vmPFC, regardless of whether participants exercised self-control. We were also able to replicate findings which were reported in the original study in support of the idea that vmPFC prioritizes choice attributes that are consistent with each individual's subjective values. Specifically, as in the original study, activity in vmPFC was associated with the perceived healthiness of food items in participants who were relatively more successful at exercising self-control in the experimental task but not in participants who were relatively less successful. However, we did not find evidence of significant differences between the two groups. Overall, these results are in line with a broader set of literature in neuroeconomics, which has described the role of vmPFC in valuation across diverse types of stimuli (e.g., money, consumer goods, etc., for a review see (Bartra et al., 2013)). The present study is the first to provide a direct replication of this effect in the context of food-related decision-making. Thus, this replication study increases the confidence in choice models of self-control which describe self-control as a value-based choice (Berkman et al., 2017).

In addition to the replication of the originally reported analyses, we added several analysis branches to further test the robustness of these results. First, in a follow-up analysis to the whole-brain search for brain regions associated with goal value (Figure 4), Hare et al. (2009) highlight the fact that individual scale points (−2 –2) of goal value are neatly distinguished in a step-wise pattern in their vmPFC ROI, suggesting that the ROI can be used to precisely distinguish and predict choices. However, the original analysis approach was optimized to demonstrate this effect and requires individual-level choice data to identify individual peak-voxels within a larger vmPFC ROI. In addition, this analysis supports the limited conclusion that, on average, most study participants show this step-wise encoding of goal value in at least one voxel within a larger vmPFC area. We added an alternative analysis approach by averaging signal extracted from all voxels within the vmPFC ROI in which activity was associated with goal value in our replication sample. We show that the step-wise encoding of choice behavior is largely preserved in this more general analysis, but that the effect size is substantially smaller. Similarly, when examining relationships between health and taste ratings and average signal within vmPFC, we do not find significant encoding of health ratings in the SC group despite the relatively large size of this replication sample. In other words, future studies that are interested in reusing these vmPFC ROIs as indicators of goal value without the luxury of an individual-level localizer task that allows them to identify individual peak voxels per person likely require a much larger sample to be appropriately powered than implied by the original publication.

Further, next to vmPFC and in contrast to the original study, we identified positive associations between goal value and activity in clusters within the striatum at a relatively lenient statistical threshold (p < .001, uncorrected) used in the original study. This discovery is likely a function of the increased power in the larger replication sample and largely in line with the neuroeconomics literature on subjective valuation which regularly identifies clusters in both vmPFC and striatum (Bartra et al., 2013). Following up on this finding, we found some evidence of differentiation between individual levels of goal value, even within our caudate ROI when applying the optimized analysis procedure reported in the original study. This adds to the findings in prior work suggesting that vmPFC is not the exclusive locus of goal value representation in the human brain.

We did not find strong evidence in support of the second set of hypotheses (Hypotheses 3 and 4, Table 2) proposed by Hare et al. (2009), which highlight the role of left dlPFC in self-control. First, we examined average activity levels in left dlPFC. Even though there were clear (and replicated) behavioral differences between participants who were relatively more and those who were relatively less successful at exercising self-control in the scanner task, we did not find hypothesized, statistically significant group differences in dlPFC activity during successful self-control trials in a whole-brain analysis. Instead, we observed relative deactivation across multiple brain regions in NSC relative to SC, including, but not limited to, areas that are involved in processing of subjective values such as vmPFC. One possible alternative hypothesis supported by our data thus is that SC do not rely on more intensive executive processing indicated by higher dlPFC activity to downregulate subjective value in self-control situations, they simply perceive less intensive subjective value for “tempting” food items to begin with. Another alternative explanation is that this null finding is due to power limitations in our data, given that only 15 participants (compared to 19 in the original sample) qualified as SC. In other words, there is a possibility that positive activations in dlPFC during self-control are simply more subtle than the resulting deactivation in value-related areas. Although we cannot conclusively disentangle these contradictory ideas, note that we exclusively found negative (although nonsignificant) coefficients within dlPFC in this sample.

Next, we followed procedures reported by Hare et al. (2009) to examine the role of dlPFC in self-control in terms of its functional connectivity with brain activity in vmPFC. Since we were unable to identify a functionally defined dlPFC cluster in which average activity was involved in self-control in the replication sample, we relied on a meta-analytically defined map from www.neurosynth.org (Yarkoni et al., 2011) associated with the term “self-control” and intersected it with an anatomical, left dlPFC mask. Our analyses which fully replicated the original work by focusing exclusively on processes in participants who were relatively more successful SC during the scanner task did not replicate the original findings which suggested a negative indirect relationship between dlPFC and vmPFC activity through IFG/BA46 during self-control. We followed up on this null-result by rerunning the PPI on the full sample of participants who exercised any self-control in the scanner task (N = 59) to address concerns about statistical power. This path was chosen given the absence of strong theoretical arguments that the mechanisms that drive successful self-control differ qualitatively (rather than just in intensity) between people who are successful more often and those who are successful relatively less often. Indeed, in this larger sample, we do find some evidence of replication. Stronger still, we found evidence of direct, negative correlations between activity within our meta-analytic left dlPFC seed and an area within vmPFC, which was hypothesized, but not found by Hare et al. (2009). It is important to note, however, that we simultaneously found evidence for unexpected positive associations between activity in the left dlPFC ROI and another, more dorsal MPFC cluster. Of note here is that the whole-brain table for this analysis in the original publication revealed a similar positive association with an MPFC cluster in almost the exact same location (see Figure 11). While there was (minimal) overlap between the unexpected MPFC cluster that showed positive functional connectivity with left dlPFC and the vmPFC ROI that was associated with goal value in our sample, we did not find such overlap between the vmPFC cluster that showed the hypothesized negative association with dlPFC. In other words, the first PPI, at best, provides mixed evidence regarding the nature of the relationship between dlPFC and vmPFC activity during self-control. Hare et al. (2009) proceeded to follow-up on the lack of a negative direct association between dlPFC and vmPFC in their first PPI by identifying a cluster in BA46 that was negatively associated with dlPFC as the seed region for a second PPI. Following this analysis approach, we were able to replicate the original findings, identifying a cluster in vmPFC that was positively associated the BA46 seed identified in PP1 based on the full replication sample (N=59) and thus indirectly negatively associated with the meta-analytic dlPFC ROI. In sum, our replication data provides mixed evidence with regards to Hypothesis 4 regarding a negative relationship between dlPFC and dlPFC activity during self-control.

These mixed results highlight the need for additional work to fully understand the role of the dlPFC in food-related decision-making and in theories of self-control more generally. Overall, our findings are most in line with a conceptualization of self-control as a simple form of value-based decision-making in which different choice attributes (here health and taste considerations) are encoded and integrated in vmPFC according to subjective values of the decision-maker (Berkman et al., 2017). This contrasts with the model that the findings of Hare gave rise to, wherein longer-term goals (here health considerations) required dlPFC involvement in order to be effectively integrated into subjective value judgments (Hare et al., 2009).

A frequently voiced explanation for failed replications is that the (cultural) context differed between the original and replication study (Zwaan et al., 2018). In our case, the original study was performed in the United States before 2009 and the replication in the Netherlands, approximately 10 years later. Thus far, we are not aware of any strong theoretical or empirical claims that the brains or fundamental psychological processes surrounding self-control of US subjects are different from those of Dutch study participants or that the basic neural processes of valuation and self-control have changed over the past decade. However, what could differ between US and Dutch individuals and what could have changed over the past decade is the role of food and dieting in society, and more specifically, to what extent food choices can generate a self-control conflict and how people cope with that. This may—in theory—influence the way in which people respond to the task and stimuli. Naturally, for a self-control dilemma to occur one should have the goal to diet or eat healthy. It could be argued that stronger goal commitment may strengthen attempts of overruling impulses and therefore amplify control-related responses. Observational studies showed that the prevalence of dieting is higher in Europe than in the United States (Santos et al., 2017) and a large proportion of the Dutch population self-reports to diet or actively restrain their food intake (de Ridder et al., 2014). This would speak against this being an explanation for the null finding. It should however be noted that self-reports of dieting and dietary restraint have been shown to be unrelated or weakly related to actual intake (de Ridder et al., 2014; Stice et al., 2004) which casts doubt on this measure being a reliable proxy of goal strength. We cannot rule out but we also cannot support that goal commitment was stronger for the successful SC in the original study compared to the current replication study.

Another important conclusion from this project is that analytical flexibility can influence fMRI results. Specifically, for H1 and H2 we presented two sets of results produced using two different analysis strategies. While the overall patterns of results remained similar, increasing confidence in the directionality of effects, effect sizes differed significantly. This has important implications for follow-up research which may rely on existing work for power calculations. Previous work has shown that not only analytical flexibility but also different preprocessing approaches to the fMRI data (e.g., different software packages and varying parameters) may affect task-based fMRI results (Bowring et al., 2022; Mikl et al., 2008; Triana et al., 2020). In this replication study we employed a state-of-the-art, standardized, and optimized preprocessing pipeline provided by fMRIprep, which was not available to the authors of the original study (Esteban, Markiewicz, et al., 2018). As much as possible, we chose parameters similar to those used in the original study (e.g., the same smoothing kernel). Though submitting the data through different preprocessing pipelines was outside of the scope of the current study, we acknowledge that doing so could potentially further inform the field about the (in)variability of individual results to specific choices made by the researchers. Unpreprocessed data for this project is available on OpenNeuro and would support such an investigation for those interested.

4.1 Impact on theory

Our findings are relevant for future theorizing on self-control. Specifically, this replication data set supports the conceptualization of self-control as either a very simple form of value-based decision-making (Berkman et al., 2017) or as automatic “effortless” self-control (Gillebaart & de Ridder, 2015) rather than a dual-system which involves conscious effortful control.

In psychology, self-control has traditionally been explained with dual-system theories (e.g., Hofmann et al., 2008; Metcalfe & Mischel, 1999). These theories are characterized by the notion of two (competing) systems for processing information, namely a “hot”/automatic/impulsive system and a “cold”/rational/reflective system. According to these dual-system models, self-control is successful when the impulses arising from the “hot” system are overcome and, consequently, behavior is in line with long-term goals. In this traditional approach, the dilemma first must be identified and, subsequently, effortful and conscious inhibition is required to overcome it (Fujita, 2011). A neurobiological parallel to these dual-system models has been proposed in which self-control involves a balance between brain regions representing the reward, salience and emotional value of a stimulus and prefrontal regions associated with (effortful) inhibition and cognitive control (Heatherton & Wagner, 2011). In this traditional perspective, effortful and conscious impulse inhibition is a necessary or defining feature of (successful) self-control.

A major criticism of this traditional perspective is that successful self-control does not always require effortful inhibition or conscious control. It has been proposed that there are many different routes to self-control, only some of which involve effortful inhibition (Fujita, 2011). Research has indicated that people can automate goal-striving behaviors in response to contextual cues (Bargh et al., 2001; Chartrand & Bargh, 1996). For instance, providing cues related to the long-term goal (e.g., dieting cues) promotes goal-congruent choices through goal priming (Fishbach et al., 2003; Papies, 2016; Van der Laan et al., 2017), which is thought to occur without requiring conscious deliberation or effort. Further, by systematically repeating (healthy) behaviors (healthy) habits can be created. It has been shown that successful SC do not necessarily exert more effort; they perform healthy behaviors automatical because of healthy habits (Galla & Duckworth, 2015; Gillebaart, 2018).

This has led to alternative conceptualizations of self-control which do not include or at least attenuate the role of effortful inhibition. As mentioned, recently, successful self-control has been conceptualized as being at least partly an automatic process in which responses to environmental cues that are routinized (or automatically triggered) in the direction that is in line with their long-term goals (Fujita, 2011; Gillebaart, 2018). A second theory, which recently has gained more traction, is to consider self-control as a simple value-based choice (Berkman et al., 2017). Value-based decision-making involves choosing an option from a set based on its relative subjective value. This process involves calculating a value for each option by evaluating various attributes—gains (e.g., improved health) and costs (e.g., less food enjoyment), assigning weights to these attributes, and enacting the most valued option. It should be noted that this is a dynamic process. That is, the weight of each attribute is sensitive to attentional shifts (e.g., being explicitly guided toward certain attributes like health), contextual effects and framing of the choice set. Within this conceptualization of self-control, there is nothing special about long-term goals: attributes related to short- and long-term goals treated similar in this equation though the relative weights may be different based on the aforementioned factors. This discussion in psychology intersects with the ongoing debate in decision neuroscience and temporal discounting where Kable and Glimcher (2007) suggested there is one common valuation in vmPFC while McClure et al. (2004) suggested that separate neural systems encode value for immediate versus longer-term attributes.

The study of Hare conceptualizes self-control as a value-based decision (H1, H2) but in line with traditional dual-system models it still posi that there are dual motives and that the future part is “special”: integrating longer-term considerations into the value system, that is, changing the weight of long-term attributes, requires involvement from control-related areas (i.e., the dlPFC; H3, H4). Their hypothesis about the role of the dlPFC had its basis in the role of dlPFC in cognitive control and emotion regulation. The authors speculated that vmPFC originally evolved to predict the short-term value of stimuli and that humans developed the ability to incorporate long-term considerations into values by giving structures such as the dlPFC the ability to modulate this value.

Our mixed findings regarding dlPFC involvement highlight the need for more research to understand the role of dlPFC in assigning weight to these longer-term consequences. The replication results rather point to the conceptualization of self-control as either automatic and “effortless” or as a (simple) form of value-based decision-making. At a minimum, our results support the idea that that it is not the dlPFC that is responsible for increasing the weight of the longer-term attributes into the choice. In support of the latter: when comparing successful to unsuccessful trials that required self-control in all participants, we observed a deactivation of vmPFC, which suggests that successful self-control in this sample may be driven by a weaker subjective value for a given food item rather than by more intensive control driven by dlPFC. The finding, that in successful SC, vmPFC reflects health ratings, even though dlPFC is not active, suggests that dlPFC activation is not needed to incorporate health into the vmPFC value signal. Thus indeed, in line with the proposition of self-control as a simple form of value-based decision-making (Berkman et al., 2017), decisions may just be the result of multiple single value-calculations.

Ten years after a decisive court ruling, we are not able to identify economically or statistically significant effects of corporate political spending on state tax policy, including tax rates, discretionary tax breaks, and tax revenues

Corporate Political Spending and State Tax Policy: Evidence from Citizens United. Cailin R. Slattery, Alisa Tazhitdinova & Sarah Robinson. NBER Working Paper 30352. August 2022. DOI 10.3386/w30352

Abstract: To what extent is U.S. state tax policy affected by corporate political contributions? The 2010 Supreme Court Citizens United v. Federal Election Commission ruling provides an exogenous shock to corporate campaign spending, allowing corporations to spend on elections in 23 states which previously had spending bans. Ten years after the ruling and for a wide range of outcomes, we are not able to identify economically or statistically significant effects of corporate independent expenditures on state tax policy, including tax rates, discretionary tax breaks, and tax revenues.


Is Adolescent Bullying an Evolutionary Adaptation That Confers Fitness Benefits?

Is Adolescent Bullying an Evolutionary Adaptation? A 10-Year Review. Anthony A. Volk, Andrew V. Dane & Elizabeth Al-Jbouri. Educational Psychology Review, Sep 6 2022. https://link.springer.com/article/10.1007/s10648-022-09703-3

Abstract: Bullying is a serious behavior that negatively impacts the lives of tens of millions of adolescents across the world every year. The ubiquity of bullying, and its stubborn resistance toward intervention effects, led us to propose in 2012 that adolescent bullying might be an evolutionary adaptation. In the intervening years, a substantial amount of research has arisen to address this question. Therefore, the goal of this review is to consider whether evidence continues to support an evolutionary perspective that bullying is an adaptation that remains adaptive for some individuals in favorable contexts. In addition, we consider new ideas related to this hypothesis, explore how an evolutionary theory of bullying intersects with other influential perspectives, including ecological and social learning theories, and discuss applied implications for interventions. Our review of the evidence published since our 2012 paper provides very consistent and strong support for the hypothesis that adolescent bullying is, at least in part, an evolutionary adaptation that is currently adaptive regarding at least five evolutionarily relevant functions (the Five “Rs”): Reputation, Resources, deteRrence, Recreation, and Reproduction. We note that bullying is a facultative adaptation that is conditionally adaptive, subject to cost–benefit analyses. Finally, we discuss how an evolutionary theory of bullying frequently complements alternative theories of adolescent bullying rather than conflicting or competing with them. An interdisciplinary approach to bullying that includes evolutionary theory is thus likely to afford stronger options for both research and prevention efforts.


“Consumed by Creed”: Obsessive-compulsive Symptoms Underpin Ideological Obsession and Support for Political Violence

Adam-Troian, Jais, and Jocelyn Belanger. 2022. ““Consumed by Creed”: Obsessive-compulsive Symptoms Underpin Ideological Obsession and Support for Political Violence” PsyArXiv. September 4. doi:10.31234/osf.io/tcrd9

Abstract: Radicalization is a process by which individuals are introduced to an ideological belief system that encourages political, religious, or social change through the use of violence. Here, we formulate an obsessive-compulsive disorder (OCD) model of radicalization that links Obsessive Passion (one of the best predictors of radical intentions) to a larger body of clinical research. The model’s central tenet is that OCD tendencies shape radical intentions via their influence on Obsessive Passion. Across four ideological samples in the United States (Environmental activists, Republicans, Democrats, and Muslims, N = 1,114), we found direct effects between OCD symptoms and radical intentions, as well as indirect effects of OCD on radical intentions via Obsessive Passion. Even after controlling for potential clinical confounds (e.g., adverse childhood experiences, anxiety, depression, substance abuse), these effects remained robust, implying that OCD plays a significant role in the formation of violent ideological intentions and opening up new avenues for the treatment and prevention of violent extremism. We discuss the implications of conceptualizing radicalization as an OCD-like disorder with compulsive violent tendencies and ideology-related concerns.


Sunday, September 11, 2022

We find a positive effect of political preferences heterogamy on union dissolution; in addition, diverging opinions on the Brexit referendum is associated to higher chances of partnership break-up

Arpino, Bruno, and Alessandro Di Nallo. 2022. “Sleeping with the Enemy. Partners’ Political Attitudes and Risk of Separation.” SocArXiv. September 9. doi:10.31235/osf.io/w8etr

Abstract: Does politics conflict with love? We aim at answering this question by examining the effect on union dissolution of partners’ (mis)match on political preferences, defined as self-reported closeness, intention to vote, or vote for a specific party. Previous studies argued that partners’ heterogamy may increase risk of union dissolution because of differences among partners in lifestyles, attitudes, and beliefs, and/or because of disapproval from family and community members. We posit that similar arguments can apply to political heterogamy and test the effect of this new heterogamy dimension using UK data from the British Household Panel Study (BHPS) and the UK Household Longitudinal Study (UKHLS). The data offer a unique opportunity to disentangle the role of heterogamy by political preferences from the effects of heterogamies in other domains (e.g., ethnicity and religiosity) and from that of other partners’ characteristics, while also covering a long period of time (from 1991 to 2021). The data also allow to implement a more specific analysis about the referendum on UK’s permanence in the European Union (known as the Brexit referendum). We find a positive effect of political preferences heterogamy on union dissolution. In addition, diverging opinions on the Brexit referendum is associated to higher chances of partnership break-up.


The Effect of Taboo Language and Gesture on the Experience of Pain; against common opinion, it seems these effects are likely not due to changes in state aggression

F@#k Pain! The Effect of Taboo Language and Gesture on the Experience of Pain. Autumn B. Hostetter, Dominic Knight Rascon-Powell. Psychological Reports, September 8, 2022. https://doi.org/10.1177/00332941221125776

Abstract: Swearing has been shown to reduce the experience of pain in a cold pressor task, and the effect has been suggested to be due to state aggression. In the present experiment, we examined whether producing a taboo gesture (i.e., the American gesture of raising the middle finger) reduces the experience of pain similar to the effect that has been shown for producing a taboo word. 111 participants completed two cold pressor trials in a 2 (Language vs. Gesture) × 2 (Taboo vs. Neutral) mixed design. We found that producing a taboo act in either language or gesture increased pain tolerance on the cold pressor task and reduced the experience of perceived pain compared to producing a neutral act. We found no changes in state aggression or heart rate. These results suggest that the pain-reducing effect of swearing is shared by taboo gesture and that these effects are likely not due to changes in state aggression.

Keywords: pain, profanity, swearing, gesture, hypoalgesic


Saturday, September 10, 2022

Behavioral scientists are consistently no better than, and often worse than, simple heuristics and models; why have markets & experience not eliminated their biases entirely?

Simple models predict behavior at least as well as behavioral scientists. Dillon Bowen. arXiv, August 3, 2022. https://arxiv.org/abs/2208.01167

Abstract: How accurately can behavioral scientists predict behavior? To answer this question, we analyzed data from five studies in which 640 professional behavioral scientists predicted the results of one or more behavioral science experiments. We compared the behavioral scientists’ predictions to random chance, linear models, and simple heuristics like “behavioral interventions have no effect” and “all published psychology research is false.” We find that behavioral scientists are consistently no better than - and often worse than - these simple heuristics and models. Behavioral scientists’ predictions are not only noisy but also biased. They systematically overestimate how well behavioral science “works”: overestimating the effectiveness of behavioral interventions, the impact of psychological phenomena like time discounting, and the replicability of published psychology research

Keywords: Forecasting, Behavioral science

3 Discussion
Critical public policy decisions depend on predictions from behavioral scientists. In this paper, we asked how accurate those predictions are. To answer this question, we compared the predictions of 640 behavioral scientists to those of simple mathematical models on five prediction tasks. Our sample included a variety of behavioral scientists: economists, psychologists, and business professionals from academia, industry, and government. The prediction tasks also covered various domains, including text-message interventions to increase vaccination rates, behavioral nudges to increase exercise, randomized control trials, incentives to encourage effort, and attempts to reproduce published psychology studies. The models to which we compared the behavioral scientists were deliberately simple, such as random chance, linear interpolation, and heuristics like “behavioral interventions have no effect” and “all published psychology research is false.” We consistently found that behavioral scientists are no better than - and often worse than - these simple heuristics and models. In the exercise, flu, and RCT studies, null models significantly outperformed behavioral scientists. These null models assume that behavioral treatments have no effect; behavioral interventions will not increase weekly gym visits, text messages will not increase vaccination rates, and nudges will not change behavior. As we can see in Table 1, compared to behavioral scientists, null models are nearly indistinguishable from the oracle. In the effort study, linear interpolations performed at least as well as professional economists. These interpolations assumed that all psychological phenomena are inert; people do not exhibit risk aversion, time discounting, or biases like framing effects. In the reproducibility study, professional psychologists’ Brier scores were virtually identical to those of a null model, which assumed that all published psychology research is false. Professional psychologists were significantly worse than both linear regression and random chance. Notably, the linear regression model used data from the reproducibility study, which were not accessible to psychologists during their participation. While this is not a fair comparison, we believe it is a useful comparison, as the linear regression model can serve as a benchmark for future attempts to predict reproducibility. Why is it so hard for behavioral scientists to outperform simple models? One possible answer is that human predictions are noisy while model predictions are not [Kahneman et al., 2021]. Indeed, there is likely a selection bias in the prediction tasks we analyzed. Recall that most of the prediction tasks asked behavioral scientists to predict the results of ongoing or recently completed studies. Behavioral scientists presumably spend time researching questions that have not been studied exhaustively and do not have obvious answers. In this case, the prediction tasks were likely exceptionally challenging, and behavioral scientists’ expertise would be of little use. However, behavioral scientists’ predictions are not only noisy but also biased. Previous research noted that behavioral scientists overestimate the effectiveness of nudges [DellaVigna and Linos, 2022, Milkman et al., 2021]. Our research extends these findings, suggesting that behavioral scientists believe behavioral science generally “works” better than it does. Behavioral scientists overestimated the effectiveness of behavioral interventions in the exercise, flu, and RCT studies. In the exercise study, behavioral scientists significantly overestimated the effectiveness of all 53 treatments, even after correcting for multiple testing. Economists overestimated the impact of psychological phenomena in the effort study, especially for motivational crowding out, time discounting, and social preferences. Finally, psychologists significantly overestimated the replicability of published psychology research in the reproducibility study. In general, behavioral scientists overestimate not only the effect of nudges, but also the impact of psychological phenomena and the replicability of published behavioral science research. Behavioral scientists’ bias can have serious consequences. A recent study found that policymakers were less supportive of an effective climate change policy (carbon taxes) when a nudge solution was also available [Hagmann et al., 2019]. However, accurately disclosing the nudge’s impact shifted support back towards carbon taxes and away from the nudge solution. In general, when behavioral scientists exaggerate the effectiveness of their work, they may drain support and resources from potentially more impactful solutions. Our results raise many additional questions. For example, is it only behavioral scientists who are biased, or do people, in general, overestimate how well behavioral science works? The general public likely has little exposure to RCTs, social science experiments, and academic psychology publications, so there is no reason to expect that they are biased in either direction. Then again, the little exposure they have had likely gives an inflated impression of behavioral science’s effectiveness. For example, a TED talk with 64 million as of May 2022 touted the benefits of power posing, whereby one can reap the benefits of improved self-confidence and become more likely to succeed in life by adopting a powerful pose for one minute [Carney et al., 2010, Cuddy, 2012]. However, the power posing literature was based on p-hacked results [Simmons and Simonsohn, 2017], and researchers have since found that power posing yields no tangible benefits [Jonas et al., 2017]. Additionally, people may generally overestimate effects due to the “What you see is all there is” (WYSIATI) bias [Kahneman, 2011]. For example, the exercise study asked behavioral scientists to consider, among other treatments, how much more people would exercise if researchers told them they were “gritty.” After the initial “gritty diagnosis,” dozens of other factors determined how often participants in that condition went to the gym during the following four-week intervention period. Work schedule, personal circumstances, diet, mood changes, weather, and many other factors also played key roles. These other factors may not have even crossed the behavioral scientists’ minds. The WYSIATI bias may have caused them to focus on the treatment and ignore the noise of life that tempers the treatment’s signal. Of course, this bias is likely to cause everyone, not only behavioral scientists, to overestimate the effectiveness of behavioral interventions and the impact of psychological phenomena. If people generally overestimate how well behavioral science works, are they more or less biased than behavioral scientists? Experimental economics might suggest that behavioral scientists are less biased because people with experience tend to be less biased in their domain of expertise. For example, experienced sports card traders are less susceptible to the endowment effect [List, 2004], professional traders exhibit less ambiguity aversion than novices [List and Haigh, 2010], experienced bidders are immune to the winner’s curse [Harrison and List, 2008], and CEOs who regularly make high-stakes decisions are less susceptible to possibility and certainty effects [List and Mason, 2011]. Given that most people have zero experience with behavioral science, they should be more biased than behavioral scientists. Then again, there are at least three reasons to believe that behavioral scientists should be more biased than the general population: selection bias, selective exposure, and motivated reasoning. First, behavioral science might select people who believe in its effectiveness. On the supply side, students who apply to study psychology for five years on a measly PhD stipend are unlikely to believe that most psychology publications fail to replicate. On the demand side, marketing departments and nudge units may be disinclined to hire applicants who believe their work is ineffective. Indeed, part of the experimental economics argument is that markets filter out people who make poor decisions [List and Millimet, 2008]. The opposite may be true of behavioral science: the profession might filter out people with an accurate assessment of how well behavioral science works. Second, behavioral scientists are selectively exposed to research that finds large and statistically significant effects. Behavioral science journals and conferences are more likely to accept papers with significant results. Therefore, most of the literature behavioral scientists read promotes the idea that behavioral interventions are effective and psychological phenomena substantially influence behavior. However, published behavioral science research often fails to replicate. Lack of reproducibility plagues not only behavioral science [Collaboration, 2012, 2015, Camerer et al., 2016, Mac Giolla et al., 2022] but also medicine [Freedman et al., 2015, Prinz et al., 2011], neuroscience [Button et al., 2013], and genetics [Hewitt, 2012, Lawrence et al., 2013]. Scientific results fail to reproduce for many reasons, including publication bias, p-hacking, and fraud [Simmons et al., 2011, Nelson et al., 2018]. Indeed, most evidence that behavioral scientists overestimate how well behavioral science works involves asking them to predict the results of nudge studies. However, there is little to no evidence that nudges work after correcting for publication bias [Maier et al., 2022]. Even when a study successfully replicates, the effect size in the replication study is often much smaller than that reported in the original publication [Camerer et al., 2016, Collaboration, 2015]. For example, the RCT study paper estimates that the academic literature overstates nudges’ effectiveness by a factor of six [DellaVigna and Linos, 2022]. Finally, behavioral scientists might be susceptible to motivated reasoning [Kunda, 1990, Epley and Gilovich, 2016]. As behavioral scientists, we want to believe that our work is meaningful, effective, and true. Motivated reasoning may also drive selective exposure [B´enabou and Tirole, 2002]. We want to believe our work is effective, so we disproportionately read about behavioral science experiments that worked. Our analysis finds mixed evidence of the relationship between experience and bias in behavioral science. The RCT study informally examined the relationship between experience and bias for behavioral scientists predicting nudge effects and concluded that more experienced scientists were less biased. While we also estimate that more experienced scientists are less biased, we do not find statistically significant pairwise differences between the novice, moderately experienced, and most experienced scientists. Even if the experimental economics argument is correct that behavioral scientists are less biased than the general population, why are behavioral scientists biased at all? The experimental economics literature identifies two mechanisms to explain why more experienced people are less biased [List, 2003, List and Millimet, 2008]. First, markets filter out people who make poor decisions. Second, experience teaches people to think and act more rationally. We have already discussed that the first mechanism might not apply to behavioral science. And, while our results are consistent with the hypothesis that behavioral scientists learn from experience, they still suggest that even the most experienced behavioral scientists overestimate the effectiveness of nudges. The remaining bias for the most experienced scientists is larger than the gap between the most experienced scientists and novices. Why has experience not eliminated this bias entirely? Perhaps the effect of experience competes with the forces of “What you see is all there is,” selection bias, selective exposure, and motivated reasoning such that experience mitigates but does not eliminate bias in behavioral science. Finally, how can behavioral scientists better forecast behavior? One promising avenue is to use techniques that help forecasters predict political events [Chang et al., 2016, Mellers et al., 2014]. For example, the best political forecasters begin with base rates and then adjust their predictions based on information specific to the event they are forecasting [Tetlock and Gardner, 2016]. Behavioral scientists’ predictions would likely improve by starting with the default assumptions that behavioral interventions have no effect, psychological phenomena do not influence behavior, and published psychology research has a one in three chance of replicating [Collaboration, 2012]. Even though these assumptions are wrong, they are much less wrong than what behavioral scientists currently believe.

Both laypersons & police officers were worse at detecting deception when judging handcuffed suspects compared to non-handcuffed suspects, while not affecting their judgement bias; police officers were also overconfident in their judgements

Looking guilty: Handcuffing suspects influences judgements of deception. Mircea Zloteanu,Nadine L. Salman,Eva G. Krumhuber,Daniel C. Richardson. Journal of Investigative Psychology and Offender Profiling, September 7 2022. https://doi.org/10.1002/jip.1597

Abstract: Veracity judgements are important in legal and investigative contexts. However, people are poor judges of deception, often relying on incorrect behavioural cues when these may reflect the situation more than the sender's internal state. We investigated one such situational factor relevant to forensic contexts: handcuffing suspects. Judges—police officers (n = 23) and laypersons (n = 83)—assessed recordings of suspects, providing truthful and deceptive responses in an interrogation setting where half were handcuffed. Handcuffing was predicted to undermine efforts to judge veracity by constraining suspects' gesticulation and by priming stereotypes of criminality. It was found that both laypersons and police officers were worse at detecting deception when judging handcuffed suspects compared to non-handcuffed suspects, while not affecting their judgement bias; police officers were also overconfident in their judgements. The findings suggest that handcuffing can negatively impact veracity judgements, highlighting the need for research on situational factors to better inform forensic practice.

7 DISCUSSION

The present research explored whether a situational factor related to interrogation procedures (i.e., the use of handcuffs on suspects) can negatively impact veracity judgements. Confirming our hypothesis, the handcuffing manipulation affected both laypersons' and police officers' ability to detect deception (i.e., H2 was supported; moderate effect size). Statements made by handcuffed suspects were harder to classify for both police officers and laypersons. Converting the handcuffing effect size (ξ = 0.37) to more intuitive estimates (as recommended by Fritz et al., 2012), we obtain a Number Needed to Treat (NNT) of 5.01. Meaning for every fifth person that is interviewed wearing handcuffs we would expect one more misclassification of veracity. Or, based on the Common Language (CL) effect size, the probability that a suspect selected at random from the handcuffed condition is misclassified in terms of statement veracity compared to a suspect from the non-handcuffed condition is 64.3%. This decrease in accuracy was attributable to the study's manipulation affecting veracity discriminability rather than a shift in judgement response tendencies (H1 was not supported), as all judges remained truth-biased overall (H3 was not supported; NNT = 10.54, CL = 56.7%). For both judge groups, truths were easier to detect than lies (NNT = 12.02, CL = 55.9%; replicating the veracity effect; Levine et al., 1999).

Unsurprisingly, police officers did not perform better at judging veracity than laypersons (see Aamodt & Custer, 2006), and judging handcuffed suspects made this process even harder. However, the manipulation did not affect officers' response bias (H5 was not supported). This contrasts research arguing for a veracity detection reversal in professionals (i.e., police officers showing higher lie detection, but lower truth detection compared to laypersons; Meissner & Kassin, 2002). The similarity in response patterns with laypersons indicates that police officers were not overall more suspicious of suspects. This could, however, be due to the relatively junior sample of officers recruited (see Table 1), or, potentially, due to the “suspects” being naïve students which may have mitigated lie bias towards them; however, we note that the instructions never mention the status of suspects.

A more worrying result, and per our prediction, police officers displayed higher confidence while being no more accurate than laypersons (i.e., H4 was supported; moderate-to-large effect size; NNT = 3.66, CL = 70.2%), even showing a trend towards lower accuracy (e.g., below chance lie detection; NNT = 5.88, CL = 62.2%). This parallels findings of professionals tending to be overconfident in their veracity judgements (Aamodt & Custer, 2006; DePaulo & Pfeifer, 1986; Masip et al., 2016). While the police officers' level of experience may have not been sufficient to bias their judgements in the direction of a lie, it was able to increase their confidence in catching liars (e.g., Masip et al., 2016).

Overall, judges performed worse at discriminating veracity when viewing handcuffed suspects, supporting our assertions that situational factors can negatively impact the discriminability between deceptive and honest suspects (for a more detailed breakdown of the honesty scale data, see SI). Such effects may have serious ramifications for the forensic domain (Verschuere et al., 2016), especially when considering the already poor deception detection rates in the absence of the handcuffing manipulation. Interestingly, both laypersons and police officers were less confident in their judgements when they watched the handcuffed (vs. non-handcuffed) videos (NNT = 5.32, CL = 63.6%). Judges may have found deception detection more difficult when suspects were handcuffed, tempering their confidence.

These results illustrate that situational elements can impact the perception and judgement of both laypersons and police officers. Reducing the impact of such artificial factors could improve forensic practices and deception detection procedures, whilst reducing the risk of potential miscarriages of justice. Such effects are especially pertinent in situations of judgement under uncertainty where external and contextual information often influence the perception of ambiguous or ambivalent information (Masip et al., 2009; Mobbs et al., 2006). In line with research on investigative interviewing, it would seem recommendable that the space and circumstances under which an interrogation takes place are comfortable and do not restrict the individual (Goodman-Delahunty et al., 2014; Kelly et al., 2013).

7.1 Future directions

The current work sought to highlight the effects of situational factors on veracity judgements, particularly in forensic contexts. Future research could elaborate on the different ways in which handcuffing affects senders and judges by separating their influence on suspect perceptions (e.g., handcuffs as a visual cue of criminality; Stiff et al., 1992) from the effect on suspects' ability to gesticulate (within-sender features). For this, handcuffed and non-handcuffed suspects' movements could be restricted by asking them, for example, to place their hands flat on a table throughout the interrogation. This would equate the nonverbal differences whilst having the presence/absence of handcuffs as the only factor that differs between conditions. Alternatively, the videos could be edited to show the same suspect with or without handcuffs, revealing whether any impressions brought about by being handcuffed are due to the presence of external visual cues.

Considerations should also be given to the content of the stimuli themselves. An analysis of the videos may reveal verbal, paraverbal, and/or nonverbal cues which may aid in understanding the current findings. Such an investigation could uncover if behavioural differences between the liars and truth-tellers are indeed reduced by handcuffing and if differences in impression management are brought about by the manipulation (e.g., handcuffed suspects may “compensate” for their restricted gesticulation by modifying their speech and, by extension, their verbal cues may differ; see Verschuere et al., 2021).

Additionally, given the within-sender variability typically seen in deception research (Levine, 2010; Zloteanu, Bull, et al., 2021), the current stimulus set may be expanded to show a larger number of senders which would provide more precise effect size estimates and reduced uncertainty (Levine et al., 2022). Future research should also employ a more in-depth statistical approach (i.e., multi-level modelling) that accounts for both sender and decoder variability. This may be especially relevant in understanding if handcuffing interacts with senders' demeanour and judges' expectations. The possibility exists that the manipulation may not affect all individuals to the same degree or in the same manner (see DAG in SI for the potential influence of within/between subject and stimuli variance on the judgement process).

Subsequent work may also explore the effect of handcuffing on the relationship quality between suspect and interrogator (also, see SI). Due to the interactive nature of the interrogation task, handcuffs may have affected the rapport between the interrogator and suspect, which in turn could shape the behaviour of suspects (Kassin et al., 2003; Paton et al., 2018). The present manipulation demonstrates that deception detection does not happen in isolation. Future studies investigating veracity judgements should expand the range of factors being considered, both within the lab and in the real world.

7.2 Limitations

The issue of generalisability in the deception field is rarely addressed; nonetheless, a few elements of the current research must be considered. First, the type of lie told by suspects related to personal information that liars misrepresented. It can be argued that differences in performance and judgement may emerge if other types of lies (e.g., lies about transgressions) are employed (Levine, Kim, & Blair, 2010; cf. Hartwig & Bond, 2014; Hauch et al., 2014). Second, although some have argued that using students instead of real suspects may impact the detection rate (see O’Sullivan et al., 2009), both empirical investigations and meta-analyses report that deception detection is unaffected by whether the sender is a student or not (Hartwig & Bond, 2014; Zhang et al., 2013), nor do police officers show better accuracy rates even in naturalistic high-stakes settings (Hartwig, 2004; Meissner & Kassin, 2002). However, using different type of senders may influence perceptions and judgements.

Presently, it is difficult to separate the effect of handcuffing on judges' perception (i.e., pure external features) from that on sender performance (i.e., within-sender features) as our manipulation may have been affecting either or both. For example, handcuffing could attenuate behavioural differences between liars and truth-tellers resulting in poorer overall veracity discrimination. However, considering the dynamics between the interrogator and the suspects, being handcuffed could have also prompted senders as to the added scrutiny and behavioural restrictions, and compensated through increased impression management to produce a more convincing performance (Buller & Burgoon, 1996; Burgoon et al., 1996). The interplay between the interviewee and the interviewer is an important unknown, as some response variability may be due to the interrogator himself, given that rapport strongly influences interviewing outcomes (Abbe & Brandon, 2013).

The interrogation style used should also be weighed. Currently, while we did not find any effect of probing, this element could not be explored in depth due to a lack of variability in the use of the three probes by the interrogator (see SI). The literature on probing is equivocal on its use impacting veracity judgements (Buller et al., 1991). Nonetheless, it may impact rapport building and disclosure (Paton et al., 2018). Different probes may result in changes in the interdynamics of the interrogator and suspect, as well as subsequent judges (e.g., biasing impressions based on the valence of the probe used during the questioning). Future research could consider manipulating (e.g., standardising) the probing element to investigate how it interacts with the handcuffing element (e.g., Granhag & Strömwall, 2001); specific probes may bolster (e.g., negative) or attenuate (e.g., positive) the effects of handcuffing.

Finally, a more pronounced limitation is the relatively small and unbalanced sample. Underpowered studies are less likely to find true effects (i.e., Type II error), have a higher chance of found effects being statistical artefacts (i.e., Type I error), inflate estimates of true effects (i.e., Type M error), and have lower replicability (Fraley & Vazire, 2014; Gelman & Carlin, 2014). For instance, the CIs around the handcuffing effect indicate that the data is compatible with a wide range of effect sizes, from large and of potential interest (ξ = 0.58) to small and potentially unimportant (ξ = 0.10). Thus, we advise readers to interpret the results with care. Still, considering the forensic-relevant sample alongside the implications of our findings (especially for miscarriages of justice), on balance, we consider that the value of the research outweighs its drawbacks (Eckermann et al., 2010; Sterling et al., 1995).

To increase usability, we report all necessary measurements of uncertainty and variability (Calin-Jageman & Cumming, 2019), permitting future hypothesis generation and integration into meta-analyses (Cumming, 2014; Fritz et al., 2012). For example, replications can consider the effect sizes reported and their confidence intervals to estimate future results (e.g., prediction intervals; Cumming, 2008), and calculate the statistical power needed to reproduce the effect (e.g., considering ξ33%; see, Simonsohn, 2015).