Sunday, February 5, 2023

Rolf Degen summarizing... The average effect sizes in a “null field” such as homeopathy are a good indicator of the extent to which the tunnel vision of the researchers involved alone can conjure up positive results

Homeopathy can offer empirical insights on treatment effects in a null field. Matthew K. Sigurdson, Kristin L. Sainani & John P.A. Ioannidis. Journal of Clinical Epidemiology, February 01, 2023. https://doi.org/10.1016/j.jclinepi.2023.01.010

Abstract

Objectives: A “null field” is a scientific field where there is nothing to discover and where observed associations are thus expected to simply reflect the magnitude of bias. We aimed to characterize a null field using a known example, homeopathy (a pseudoscientific medical approach based on using highly diluted substances), as a prototype.

Study design: We identified 50 randomized placebo-controlled trials of homeopathy interventions from highly-cited meta-analyses. The primary outcome variable was the observed effect size in the studies. Variables related to study quality or impact were also extracted.

Results: The mean effect size for homeopathy was 0.36 standard deviations (Hedges’ g; 95% CI: 0.21, 0.51) better than placebo, which corresponds to an odds ratio of 1.94 (95% CI: 1.69, 2.23) in favor of homeopathy. 80% of studies had positive effect sizes (favoring homeopathy). Effect size was significantly correlated with citation counts from journals in the Directory of Open Access Journals and CiteWatch. We identified common statistical errors in 25 studies.

Conclusion: A null field like homeopathy can exhibit large effect sizes, high rates of favorable results, and high citation impact in the published scientific literature. Null fields may represent a useful negative control for the scientific process.


While overall income inequality rose over the past 5 decades, the rise in overall consumption inequality was small; the declining quality of income data likely contributes to these differences for the bottom of the distribution

Consumption and Income Inequality in the United States Since the 1960s. Bruce D. Meyer and James X. Sullivan. Journal of Political Economy, Feb 2023. https://doi.org/10.1086/721702

Abstract: Recent research concludes that the rise in consumption inequality mirrors, or even exceeds, the rise in income inequality. We revisit this finding, constructing improved measures of consumption, focusing on its well-measured components that are reported at a high and stable rate relative to national accounts. While overall income inequality rose over the past 5 decades, the rise in overall consumption inequality was small. The declining quality of income data likely contributes to these differences for the bottom of the distribution. Asset price changes likely account for some of the differences in recent years for the top of the distribution.


Messages generated by AI are persuasive across a number of policy issues, including weapon bans, a carbon tax, and a paid parental-leave program; participants rated the author of AI messages as being more factual and logical, but less angry & unique

Bai, Hui, Jan G. Voelkel, Johannes C. Eichstaedt, and Robb Willer. 2023. “Artificial Intelligence Can Persuade Humans on Political Issues.” OSF Preprints. February 5. doi:10.31219/osf.io/stakv

Abstract: The emergence of transformer models that leverage deep learning and web-scale corpora has made it possible for artificial intelligence (AI) to tackle many higher-order cognitive tasks, with critical implications for industry, government, and labor markets in the US and globally. Here, we investigate whether the currently most powerful, openly-available AI model – GPT-3 – is capable of influencing the beliefs of humans, a social behavior recently seen as a unique purview of other humans. Across three preregistered experiments featuring diverse samples of Americans (total N=4,836), we find consistent evidence that messages generated by AI are persuasive across a number of policy issues, including an assault weapon ban, a carbon tax, and a paid parental-leave program. Further, AI-generated messages were as persuasive as messages crafted by lay humans. Compared to the human authors, participants rated the author of AI messages as being more factual and logical, but less angry, unique, and less likely to use story-telling. Our results show the current generation of large language models can persuade humans, even on polarized policy issues. This work raises important implications for regulating AI applications in political contexts, to counter its potential use in misinformation campaigns and other deceptive political activities.


Continuing education workshops do not produce sustained skill development—quite the opposite; any modest improvement in performance erodes over time without further coaching

The implications of the Dodo bird verdict for training in psychotherapy: prioritizing process observation. Henny A. Westra. Psychotherapy Research, Dec 16 2022. https://doi.org/10.1080/10503307.2022.2141588

Abstract: Wampold et al.’s 1997 meta-analysis found that the true differences between bona fide psychotherapies is zero, supporting the Dodo bird conjecture that “All have won and must have prizes”. Two and half decades later, the field continues to be slow to absorb this and similar uncomfortable discoveries. For example, entirely commensurate with Wampold’s conclusion is the meta-analytic finding that adherence to a given model of psychotherapy is unrelated to therapy outcomes (Webb et al., 2010). Despite the clear implication that theoretical models should not be the main lens through which psychotherapy is viewed if we are aiming to improve outcomes, therapists continue to identify themselves primarily by their theoretical orientation. And a major corollary of Wampold’s conclusions is that despite the evidence for non superiority of a given model, our focus in training continues to be model-driven. This article seeks to elaborate the training implications of Wampold et al.’s conclusion, with a rationale and appeal to incorporate process-centered training.

Consider these similarly uncomfortable findings regarding the state of training. We assume, rather than verify the efficacy of our training programs. Yet, there is no evidence that continuing education workshops for example, produce sustained skill development—quite the opposite. Large effects on self-report are found but any modest improvement in performance erodes over time without further coaching (Madson et al., 2019). Perhaps most concerning, psychotherapists do not appear to improve with experience and in fact, the evidence suggests that skills may decline slightly over time (Goldberg et al., 2016). Not surprisingly then, while the number of model-based treatments has proliferated, the rate of client improvement has not followed suit (Miller et al., 2013). Could stagnant training methods may be related to stagnant patient outcomes?

We need innovations in training that better align our training foci and methods with factors empirically supported as influencing client outcomes. Process researchers have long observed that trained process coders (typically for research purposes) make better therapists due to their enhanced attunement (e.g., Binder & Strupp, 1997). While such training is not yet available in training programs, it arguably should be based on emerging developments in the science of expertise (Ericsson & Pool, 2016) and the urgent need to bring outcome information forward in real time so that it can be used to make responsive adjustments to the process of therapy. In fact, such information could be considered “routine outcome monitoring in real time” (Westra & Di Bartolomeo, 2022).

To elaborate, Tracey et al. (2014) provocatively argued that acquiring expertise in psychotherapy may not even be possible. This is because the ability to predict outcomes is crucial to shaping effective performance. Yet there is a lack of feedback available to therapists regarding the outcomes of their interventions and such information, if it comes at all, comes too late to make a difference in the moment. Therapists are essentially like blind archers attempting to shoot at a target. The development of Routine Outcome Monitoring (ROM) measures capable of forecasting likely outcomes is a major advance in correcting this blindness and improving predictive capacity. However, in order to be effective for skill development, feedback needs to occur more immediately so that the relationship between the therapist action and the client response (or nonresponse) can be quickly ascertained and adjustments made in real time. Interestingly, while ROM has been helpful in improving failing cases, it has not been effective in enhancing clinical skills more generally (Miller et al., 2013).

Learning to preferentially attend to, extract and continuously integrate empirically supported process data may prove to be the elusive immediate feedback that has been lacking in psychotherapy training but that is crucial to developing expertise. Observable process data that has been validated through process science as differentiating good from poor patient outcomes, could be considered “little outcomes”; which in turn are related to session outcomes and ultimately treatment outcome (Greenberg, 1986). Moreover, thin-slicing research supports that it is possible to make judgements about important outcomes from even tiny slices of expressive behavior (Ambady & Rosenthal, 1992). If one considers real time process information as micro-outcomes, properly trained clinicians, just like expert-trained process coders, may no longer have to be blind. For example, a therapist trained to identify and monitor resistance and signals of alliance ruptures, can be continuously tracking these important phenomena and responsively adjusting to safeguard the alliance. Or a therapist who is sensitive to markers of low and high levels of experiencing (Pascual-Leone & Yeryomenko, 2017) and client ambivalence (Westra & Norouzian, 2018) can not only optimize the timing of their interventions but also continuously watch the client for feedback on the success of their ongoing efforts.

Being steeped in process research gives one a unique perspective on the promise of process observation to advance clinical training. Our lab recently took our first foray into studying practicing community therapists. As we coded the session videotapes, we became aware that we possessed a unique skill set that was absent in therapist’s test interviews. Therapists seemed to be guided solely by some model of how to bring about change but failed to simultaneously appreciate the ebb and flow of the relational context of the work. They seemed absorbed in their own moves (their model) but not aware that they were in a dance and must continually track and coordinate the process with their partner. It seemed that we had incidentally trained ourselves to detect and use these process signals. Our training was different and very unique; it was more akin to deliberate practice focused on discrimination training for detecting empirically supported processes.

In short, information capable of diagnosing the health of the process and critically, of forecasting eventual outcomes is arguably hiding in plain sight if one can acquire the requisite observational capacity to harvest it. And transforming an unpredictable environment into a predictable one makes expertise possible to acquire (Kahneman, 2011). Importantly, extracting such vital information relies on observational skill, rather than patient report, end of session measures, or longer-term outcome; thus, such real time data extraction is immediately accessible and can complement existing outcome monitoring (Westra & Di Bartolomeo, 2022). Moreover, process markers are often opaque; requiring systematic observational training for successful detection. Without proper discrimination and perceptual acuity training, this gilded information remains obscured. Thus, heeding Wampold et al.’s call to refocus our efforts must include innovations in training; innovations that harness outcome information. We need more process research to further uncover the immediately observable factors capable of differentiating poor and good outcomes, but existing process science gives us a good start. And since process-centered training is transtheoretical, it can exist alongside models of therapy—learning to see while doing (Binder & Strupp, 1997). Training in psychotherapy has primarily prioritized intervention (models) and now it may be time to emphasize observation.

Psychotherapeutic experience seems to be unrelated to patients’ change in pathology

Germer, S., Weyrich, V., Bräscher, A.-K., Mütze, K., & Witthöft, M. (2022). Does practice really make perfect? A longitudinal analysis of the relationship between therapist experience and therapy outcome: A replication of Goldberg, Rousmaniere, et al. (2016). Journal of Counseling Psychology, 69(5), 745–754. Jan 2023. https://doi.org/10.1037/cou0000608

Abstract: Experience is often regarded as a prerequisite of high performance. In the field of psychotherapy, research has yielded inconsistent results regarding the association between experience and therapy outcome. However, this research was mostly conducted cross-sectionally. A longitudinal study from the U.S. recently indicated that psychotherapists’ experience was not associated with therapy outcomes. The present study aimed at replicating Goldberg, Rousmaniere, et al. (2016) study in the German healthcare system. Using routine evaluation data of a large German university psychotherapy outpatient clinic, the effect of N = 241 therapists’ experience on the outcomes of their patients (N = 3,432) was assessed longitudinally using linear and logistic multilevel modeling. Experience was operationalized using the number of days since the first patient of a therapist as well as using the number of patients treated beforehand. Outcome criteria were defined as change in general psychopathology as well as response, remission, and early termination. Several covariates (number of sessions per case, licensure, and main diagnosis) were also examined. Across all operationalizations of experience (time since first patient and number of cases treated) and therapy outcome (change in psychopathology, response, remission, and early termination), results largely suggest no association between therapists’ experience and therapy outcome. Preliminary evidence suggests that therapists need fewer sessions to achieve the same outcomes when they gain more experience. Therapeutic experience seems to be unrelated to patients’ change in psychopathology. This lack of findings is of importance for improving postgraduate training and the quality of psychotherapy in general.


Using the opinionated language model affected the opinions expressed in participants' writing and shifted their opinions in the subsequent attitude survey

Co-Writing with Opinionated Language Models Affects Users' Views. Maurice Jakesch, Advait Bhat, Daniel Buschek, Lior Zalmanson, Mor Naaman. arXiv Feb 1 2023. https://arxiv.org/abs/2302.00560


Abstract: If large language models like GPT-3 preferably produce a particular point of view, they may influence people's opinions on an unknown scale. This study investigates whether a language-model-powered writing assistant that generates some opinions more often than others impacts what users write - and what they think. In an online experiment, we asked participants (N=1,506) to write a post discussing whether social media is good for society. Treatment group participants used a language-model-powered writing assistant configured to argue that social media is good or bad for society. Participants then completed a social media attitude survey, and independent judges (N=500) evaluated the opinions expressed in their writing. Using the opinionated language model affected the opinions expressed in participants' writing and shifted their opinions in the subsequent attitude survey. We discuss the wider implications of our results and argue that the opinions built into AI language technologies need to be monitored and engineered more carefully.


Saturday, February 4, 2023

Above a threshold level of wage, an increase in intelligence is no longer associated with higher earnings

The plateauing of cognitive ability among top earners. Marc Keuschnigg, Arnout van de Rijt, Thijs Bol. European Sociological Review, jcac076, January 28 2023. https://doi.org/10.1093/esr/jcac076

Abstract: Are the best-paying jobs with the highest prestige done by individuals of great intelligence? Past studies find job success to increase with cognitive ability, but do not examine how, conversely, ability varies with job success. Stratification theories suggest that social background and cumulative advantage dominate cognitive ability as determinants of high occupational success. This leads us to hypothesize that among the relatively successful, average ability is concave in income and prestige. We draw on Swedish register data containing measures of cognitive ability and labour-market success for 59,000 men who took a compulsory military conscription test. Strikingly, we find that the relationship between ability and wage is strong overall, yet above €60,000 per year ability plateaus at a modest level of +1 standard deviation. The top 1 per cent even score slightly worse on cognitive ability than those in the income strata right below them. We observe a similar but less pronounced plateauing of ability at high occupational prestige.

Discussion

The empirical results lend support to our argument that cognitive ability plateaus at high levels of occupational success. Precisely in the part of the wage distribution where cognitive ability can make the biggest difference, its right tail, cognitive ability ceases to play any role. Cognitive ability plateaus around €60,000 at under a standard deviation above the mean. In terms of occupational prestige, it plateaus at a similar level above a job prestige of 70: The differences in the prestige between accountants, doctors, lawyers, professors, judges, and members of parliament are unrelated to their cognitive abilities.

A limitation of our study is that we do not account for effort or non-cognitive capacities—motivation, social skills, creativity, mental stability, and physical ability (Borghans et al., 2016). Cognitive ability is more relevant for some occupations than for others, and academia, for which it is arguably most relevant, is neither the best-paid nor the most prestigious professional field. Our results thus raise the question to what degree top wages are indicative of other, unobserved dimensions of ability. However, omission of effort and non-cognitive ability from the analysis is only problematic for our conclusions about the relationship between ability and success if there are theoretical arguments to be made that their effects dominate luck in the production of top income and prestige, either because their distributions have many extreme values or if there are strongly increasing returns.

Our analysis, further, is limited to a single country. Sweden may be seen as a conservative testing ground. In countries where higher education is less inclusive, one would expect an overall weaker relationship between labour-market success and ability (Breen and Jonsson, 2007). Namely, less income redistribution and steep tuition barriers to elite colleges may impede the flow of gifted individuals from lower classes into top jobs. On the other hand, higher net wages and greater social status at the top may attract more talent, and greater differentiation in college prestige elsewhere may allow firms to select on cognitive skills among those with a college degree by using elite affiliations as a proxy. Future research on different countries may seek to evaluate to what extent our findings generalize.

Third, we limit our analyses to native-born men. This is an unavoidable restriction of the data (women and immigrants were not enrolled in the military), and it is important to learn whether our findings generalize to the full working population. We invite further research that includes women and citizens from different ethnic backgrounds, and we call for careful adjustments in measuring occupational success for different cohorts in light of marked increases in female labour-force participation over time as well as in the share of the immigrant workforce and the varying disadvantages they face along different career paths in many countries. Such research could also explore potential variation in meritocracy regimes across social groups, connecting debates on gender equality and integration to quantitative studies of the relationship between success and ability.

Finally, our analysis was descriptive in nature and did not assess the proposed theoretical mechanism. An additional mechanism that may drive the plateauing of the success–ability relation at high wages is that brighter individuals select into more poorly remunerated occupational groups, even if within these groups the brightest are rewarded the highest wages. If these worse-paying jobs are of higher prestige, this could explain the weaker patterns we observed for the relationship between wage and occupational prestige. While we could not effectively explore the operation of this possible mechanism, future studies may be able to disentangle competing mechanisms through longitudinal analysis of educational and labour market trajectories.

Recent years have seen much academic and public discussion of rising inequality (e.g. Mankiw, 2013Piketty, 2014Alvaredo et al., 2017). In debates about interventions against large wage discrepancies, a common defence of top earners is the superior merit inferred from their job-market success using human capital arguments (Murray, 2003Mankiw, 2013). However, along an important dimension of merit—cognitive ability—we find no evidence that those with top jobs that pay extraordinary wages are more deserving than those who earn only half those wages. The main takeaway of our analysis is thus the identification, both theoretically and empirically, of two regimes of stratification in the labour market. The bulk of citizens earn normal salaries that are clearly responsive to individual cognitive capabilities. Above a threshold level of wage, cognitive-ability levels are above average but play no role in differentiating wages. With relative incomes of top earners steadily growing in Western countries (Alvaredo et al., 2017), an increasing share of aggregate earnings may be allocated under the latter regime.

Listening to one’s most disliked music evokes a stress response that makes the whole body revolt

Merrill, Julia, Taren-Ida Ackermann, and Anna Czepiel. 2023. “The Negative Power of Music: Effects of Disliked Music on Psychophysiology.” PsyArXiv. February 2. doi:10.31234/osf.io/6escn

Abstract: While previous research has shown the positive effects of music listening in response to one’s favorite music, the negative effects of one’s most disliked music have not gained much attention. Contra to studies on musical chills, in the current study, participants listened to three self-selected disliked musical pieces which evoked highly unpleasant feelings. As a contrast, three musical pieces were individually selected for each participant based on neutral liking ratings they provided on other participants’ music. During music listening, real-time ratings of subjective (dis)pleasure and simultaneous recordings of peripheral measures were obtained. Results show that compared to neutral music, listening to disliked music evokes physiological reactions reflecting higher arousal (heart rate, skin conductance response, body temperature), disgust (levator labii muscle), anger (corrugator supercilii muscle), distress and grimacing (zygomaticus major muscle). The differences between conditions were most prominent during “very unpleasant” real-time ratings, showing peak responses for the disliked music. Hence, disliked music leads to a strong response of physiological arousal and facial expression, reflecting the listener’s attitude toward the music and the physiologically strenuous effect of listening to one’s disliked music.


Rolf Degen summarizing... Unlike a machine, in which dedicated components are entrusted with fixed functions, the brain operates more like a complex dynamic system in which changing coalitions of neurons can perform varying tasks depending on the context

Improving the study of brain-behavior relationships by revisiting basic assumptions. Christiana Westlin et al. Trends in Cognitive Sciences, February 2 2023. https://doi.org/10.1016/j.tics.2022.12.015

Highlights

The study of brain-behavior relationships has been guided by several foundational assumptions that are called into question by empirical evidence from human brain imaging and neuroscience research on non-human animals.

Neural ensembles distributed across the whole brain may give rise to mental events rather than localized neural populations. A variety of neural ensembles may contribute to one mental event rather than one-to-one mappings. Mental events may emerge as a complex ensemble of interdependent signals from the brain, body, and world rather than from neural ensembles that are context-independent.

A more robust science of brain-behavior relationships awaits if research efforts are grounded in alternative assumptions that are supported by empirical evidence and which provide new opportunities for discovery.


Abstract: Neuroimaging research has been at the forefront of concerns regarding the failure of experimental findings to replicate. In the study of brain-behavior relationships, past failures to find replicable and robust effects have been attributed to methodological shortcomings. Methodological rigor is important, but there are other overlooked possibilities: most published studies share three foundational assumptions, often implicitly, that may be faulty. In this paper, we consider the empirical evidence from human brain imaging and the study of non-human animals that calls each foundational assumption into question. We then consider the opportunities for a robust science of brain-behavior relationships that await if scientists ground their research efforts in revised assumptions supported by current empirical evidence.


Keywords: brain-behavior relationshipswhole-brain modelingdegeneracycomplexityvariation


Concluding remarks

Scientific communities tacitly agree on assumptions about what exists (called ontological commitments), what questions to ask, and what methods to use. All assumptions are firmly rooted in a philosophy of science that need not be acknowledged or discussed but is practiced nonetheless. In this article, we questioned the ontological commitments of a philosophy of science that undergirds much of modern neuroscience research and psychological science in particular. We demonstrated that three common commitments should be reconsidered, along with a corresponding course correction in methods (see Outstanding questions). Our suggestions require more than merely improved methodological rigor for traditional experimental design (Box 1). Such improvements are important, but may aid robustness and replicability only when the ontological assumptions behind those methods are valid. Accordingly, a productive way forward may be to fundamentally rethink what a mind is and how a brain works. We have suggested that mental events arise from a complex ensemble of signals across the entire brain, as well as the from the sensory surfaces of the body that inform on the states of the inner body and outside world, such that more than one signal ensemble maps to a single instance of a single psychological category (maybe even in the same context [51,56]). To this end, scientists might find inspiration by mining insights from adjacent fields, such as evolution, anatomy, development, and ecology (e.g., [123,124]), as well as cybernetics and systems theory (e.g., [125,126]). At stake is nothing less than a viable science of how a brain creates a mind through its constant interactions with its body, its physical environment, and with the other brains-in-bodies that occupy its social world.

Outstanding questions

Well-powered brain-wide analyses imply that meaningful signals exist in brain regions that are considered nonsignificant in studies with low within-subject power, but is all of the observed brain activity necessarily supporting a particular behavior? By thresholding out weak yet consistent effects, are we removing part of the complex ensemble of causation? What kinds of technical innovations or novel experimental methods would allow us to make progress in answering this question?

How might we incorporate theoretical frameworks, such as a predictive processing framework, to better understand the involvement of the whole-brain in producing a mental event? Such an approach hypothesizes the involvement of the whole-brain as a general computing system, without implying equipotentiality (i.e., that all areas of the brain are equally able to perform the same function).

Why are some reported effects (e.g., the Stroop effect) seemingly robust and replicable if psychological phenomena are necessarily degenerate? These effects should be explored to determine if they remain replicable outside of constrained laboratory contexts and to understand what makes them robust.

Given that measuring every signal in a complex system is unrealistic given the time and cost constraints of a standard neuroimaging experiment, how can we balance the measurement of meaningful signals in the brain, body, and world with the practical realities of experimental constraints?

Is the study of brain-behavior relationships actually in a replication crisis? And if so, is it merely a crisis of method? Traditional assumptions suggest that scientists should replicate sample summary statistics and tightly control variation in an effort to estimate a population summary statistic, but perhaps this goal should be reconsidered.

Friday, February 3, 2023

Within internet there exists the 90-9-1 principle (also called the 1% rule), which dictates that a vast majority of user-generated content in any specific community comes from the top 1% active users, with most people only listening in

Vuorio, Valtteri, and Zachary Horne. 2023. “A Lurking Bias: Representativeness of Users Across Social Media and Its Implications for Sampling Bias in Cognitive Science.” PsyArXiv. February 2. doi:10.31234/osf.io/n5d9j

Abstract: Within internet there exists the 90-9-1 principle (also called the 1% rule), which dictates that a vast majority of user-generated content in any specific community comes from the top 1% active users, with most people only listening in. When combined with other demographic biases among social media users, this casts doubt as to how well these users represent the wider world, which might be problematic considering how user-generated content is used in psychological research and in the wider media. We conduct three computational studies using pre-existing datasets from Reddit and Twitter; we examine the accuracy of the 1% rule and what effect this might have on how user-generated content is perceived by performing and comparing sentiment analyses between user groups. Our findings support the accuracy of the 1% rule, and we report a bias in sentiments between low- and high-frequency users. Limitations of our analyses will be discussed.


Contrary to this ideal, we found a negative association between media coverage of a paper and the paper’s likelihood of replication success = deciding a paper’s merit based on its media coverage is unwise

A discipline-wide investigation of the replicability of Psychology papers over the past two decades. Wu Youyou, Yang Yang, and Brian Uzzi. Proceedings of the National Academy of Sciences, January 30, 2023, 120 (6) e2208863120. https://doi.org/10.1073/pnas.2208863120


Significance: The number of manually replicated studies falls well below the abundance of important studies that the scientific community would like to see replicated. We created a text-based machine learning model to estimate the replication likelihood for more than 14,000 published articles in six subfields of Psychology since 2000. Additionally, we investigated how replicability varies with respect to different research methods, authors 'productivity, citation impact, and institutional prestige, and a paper’s citation growth and social media coverage. Our findings help establish large-scale empirical patterns on which to prioritize manual replications and advance replication research.


Abstract: Conjecture about the weak replicability in social sciences has made scholars eager to quantify the scale and scope of replication failure for a discipline. Yet small-scale manual replication methods alone are ill-suited to deal with this big data problem. Here, we conduct a discipline-wide replication census in science. Our sample (N = 14,126 papers) covers nearly all papers published in the six top-tier Psychology journals over the past 20 y. Using a validated machine learning model that estimates a paper’s likelihood of replication, we found evidence that both supports and refutes speculations drawn from a relatively small sample of manual replications. First, we find that a single overall replication rate of Psychology poorly captures the varying degree of replicability among subfields. Second, we find that replication rates are strongly correlated with research methods in all subfields. Experiments replicate at a significantly lower rate than do non-experimental studies. Third, we find that authors’ cumulative publication number and citation impact are positively related to the likelihood of replication, while other proxies of research quality and rigor, such as an author’s university prestige and a paper’s citations, are unrelated to replicability. Finally, contrary to the ideal that media attention should cover replicable research, we find that media attention is positively related to the likelihood of replication failure. Our assessments of the scale and scope of replicability are important next steps toward broadly resolving issues of replicability.

Discussion

This research uses a machine learning model that quantifies the text in a scientific manuscript to predict its replication likelihood. The model enables us to conduct the first replication census of nearly all of the papers published in Psychology’s top six subfield journals over a 20-y period. The analysis focused on estimating replicability for an entire discipline with an interest in how replication rates vary by subfield, experimental and non-experimental methods, the other characteristics of research papers. To remain grounded in the human expertise, we verified the results with available manual replication data whenever possible. Together, the results further provide insights that can advance replication theories and practices.
A central advantage of our approach is its scale and scope. Prior speculations about the extent of replication failure are based on relatively small, selective samples of manual replications (21). Analyzing more than 14,000 papers in multiple subfields, we showed that replication success rates differ widely by subfields. Hence, not one replication failure rate estimated from a single replication project is likely to characterize all branches of a diverse discipline like Psychology. Furthermore, our results showed that subfield rates of replication success are associated with research methods. We found that experimental work replicates at significantly lower rates than non-experimental methods for all subfields, and subfields with less experimental work replicate relatively better. This finding is worrisome, given that Psychology’s strong scientific reputation is built, in part, on its proficiency with experiments.
Analyzing replicability alongside other metrics of a paper, we found that while replicability is positively correlated with researchers’ experience and competence, other proxies of research quality, such as an author’s university prestige and the paper’s citations, showed no association with replicability in Psychology. The findings highlight the need for both academics and the public to be cautious when evaluating research and scholars using pre- and post-publication metrics as proxies for research quality.
We also correlated media attention with a paper’s replicability. The media plays a significant role in creating the public’s image of science and democratizing knowledge, but it is often incentivized to report on counterintuitive and eye-catching results. Ideally, the media would have a positive relationship (or a null relationship) with replication success rates in Psychology. Contrary to this ideal, however, we found a negative association between media coverage of a paper and the paper’s likelihood of replication success. Therefore, deciding a paper’s merit based on its media coverage is unwise. It would be valuable for the media to remind the audience that new and novel scientific results are only food for thought before future replication confirms their robustness.
We envision two possible applications of our approach. First, the machine learning model could be used to estimate replicability for studies that are difficult or impossible to manually replicate, such as longitudinal investigations and special or difficult-to-access populations. Second, predicted replication scores could begin to help prioritize manual replications of certain studies over others in the face of limited resources. Every year, individual scholars and organizations like Psychological Science Accelerator (67) and Collaborative Replication and Education Project (68) encounter the problem of choosing from an abundance of Psychology studies which ones to replicate. Isager and colleagues (69) proposed that to maximize gain in replication, the community should prioritize replicating studies that are valuable and uncertain in their outcomes. The value of studies could be readily approximated by citation impact or media attention, but the uncertainty part is yet to be adequately measured for a large literature base. We suggest that our machine learning model could provide a quantitative measure of replication uncertainty.
We note that our findings were limited in several ways. First, all papers we made predictions about came from top-tier journal publications. Future research could examine papers from lower-rank journals and how their replicability associate with pre- and post-publication metrics (70). Second, the estimates of replicability are only approximate. At the subfield-level, five out of six subfields in our analysis were represented by only one top journal. A single journal does not capture the scope of the entire subfield. Future research could expand the coverage to multiple journals for one subfield or cross-check the subfield pattern derived using other methods (e.g., prediction markets). Third, the training sample used to develop the model used nearly all the manual replication data available, yet still lacked direct manual replication for certain psychology subfields. While we conducted a series of transfer learning analyses to ensure the model’s applicability beyond the scope of the training sample, implementation of the model in the subfields of Clinical Psychology and Developmental Psychology, where actual manual replication studies are scarce should be done judiciously. For example, when estimating a paper’s replicability, we advise users to review a paper’s other indicators of replicability, like original study statistics, aggregated expert forecast, or prediction market. Nevertheless, our model can continue to be improved as more manual replication results become available.
Future research could go in several directions: 1) our replication scores could be combined with other methods like prediction markets (16) or non-text-based machine learning models (2728) to further refine estimates for Psychology studies; 2) the design of the study could be repeated to conduct replication censuses in other disciplines; and 3) the replication scores could be further correlated with other metrics of interest.
The replicability of science, which is particularly constrained in social science by variability, is ultimately a collective enterprise improved by an ensemble of methods. In his book The Logic of Scientific Discovery, Popper argued that “we do not take even our own observations quite seriously, or accept them as scientific observations, until we have repeated and tested them” (1). However, as true as Popper’s insight about repetition and repeatability is, it must be recognized that tests come with a cost of exploration. Machine learning methods paired with human acumen present an effective approach for developing a better understanding of replicability. The combination balances the costs of testing with the rewards of exploration in scientific discovery.

Thursday, February 2, 2023

Do unbiased people act more rationally?—The more unbiased people assessed their own risk of COVID-19 compared to that of others, the less willing they were to be vaccinated

Do unbiased people act more rationally?—The case of comparative realism and vaccine intention. Kamil Izydorczak, Dariusz Dolinski, Oliver Genschow, Wojciech Kulesza, Pawel Muniak, Bruno Gabriel Salvador Casara and Caterina Suitner. Royal Society Open Science, February 1 2023. https://doi.org/10.1098/rsos.220775

Abstract: Within different populations and at various stages of the pandemic, it has been demonstrated that individuals believe they are less likely to become infected than their average peer. This is known as comparative optimism and it has been one of the reproducible effects in social psychology. However, in previous and even the most recent studies, researchers often neglected to consider unbiased individuals and inspect the differences between biased and unbiased individuals. In a mini meta-analysis of six studies (Study 1), we discovered that unbiased individuals have lower vaccine intention than biased ones. In two pre-registered, follow-up studies, we aimed at testing the reproducibility of this phenomenon and its explanations. In Study 2 we replicated the main effect and found no evidence for differences in psychological control between biased and unbiased groups. In Study 3 we also replicated the effect and found that realists hold more centric views on the trade-offs between threats from getting vaccinated and getting ill. We discuss the interpretation and implication of our results in the context of the academic and lay-persons' views on rationality. We also put forward empirical and theoretical arguments for considering unbiased individuals as a separate phenomenon in the domain of self–others comparisons.

5. General discussion

Comparative optimism is a robust phenomenon. The bias proved to be present inter-contextually [46], and since the first theoretical works in the 1980s, it is still considered a replicable and practically significant effect. Furthermore, the bias has been successfully discovered by multiple research teams in many settings during the COVID-19 pandemic [4951]. But do social psychologists have a firm understanding of why this bias occurs and its consequences?

As with many other collective irrationalities, we can too often be taken in by the ‘rational = desirable’ narrative. In such a narrative we implicitly or explicitly assume that the most desirable state would be ‘unbiased’, and, if the examined population fails to adhere to this pattern, we conclude that the cognitive processes we examine are somewhat ‘flawed’. In the presented studies, we concluded that those who are ‘unbiased’ more often abstain from taking one of the most (if not the most) effective, evidence based and affordable actions that could protect them from deadly threat. A seemingly ‘rational’ mental approach to the issue of COVID-19 contraction is related to a more irrational response to that threat—namely not getting vaccinated.

In the mini meta-analysis and two pre-registered studies, we discovered that those who express either comparative pessimism or optimism have a higher intention to get vaccinated for COVID-19 than those who are unbiased. The relationship of comparative pessimism to pro-health behaviour seems more intuitive, and the positive relationship of comparative optimism comes as a surprise, but our discovery is not isolated in that regard [52].

In Study 2, we found no evidence of a relationship between psychological control and comparative optimism with vaccine intention.

In Study 3 we found a common denominator of people who are realists and who have a lower vaccine intention. It turned out that both phenomena are related to lower COVID-19 ThreatDifference (ThreatDisease − ThreatVaccine). Furthermore, in line with the extended protection motivation theory (PMT [47,48]), the trade-off between risks of the disease and risks of the vaccine proved to predict being unbiased, and this relationship is partly mediated by vaccine intention.

Our studies present evidence that counters the ‘rational = desirable’ narrative, but that could lead into another trap: assuming that it is irrationalities and biases that help us cope more effectively. We think that such a narrative can be an equally false over-simplification and our studies offer more compelling explanations.

Collective irrationalities, such as comparative optimism may neither enhance nor hamper our coping abilities. They may, in turn, be a by-product of ongoing coping processes, possibly leading to greater protection (in the case of our studies, vaccination against COVID-19). From the perspective of our studies, it is clear that we might wrongfully ascribe a causal role to these biases.

While one might think that comparative optimism may cause reckless behaviour, such as refusal to vaccinate, Study 3 suggests another plausible alternative mechanism: ThreatDifference might be the reason for stronger or weaker vaccine intention (along with many other factors; see [43,53]) and comparative optimism might be a result of knowing one's own efforts, such as vaccination. In fact, a recent experimental study [52] provides evidence that being more aware of one's own self-protective effort enhances comparative optimism.

It is also noteworthy that comparative biases may arise in part from a lack of information about the comparative target, and that providing people with information about the comparative target diminishes the bias [54]. Accordingly, the comparative optimists in our study may have lacked information about the preventive behaviour of others.

The case of the relationship between comparative optimism and constructive pro-health behaviour is complex. On the one hand, we have evidence for both the benefits and drawbacks of CO [55]. On the other hand, CO may be the result rather than the cause of pro-health behaviour. Clearly there are many contextual factors involved and we should discard the overly simplistic view of an inherently beneficial or inherently harmful nature of comparative optimism (which also might be the case for many other collective irrationalities).

Our paper presents a pre-registered and high-powered line of research, which addresses differences between comparative optimists and the ‘unbiased’—a category of individuals that has most often been either left undiscussed or barely mentioned in previous studies regarding CO. Examining the bias from the perspective of the unbiased and using a mixed method approach that combined theory-driven hypotheses with a bottom-up strategy, thus giving a voice to participants, offered the opportunity to enrich theoretical knowledge on comparative bias and led to the surprising discovery that being unbiased can be related to a less pro-health attitude.

5.1. Limitations and future directions

The main limitation of our study is the lack of behavioural measures. This was a result of an early stage of our research project, which took place before COVID-19 vaccines were available. For that reason, we gathered data only about vaccine intention. In follow-up studies the vaccines were available but we decided to examine the intention of the yet unvaccinated to ensure the direct comparability of follow-up studies with the studies from a mini meta-analysis. This limitation leads to another one—at the time of Study 2 and especially Study 3, the number of unvaccinated was shrinking and we can expect that they might differ from the general population in many ways (for example, from study to study, we observed the diminishing share of ‘realists’). This constitutes a limit for the generalization of our conclusions.

The future direction of research regarding the differences between unbiased and comparative optimists should concentrate on actual behaviours rather than intentions or declarations. Moreover, future studies should enhance the scope of generalization by investigating more representative samples.

Another limitation is the possibility of an alternative explanation of our results. We interpret the results of Study 3 in the light of the extended PMT theory, assuming that the relationship between predicted outcomes of falling ill and getting vaccinated leads to engagement or disengagement with vaccination, which it turn results in them feeling superior (comparatively optimistic) or similar (comparatively realistic) to others.

But an alternative is probable. Following Gigerenzer's theory of ‘fast and frugal heuristics' [56], people can often make more ecologically valid decisions when they follow heuristics, without engaging in deep, analytical processes.

Perhaps people who chose the ecologically rational option to take the vaccine did so because they followed their intuition/shortcuts when making the decision. By doing so, they estimated the trade-offs between the disease and vaccine in line with the mainstream message (media, experts and authorities). If these individuals followed intuition in this respect, they may also be more prone to the default bias, namely optimistic bias. On the other hand, people who engage in processing the information more reflectively might end up being more sceptical towards vaccination and also less prone to the optimistic bias.

These alternative explanations could be empirically tested—if pro-vaccine attitudes could be ascribed to using more ‘fast and frugal heuristics’, people more sceptical of the vaccines should be able to recall more information about vaccines (regardless of their epistemic status) and provide more elaborate explanations for their stance.

As a general direction for future research on comparative biases, we advocate for considering a categorical approach to measuring biases—individuals who do not exhibit a bias should be treated as a separate category, especially when empirical results would indicate a substantial inflation of scores signalling a lack of bias (a similar inflation has been identified in the case of dehumanization—see [57], p. 12). Alternatively, if one decides to treat comparative bias as a continuous scale, a nonlinear relationship should be investigated. If comparative biases can have two directions, it is reasonable to expect that different directions might have different correlations.

The stated goal of the app is to produce a list of courses that would be easy for engineering majors to excel in effortlessly, where the majority of the class is young women that would not necessarily find the class easy, putting engineering majors in a position to help a pool of potential "mates"

Need help with students who've turned my class into a dating service. Jan 2023. https://academia.stackexchange.com/questions/192977/need-help-with-students-whove-turned-my-class-into-a-dating-service

Controversial Post — You may use comments ONLY to suggest improvements. You may use answers ONLY to provide a solution to the specific question asked below. Moderators will remove debates, arguments or opinions without notice. See: Why do the moderators move comments to chat and how should I behave afterwards?

I'm a professor at a local university. I'm passionate about teaching, and am proud to teach 100-level science and mathematics courses to young and aspiring students.

Some senior engineering students created a sort of dating service/app, "How I Met My Future Wife" (not the actual name, but close enough). It advertises itself as a way for smart young guys to meet "potential marriage material", by helping them social with "young, cultured, educated women". It works by aggregating diversity data my university publishes. This data is intended to help make a case for having more women and minorities in STEM courses so that post-university, we have more diverse representation in the worlds of science, business, and engineering. These senior engineering students used it to create a database of courses that are statistically likely to have a large proportion of young women from certain cultural backgrounds.

The stated goal of the app is to produce a list of courses that would be easy for engineering majors to excel in effortlessly, where the majority of the class is young women that would not necessarily find the class easy. It basically puts engineering majors in a position to ingratiate themselves with a large pool of potential "mates", and even guides users through getting reduced tuition or even taking the course for free (i.e. "auditing" a course; take it for free, but it doesn't affect your GPA, so as to prevent students from gaming the system and boosting their GPAs with easy courses).

A number of 100-level science courses are having record levels of senior-level STEM students auditing these courses, and a number of female students have approached me, noting they are disgusted and uncomfortable with the amount of "leching" taking place (edit: there are no unwanted advances, but it's painfully obvious to some students what's taking place). It's also demoralizing several of them, since we routinely have cases where a young man is leading open labs as if they're a teacher themselves (in order to "wow" their female classmates, offer "private free tutoring sessions", etc). Some of the young students in my class take up these offers, and this further demoralizes other female students seeing this happen (i.e. only attractive women being offered tutoring sessions). This is further compounded by the condescension involved (i.e. one self-admitted user of the app told me "this material that others struggle with is so easy for me, and I'm doing it for laughs and phone numbers.").

How can I stop this?

People auditing the course don't have to take the exams, or attend regularly. They can showboat in a course that's easy for them at zero risk or cost to themselves. I have no means to kick people from the course, despite this obvious behavior, and the people abusing the course can basically come and go as they please.

The university administration refuses to even acknowledge the problem exists (mostly, to my knowledge, because they don't want to admit fault or harm being caused by publishing such granular diversity reports), a few fellow profs either find it comical, or are happy that open labs are so full of volunteer tutors (perk to them, I guess). It seems that all parties are ignoring the young students I teach. I don't know if there are any legal routes, and there's no way I could do a public name-and-shame without jeopardizing my career. I'm at a total loss here.

Update

I scheduled a morning meeting with a senior colleague who has helped me with hard problems in the past (sort of the "go to guy" when things get rough). My husband and I had a long serious talk with him, and it's been made clear the university won't help me with this, as it would mean a "black left eye" for them, and I'd be tossed to the wolves on the left and right. If I want to pursue this further, I have to be prepared to forfeit my career, credibility (i.e. be black-balled in industry), and face lawsuits and SLAPP attacks from the university. With our combined salaries, my husband and I are barely making ends meet. My only real recourse is to counsel my students, while hoping that the app eventually gets more unwanted attention. In short, the problem will have to "solve itself", while numerous female students endure even more adversity in STEM by a program intended to help them.


Wednesday, February 1, 2023

Exploring the impact of money on men’s self-reported markers of masculinity: Men thought that their erect penis size was at least 21.1% above the population mean, but those rewarded with money were more realistic

Smaller prize, bigger size? Exploring the impact of money on men’s self-reported markers of masculinity. Jacob Dalgaard Christensen, Tobias Otterbring and Carl-Johan Lagerkvist. Front. Psychol., February 1 2023, Volume 14 - 2023. https://doi.org/10.3389/fpsyg.2023.1105423

Abstract: Bodily markers, often self-reported, are frequently used in research to predict a variety of outcomes. The present study examined whether men, at the aggregate level, would overestimate certain bodily markers linked to masculinity, and if so, to what extent. Furthermore, the study explored whether the amount of monetary rewards distributed to male participants would influence the obtained data quality. Men from two participant pools were asked to self-report a series of bodily measures. All self-report measures except weight were consistently found to be above the population mean (height and penis size) or the scale midpoint (athleticism). Additionally, the participant pool that received the lower (vs. higher) monetary reward showed a particularly powerful deviation from the population mean in penis size and were significantly more likely to report their erect and flaccid penis size to be larger than the claimed but not verified world record of 34 cm. These findings indicate that studies relying on men’s self-reported measures of certain body parts should be interpreted with great caution, but that higher monetary rewards seem to improve data quality slightly for such measures.

4. Discussion

The present study shows that men seem to self-report their physical attributes in a self-view-bolstering way, although not for weight, consistent with earlier findings (Neermark et al., 2019). Specifically, at the aggregate level, men reported being marginally more athletic compared to the scale midpoint, claimed to be significantly taller compared to the Danish mean for individuals of similar ages, and stated that their erect penis size was several centimeters longer than the available Danish population mean. The finding that participants do not seem to have over-reported their weight but likely exaggerated their height slightly also implies that they sought to present themselves as more physically fit. Together, these results indicate that, when interested in bodily variables important to men’s self-view and identity, such variables should not be done through self-report; especially not if they concern private bodily measures linked to masculinity (i.e., penis size). Indeed, men deviated substantially more in their reporting of private (vs. publicly visible) body measures, as the overall sample mean in erect penis size was at least 21.1% above the Danish population mean, while only 1% above the Danish mean in height among men of similar ages and roughly equal to the population mean in weight.

Interestingly, giving participants a higher (vs. lower) monetary reward reduced the average self-reported estimate of both erect and flaccid penis size, but had no impact on the more publicly visible measures. To underscore the point that participants in the low monetary reward group provided less accurate self-report estimates, we further found participants in this group to be significantly more likely to report that their erect and flaccid penis size was larger than the claimed world record of 34 cm (Kimmel et al., 2014Kim, 2016Zane, 2021). However, the means of erect penis size were still significantly above the available Danish population mean for both the low and high payment groups. As such, even with the higher monetary reward, our results regarding private self-report data do not appear to be trustworthy.

While our results indicate that men may have exaggerated their penis size and, to a lesser extent, their height and athleticism in a self-view-bolstering way, it is important to note that extreme values based on self-report can be the result not only of deliberate exaggerations but also of measurement error. We find a measurement error account unlikely to be the main driver of our results for several reasons. First, regarding penis size, the deviation of more than 20% (upward) from the stated Danish population mean is too extreme to realistically have occurred simply due to measurement error, and a measurement error account should arguably stipulate both under- and over-reporting, which is not congruent with the current results. Second, self-reported penis size has previously been found to correlate positively with social desirability scores (King et al., 2019), suggesting that some men deliberately exaggerate their penis size. Still, our study would have been strengthened by asking participants to also measure other body parts with the ruler that are not commonly connected to masculinity (e.g., their forearms). Such instructions would have allowed us to more explicitly test whether, as we believe, men strategically exaggerate only those bodily cues that are linked to masculinity or, alternatively, whether they over-report all bodily measures, irrespective of their “macho” meaning. It is possible that men, on average, are more inclined to lie about their penis size than their height, weight, or athleticism, considering that the penis is typically concealed and hence easier to lie about without getting caught in everyday interactions, whereas people cannot easily hide their height, weight, and body shape.

In conclusion, our results suggest that private data related to bodily cues of masculinity can only be reliably collected in the lab, where conditions can be fully controlled. Given our findings, scientific studies with self-report data concerning penis size should be interpreted with great caution. However, one remedy to reduce exaggerated response patterns seems to be higher monetary rewards given to participants. Indeed, one study found monetary incentives to be the top priority for online panel participants, and further revealed that data quality can be positively related to monetary compensation (Litman et al., 2015), supporting our argument that increased payments may be important for accessing high-quality data on the private (penis) measures investigated herein. It is possible that participants who received the larger monetary payment, on average, were less inclined to exaggerate the size of their penis because they felt a stronger need to reply (more) honestly. In contrast, those who received the smaller monetary payment may have been more motivated to exaggerate their penis size due to anger for the low payment coupled with the activation of self-threat when receiving questions about male markers of masculinity. Indeed, self-threat has been shown to magnify the self-serving bias (Campbell and Sedikides, 1999) and participants receiving the low monetary reward might have been more prone to engage in (extreme) protest responses—as our Chi-square analyses indicate—due to psychological reactance following the low payment (MacKenzie and Podsakoff, 2012).

Future research could examine, for instance, whether oath scripts or the implementation of interactive survey techniques, with direct feedback to participants when their responses exceed certain probability thresholds, may reduce exaggerated response patterns in studies with self-report measures (Kemper et al., 2020). Before such studies are conducted, the most telling take-away message based on the current results—regarding the aggregate “believability” in men’s self-reported penis size—is perhaps best captured by a quote from the New York Times bestselling author Darynda Jones: “Never trust a man with a penis.”