Monday, March 8, 2021

Estimating the Prevalence of Transparency and Reproducibility-Related Research Practices in Psychology (2014–2017)

Estimating the Prevalence of Transparency and Reproducibility-Related Research Practices in Psychology (2014–2017). Tom E. Hardwicke et al. Perspectives on Psychological Science, March 8, 2021. https://doi.org/10.1177/1745691620979806

Abstract: Psychologists are navigating an unprecedented period of introspection about the credibility and utility of their discipline. Reform initiatives emphasize the benefits of transparency and reproducibility-related research practices; however, adoption across the psychology literature is unknown. Estimating the prevalence of such practices will help to gauge the collective impact of reform initiatives, track progress over time, and calibrate future efforts. To this end, we manually examined a random sample of 250 psychology articles published between 2014 and 2017. Over half of the articles were publicly available (154/237, 65%, 95% confidence interval [CI] = [59%, 71%]); however, sharing of research materials (26/183; 14%, 95% CI = [10%, 19%]), study protocols (0/188; 0%, 95% CI = [0%, 1%]), raw data (4/188; 2%, 95% CI = [1%, 4%]), and analysis scripts (1/188; 1%, 95% CI = [0%, 1%]) was rare. Preregistration was also uncommon (5/188; 3%, 95% CI = [1%, 5%]). Many articles included a funding disclosure statement (142/228; 62%, 95% CI = [56%, 69%]), but conflict-of-interest statements were less common (88/228; 39%, 95% CI = [32%, 45%]). Replication studies were rare (10/188; 5%, 95% CI = [3%, 8%]), and few studies were included in systematic reviews (21/183; 11%, 95% CI = [8%, 16%]) or meta-analyses (12/183; 7%, 95% CI = [4%, 10%]). Overall, the results suggest that transparency and reproducibility-related research practices were far from routine. These findings establish baseline prevalence estimates against which future progress toward increasing the credibility and utility of psychology research can be compared.

Keywords: transparency, reproducibility, meta-research, psychology, open science

Our evaluation of transparency and reproducibility-related research practices in a random sample of 250 psychology articles published between 2014 and 2017 shows that, although many articles were publicly available, crucial components of research—protocols, materials, raw data, and analysis scripts—were rarely made publicly available alongside them. Preregistration remained a nascent proposition with minimal adoption. The disclosure of funding sources and conflicts of interest was modest. Replication or evidence synthesis via meta-analysis or systematic review was infrequent (although, admittedly, only a relatively short time had elapsed since the articles had been published). Although there is evidence that some individual methodological reform initiatives have been effective in specific situations (e.g., Hardwicke et al., 2018Nuijten et al., 2017; for review, see Hardwicke, Serghiou, et al., 2020), the findings of the current study imply that their collective, broader impact on the psychology literature during the examined period was still fairly limited in scope.

For most of the articles (65%) we examined, we could access a publicly available version (open access). This is higher than recent open-access estimates obtained for biomedicine (25%; Wallach et al., 2018) and the social sciences (40%; Hardwicke, Wallach, et al., 2020), as well as a large-scale automated analysis that suggested that 45% of the scientific literature published in 2015 was publicly available (Piwowar et al., 2018). Limiting access to academic publications reduces opportunities for researchers, policymakers, practitioners, and the general public to evaluate and make use of scientific evidence. One step psychologists can take to improve the public availability of their articles is to upload them to the free preprint server PsyArXiv (https://psyarxiv.com/). Uploading a preprint does not preclude publication at most journals (Bourne et al., 2017), although specific policies regarding open access can be checked on the Sherpa/Romeo database (http://sherpa.ac.uk/romeo/index.php).

The reported availability of research materials was modest in the articles we examined (14%), which is comparable to recent estimates in the social sciences (11%; Hardwicke, Wallach, et al., 2020) and lower than in biomedicine (33%; Wallach et al., 2018). Several reportedly available sets of materials were in fact not available because of broken links, an example of the “link-rot” phenomenon that has been observed by others trying to access research resources (Evangelou et al., 2005Rowhani-Farid & Barnett, 2018). We also did not find any study protocols (an additional document detailing the study methods); however, it is unclear to what extent this results from a difference in norms between, for example, biomedicine (in which prespecified protocols are increasingly promoted; Ioannidis, Greenland, et al., 2014) and psychology (in which there may not be an expectation to provide methodological details in a separate protocol document). We did not examine whether sufficient methodological information was provided in the Method sections of articles, as this would have required domain-specific expertise in the many topics addressed by the articles in our sample. The availability of original research materials (e.g., survey instruments, stimuli, software, videos) and protocols enables the comprehensive evaluation of research (during traditional peer review and beyond; Vazire, 2017) and high-fidelity independent replication attempts (Open Science Collaboration, 2015Simons, 2014), both of which are important for the verification and systematic accumulation of scientific knowledge (Ioannidis, 2012). Furthermore, reusing materials and protocols reduces waste and enhances efficiency (Chalmers & Glasziou, 2009Ioannidis, Greenland, et al., 2014). Psychologists can share their materials and protocols online in various third-party repositories that use stable permalinks, such as the Open Science Framework2 (OSF; see Klein et al., 2018). One observational study found that when the journal Psychological Science offered authors an open-materials badge there was a subsequent increase in the sharing of materials (Kidwell et al., 2016).

Data-availability statements in the articles we examined were extremely uncommon. This is consistent with accumulating evidence that suggests that the data underlying scientific claims are rarely immediately available (Alsheikh-Ali et al., 2011Iqbal et al., 2016), although some modest improvement has been observed in recent years in biomedicine (Wallach et al., 2018). Although we did not request data from authors directly, such requests to psychology researchers typically have a modest yield (Vanpaemel et al., 2015Wicherts et al., 2006). Most data appear to be effectively lost, including for some of the most influential studies in psychology and psychiatry (Hardwicke & Ioannidis, 2018b). Vanpaemel et al. (2015), for example, could not obtain 62% of the 394 data sets they requested from authors of papers published in four American Psychological Association journals in 2012. The sharing of raw data, which is the evidence on which scientists base their claims, enables verification through the independent assessment of analytic or computational reproducibility (Hardwicke, Bohn, et al., 2020Hardwicke et al., 2018LeBel et al., 2018) and analytic robustness (Steegen et al., 2016). Data sharing also enhances evidence synthesis, such as through individual participant-level meta-analysis (Tierney et al., 2015), and can facilitate discovery, such as through the merging of data sets and reanalysis with novel techniques (Voytek, 2016). Psychologists can improve data availability by uploading raw data to third-party repositories such as the OSF (Klein et al., 2018). Data sharing must be managed with caution if there are ethical concerns, but such concerns do not always preclude all forms of sharing or necessarily negate ethical motivations for sharing (Meyer, 2017). Furthermore, when data cannot be made available it is always possible to explicitly declare this in research articles and explain the rationale for not sharing (Morey et al., 2016). Journal policies that use badges to encourage data sharing (Kidwell et al., 2016) or mandate data sharing (Hardwicke et al., 2018Nuijten et al., 2017) have been associated with marked increases in data availability in the journals that adopted them.

Of the articles we examined, only one shared an analysis script, a dearth consistent with assessments in biomedicine (Wallach et al., 2018), the social sciences (Hardwicke, Wallach, et al., 2020), and biostatistics (Rowhani-Farid & Barnett, 2018). Analysis scripts (a step-by-step description of the analysis in the form of computer code or instructions for recreating the analysis in point-and-click software) provide the most veridical documentation of how the raw data were filtered, summarized, and analyzed. Verbal descriptions of analysis procedures are often ambiguous, contain errors, or do not adequately capture sufficient detail to enable analytic reproducibility (Hardwicke, Bohn, et al., 2020Hardwicke et al., 2018Stodden et al., 2018). Psychologists can share their analysis scripts on a third-party repository, such as the OSF (Klein et al., 2018), and educational resources are available to help researchers improve the quality of their analysis code (Wilson et al., 2017). Sharing the computational environment in which analysis code successfully runs may also help to promote its longevity and trouble-free transfer to other researchers’ computers (Clyburne-Sherin et al., 2018).

Preregistration, which involves making a time-stamped, read-only record of a study’s rationale, hypotheses, methods, and analysis plan on an independent online repository, was rare in the articles we examined. Preregistration fulfills a number of potential functions (Nosek et al., 2019), including clarifying the distinction between exploratory and confirmatory aspects of research (Kimmelman et al., 2014Wagenmakers et al., 2012) and enabling the detection and mitigation of questionable research practices such as selective-outcome reporting (Franco et al., 2016John et al., 2012Simmons et al., 2011). Preregistration is relatively new to psychology (Nosek et al., 20182019), but similar concepts of registration have a longer history in the context of clinical trials in biomedicine (Dickersin & Rennie, 2012), in which they have become the expected norm (Zarin et al., 2017). However, clinical trials represent only a minority of biomedical research, and estimates suggest that preregistration is rare in biomedicine overall (Iqbal et al., 2016Wallach et al., 2018). Preregistration is also rare in the social sciences (Hardwicke, Wallach, et al., 2020). There is no doubt that the number of preregistrations (and the related Registered Reports article format) is increasing in psychology (Hardwicke & Ioannidis, 2018aNosek et al., 2018); however, our findings suggest that efforts to promote preregistration may not yet have had widespread impact on routine practice. It is important to note that because there is a time lag between registration and study publication, our measures may underestimate adoption. Although norms and standards for preregistration in psychology are still evolving (Nosek et al., 2019), several dedicated registries, such as the OSF, will host preregistrations, and detailed guidance is available (Klein et al., 2018).

Our findings suggest that psychology articles were more likely to include funding statements (62%) and conflict-of-interest statements (39%) than social-science articles in general (31% and 15%, respectively; Hardwicke, Wallach, et al., 2020) but less likely than biomedical articles (69% and 65%, respectively; Wallach et al., 2018). It is possible that these disclosure statements are more common than most other practices we examined because they are often mandated by journals (Nutu et al., 2019). Disclosing funding sources and potential conflicts of interest in research articles helps readers to make informed judgments about the risk of bias (Bekelman et al., 2003Cristea & Ioannidis, 2018). In the absence of established norms or journal mandates, authors may often assume that such statements are not relevant to them (Chivers, 2019). However, because the absence of a statement is ambiguous, researchers should ideally always include one, even if it is to explicitly declare that there were no funding sources and no potential conflicts of interest.

Of the articles we examined, 5% claimed to be a replication study—slightly higher than a previous estimate in psychology of 1% (Makel et al., 2012) and a similar estimate of 1% in the social sciences (Hardwicke, Wallach, et al., 2020) but comparable to a 5% estimate in biomedicine (Wallach et al. 2018). Only 1% of the articles we examined were cited by another article that claimed to be a replication attempt; of these articles, 11% were included in a systematic review, and 7% were included in a meta-analysis. Replication and evidence synthesis through systematic reviews and meta-analyses help to verify and build on the existing evidence base. However, it is unclear what an ideal frequency of these activities would be because they depend on many factors, such as how often studies are sufficiently similar to be amenable to synthesis methods. Although the current findings suggest that routine replication and evidence synthesis is relatively rare in psychology, many high-profile replication attempts have been conducted in recent years (Open Science Collaboration, 2015Pashler & Wagenmakers, 2012). In addition, because the articles we examined were published relatively recently, there may be some time lag before relevant replication and evidence-synthesis studies emerge. For example, in biomedicine at least, there is a geometric growth in the number of meta-analyses, and in many fields multiple meta-analyses are often conducted once several studies appear on the same research question (Ioannidis, 2016).

The current study has several caveats and limitations. First, our findings are based on a random sample of 250 articles, and the obtained estimates may not necessarily generalize to specific contexts, such as other disciplines, subfields of psychology, or articles published in particular journals. However, this target sample size was selected to balance informativeness with tractability, and the observed estimates have reasonable precision. Second, although the focus of this study was transparency and reproducibility-related practices, this does not imply that the adoption of these practices is sufficient to promote the goals they are intended to achieve. For example, poorly documented data may not enable analytic reproducibility (Hardwicke, Bohn, et al., 2020Hardwicke et al., 2018), and inadequately specified preregistrations may not sufficiently constrain researcher degrees of freedom (Claesen et al., 2019Bakker et al., 2020). Third, we relied only on published information. Direct requests to authors may have yielded additional information; however, as noted earlier, such requests to research psychologists are often unsuccessful (Hardwicke & Ioannidis, 2018aVanpaemel et al., 2015Wicherts et al., 2006). Fourth, a lack of transparency may have been justified in some cases if there were overriding practical, legal, or ethical concerns (Meyer, 2017). However, no constraints of this kind were declared in any of the articles we examined. Last, the study can gauge the prevalence of the assessed practices only during a particular time period. The effect of reform initiatives introduced after the examined time period, such as the founding of the Society for Improving Psychological Science (http://improvingpsych.org), will not be represented in our findings.

The current findings imply the minimal adoption of transparency and reproducibility-related practices in psychology during the examined time period. Although researchers appear to recognize the problems of low credibility and reproducibility (Baker, 2016) and endorse the values of transparency and reproducibility in principle (Anderson et al., 2010), they are often wary of change (Fuchs et al., 2012Houtkoop et al., 2018) and routinely neglect these principles in practice (Hardwicke, Wallach, et al., 2020Iqbal et al., 2016Wallach et al., 2018). There is unlikely to be a single remedy to this situation. A multifaceted approach will likely be required, with iterative evaluation and careful scrutiny of reform initiatives (Hardwicke, Serghiou, et al., 2020). At the educational level, guidance and resources are available to aid researchers (Crüwell et al., 2019Klein et al., 2018). At the institutional level, there is evidence that funder and journal policies can be effective at fomenting change (Hardwicke et al., 2018Nuijten et al., 2017), and these initiatives should be translated and disseminated where relevant. Heterogeneous journal policies (Nutu et al., 2019) may currently be disrupting efforts to establish norms and promote better standards in routine practice. The Transparency and Openness Promotion initiative promises to encourage the adoption and standardization of journal policies related to transparency and reproducibility (Nosek et al., 2015), but it remains to be seen how effective this initiative will be in practice. Aligning academic rewards and incentives (e.g., funding awards, publication acceptance, promotion, and tenure) with better research practices may also be instrumental in encouraging wider adoption of these practices (Moher et al., 2018).

The current study is one of several to examine the prevalence of transparency and reproducibility-related research practices across scientific disciplines (Hardwicke, Wallach, et al., 2020Iqbal et al., 2016Wallach et al., 2018). Here, we have sketched out some of the topography of psychology’s territory. Additional studies will be required to fill in areas of the map that have yet to be explored and increase the resolution in specific areas (e.g., subfields of psychology). Future studies can also add a temporal dimension by comparing new data with the baseline established here, allowing us to explore the evolution of this landscape over time.

From 2013... Resource Security Impacts Men’s Female Breast Size Preferences: Men from low & medium socioeconomic contexts rated larger breasts as more attractive than did men from higher socioeconomic levels

From 2013... Swami V, Tovée MJ (2013) Resource Security Impacts Men’s Female Breast Size Preferences. PLoS ONE 8(3): e57623; Mar 6 2013. https://doi.org/10.1371/journal.pone.0057623

Abstract: It has been suggested human female breast size may act as signal of fat reserves, which in turn indicates access to resources. Based on this perspective, two studies were conducted to test the hypothesis that men experiencing relative resource insecurity should perceive larger breast size as more physically attractive than men experiencing resource security. In Study 1, 266 men from three sites in Malaysia varying in relative socioeconomic status (high to low) rated a series of animated figures varying in breast size for physical attractiveness. Results showed that men from the low socioeconomic context rated larger breasts as more attractive than did men from the medium socioeconomic context, who in turn perceived larger breasts as attractive than men from a high socioeconomic context. Study 2 compared the breast size judgements of 66 hungry versus 58 satiated men within the same environmental context in Britain. Results showed that hungry men rated larger breasts as significantly more attractive than satiated men. Taken together, these studies provide evidence that resource security impacts upon men’s attractiveness ratings based on women’s breast size.

General Discussion

It has been suggested that one function of female breast size is to act as an indicator of adipose tissue reserves in non-lactating women [15][34][35]. This hypothesis is based on the fact that adipose tissue plays a central role in the storage of calories, which in turn leads to the suggestion that breast size may reliably predict food availability or access to resources. In situations marked by relative resource insecurity, then, men should idealise larger female breast size, as large size would indicate that a woman has access to resources. In two studies, we found evidence for this hypothesis: men who were experiencing relative resource insecurity (operationalised either as environmental socioeconomic context or proprioceptive hunger) rated women with larger breast sizes as more physically attractive than did men experiencing resource security.

Based on the present set of findings, it might be argued that temporary affective states produce individual variation in breast size judgements. Men experiencing immediate resource insecurity may perceive women with larger breasts as more attractive because large breast size indicates access to resources [57][59] or, more broadly, traits associated with maturity that may be more valued during periods of insecurity [60][65]. In short, the subjective experience of resource deprivation in the form of hunger appears to drive men to place greater value on female cues that indicate access to resources. Moreover, it is apparent that these temporary affective states mirror patterns of cross-environmental differences, with men from contexts of low socioeconomic status rating larger breast sizes as more attractive than men from contexts of high socioeconomic status. It is possible the cumulative temporal effect of resource insecurity among the former group is what drives their idealisation of a larger breast size [57][59].

Of course, this is not to suggest that adipose tissue reserves are the only thing indicated by larger breast size. If this were the case, then larger breast size should be no more important than fat stored in any other part of a woman’s body [17]. Rather, breast size may also act as a cue of nulliparity, age, sexual maturity, or fertility [14][17] and, furthermore, there may be other more important cues of fat storage compared to the breasts, such as overall body size [57][59]. This may help to explain the small-to-moderate effect sizes uncovered in both studies reported here: all things being equal breast size may indicate fat reserves, but in reality breast size is likely correlated with body mass [72], which may act as a more reliable indicator of such reserves. Determining the relative importance of breast size and body size, respectively, as cues of fat reserves will require further research.

Nor do our findings deny a role for sociocultural factors in shaping breast size judgements. It has been argued, for example, that breasts are one of the most important sites of objectification of the female body in socioeconomically developed settings [4][72][73] and media targeted at some men appear to fetishise large breasts [74][75]. As an aside, this should not be used to suggest that the importance of breasts varies across cultures and that our methodology artificially inflates the importance of breast size: earlier ethnographic research indicates that breasts are eroticised in many different cultures [76]. In addition, judgements of breast size appear to be shaped by individual psychological differences [28][30], as well as motivational states [77], which may help account for some of the discrepant findings in earlier studies. In future work, it will be important to take into account the different theoretical perspectives highlighted here in order to arrive at a fuller picture of the forces shaping breast size preferences across cultures.

There are a number of limitations of the present work, which should be recognised. First, it is possible that there were differences in mean breast size across our research sites (particularly in Study 1), which impacted on our respondents’ breast size preferences. For example, some scholars have suggested that attractiveness judgements are calibrated to local conditions [78]; this being the case, it is possible that local variations in mean breast size may have impacted upon men’s breast size judgements independent of socioeconomic status. Obtaining population-based anthropometric and tailoring stimuli according to local variation may help to expand on our findings. Second, it is possible that figures with larger breast size were perceived as heavier overall. If so, it is possible that our findings were driven by body size preferences in general, rather than breast size per se. Although variation in breast size in our stimuli is unlikely to have resulted in major in perceptions of body weight or size, this is an issue that warrants further investigation.

Third, our focus on breast size comes at the expense of other breast-related variables that may have impacted upon participants’ ratings, such as symmetry, shape, and areola size [7]. Although these traits were held constant in our study, future work may wish to concurrently consider the effects of manipulations to different breast-related variables, as well as other morphological traits, such as body size and waist-to-hip ratio. In a similar vein, because the faces of our stimuli were identical for each figure, participants may have focused more on the figures’ bodies as a result [7]. One way in which this limitation could be overcome would be to utilise a between-groups design in which participants are asked to rate only one figure, rather than being presented with all figures simultaneously.

These limitations notwithstanding, the present set of results provides evidence that breast size may play a role in men’s assessments of female access to resources. All things being equal, men from relatively low socioeconomic contexts and who experience temporary hunger rate women with larger breast size as more attractive than men from high socioeconomic contexts or are experiencing satiety. These results add to the findings of recent empirical work demonstrating the malleability of physical attractiveness ratings [65] and highlight the importance of considering the context in which attractiveness judgements are made. What remains is for scholars to begin the task of theorising how the many different factors that are known to impact upon physical attractiveness preferences (e.g., social, economic, evolutionary, individual differences) might fit together [79].

From 2019... While outsiders appear reluctant to challenge leadership within a field when the star is alive, the loss of a luminary provides an opportunity for fields to evolve in new directions that advance knowledge

From 2019... Does Science Advance One Funeral at a Time? Pierre Azoulay, Christian Fons-Rosen, Joshua S. Graff Zivin. American Economic Review, 109 (8): 2889-2920. DOI: 10.1257/aer.20161574

Abstract: We examine how the premature death of eminent life scientists alters the vitality of their fields. While the flow of articles by collaborators into affected fields decreases after the death of a star scientist, the flow of articles by non-collaborators increases markedly. This surge in contributions from outsiders draws upon a different scientific corpus and is disproportionately likely to be highly cited. While outsiders appear reluctant to challenge leadership within a field when the star is alive, the loss of a luminary provides an opportunity for fields to evolve in new directions that advance the frontier of knowledge.

A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it. —Max Planck

IV. Conclusion

In this paper, we leverage the applied economist’s toolkit, together with a novel approach to delineate the boundaries of scientific fields, to explore the effect that the passing of an eminent life scientist exerts on the dynamics of growth, or decline, for the fields in which she was active while alive. We find that publications and grants by scientists who never collaborated with the star surge within the subfield, absent the star. Interestingly, this surge is not driven by a reshuffling of leadership within the field, but rather by new entrants who are drawn from outside of it. Our rich data on individual researchers and the nature of their scholarship allows us to provide a deeper understanding of this dynamic.

In particular, this increase in contributions by outsiders appears to tackle the mainstream questions within the field but by leveraging newer ideas that arise in other domains. This intellectual arbitrage is quite successful: the new articles represent substantial contributions, at least as measured by long-run citation impact. Together, these results paint a picture of scientific fields as scholarly guilds to which elite scientists can regulate access, providing them with outsized opportunities to shape the direction of scientific advance in that space. We also provide evidence regarding the mechanisms that may enable the regulation of entry. While stars are alive, entry appears to be effectively deterred where the shadow they cast over the fields in which they were active looms particularly large. After their passing, we find evidence for influence from beyond the grave, exercised through a tightly-knit “invisible college” of collaborators (de Solla Price and Beaver 1966, Crane 1969). The loss of an elite scientist central to the field appears to signal to those on the outside that the cost/benefit calculations on the avant garde ideas they might bring to the table has changed, thus encouraging them to engage. But this appears to occur only when the topology of the field offers a less hostile landscape for the support and acceptance of “foreign” ideas, for instance when the star’s network of close collaborators is insufficiently robust to stave off threats from intellectual outsiders. In the end, our results lend credence to Planck’s infamous quip that provides the title for this manuscript. Yet its implications for social welfare are ambiguous. While we can document that eminent scientists restrict the entry of new ideas and scholars into a field, gatekeeping activities could have beneficial properties when the field is in its inception; it might allow cumulative progress through shared assumptions and methodologies, and the ability to control the intellectual evolution of a scientific domain might, in itself, be a prize that spurs much ex ante risk taking. Because our empirical exercise cannot shed light on these countervailing tendencies, we must refrain from drawing concrete policy conclusions from our results.

All of the evidence we have presented pertains to the academic life sciences. It is unclear how the lessons from that setting might apply to other fields inside the academy. In particular, when frontier research requires access to expensive and highly-specialized capital equipment, as is sometimes the case in the physical sciences, the rules governing access to that capital are likely to favor succession by insiders. At the other end of the spectrum, more atomistic fields where scientists generally work alone or in very small groups may evolve in a more frictionless manner. Whether our findings apply to industrial research and development is also an open question. In that setting, the choice of problem-solving approaches is guided by market signals (however imperfectly, cf. Acemoglu 2012), and thus likely to differ from those selected under the more nuanced system of pecuniary and non-pecuniary incentives that characterizes academic research (Feynman 1999; Aghion, Dewatripont, and Stein 2008). Assessing the degree to which our results extend to other settings, and the reasons they might differ, represents a fruitful area for future research.