Friday, October 16, 2020

The Link Between Adaptive Memory and Cultural Attraction: New Insights for Evolutionary Ethnobiology

The Link Between Adaptive Memory and Cultural Attraction: New Insights for Evolutionary Ethnobiology. Risoneide Henriques da Silva, Washington Soares Ferreira Júnior, Joelson Moreno Brito Moura & Ulysses Paulino Albuquerque. Evolutionary Biology, Oct 11 2020.

Abstract: In this paper, we present the points of convergence between of adaptive memory and cultural attraction, and how these two approaches can help evolutionary ethnobiologists understand human cognition and behavior in relation to nature. In addition, we present empirical evidence of how the union of genetic, cultural and ecological factors can shape the human mind and behavior, aspects that are often dissociated by ethnobiologists. Thus, the present manuscript brings a holistic perspective on the subject, allowing theoretical contributions and opportunities for dialogue between the fields of adaptive memory, cultural attraction and evolutionary ethnobiology.

Voice Pitch Seems A Valid Indicator of One’s Unfaithfulness in Committed Relationships

Voice Pitch – A Valid Indicator of One’s Unfaithfulness in Committed Relationships? Christoph Schild, Julia Stern, Lars Penke & Ingo Zettler. Adaptive Human Behavior and Physiology, Oct 16 2020.


Objectives: When judging a male speakers’ likelihood to act sexually unfaithful in a committed relationship, listeners rely on the speakers’ voice pitch such that lower voice pitch is perceived as indicating being more unfaithful. In line with this finding, a recent study (Schild et al. Behavioral Ecology, 2020) provided first evidence that voice pitch might indeed be a valid cue to sexual infidelity in men. In this study, male speakers with lower voice pitch, as indicated by lower mean fundamental frequency (mean F0), were actually more likely to report having been sexually unfaithful in the past. Although these results fit the literature on vocal perceptions in contexts of sexual selection, the study was, as stated by the authors, underpowered. Further, the study solely focused on male speakers, which leaves it open whether these findings are also transferable to female speakers.

Methods: We reanalyzed three datasets (Asendorpf et al. European Journal of Personality, 25, 16–30, 2011; Penke and Asendorpf Journal of Personality and Social Psychology, 95, 1113–1135, 2008; Stern et al. 2020) that include voice recordings and infidelity data of overall 865 individuals (63,36% female) in order to test the replicability of and further extend past research.

Results: A significant negative link between mean F0 and self-reported infidelity was found in only one out of two datasets for men and only one out of three datasets for women. Two meta-analyses (accounting for the sample sizes and including data of Schild et al. 2020), however, suggest that lower mean F0 might be a valid indicator of higher probability of self-reported infidelity in both men and women.

Conclusions: In line with prior research, higher masculinity, as indicated by lower mean F0, seems to be linked to self-reported infidelity in both men and women. However, given methodological shortcomings, future studies should set out to further delve into these findings.


In this Registered Report, we reanalyzed three datasets to test a potential relation between F0 and self-reported infidelity in n = 319 male and n = 551 female speakers. While a significant negative link between mean F0 and self-reported infidelity was found in only one out of two datasets for men and only one out of three datasets for women, two meta-analyses (accounting for the sample sizes and including the original Schild et al. 2020, data for men) suggest that lower mean F0 might be a valid indicator of higher probability of self-reported infidelity in both men and women. The one dataset that yielded significant associations for both men and women and had vocal attractiveness ratings suggests that this effect was not mediated by vocal attractiveness in men, but partially mediated by vocal attractiveness in women, such that lower mean F0 predicted lower vocal attractiveness, which in turn predicted a higher likelihood of self-reported infidelity. Further, where it was possible to test, relationship length was associated with higher self-reported infidelity such that participants were more likely to report extra-pair copulations in longer relationships. This is in line with the finding that sociosexual desire tends to become more unrestricted and sexual interests broaden to people outside of committed relationships after about 4 years of relationship duration, sometimes called the “4 year itch” (Fisher 1987; Penke and Asendorpf 2008). However, the effect of mean F0 on infidelity is independent of relationship length. Participants’ age seemed to be unrelated to their self-reported infidelity.

Why is F0 Associated With Unfaithfulness in Committed Relationships?

Whereas previous studies report that male speakers with lower pitched voices are perceived as more likely to act sexually unfaithful in a committed relationship than speakers with higher pitched voices (O’Connor et al. 2011; O’Connor and Barclay 2017), only one previous study investigated whether mean F0 is actually linked to a higher likelihood of self-reported infidelity (Schild et al. 2020). In an exploratory finding, Schild and colleagues (Schild et al. 2020) report that men with lower F0 were, indeed, more likely to cheat in committed relationships. Further, the relation between F0 and sexual infidelity in women has not been tested so far. The current study presents evidence that F0 is actually linked to sexual unfaithfulness in men and women. Although the evidence is rather mixed in all of the separately analyzed datasets, the conducted meta-analyses suggest that men and women with lower F0 more often report to cheat in committed relationships. However, in line with the mixed findings, we recommend future research to investigate the robustness of our findings.

That mean F0 might be a valid cue to one’s sexual infidelity could also explain why listeners were found to make accurate judgements about the sexual infidelity of speakers in two prior studies (Hughes and Harrison 2017; Schild et al. 2020). Picking up on a valid cue to potential infidelity might be especially relevant to avoid high fitness costs such as the loss of protection and provisioning (Geary et al. 2004) as well as parental and relationship investment (O’Connor et al. 2011). However, while no other vocal parameters in this study were found to be valid indicators of self-reported infidelity, future research should set out to investigate whether other aspects of vocal communication, such as clarity of speech (Kempe et al. 2013), are valid cues to one’s infidelity.

Our findings are in line with previous findings indicating that men with lower mean F0 also report higher mating success (e.g., Puts 2005) and a higher number of sexual partners (e.g., Hughes et al. 2004), which is indicative of a less restricted sociosexual orientation. In turn, an unrestricted sociosexual orientation is linked to less commitment to romantic relationships and higher likelihoods of infidelity (Mattingly et al. 2011; Penke and Asendorpf 2008). But why is F0 associated with a higher likelihood of infidelity? Romantic infidelity can be the result of situational (e.g. opportunities) and dispositional factors (Blow and Hartnett 2005; Hilbig et al. 2015). With regard to opportunities for infidelity, lower mean F0 in men is associated with both perceptions of attractiveness and dominance (e.g., Puts et al. 2016), so it increases success in both being chosen by the opposite sex and intrasexual competition. The association can thus not distinguish between these two routes to infidelity opportunities, though two studies suggest that success in male-male competition, rather than female mate choice, is a more important predictor of male number of sexual partners and that male F0 is under stronger intrasexual than intersexual selection (Hill et al. 2013; Kordsmeyer et al. 2018). In contrast, lower female mean F0 is perceived as more dominant but less attractive (e.g., Borkowska and Pawlowski 2011; Jones et al. 2010). Interestingly lower, not higher, mean F0 predicted infidelity in women. This could either mean that being perceived as dominant is important for female infidelity opportunities, just as it is for men. Alternatively, it could be interpreted as less vocally attractive women being more likely to be romantically unfaithful, which is corroborated by the partial mediation of the F0-infidelity association by lower rated vocal attractiveness in Dataset 2. Vocal attractiveness contributes to women’s likelihood of being chosen by potential mates over and beyond physical attractiveness (Asendorpf et al. 2011). Thus, it might be that less vocally attractive women end up with less opportunity to engage in a committed relationship with a preferred partner on a competitive mating market with mutual mate choice, as is typical for modern humans (Penke et al. 2008). If this is the case, these women might use infidelity as a mate switching strategy (Buss et al. 2017). As another alternative, a lower F0 and the disposition for infidelity might share a common cause in both men and women. A candidate would be androgenic masculinization throughout development. Both, mean F0 (Puts et al. 2012ab) and unrestricted sociosexual desire (Penke and Asendorpf 2008; Schmitt 2005), as well as the closely related desire for sexual variety (Schmitt and International Sexuality Description Project 2003), are strongly sexually dimorphic in humans. Importantly, higher masculinity is also linked to less restricted sociosexual orientation (Ostovich and Sabini 2004) and more sexual partners across the lifespan (Burri et al. 2015) in women, potentially explaining our findings. Lastly, given that women lower their mean F0 when talking to more attractive men (Hughes et al. 2010), when speaking to men they prefer (Pisanski et al. 2018) and when trying to sound sexy or attractive (Hughes et al. 2014), it might be that lower mean F0 indicates general interest and attracts more opportunities for infidelity. Importantly, all these potential explanations are not mutually exclusive, and might thus be addressed explicitly by future research.


Our investigation has four potential limitations in particular. First, due to the item wording, our infidelity measure was only a proxy of self-reported infidelity in Datasets 1 and 2: While one can assume that a majority of extra-pair copulations are, indeed, best described by acts of infidelity, other extra-pair copulations might actually be accepted by the partner (e.g., in polyamorous couples or open relationships, which were not assessed). Thus, our outcome measure might contain noise. However, note that only around 5% of relationships in western countries (such as those in which our data were collected) are consensually non-monogamous (Rubin et al. 2014). Second, as in Schild et al. (2020), we were only able to analyze whether individuals have ever cheated on any of their partners. We are not able to investigate or draw any conclusions about (a) how many of their partners they have cheated on (just one, all of them, or anything in between), (b) what were the reasons for cheating, and (c) whether cheating that does not involve sexual intercourse (e.g., kissing) is also related to F0. Third, for assessing infidelity, we relied on self-report measures. However, as infidelity in committed relationships is rather socially undesirable (Mogilski et al. 2014), there is a chance that not all participants gave honest responses to these questions, although all surveys were administered completely anonymous. Fourth, although the overall sample size of this investigation was relatively large, the asymmetric distribution of cheaters and non-cheaters decreased the statistical power of this investigation. In detail, 39%, 37%, and 17% of the study participants reported infidelity in Dataset 1, Dataset 2, and Dataset 3, respectively. We strongly encourage future studies to replicate our study and resolve potential problems that limit the interpretability of the current study’s findings.

Talking to Cows: Reactions to Different Auditory Stimuli During Gentle Human-Animal Interactions

Talking to Cows: Reactions to Different Auditory Stimuli During Gentle Human-Animal Interactions. Annika Lange et al. Front. Psychol., October 15 2020.

Abstract: The quality of the animal-human relationship and, consequently, the welfare of animals can be improved by gentle interactions such as stroking and talking. The perception of different stimuli during these interactions likely plays a key role in their emotional experience, but studies are scarce. During experiments, the standardization of verbal stimuli could be increased by using a recording. However, the use of a playback might influence the perception differently than “live” talking, which is closer to on-farm practice. Thus, we compared heifers' (n = 28) reactions to stroking while an experimenter was talking soothingly (“live”) or while a recording of the experimenter talking soothingly was played (“playback”). Each animal was tested three times per condition and each trial comprised three phases: pre-stimulus, stimulus (stroking and talking) and post-stimulus. In both conditions, similar phrases with positive content were spoken calmly, using long low-pitched vowels. All tests were video recorded and analyzed for behaviors associated with different affective states. Effects on the heifers' cardiac parameters were assessed using analysis of heart rate variability. Independently of the auditory stimuli, longer durations of neck stretching occurred during stroking, supporting our hypothesis of a positive perception of stroking. Observation of ear positions revealed longer durations of the “back up” position and less ear flicking and changes of ear positions during stroking. The predicted decrease in HR during stroking was not confirmed; instead we found a slightly increased mean HR during stroking with a subsequent decrease in HR, which was stronger after stroking with live talking. In combination with differences in HRV parameters, our findings suggest that live talking might have been more pleasurable to the animals and had a stronger relaxing effect than “playback.” The results regarding the effects of the degree of standardization of the stimulus on the variability of the data were inconclusive. We thus conclude that the use of recorded auditory stimuli to promote positive affective states during human-animal interactions in experimental settings is possible, but not necessarily preferable.


We compared the reactions of heifers to stroking while applying two different auditory stimuli: the stroker talking directly to the animals in a gentle voice or a recording of the stroker's talking. We found behavioral and physiological indications of a positive perception of the interactions for both auditory stimuli. While the behavioral reactions to gentle interactions did not differ statistically, some of the cardiac parameters indicated differences between the auditory stimuli, also shortly after the presentation of the stimulus had ended.

Perception of Each Treatment

Both treatments led to changes in behavior during the STIM phase that indicate a positive perception: During stroking, the heifers showed significantly longer durations of neck stretching, a behavior shown during intraspecific social grooming (Sambraus, 1969Reinhardt et al., 1986Schmied et al., 2005), which is often actively solicited, and stroking by humans (Waiblinger et al., 2004Schmied et al., 2008Lürzel et al., 2015a). It is interpreted a sign of enjoyment, and it can thus be assumed that the situation is perceived as positive.

In a previous, similar experiment (Lange et al., 2020), we observed decreases of ear flicking and changes of ear position during stroking with no auditory stimuli. The present study confirms this pattern. The animals showed less ear flicking during STIM than PRE, a behavior mostly associated with negative affective states, such as pain after dehorning (Heinrich et al., 2010Neave et al., 2013) or reactions to insect attacks (Mooring et al., 2007).

During STIM, the animals also changed the positions of their ears less often than in PRE. Frequencies of changes of ear positions were lower in sheep feeding (Reefmann et al., 2009a) or voluntarily being groomed by a human (Reefmann et al., 2009b) than during separation from the herd. In contrast, dairy cows showed an increased frequency of changes of ear positions during stroking compared to before or after (Proctor and Carder, 2014), which might however have been caused by small differences in experimental design, such as the stroker approaching at the beginning of the stroking phase. In contrast, the decrease in changes of ear positions and ear flicking during stroking in the current as well as in our previous study (Lange et al., 2020) indicates an association of a reduction of these behaviors with a positive, low-arousal state also in cattle.

However, for some of the behaviors we expected to indicate affective states, the treatment did not lead to significant differences: previously observed effects of stroking (Lange et al., 2020) on the duration of the animal resting its head and the time spent in contact with the experimenter were not confirmed in this study. These findings might be connected with the auditory stimulus, which might keep the animal comparatively more attentive to a certain degree and thus limit the intensity of the relaxation.

In an attempt to reflect the continuous nature of ear positions, we recorded nine different positions along the vertical and the horizontal axis: back up, back center, back down, center up, center, center down, forward up, forward center and forward down, plus ear hanging. During stroking, durations of the back up position increased significantly, while durations of forward up and ear low decreased, mostly in line with our previous experiment (Lange et al., 2020). The tendency toward decreased durations of forward up might indicate lowered vigilance (Boissy and Dumont, 2002), which is associated with less fear (Welp et al., 2004), and could corroborate the hypothesis that stroking induces positive low-arousal states.

We predicted to find longer durations of ear low during stroking, because low ear positions, including ear hanging, were associated with low-arousal, positive affective states in dairy cows in previous studies (Schmied et al., 2008Proctor and Carder, 2014). However, we observed predominantly back up positions and surprisingly rare occurrences of ear low. One possible reason might have been the strokers' position kneeling next to the lying animal and resulting in the auditory signal being located above and behind the heifers' ears in both conditions. Since the ear position pattern was very similar to the one found in our previous study without vocal stimulation (Lange et al., 2020), however, the effect of the auditory stimulus seems not to have had a strong influence on ear positions, possibly because cattle have a relatively low sound-localization acuity compared with other mammals (Heffner and Heffner, 1992); the stroker's position relative to the animal's head may nevertheless be relevant.

Furthermore, the effects that we saw in STIM were not observed in POST, contrary to our hypothesis of longer-lasting effects of the treatment on behavior. However, some of the observed behaviors (such as neck stretching and the different ear positions) are more immediate reactions to positive stimuli and do not allow to observe longer-lasting changes in affective states.

Comparison of the Treatments

As there were no significant differences in the behavioral reactions to the two different auditory stimuli, stroking and talking in a gentle voice per se seem to have a stronger effect on the behavior than the source of the auditory stimulus. As this experiment did not include a treatment where the animals were stroked without any auditory stimulation, we cannot infer any information on whether gentle talking in general enhances or diminishes the positive effects of stroking, but the results are very similar to our previous study, where the animals were stroked without acoustic stimulation. Stroking can elicit quite strong effects on physiology and behavior in different species (rats: Holst et al., 2005; cows: Schmied et al., 2010; cats: Gourkow et al., 2014; lambs: Coulon et al., 2015; horses: Lansade et al., 2018), which might exceed possible consequences of small differences in auditory stimuli. Regarding the absence of significant differences in behavior, it seems plausible that the heifers did not discern the two auditory stimuli, at least not to an extent where it would have affected their behavior. Furthermore, the mismatch of experimenter and playback voice did not have a significant effect on any of the behaviors. Indeed, there is a substantial amount of literature in different species indicating that they do not necessarily distinguish playback from live auditory stimuli: playback is used successfully in studies investigating bird behavior (Douglas and Mennill, 2010), dogs react to dog-directed human speech played back from a loudspeaker (Ben-Aderet et al., 2017Benjamin and Slocombe, 2018), and dairy cows increase their production when exposed to a playback of calf vocalizations (Pollock and Hurnik, 1978McCowan et al., 2002; no effect if calves are reared with their mothers: Zipp et al., 2013). Other characteristics of speech might thus have a stronger impact on the animals' behavior than the characteristics induced by the type of source.

On the other hand, the analysis of cardiac parameters points toward a different perception of the two auditory stimuli. In both conditions, HR increased from PRE to STIM and decreased from STIM to POST, but this decrease was significantly more pronounced in the “live” condition, indicating a stronger relaxation effect of live talking after the presentation of the stimulus. The slight increase of HR during STIM in both conditions seems to contradict our expectation that our treatment would induce a low-arousal state. However, it is in line with previous findings reporting an increased HR of lying animals that were licked by conspecifics (Laister et al., 2011) or receiving a stroking treatment (Lange et al., 2020) and might be caused by physical reactions to stroking (e.g., neck stretching) more than by a meaningful change in arousal or affective state (Lange et al., 2020).

Independently of the changes in HR, there were some significant effects of the conditions on HRV parameters: HF increased in POST in the “live” condition, but decreased in POST in the “playback” condition. It is widely accepted that HF increases with increasing activity of the parasympathetic branch of the autonomic nervous system (Task Force of ESP and NASPE, 1996von Borell et al., 2007). The increased values suggest a higher parasympathetic activity after stroking in the “live,” but not the “playback” condition. An increased HF may be associated with positive emotions (McCraty et al., 1995von Borell et al., 2007) and was found in horses regularly receiving a relaxing massage (Kowalik et al., 2017). This increase in HF was not accompanied by an increase in RMSSD, although both represent vagal activity and are often correlated (Task Force of ESP and NASPE, 1996Hagen et al., 2005von Borell et al., 2007Shaffer et al., 2014). However, changes in RMSSD were not consistently observed in other studies investigating different affective states in animals (Reefmann et al., 2012Travain et al., 2016). RMSSD might therefore be a suboptimal indicator of animal affective states (Gygax et al., 2013Tamioso et al., 2018). A different pattern emerged for SDNN: values increased from PRE to STIM in the “live” condition, and decreased again in POST, whereas in the “playback” condition, SDNN reached its highest values in POST. SDNN reflects influences of both parasympathetic and sympathetic activity (von Borell et al., 2007Shaffer et al., 2014). Together with the decrease of RMSSD/SDNN during live talking, these findings might indicate that the “live” condition led to higher sympathetic activity during stroking and talking, possibly indicating positive arousal in response to being stroked (Tamioso et al., 2018). The increase of RMSSD/SDNN in “live” in POST is in line with increased values observed in sheep being brushed by a familiar human (Tamioso et al., 2018), and, in combination with the observed increase of HF in POST in “live,” indicates a shift toward vagal dominance after live talking. These patterns were not observed in the “playback” condition; contrarily, SDNN increased in POST, while RMSSD/SDNN and HF decreased slightly, possibly indicating a relative shift towards sympathetic regulation after stroking with “playback” stimulation.

In combination, the HRV results suggest that live talking may have been more pleasurable to the animals than “playback” and led to increased parasympathetic activity in the POST phase. They thus support the interpretation of a more pronounced relaxation effect indicated by the stronger decrease of HR in POST in “live” than in “playback.” The difference between the two auditory stimuli might be caused by losses of lower and higher frequencies of recorded sound, which have been found to cause a decline in dog's responses to commands, especially in the absence of certain non-verbal cues (Fukuzawa et al., 2005). As we could not measure the actual sound pressure reaching the animals' ears directly, we can neither exclude the possibility that there might have been other systematic differences between the acoustic signals produced by two sources, such as consistent differences in volume, which might have contributed to eliciting higher or lower arousal. Another difference between the situations might have been produced by a subconscious change of the stroker's body language or attention toward the animal during live talking. However, stroker behavior was standardized as far as possible – in both conditions, the stroker was calmly sitting next to the heifer's shoulder, focused on stroking the animal. Great care was taken to match the “playback” condition not only in body posture and calm breathing, but also in mental focus and intention of interacting gently with the animal, trying to minimize possible differences in non-verbal communication.

We hypothesized that the higher degree of standardization in the “playback” stimulus would lead to decreased variability in the data. However, the variability of the responses as indicated by the precision parameters revealed a conflicting pattern, indicating that the relationship between the degree of standardization of the treatment and the variability in the observed behavior is more complex than expected or has different effects on different parameters. The higher degree of standardization in “playback” stimuli did not lead to a generally reduced variability and therefore should not be the main criterion for preference of playback stimuli for gentle human-animal interactions in experimental settings.

Quotation errors in high-impact general science journals: Found a total error rate of 25%, which tracks well with error rates found in similar studies in other academic fields

Quotation errors in general science journals. Neal Smith and Aaron Cumberledge. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, October 14 2020.

Abstract: Due to the incremental nature of scientific discovery, scientific writing requires extensive referencing to the writings of others. The accuracy of this referencing is vital, yet errors do occur. These errors are called ‘quotation errors’. This paper presents the first assessment of quotation errors in high-impact general science journals. A total of 250 random citations were examined. The propositions being cited were compared with the referenced materials to verify whether the propositions could be substantiated by those materials. The study found a total error rate of 25%. This result tracks well with error rates found in similar studies in other academic fields. Additionally, several suggestions are offered that may help to decrease these errors and make similar studies more feasible in the future.

4. Discussion

This study is the first review of quotation errors in high-impact general science journals. Errors were found to exist in considerable numbers. This demonstrates a weakness in the current use of references in scientific writing. There may be several reasons for these errors. Stochastic modelling suggests that 70–90% of references are copied second-hand from other articles' reference lists [26]. In addition, it has been argued through analysis of misprints that only about 20% of authors citing a paper have actually read the original [27]. As suggested by other quotation error researchers, authors could avoid errors through greater diligence [1,45,9]. There is also a lack of agreement regarding the correct reasons to include citations in scientific papers [28]. This could contribute to the citing of inappropriate references. Finally, quotation errors may occur though deliberate malpractice with the goal of increasing the citation metrics for the cited references [4].

Regardless of the cause, the most pragmatic approach to improving this problem is to improve the review and verification of references [1,20]. In the current state of academic literature, this is a very time-consuming task. In this study, it took two reviewers months of work to examine only 250 citations. The 500 articles from which we randomly selected our sample had a total of 26 344 references (many of which were cited multiple times). This suggests that it is unfeasible for editors or reviewers to thoroughly check all citations for substantiation. Therefore, we present two suggestions that would make systematic checking of references far more feasible.

First, most importantly, journals should change their citation styles to require page numbers. None of the high-impact journals reviewed require or even allow the inclusion of page numbers with in-text citations. In the verification process, a huge amount of time is spent searching through references to find the information being cited. Some books and reports are hundreds or thousands of pages long. Furthermore, even relatively short journal articles of 8–10 pages can be very dense and take a long time to thoroughly examine. Requiring page numbers (or paragraph numbers, etc.) places a slightly higher burden on the authors in exchange for significantly lightening the workload of potential reviewers. Lengthy references are often used to cite one specific piece of information, and it is not reasonable to expect reviewers to search through them to find that information. Page numbers should be required. One possible exception to this rule could be when referring to a study as a whole. However, even in those cases, propositions can nearly always be substantiated by referring to the page number of the introduction or abstract of a paper. This makes quotation errors easier to check for, increasing the likelihood of detection both before and after publication.

Requiring page numbers with in-text citations would constitute a significant change for academic publishers. The five journals in our study all use numbered endnotes, with a single endnote used for each reference regardless of how many times it is cited. To require page numbers in the text, these journals would have to either require page numbers to be included in each in-text citation (along with an endnote reference number), require separate endnotes containing page numbers for each citation of the reference, or abandon endnote citation altogether for some style of parenthetical citation. However, the continued prevalence of quotation errors is a significant problem that more than justifies the one-time cost of journals adopting new in-text citation policies.

We are not necessarily suggesting that systematic review of all quotations should be done by reviewers/editorial staff. However, systematic review of quotations would have benefits. There is a reason that the academic review process exists: to verify and improve the quality of scientific literature. Minimizing quotation errors is certainly one way to do that, and reference verification by journal staff has been significantly correlated with fewer quotation errors [10]. However, even in the absence of such a system of editorial review, including page numbers would give readers and reviewers in studies such as ours a better chance at successfully detecting quotation errors when they happen. Furthermore, the simple act of requiring authors to specifically locate and cite a specific page would necessitate them taking more care with their use of citations.

Our second suggestion refers specifically to the Impossible to Substantiate category. We are not aware of any previous studies that include an Impossible to Substantiate category, so further explanation and justification for its inclusion is in order. Essentially, this category refers to statements being cited that either lack a clear proposition or contain a proposition that cannot be substantiated through an outside reference. For example, an article might merely mention a novel material and cite a reference discussing that material. There is no specific proposition being made. The reference is simply giving additional background information. Therefore, substantiation is impossible. In other cases, statements cannot possibly be substantiated with a reference. For example, it was not uncommon in the articles surveyed for the methods section to be replaced (in whole or in part) with a citation. Here, there is a claim: ‘The methods from this reference were used’. However, it is not possible to substantiate this claim, because the article does not include the details of the methods used for comparison.

Some may consider this approach to be overly fastidious. However, there is no good reason to allow this type of inexact and non-verifiable referencing to pervade scientific literature. The most likely reason for this type of citing is to shorten articles to save printing space. This is a weak justification in the digital age. If background information is so unimportant that it does not merit a few words in the text (‘discussed in reference 15’ or ‘see reference 15 for the history of material X’ for example), then instead of using a propositionless citation, the information should be edited out of the paper proper and included as a supplement. The citing of methods sections and other unsubstantiatable claims could be dealt with in the same manner.

Of the previous quotation error studies reviewed, 71% did not mention string citations at all, and 14% specifically excluded string citations from their research [25,723]. Only one study specifically noted a difference in error rate between single and string citations. Surprisingly, that study came to a directly opposite conclusion regarding string citations, finding major errors more common in string citations [9]. The reason behind this discrepancy is unclear, although it may be related to the study's enormous sample size (more than six times larger than the other studies reviewed) or its very limited topic focus (peer-reviewed orthopedic literature related to the scaphoid). It is also not methodologically clear if the study required each reference in a string citation to substantiate the entire proposition being made. Our study did not require this. It required only that all the references in the string—as a whole—substantiate the entire proposition and for the reference being checked to contribute to that substantiation. References mentioned in string citations tend to make overlapping points and are often redundant [29]. Therefore, using our methodology, it seems reasonable to expect string citations to be more likely to be Fully Substantiated, not less. Still, the connection between string citations and substantiation needs further investigation.

Previous research has found quotation errors in the physical, life and social sciences [123]. This study extends that research to a cross section of high-impact general science journals, finding a similar rate of errors. However, further research is needed to more fully understand the problem. This paper reviewed only a total of 250 citations, which is less than 1% of the citations included in the five target journals over the course of a year. Although this sample is in keeping with the sample size of similar studies [25,78,1023], a larger sample could produce more meaningful results. The main barrier to using a larger sample is the time cost involved. By improving citation and referencing standards for journal articles, reviewers should be able to check references more quickly. Furthermore, in this study the reviewers were not experts in the scientific disciplines to which the references belonged. Even though only two references (0.08%) were deemed too difficult to understand, some classifications required extensive research on the part of the reviewers. Expert reviewers should be able to work at a significantly faster pace, allowing for larger sample sizes. Further review of references can better show the extent of quotation errors in scientific literature. A better understanding of these errors can help decrease them, leading to better, more rigorously supported science.