Friday, May 21, 2021

Prisoners’ perceptions: They engaged in self-enhancing comparisons, differentiating themselves from other prisoners and their past selves who committed the crime, and overestimated criminality in the general population

From 2020... Explaining the better-than-average effect among prisoners. Sarah G. Taylor, Hedwig Eisenbarth, Constantine Sedikides, Mark D. Alicke. Journal of Applied Social Psychology, October 28 2020.

Abstract: We addressed explanations for why prisoners manifest the Better-Than-Average Effect (perceptions of superiority to the average peer), focusing on three biases: self-enhancing (social as well as temporal) comparisons, denial, and self-serving attributions. We tested the Better-Than-Average Effect in regards to prisoners’ perceptions of their worst trait, and assessed the relationship between the three biases and positive self-evaluations. Prisoners engaged in self-enhancing comparisons, differentiating themselves from other prisoners and their past selves who committed the crime, but also expected self-improvement in the future. Prisoners also demonstrated denial for intentions to commit the crime, planning of it, recidivism, and over-estimation of crime prevalence in the general population. Although prisoners made self-serving attributions by distancing their own character from their criminal behavior and reporting they had experienced more hardship relative to others, they did not attribute the cause of their crime to such hardship. More extensive self-enhancing temporal comparisons and denial predicted more positive self-evaluations of prisoners’ worst trait relative to the average community member. The strength of some of these biases varied with levels of narcissism and psychopathy.

Men were more likely than women and older participants were more likely than younger participants to indicate that they would cheat on their partners if they were in a long-term intimate relationship

Plurality in mating: Exploring the occurrence and contingencies of mating strategies. Menelaos Apostolou. Personality and Individual Differences, Volume 175, June 2021, 110689.


• Nominates nine different mating strategies

• Finds that 68% of the participants preferred a mixed mating strategy

• Finds that 82.8% of the participants preferred to have initially or eventually one long-term partner

• Finds significant sex, age and sexual orientation differences in the adoption of mating strategies

Abstract: People adopt a variety of strategies in order to achieve specific mating goals. The current research nominated nine different mating strategies, and attempted to estimate their occurrence. Evidence from an online sample of 6273 Greek-speaking participants, indicated that a mixed strategy was in the highest occurrence, followed by a long-term and a short-term mating strategy. Men were more likely than women to prefer a short-term and a mixed mating strategy, and that younger participants were more likely to prefer a mixed than a long-term mating strategy. In addition, heterosexual women with same-sex attraction were more likely than exclusively heterosexual women to prefer a short-term and a mixed strategy than a long-term mating strategy. Furthermore, we found that men were more likely than women and older participants were more likely than younger participants to indicate that they would cheat on their partners if they were in a long-term intimate relationship. Furthermore, heterosexual with same-sex attraction, bisexual and homosexual men and women were more likely than exclusive heterosexual participants to indicate that they would cheat on their partners when in a long-term intimate relationship.

Keywords: Mating strategiesMatingSex differencesCheatingInfidelity

From 2018... Applying automatic text-based detection of deceptive language: How we lie to the police

Applying automatic text-based detection of deceptive language to police reports: Extracting behavioral patterns from a multi-step classification model to understand how we lie to the police. Lara Quijano-Sanchez et al. Knowledge-Based Systems, Volume 149, 1 June 2018, Pages 155-168.


• VeriPol is an effective text-based lie detection model for police reports.

• Our model includes feature selection by L1 penalization and heuristic rules.

• Computational experiments on a real dataset show a validation accuracy of 91.

• A pilot study shows a lower bound on the empirical precision of 83%, approx.

• The model analysis provides linguistic insights of how people lie to the police.

Abstract: Filing a false police report is a crime that has dire consequences on both the individual and the system. In fact, it may be charged as a misdemeanor or a felony. For the society, a false report results in the loss of police resources and contamination of police databases used to carry out investigations and assessing the risk of crime in a territory. In this research, we present VeriPol, a model for the detection of false robbery reports based solely on their text. This tool, developed in collaboration with the Spanish National Police, combines Natural Language Processing and Machine Learning methods in a decision support system that provides police officers the probability that a given report is false. VeriPol has been tested on more than 1000 reports from 2015 provided by the Spanish National Police. Empirical results show that it is extremely effective in discriminating between false and true reports with a success rate of more than 91%, improving by more than 15% the accuracy of expert police officers on the same dataset. The underlying classification model can be analysed to extract patterns and insights showing how people lie to the police (as well as how to get away with false reporting). In general, the more details provided in the report, the more likely it is to be honest. Finally, a pilot study carried out in June 2017 has demonstrated the usefulness of VeriPol on the field.

Keywords: Lie detectionInformation extractionPredictive policingModel knowledge extractionNatural language processingDecision support systems

The Ethnographic Atlas covers more than 1200 pre-industrial societies but has been seen skeptically; paper documents positive associations between the Atlas & self-reported data from 790,000 individuals across 43 countries

Tabulated nonsense? Testing the validity of the Ethnographic Atlas. Duman Bahrami-Rad, Anke Becker, Joseph Henrich. Economics Letters, Volume 204, July 2021, 109880.


• We validate the Ethnographic Atlas, a popular anthropological database.

• We benchmark the ethnographic data with self-reports from survey respondents.

• Ethnographic data and contemporary self-reports are positively correlated.

• Our results provide evidence for the validity of ethnographic accounts.

Abstract: The Ethnographic Atlas (Murdock, 1967), an anthropological database, is widely used across the social sciences. The Atlas is a quantified and discretely categorized collection of information gleaned from ethnographies covering more than 1200 pre-industrial societies. While being popular in many fields, it has been subject to skepticism within cultural anthropology. We assess the Atlas’s validity by comparing it with representative data from descendants of the portrayed societies. We document positive associations between the historical measures collected by ethnographers and self-reported data from 790,000 individuals across 43 countries.

Keywords: Ethnographic AtlasValidationCulture

3.2 Results

Twelve domains are equivalently represented in the DHS and the Atlas: (1) patrilocality, (2) matrilocality, (3) polygyny, (4) reliance on animal husbandry, (5) reliance on agriculture, (6) length of post-partum abstinence, (7) breastfeeding duration, (8) insistence on virginity, (9) a preference for sons, (10) prevalence of domestic violence, (11) age difference between husband and wife, and (12) geographical location.2

Throughout, we find positive associations between the ethnographic information from the Atlas and the self-reported individual-level data from the DHS (Table 1). Columns (1) to (5) list the results for variables that capture different aspects of kinship organization and subsistence modes. Almost all associations are positive, statistically significant, and sizeable. For example, a one standard deviation increase in the prevalence of historical patrilocality is associated with a 0.8 percentage points increase in the likelihood that an individual lives patrilocally today. This amounts to about twelve percent of the unconditional probability

of living patrilocally in this sample (0.7). We can only speculate about the lack of association for reliance on agriculture, which could be due to differences in pre-industrial and contemporary agriculture, or the fact that the DHS variable captures only one specific aspect of contemporary reliance on agriculture.

Columns (6) to (11) list results for variables that capture social norms, customs, or preferences. Again, the associations between the historical and contemporary measures are positive throughout, in most cases statistically significant, and often meaningful in terms of size. For example, a one standard deviation increase in the historical length of post-partum abstinence is associated with a twelve percentage points increase in how long respondents today abstain after childbirth. For the preference of female virginity before marriage the association between the two measures is very small. This can plausibly be attributed to the lack of variation in the contemporary sample: about 93% of respondents express this attitude. Again, we can only speculate about the lack of association between the historical age of an infant at the onset of weaning in an ethnic group and the average breastfeeding duration of its descendants. It could be that male ethnographers could not make informed guesses about this dimension, or that breastfeeding practices have undergone substantial change during the past century.

Finally, we show that geographical location of the centroid of an ethnic group as reported by ethnographers is related to where people actually live today. For each individual in the DHS for whom we have information on geographical location, we calculate the distance in kilometers to the centroid of the homeland of her ancestral society. Figure 1 in the supplementary material shows the distribution. The median distance is 168 kilometers and a non-negligible fraction of about 12 percent live as close as 50 kilometers to the centroid of their ancestral homeland.

2  Table 2 in the supplementary material describes how these dimensions are measured in the Atlas and the DHS

From 2013... Cultural traits studies from large, cross-cultural datasets: We are underestimating the probability of finding spurious correlations between cultural traits

From 2013... Roberts S, Winters J (2013) Linguistic Diversity and Traffic Accidents: Lessons from Statistical Studies of Cultural Traits. PLoS ONE 8(8): e70902.

Abstract: The recent proliferation of digital databases of cultural and linguistic data, together with new statistical techniques becoming available has lead to a rise in so-called nomothetic studies [1]–[8]. These seek relationships between demographic variables and cultural traits from large, cross-cultural datasets. The insights from these studies are important for understanding how cultural traits evolve. While these studies are fascinating and are good at generating testable hypotheses, they may underestimate the probability of finding spurious correlations between cultural traits. Here we show that this kind of approach can find links between such unlikely cultural traits as traffic accidents, levels of extra-martial sex, political collectivism and linguistic diversity. This suggests that spurious correlations, due to historical descent, geographic diffusion or increased noise-to-signal ratios in large datasets, are much more likely than some studies admit. We suggest some criteria for the evaluation of nomothetic studies and some practical solutions to the problems. Since some of these studies are receiving media attention without a widespread understanding of the complexities of the issue, there is a risk that poorly controlled studies could affect policy. We hope to contribute towards a general skepticism for correlational studies by demonstrating the ease of finding apparently rigorous correlations between cultural traits. Despite this, we see well-controlled nomothetic studies as useful tools for the development of theories.


Building better corpora

One of the most challenging issues to resolve is minimising the distance between those doing the data analysis and those researchers involved at other levels (e.g. field linguists). Part of the appeal of the nomothetic approach is the ease and cost-effectiveness in performing the analysis [14]. However, if the fundamental problems outlined in this paper are to be overcome, then there a few solutions we can apply to this distance problem which involve improving the data quality. First, we want to increase the resolution of each individual variable. So, to take the previous example of consonant inventory size, the aim should be to report all accounts and not select one on the basis of prior theoretical assumptions. Having more data per variable will increase the statistical power for nomothetic studies. Second, minimising distance can be achieved by using multiple and, ideally, independent datasets that work together to build up mutually supporting evidence for or against a particular hypothesis. Different datasets can take the shape of those derived from different large-scale studies (e.g. Phoible [75] and WALS for phoneme inventory counts [55]), idiographic accounts of individual case studies and experimental data.

Thirdly, databases such as the WALS indicate linguistic norms for populations, but may not capture the variation within and between individuals. One solution is for the primary data to be raw text or recordings of real interactions between individuals [76] and for population-level features, such as grammatical rules, to be derived directly from these. While collecting adequate amounts of data of this kind is more difficult, and while it is not free of biases, it offers a richer source of information.

Furthermore, databases should be collected and coded with specific questions in mind, otherwise there is a risk that correlations could emerge due to biases in the original motivation for the database. For example, the database that was used to demonstrate a link between future tense and economic behaviour was designed to identify similarities between European languages, which also happen to be culturally related and relatively wealthy [36].

Model comparison

The correct null models to use when assessing cultural traits can be difficult to estimate, or unintuitive. As we shall demonstrate below, standard baselines of chance may not be conservative enough to eliminate spurious correlations. Rather than use random chance as a baseline, studies should compare competing hypotheses (as in [7]). Model comparison techniques allow researchers to test one model against another to see which better explains a particular distribution of data [77][78]. So, whereas standard regression techniques are able to tell you the amount of deviance explained by a particular model, they do not provide information about whether you should have a preference for one model over another given a particular set of data. Model comparison techniques are therefore useful summaries of the available information and are better viewed as inductive-style approaches that should be complementary to the hypothetico-deductive and falisificationist approaches more typically associated with the scientific process [72]. Model comparison can also be used to test linear versus non-linear assumptions.

Phylogenetic comparative methods

A simple, although conservative, test that controls for the relatedness of languages is to run the analysis within each language family (as in [1]). For example, the correlation between acacia trees and tonal languages is only significant for one language family, which is evidence against a causal relationship. However, more sophisticated methods are available. Studies of cultural traits have borrowed tools from biology to control for the non-independence of cultures [11]. Comparative methods include estimating the strength of a phylogenetic signal [49][79] and estimating the correlation between variables while controlling for the relatedness of observations [80][82]. For example, in the analyses above we found that speakers who take siestas have grammars with less verbal morphology. While experiments show that daytime naps affect procedural memory [83], which has been linked to morphological processing [84], the predictions run in the opposite direction to the results. However, doing the same analysis, but accounting for the relatedness of languages using a phylogenetic tree [80], this correlation disappears entirely (r = 0.017, t = 0.13, p = 0.89, see methods). This highlights the very different implications that can come out of nomothetic studies when considering the independence of the observations.

While phylogenetic methods are relatively new and phylogenetic reconstruction (see below) is computationally expensive, software for phylogenetic comparative methods is freely available (e.g. packages for R, [85][88]) and do not require intense computing power. The more limiting factor for studies of linguistic features is a lack of standard, high-resolution phylogenetic trees.

Other phylogenetic techniques have been used to reconstruct likely trees of descent from cultural data (e.g. [89][91]). These may also be useful as further steps for determining whether links between cultural traits discovered by nomothetic studies are robust. For example, apparent universals in the distribution of linguistic structural features may actually be underpinned by lineage-specific trends [92].

Causal graphs

Our analyses above suggests that cultural features are linked in complex ways, making it difficult to know what to control for in a specific study and potentially casting doubt on the value of nomothetic approaches. However, we see nomothetic studies as a useful tool for exploring complex adaptive systems. One change to the approach which could offer better resistance to the problems above would be to move away from trying to explain the variance in a single variable of interest towards analysing networks of interacting variables.

One method that could aid this type of analysis is the construction of causal graphs from large datasets [15]. While mediation analyses are often used to assess the causal relationship between a small number of variables [4], recent techniques are designed to handle high-dimensional data. We applied this technique to many of the variables in the study above. Figure 4 shows the most likely directed, acyclic graph that reflects the best fit to the relationships between the variables. We emphasise that this graph should be interpreted as a useful visualisation and as a hypothesis-generating exercise rather than representing proof of causation between variables.

Figure 4. The most likely directed, acyclic graph of causal relationships between different variables in this study.

Boxes represent variables and arrows represent suggested causal links going from a cause to an effect. See the methods section for details.

Some interesting relationships emerge. First, some elements make intuitive sense, like the contemporary pathogen prevalence relying on the historical pathogen prevalence and the Gini coefficient (the balance between rich and poor within a country). Also, environmental variables like the number of frost days, mean growing season and mean temperature are linked.

More importantly, while the initial analysis above finds a direct correlation between linguistic diversity and road fatalities, even controlling for many factors, the causal graph analysis suggests that linguistic diversity and road fatalities are not causally linked. Instead, linguistic diversity is affected by demographic variables such as population size and density while road fatalities are affected by economic indicators such as GDP and the Gini coefficient. Similarly, the analysis suggests that tonal languages and the presence of acacia trees are not causally linked.

While the causal graph mainly provides evidence against some of the correlations above, it may also suggest interesting areas of further investigation. Interestingly, the causal graph suggests that collectivism is not directly linked with the genetic factors implicated by [4], but the relationship is mediated by (current) migration patterns. While speculative, it would be interesting to test the hypothesis that the distribution of genetic factors that are correlated with collectivism emerged by a process of selective migration (although see [93]). For example, the genotype that correlates with more collectivist countries is associated with a greater risk of depression under stress [29], so perhaps this gene came under selection in harsher climates. Indeed, we find some support for this idea, since adding environmental variables improves the fit of the model predicting the distribution of genotypes (compared to [4], see methods section). In this way, causal graph analyses may be a useful additional tool that can be used to explore relationships between complex adaptive variables such as cultural traits. Since the range of hypotheses suggested by inductive approaches can be very large, methods such as causal graphs can point to fruitful hypotheses to develop with more conventional approaches such as experiments.

Sexual self-schema reflects an individual’s cognitive representations of oneself as a sexual person; the more important women ranked religion, the more their SSS was consistently negative

Formal and informal sources of sexual information predict women’s sexual self-schema. Anneliis Sartin-Tarm, Kirstin Clephane, Tierney Lorenz. The Canadian Journal of Human Sexuality, Vol. 30, No. 1, April 29, 2021.

Abstract: Sexual self-schema (SSS) reflect an individual’s cognitive representations of oneself as a sexual person, and predict critical sexual health and wellbeing outcomes in women. Like other cognitive structures, SSS are thought to form through exposure to different kinds of information. The current exploratory study investigated associations between young women’s experiences with different sources of sexual information and their SSS valence and complexity. Respondents (n = 401) completed a validated SSS measure and ranked their perceived importance of different sources of sexual information as they were growing up. We found that the more important women perceived their friends as sources of sexual information, the more consistently their SSS was negative or aschematic (i.e., neither positive nor negative). In contrast, the more important they ranked partners, the more their SSS was positive or coschematic (i.e., both positive and negative). Finally, the more important women ranked religion, the more their SSS was consistently negative. Overall, preliminary associations suggest that friends, partners, and religion influence young women’s SSS valence and complexity. Further research may examine directionality and mechanistic causality of these associations, as well as how multiple varied sources of information interact to produce diverse SSS configurations.

KEYWORDS: Adolescent sexual behaviour, sex education, sexual information, sexual self-schema, women’s sexuality