Thursday, May 12, 2022

Serbian Roma: Girls were the preferred sex, owing to expected fitness benefits; Roma fathers tend to bias their investment towards taller, more endowed children, because of greater fitness pay-off

Paternal investment, stepfather presence and early child development and growth among Serbian Roma. Jelena Čvorović. Evolutionary Human Sciences, Vol 4, April 18 2022. https://www.cambridge.org/core/journals/evolutionary-human-sciences/article/paternal-investment-stepfather-presence-and-early-child-development-and-growth-among-serbian-roma/F3B5185E6333B85FEF0ACCFD5685FCA4

Abstract: Research on paternal investment and child growth and development is limited outside of high-income countries. Using nationally representative data from low-resource Serbian Roma communities, this study examined father investment (direct care), its predictors and the associations between paternal investment, stepfather presence and child physical growth and early development. The sample included 1222 children aged 35–59 months, out of which 235 were living with biological fathers. Child outcomes included height-for-age Z-scores, stunting and early child developmental score. Roma paternal investment was relatively low. There was a positive association of father investment and children's height, and no association with developmental score. The presence of father vs. stepfather did not exert any influence on children. Instead, maternal and child characteristics explained both the overall development and height for Roma children. Thus, older children, born to literate, lower parity mothers of higher status and greater investment had better developmental and growth outcomes; girls were the preferred sex, owing to expected fitness benefits. Reverse causality emerged as the most likely pathway through which the cross-sectional association of father direct care with child growth may manifest, such that Roma fathers tend to bias their investment towards taller, more endowed children, because of greater fitness pay-off.

Discussion

This paper used nationally representative data from Serbian Roma communities to assess paternal investment (as defined in this paper) and its predictors, and the associations between paternal investment, stepfather presence and early child development and physical growth.

There were several main findings. In this Roma sample, parental investment in a number of direct care behaviours was relatively low, while maternal and paternal investments were positively correlated. Overall, almost 10% of children did not receive any investment from their parents, as measured in the present study. In line with other studies from low- and middle-income countries, almost one-third of Roma fathers did not interact with their children in the surveyed care behaviours, twice the proportion of unengaged Roma mothers: 29.7 vs. 14.2% (Jeong et al., Reference Jeong, McCoy, Yousafzai, Salhi and Fink2016; Sun et al., Reference Sun, Liu, Chen, Rao and Liu2016). There was a cross-sectional association between father's investment with child's height and stunting, while maternal investment and child's height appeared to be predictors of father's investment. There were no associations between father's investment and developmental score, and stepfather vs. father presence and Roma children outcomes. Instead, maternal and child characteristics explained both the overall development and height for Roma children.

Parenting practices may be influenced by a range of child and family characteristics, and political and economic development (Walker et al., Reference Walker, Wachs, Grantham-McGregor, Black, Nelson, Huffman and Richter2011). In addition, local culture and traditions influence parenting behaviour: in many countries, childcare is culturally ‘mother centric’, with low participation from fathers (Hosegood & Madhavan, Reference Hosegood and Madhavan2010). Thus, for instance, an average Serbian father spends only 11 minutes per day with his children, while only one in 20 fathers is fully involved in parenting (Republički zavod za Statistiku, 2020). The observed low paternal investment of Roma fathers may reflect the prevailing dominant patriarchal cultural pattern with a significant sex asymmetry in parenting (Čvorović & Coe, Reference Čvorović and Coe2019). Additionally, bias in maternal reporting could account for this finding – studies have found that generally fathers tend to report significantly higher levels of involvement than mothers, contingent on numerous factors including ethnicity, the quality of the couple's relationship and the child characteristics (Charles et al., Reference Charles, Spielfogel, Gorman-Smith, Schoeny, Henry and Tolan2018).

Previous research found inconsistent relationships between father's direct care and children's height (Maselko et al., Reference Maselko, Hagaman, Bates, Bhalotra, Biroli, Gallis and Rahman2019; Jeong et al., Reference Jeong, McCoy, Yousafzai, Salhi and Fink2016), but in this study, after adjusting for potential confounding factors, Roma fathers’ direct care was positively associated with their children's height. Fathers may contribute to their offspring's well-being in a number of ways: additive paternal care (‘cooperative care’ where both parents work together to care for children at the same time) can include complementing the mother's direct care or providing resources that allow for better nutrition (Gurven et al., Reference Gurven, Winking, Kaplan, von Rueden and McAllister2009). Additive care may also include playing with a child or teaching (Starkweather et al., Reference Starkweather, Keith, Prall, Alam, Zohora and Emery Thompson2021). Paternal additive direct care may have similar impact to that of other allomothers: a child may receive a better overall gain and additive investments that thus could lead to better fitness outcomes (Emmott & Page, Reference Emmott, Page, Shackelford, A and Weekes-Shackelford2019). Given the positive correlation of Roma maternal and paternal investment, the overall investment may have positively affected children, resulting in better outcomes. However, the most obvious mode that paternal direct care can influence growth is via nutrition, i.e. feeding. MICS is cross-sectional in design, and measures of direct parental involvement did not include feeding practices, thus the results may be confounded by an unmeasured variable that correlates with paternal care. In this setting, another possible pathway that may reflect on the positive association between father investment and child height is reverse causality, such that fathers provided more care to taller/healthier children (Maselko et al., Reference Maselko, Hagaman, Bates, Bhalotra, Biroli, Gallis and Rahman2019). A Roma child's height, in addition to maternal investment, predicted father's investment: taller children had fathers who provided more direct care. In young children, height serves as a proxy for the cumulative effect of nutritional and health loads from conception (Frongillio et al., Reference Frongillo, Kulkarni, Basnet and de Castro2017). Thus, early childhood growth is an important measure of offspring quality, as it may influence future health and reproduction (Kramer et al., Reference Kramer, Veile and Otárola-Castillo2016).

Additionally, Roma child's height and the chances of stunting were influenced by the child's sex: Roma boys were more likely to be shorter and stunted than Roma girls, suggesting that they were more susceptible to nutritional inequalities than their girl counterparts of the same age. This pattern is consistent with previous findings where biased paternal investment was associated with the children's health/height but also sex (Alvergne et al., Reference Alvergne, Faurie and Raymond2009; Čvorović, Reference Čvorović2020a; Hagen et al., Reference Hagen, Hames, Craig, Lauer and Price2001). Among the Roma, parents may selectively invest in and support taller children (girls) who had the greatest potential to survive into adulthood and reproduce successfully, thus making the parents into grandparents, or in other words, enhancing the parents’ reproductive success (Čvorović, Reference Čvorović2020a). Sex preferencing among the Roma in favouring girls is a common finding in Hungarian Roma groups as well (Bereczkei & Dunbar, Reference Bereczkei and Dunbar2002). Roma girls more often than boys engage in helping-at-the-nest, have a greater chance of marrying up the socio-economic scale and produce more surviving children compared with sons. In addition, having a high-quality (taller/healthier) daughter is regarded as a considerable advantage and a source of income among Serbian Roma who practice bride price (Čvorović, Reference Čvorović2014).

Compared with other studies, where socioeconomic status and parental education were positively associated with early child development (Urke et al., Reference Urke, Contreras and Matanda2018; Jeong et al., Reference Jeong, McCoy and Fink2017; Paxson & Schady, Reference Paxson and Schady2010), in this setting, there was an apparent lack of relationship between child developmental score and also four domains of development (literacy–numeracy, physical, socioemotional and learning) with paternal investment and other variables.

In contrast to other studies, when a stepfather was introduced to the context, the presence of either father or stepfather had no influence on Roma children outcomes (Case & Paxson, Reference Case and Paxson2001). Instead, child's age and maternal characteristics explained both the height and overall development for Roma children. Thus, older children were taller and had higher developmental score. Older children have higher reproductive value, and in poor populations, the later born children are often disadvantaged relative to earlier borns in nutritional status and growth, having higher morbidity and mortality (Lawson & Mace, Reference Lawson and Mace2009). In addition, maternal parity was negatively associated with child's height. Roma mothers with higher parity had children who were shorter than those of mothers with lower parity. Under poor conditions, numerous siblings may put children at higher risk of malnutrition, because of the discrepancy between family size and available resources. Maternal parity may also serve as a rough measure for investment: body size may be a proxy for a trade-off between offspring number and quality, or between the number and size of offspring, especially under resource-scarce conditions and in high-fertility settings such as with the Roma (Walker et al., Reference Walker, Gurven, Burger and Hamilton2008). Maternal parity was higher in stepfather households, but interestingly, unlike in other studies (Amato & Rivera, Reference Amato and Rivera1999; Lawson & Mace, Reference Lawson and Mace2009), maternal investment was higher for children living with stepfathers. One possible explanation as of why children living with stepfathers experience higher maternal investment may be that a stepfather is providing some extra resources, thus the mother may be experiencing a higher quality of life in this new relationship, both of which enable the mother to better provide for her children, including direct care. MICS does not include information on stepfathers’ resources, while maternal age, basic literacy skills and access to improved sanitation (as proxies for socioeconomic status) did not differ between mothers living with biological fathers vs. stepfathers. A more likely explanation is that Roma mothers in stepfather households may be compensating for the absence of a biological father by focusing more investment and attention on the children from former unions (Emmot & Mace, Reference Emmott and Mace2014). Higher investment and more attention could also serve as protection for any number of possible negative effects in the new home. This context could perhaps explain the lack of association of maternal and paternal characteristics and child development and maternal characteristics and child height: the poor conditions may have affected the ability of Roma parents to invest, and to make substantially enough investment to be detected or differentiate between children. Thus, mothers ‘get activated’ only in the presence of stepfathers (high risk), to protect their children and compensate for even the limited paternal investment. Roma children may be sensitive to this particular setting as well: child's age was associated with growth and development only in the stepfather's presence, thus younger and older sibling get to compete more in a stepfather's household, as there is an actual maternal investment to compete for.

Furthermore, maternal age was positively associated with both height and development: generally, older mothers tend to invest more in offspring, as they are less likely to have additional children and the investment focuses on the children they already have (Uggla & Mace, Reference Uggla and Mace2016). An additional explanation could be that this relationship reflects on maternal status within a family. Many Roma women face inflexible gender roles and expectations, and for many, having children in marriage is the only socially endorsed route for an improvement in status (Čvorović & Coe, Reference Čvorović and Coe2019). A higher maternal status within a Roma family may include more power in decision-making concerning child's wellbeing such as diet and activities.

Additionally, there was a positive association of child's age, maternal investment, and literacy with children's overall developmental score. As a child ages, it is more likely to develop and learn skills and be ahead in development. The importance of maternal care behaviours and education for children's early development has been well described: parental support for learning (such as stimulating interactions and reading books) was found to be an important means through which parental education is associated with children's development (Sun et al., Reference Sun, Liu, Chen, Rao and Liu2016; Jeong et al., Reference Jeong, McCoy and Fink2017; Walker et al., Reference Walker, Wachs, Grantham-McGregor, Black, Nelson, Huffman and Richter2011). In turn, maternal education can facilitate maternal investment and practices, as increasing levels of education lead to different thinking and decision-making patterns (Cutler & Lleras-Muney, Reference Cutler and Lleras-Muney2010). This may be especially important to the Roma, given the high illiteracy rate among females: even low levels of education increase children's well-being and survival prospects (Sandiford et al., Reference Sandiford, Cassel, Sanchez and Coldham1997; Čvorović, Reference Čvorović2020a).

To the best of my knowledge, this is the first study to provide an account of paternal direct care as a proxy for investment, stepfather presence and child development and growth among the low-resource Roma. The study findings contribute new evidence of the drivers or lack of it of development and growth among children in marginalised ethnic populations, adding to the literature about paternal investment and child outcomes.

The majority of Roma children grow up in poverty, born to mothers with low education, and in homes with limited learning opportunities. In this context, parental investment was relatively low. Fathers have limited involvement in direct care of their young children and this involvement was not associated with child development. The presence of a father vs. a stepfather did not exert any influence on Roma children, insomuch as it did not have direct influence on the children's’ outcomes of growth and development. Roma paternal investment was low to begin with and father absence is likely to be less important in settings where fathers usually provide less support for their children (Lawson et al., Reference Lawson, Schaffnit, Hassan, Ngadaya, Ngowi, Mfinanga and Borgerhoff Mulder2017). In the presence of a stepfather, maternal and child's traits explained overall child development and growth. Maternal investment was higher for children living with stepfathers, thus mothers may be protecting their children from previous unions and compensating for paternal absence. Competition among Roma children – among younger and older siblings – was evident only when maternal investment was significant, in the presence of a stepfather. Thus, older children, born to literate, lower parity mothers of higher status and greater investment had better developmental and growth outcomes; girls were the preferred sex, as they were likely to be taller and less stunted than Roma boys, possibly owing to expected fitness benefits. Reverse causality emerged as the most likely pathway through which the cross-sectional, positive association of father direct care with child growth may manifest, such that Roma fathers tend to bias their investment towards taller, more endowed children, because of greater fitness pay-off.

The study had several limitations. The data were cross-sectional, limiting causal inferences between the variables under study. The developmental score and paternal investment were mother-reported and thus subject to biases: both measure how mothers perceived their child's development and their husbands’ involvement, and not actual child development and paternal behaviour. Furthermore, the reliability of Early Child Development scale was fair but similar to other recent studies, reflecting its limited adaptation to local culture and context (Urke et al., Reference Urke, Contreras and Matanda2018; McCoy et al., Reference McCoy, Sudfeld, Bellinger, Muhihi, Ashery, Weary and Fink2017). The questions regarding literacy/numeracy have been shown to be too advanced for 3- and 4-year-old children (McCoy et al., Reference McCoy, Peet, Ezzati, Danaei, Black, Sudfeld and Fink2016), this being particularly pertinent as regards Roma and other disadvantaged children where parental literacy is limited.

Additionally, to date, no specific growth references have been developed for the Roma, even though their Indian origin indicate an ethnicity impact to the anthropometric measures. Albeit the population-specific growth references may serve as a more biologically relevant measure of within-population assessment of children's growth (Kramer et al., Reference Kramer, Veile and Otárola-Castillo2016; Martin et al., Reference Martin, Blackwell, Kaplan and Gurven2019), the effects found in this study may be considerable, ranging from an approximately 0.33 SD in child's height to a more than 1 SD difference in paternal care (Winking & Koster Reference Winking and Koster2015).

As child's growth and development are sensitive to available resources, and may be affected by aspects outside of direct family influence (Lawson et al., Reference Lawson, Schaffnit, Hassan, Ngadaya, Ngowi, Mfinanga and Borgerhoff Mulder2017; Winking & Koster, Reference Winking and Koster2015), social assistance (cash transfers) may also have an effect on Roma family, including the growth and development of Roma children. For instance, in affluent settings, fathers tend to engage more in direct child care if their wives are employed and/or contribute a greater share of the couple's earnings (Raley et al., Reference Raley, Bianchi and Wang2012). Roma mothers’ receipt of welfare could motivate Roma fathers to engage in direct child care: the majority of Roma women do not work (formal income leads to withdrawal of social benefits), but still support the family with cash transfers. Nevertheless, a recent study found that among Serbian Roma, receiving social assistance was associated with disintegration and a diminished role of the family (Čvorović & Vojinović, Reference Čvorović and Vojinović2020), but whether welfare influence father–child relationships and child outcomes remains unexplored. Also, other potential confounders, such as parental height and health status, were not collected. To fully understand the effects of paternal investment on child outcomes, information should be collected directly from fathers and father-like figures and/or through observation, and include parental anthropometrics, as well as data on the presence of alloparents, which may have an effect on child outcomes, including growth (Sear & Mace, Reference Sear and Mace2008).

Low-intensity pleasure (familiar calm activities such as playing peek-a-boo) was the most influential variable in distinguishing boys from girls; girls came out higher on fear, falling reactivity, and low intensity pleasure, & boys higher on approach

Citation: Gartstein MA, Seamon DE, Mattera JA, Bosquet Enlow M, Wright RJ, Perez-Edgar K, et al. (2022) Using machine learning to understand age and gender classification based on infant temperament. PLoS ONE 17(4): e0266026. https://doi.org/10.1371/journal.pone.0266026

Abstract: Age and gender differences are prominent in the temperament literature, with the former particularly salient in infancy and the latter noted as early as the first year of life. This study represents a meta-analysis utilizing Infant Behavior Questionnaire-Revised (IBQ-R) data collected across multiple laboratories (N = 4438) to overcome limitations of smaller samples in elucidating links among temperament, age, and gender in early childhood. Algorithmic modeling techniques were leveraged to discern the extent to which the 14 IBQ-R subscale scores accurately classified participating children as boys (n = 2,298) and girls (n = 2,093), and into three age groups: youngest (< 24 weeks; n = 1,102), mid-range (24 to 48 weeks; n = 2,557), and oldest (> 48 weeks; n = 779). Additionally, simultaneous classification into age and gender categories was performed, providing an opportunity to consider the extent to which gender differences in temperament are informed by infant age. Results indicated that overall age group classification was more accurate than child gender models, suggesting that age-related changes are more salient than gender differences in early childhood with respect to temperament attributes. However, gender-based classification was superior in the oldest age group, suggesting temperament differences between boys and girls are accentuated with development. Fear emerged as the subscale contributing to accurate classifications most notably overall. This study leads infancy research and meta-analytic investigations more broadly in a new direction as a methodological demonstration, and also provides most optimal comparative data for the IBQ-R based on the largest and most representative dataset to date.

Discussion

We set out to leverage existing IBQ-R datasets from multiple laboratories (N = 4,438) to address an important gap in research by investigating age and gender classifications in early childhood, and overcoming limitations of the published studies such as small sample sizes that cannot be considered representative or provide widely generalizable results. Relying on algorithmic modeling techniques, 14 IBQ-R subscale scores served as features used to classify participating children as boys (n = 2,298) and girls (n = 2,093), and into three age groups: youngest (< 24 weeks; n = 1,102), mid-range (24 to 48 weeks; n = 2,557), and oldest (> 48 weeks; n = 779). Importantly, this approach allowed us to simultaneously classify infants into age and gender categories, providing an opportunity for the first time to consider the extent to which gender differences are informed by infant age. This study also makes an important contribution to the literature as a novel methodological demonstration. That is, the present application of machine learning algorithms provides a new direction for infancy and temperament research, as well as meta-analytic investigations more broadly.

Results based on accuracy indicators (the inverse of misclassification rates), Cohen’s kappa coefficients, and AUC (incorporating sensitivity and specificity parameters) demonstrated that temperament features provided superior classification of age groups relative to gender, which is consistent with the existing literature insofar as age effects have generally been more robust (e.g., not dependent on methodology; [5,26,52]). As noted, gender differences in infancy have been largely limited to activity level and fear/behavioral inhibition, with higher activity level and approach reported for boys [29,30] and greater fear/behavioral inhibition for girls [14,25,31,35,36]. These gender differences are somewhat controversial due to a lack of consensus regarding their origin (i.e., biologically based or largely a function of socialization; [53]) and questions regarding the role of parental expectations. That is, parents could rate boys and girls differently not due to actual variability in behavior but as a function of their own culturally influenced ideas about what is typical behavior in boys vs. girls. This explanation cannot be ruled out completely, although existing research suggests that gender differences are not entirely dependent on methodology (i.e., have been identified via behavioral observations along with parent report; [33,52]).

Importantly, gender classification by age groups results suggest this is most effective for the oldest age group, in line with the literature that indicates gender differences in temperament attributes become more pronounced with age [54]. Although a number of factors could be contributing to this pattern of results—accentuated gender differences in temperament with increasing age, and, conversely more accurate classification of gender with temperament features for oldest participants—socialization is often described as critical among these. The primary mechanism invoked in such explanations involves the infants’ interactional history, and is consistent with literature that indicates mothers respond differently to their sons and daughters [5559], presenting with different affordances as social interaction partners (e.g., [60]). Over time, such differences could result in divergent trajectories with respect to temperament due to differences in socialization goals/approaches for boys vs. girls. Specifically, parents may prioritize relationship orientation for daughters, but competence and autonomy for sons [6163]. These and other socialization-related pathways may be responsible for the stronger temperament-based classification of boys and girls later in infancy observed herein.

At the same time, gender is viewed as a marker for a host of sex-linked distinctions in physiological processes. For example, prenatal exposure to high levels of androgen is predictive of later behavior problems, primarily of the externalizing type (e.g., ADHD; [64]), and used to explain early vulnerability observed in boys with respect to this set of problems [65]. Postpartum biological effects are also possible, for example via testosterone increases for boys in infancy, referred to as “mini-puberty,” peaking by the second month and returning to baseline at about 6 months [66]. Sex-linked differentiation in brain structures and functions occurs with maturation, resulting in greater discrepancies with age. For example, Goldstein et al. [67] reported that the amygdala tends to be larger in males and the hippocampus larger in females (see Hines [68] for a related review).

Follow-up analyses outlining feature importance for classification models were performed for the Ensembled Decision Trees (Random Forest) to further interpretation of the observed results. Random Forest methods provide an effective mechanism for feature selection and importance using tree-based mechanisms to rank node classification via the mean decrease in gini impurity, i.e., the probability that a random sample in a particular tree node would be mislabeled using the distribution of the node sample, averaged across all trees [69]. Figures provided in Supplemental Materials (S1S3 Figs) demonstrate that while Fear was the most important feature in distinguishing boys and girls for the youngest and mid-range age group, for oldest infants, low intensity pleasure was most influential. In fact, for youngest infants (S3 Fig), all three distress-related scales (Fear, Distress to Limitations, Sadness) were of primary importance in classifying infants accurately by gender via the Random Forest algorithm. Positive emotionality and regulatory dimensions of temperament (e.g., Falling Reactivity, Approach) begin to take on greater importance for mid-range and oldest infants. Notably, certain temperament features detracted from model accuracy in classifying infants by gender (i.e., associated with lowest negative importance values), particularly Cuddliness, Vocal Reactivity, and Smiling and Laughter in the youngest age group and Smiling and Laughter, Perceptual Sensitivity, and Activity in the oldest age group. These results identify the temperament attributes that did not differentiate boys and girls effectively, and it is of interest that the list of these poorly differentiating features varied by age. When the most important features were considered for age classification and gender classification models only, Fear again emerged as the critical dimension, which is in line with the extensive literature documenting the developmental progression as well as gender differences for this domain of temperament [2,13,14,26,54].

This work is not without limitations, chief among these our reliance on a single method (i.e., parent report) in the assessment of infant temperament. Future studies should aggregate datasets providing different sources of information, including behavioral observations and physiological measures, such as cortisol reactivity, heart rate variability/respiratory sinus arrhythmia, and/or frontal alpha asymmetry ascertained via electroencephalogram (EEG) recordings. In addition, the outcomes examined in this study were limited to child gender and age. Future studies with older children should conduct classification analyses with additional dependent variables, particularly symptom and disorder classifications (e.g., clinical/subclinical/asymptomatic ADHD). It should be noted that we did not consider classification based on race/ethnicity because of a far more limited literature suggesting these differences can be discerned on the basis of temperament, and future research should examine related models, as relevant studies accumulate. Finally, the present modeling approach could be extended and potentially improved by applying ensembling modeling approaches (i.e., using multiple algorithms simultaneously), as opposed to relying on singular modeling frameworks.

This study underscores the importance of meta-analytic investigations and cross-laboratory collaborations, providing illusive answers to questions, such as those related to intersections of gender and age in temperament development, that have not been previously addressed. Because of the large cross-laboratory sample included herein, this study provides most optimal comparative data for the IBQ-R (Table 2), which has emerged as a widely used infant temperament assessment tool. Importantly, the present investigation serves as a methodological illustration for application of machine learning techniques in infancy and temperament research, as well as developmental science more broadly. Given the propensity for differing algorithmic methods to have strengths and weaknesses that may bias predictive outcomes and classification accuracy, we selected 11 established algorithmic modeling and classification techniques to quantify the most robust outcomes, simultaneously demonstrating the viability of machine learning approaches in this area of scientific inquiry. Results of this study make an important contribution to developmental temperament research, demonstrating effective age group classification on the basis of fine-grained temperament features, and indicating more effective gender classification for the older age group, with multiple implications for future mechanistic research examining potential socialization and biological contributors.

Brain size and IQ are positively correlated; however, multiple meta-analyses have led to considerable differences in summary effect estimations, thus failing to provide a plausible effect estimate

Of differing methods, disputed estimates and discordant interpretations: the meta-analytical multiverse of brain volume and IQ associations. Jakob Pietschnig, Daniel Gerdesmann, Michael Zeiler and Martin Voracek. Royal Society Open Science, May 11 2022. https://doi.org/10.1098/rsos.211621

Abstract: Brain size and IQ are positively correlated. However, multiple meta-analyses have led to considerable differences in summary effect estimations, thus failing to provide a plausible effect estimate. Here we aim at resolving this issue by providing the largest meta-analysis and systematic review so far of the brain volume and IQ association (86 studies; 454 effect sizes from k = 194 independent samples; N = 26 000+) in three cognitive ability domains (full-scale, verbal, performance IQ). By means of competing meta-analytical approaches as well as combinatorial and specification curve analyses, we show that most reasonable estimates for the brain size and IQ link yield r-values in the mid-0.20s, with the most extreme specifications yielding rs of 0.10 and 0.37. Summary effects appeared to be somewhat inflated due to selective reporting, and cross-temporally decreasing effect sizes indicated a confounding decline effect, with three quarters of the summary effect estimations according to any reasonable specification not exceeding r = 0.26, thus contrasting effect sizes were observed in some prior related, but individual, meta-analytical specifications. Brain size and IQ associations yielded r = 0.24, with the strongest effects observed for more g-loaded tests and in healthy samples that generalize across participant sex and age bands.

4. Discussion

In this quantitative research synthesis, we show that positive associations of in vivo brain volume with IQ are highly reproducible. This link is consistently observable regardless of which empirical studies are included in a formal meta-analysis and how they are analysed. Results of our analyses convergently indicate that the effect strength must be assumed to be small-to-moderate in size, with the best available estimates for healthy participants in full-scale IQ ranging from r = 0.24 (uncorrected; approximately 6% explained variance) to 0.29 (corrected approximately 8% explained variance). Effects for full-scale IQ appear to be stronger and more systematically related to moderators compared to verbal and performance IQ. However, these three intelligence domains are highly intercorrelated and their correlation with IQ test results are to be seen as manifestations of a largely similar true effect across domains. We, therefore, focus on full-scale IQ findings of healthy samples in our discussion, unless indicated otherwise.

4.1. Comparisons with previous meta-analyses

The strengths of the observed summary effects in the present meta-analysis correspond closely to those identified by Pietschnig et al. [24], although the number of participants in this updated analysis is more than three times larger. The observed association for full-scale IQ in healthy samples (i.e. corresponding to selection criteria of the meta-analyses from [25], and [23]) resulted in an estimate of r = 0.24 (95% CI [0.22; 0.27]), thus indicating considerably lower associations than those reported by Gignac & Bates [25]) and McDaniel [23]). Key characteristics of the available meta-analyses are summarized in table 5.

Table 5.

Characteristics of available meta-analyses on the in vivo brain volume and intelligence link. Note. k = number of independent samples in analysis; summary effect = best estimate according to authors of meta-analysis; when both Hedges & Olkin- as well as Hunter & Schmidt-typed analyses were performed, both estimates are provided, respectively.

It could be argued that these inconsistencies are to a certain extent due to the differing methodological focus of the used analyses because both meta-analyses of Gignac & Bates [25] and McDaniel [23] reported values that were corrected for direct range restriction. However, when we respecified our analyses to apply identical methods, full-scale IQ associations for healthy samples once more led to a lower estimate, yielding r = 0.29. This indicates that the reported estimates of prior Hunter & Schmidt-based syntheses were inflated (i.e. even before accounting for dissemination bias).

This idea is supported by our analyses of individual data subsets that used the very same specifications as these prior studies. For instance, Gignac & Bates [25] showed that IQ assessments with higher g-ness (i.e. reflecting abilities that are more closely related to psychometric g, thus providing a better representation of cognitive abilities) yielded larger associations than less g-loaded assessments. They concluded that the most salient estimate of the brain volume and IQ association averages r = 0.40 (i.e. corresponding to about 16% of explained variance), based on a specific subset of effect sizes that should provide the most credible results (i.e. using healthy samples, tests with excellent g-ness and attenuation-corrected effect sizes only).

None of the reasonable specifications that were included in our specification curve analysis yielded a summary effect that was larger than r = 0.37. Importantly, this most extreme upper value of all possible specifications was based on the very same inclusion criteria as the specification that is supposed to represent the best operationalization of this association according to Gignac & Bates [25], healthy samples, excellent g-ness, range departure corrected, Hunter & Schmidt estimator), excepting sample age (this uppermost value was based on children/adolescents only; the same specification with all ages yielded r = 0.34, corresponding to 11% of explained variance). This is important for a number of reasons.

First, it shows that the specification that was chosen by Gignac & Bates [25] leads to estimates in the extreme upper tail of the distribution of reasonable summary effects. Besides yielding uncharacteristically large values, these estimates have large confidence intervals (i.e. representing higher effect volatility), because they are based on comparatively small sample numbers. Results from our combinatorial meta-analyses showed that at least 75% (i.e. the bottom three quartiles) of results yielded values below r = 0.26.

Second, these findings suggest that the estimate reported in Gignac & Bates [25] must be considered to have been inflated, even when one was to assume that this extreme specification yields the most salient estimate for the brain volume and IQ association (i.e. the summary effect in [25], exceeds the upper threshold of any estimate of the present summary effect distribution). Third, the lower summary effects in the present analyses compared to the earlier estimate of Gignac & Bates [25], when identical specifications were used, indicate that the studies that were added in the present update of the literature reported lower correlations, thus conforming to a decline effect [21,22].

Consistent with this interpretation, publication years of primary studies predicted brain volume and IQ associations negatively, indicating decreasing effect sizes over time. Cross-temporally declining effect sizes have been demonstrated to be prevalent in psychological science in general and intelligence research in particular, especially when initial study sample sizes are small [22]. This means that early and small n (=imprecise) primary study reports represent more often than not overestimates of the brain size and IQ association, thus having led to inflated meta-analytic summary effects. The presently observed effect declines and comparatively large effect estimates of early small-n studies (e.g. [5]) are consistent with the decline effect and its assumed drivers.

4.2. Moderators

It is unsurprising that effects were typically stronger in healthy than in patient samples because the included patients suffered from different conditions that are likely to impair cognitive functioning (e.g. autism, brain traumas, schizophrenia) which is bound to introduce statistical noise into the data. Therefore, effects of moderators were substantially weaker and less unequivocal for patients than for healthy samples.

Consistent with Gignac & Bates [25], there were stronger associations with highly g-loaded tests compared to fairly g-loaded ones in healthy participants (uncorrected rs = 0.31 versus 0.19; Q(2) = 23.69; p < 0.001), but not in patient samples. These results were supported by the findings from our regression analyses where larger g-ness positively predicted effect sizes of healthy participants.

Within any examined subgroup, correlations that had been reported within publications were numerically larger than those that had been obtained through personal communications or from the grey literature. This suggests that correlations were selectively reported in the published literature although only differences in full-scale IQ associations of healthy samples reached nominal significance. This observation is consistent with effect inflation because larger associations are more likely than smaller ones to be numerically reported in the literature (numerically stronger effects are more likely to become significant—depending on sample sizes and accuracy—and therefore more likely to be published), thus potentially leading to inadequate assumptions of the readers about the effect strength. This finding is supported by results from our regression analyses that showed weaker effects of unpublished than published effect sizes. This suggests that the reported effects in the brain size and intelligence literature are more often inflated than not, thus conforming to results from Pietschnig et al. [24].

In a similar vein, publication years were negatively related to effect sizes, thus indicating a confounding decline effect [21] and conforming to cross-temporally decreasing effect sizes as reported in an earlier meta-analysis [24].

The only further moderator with consistent directions in terms of the observed association appeared to be measurement type which consistently yielded larger estimates for intracranial than for total brain volume, although these differences did not reach nominal significance (except for verbal IQ in patient samples). There were no consistent patterns in regard to age or sex in subgroup or regression analyses, thus conforming to a previous account that indicated that brain volume and IQ associations generalize over participant age bands and sex ([24]; but see [23], for conflicting findings).

4.3. Dissemination bias

Three of our formal methods for detecting dissemination yielded significant bias indications for both full-scale and performance IQ (Sterne & Egger's regression, Trim-and-Fill analysis, Copas & Shi's method), while only one method (Trim-and-Fill analysis) indicated bias in verbal IQ. The evidence for bias was stronger for full-scale than performance IQ. It should be noted, that both Sterne and Egger's regression, as well as the Trim-and-Fill analysis, are funnel plot asymmetry-based methods and consequently particularly sensitive for the detection of small-sample effects. This means that the detected bias seems to be rooted in the correspondingly large error variance of underpowered (i.e. small sample size) studies and is consistent with previously raised concerns about suboptimal power in neuroscientific research [201]. Viewed from this perspective, declining effect sizes over time appear to be somewhat reconciliatory, because this may well mean that average study power has increased in this field (or at least in studies addressing this research question).

The low observed replicability indices for all three domains further corroborate the evidence for effect inflation. Similarly, results of our effect estimations by means of p-value-based methods support the evidence for confounding dissemination bias, as previously observed in regard to this research question [24]. This interpretation is consistent with larger effects from published sources than from those that were obtained from the grey literature or personal communications, although these differences only reached nominal significance in meta-regressions, but not subgroup analyses.

The present findings contrast the conclusions of Gignac & Bates [25] who did not identify bias evidence in their analysis. This discrepancy may be due to two different causes.

On the one hand, Gignac & Bates [25] included unpublished results in the publication bias detection analyses (i.e. results that [24], had obtained from the grey literature or through personal communications with authors), which (i) prevent potential bias from detection and (ii) are conceptually unsuitable to be used in p-curve and p-uniform analyses [50,51]. On the other hand, different methods of dissemination bias detection are not equally sensitive for different forms of bias, thus necessitating a triangulation of methods for bias estimation according to current recommendations [42]. Relying on comparatively few and conceptually similar detection methods (i.e. publication bias tests of two p-value-based methods; p-curve and p-uniform; Henmi-Copas approach) may have contributed to the non-detection of bias evidence in this past meta-analysis [25], particularly because these methods are not suitable to detect small-sample effects.

Although the present findings indicate a presence of confounding publication bias, this should not be interpreted as evidence against a brain volume and IQ link. As pointed out above, these associations appear to generalize across numerous potential moderators and replicate well in terms of the identified direction. However, confounding dissemination bias suggests that the obtained summary effects in many primary studies (and even some meta-analyses) represent inflated estimates of the true association. However, it needs to be acknowledged that the future development of more reliable methods for assessing IQ on the one or in vivo brain volume on the other hand may lead to larger correlation estimates in primary studies. Nonetheless, the strength of the brain volume and IQ association must be considered to be small-to-medium-sized at best.

4.4. Significance of the observed effect

On the one hand, the strength of the observed summary effect suggests that effects of mere neuron numbers, glial cells, or brain reserve are unlikely candidates for the explanation of between-individuals intelligence differences. On the other hand, the effect is clearly non-trivial and has turned out to be remarkably reproducible in terms of its positive direction across a large number of primary studies. Consequently, brain volume should not be seen as a supervenient (i.e. one-to-one) but rather an isomorphic (i.e. many-to-one) proxy of human intelligence. This may mean that brain volume in its own right is too coarse of a measure to reliably predict intelligence differences. It seems likely that examining the role of functional aspects (e.g. white matter integrity) and more fine-grained structural elements (e.g. cortical thickness; see [2]) may help in further clarifying the neurobiological bases of human intelligence.