Is Infidelity Predictable? Using Explainable Machine Learning to Identify the Most Important Predictors of Infidelity. Laura M. Vowels, Matthew J. Vowels & Kristen P. Mark. The Journal of Sex Research, Aug 25 2021. https://doi.org/10.1080/00224499.2021.1967846
Abstract: Infidelity can be a disruptive event in a romantic relationship with a devastating impact on both partners’ well-being. Thus, there are benefits to identifying factors that can explain or predict infidelity, but prior research has not utilized methods that would provide the relative importance of each predictor. We used a machine learning algorithm, random forest (a type of interpretable highly non-linear decision tree), to predict in-person and online infidelity across two studies (one individual and one dyadic, N = 1,295). We also used a game theoretic explanation technique, Shapley values, which allowed us to estimate the effect size of each predictor variable on infidelity. The present study showed that infidelity was somewhat predictable overall and interpersonal factors such as relationship satisfaction, love, desire, and relationship length were the most predictive of online and in person infidelity. The results suggest that addressing relationship difficulties early in the relationship may help prevent infidelity.
Discussion
Infidelity is relatively common, with up to half of those in relationships having engaged in infidelity (Mark & Haus, 2019; Mark et al., 2011; Thompson & O’Sullivan, 2016) with potentially devastating consequences for relationships causing distress (Thompson & O’Sullivan, 2016) and often divorce (Amato & Previti, 2004). Infidelity is likely to affect not only the couple members but also their children, extended family, and friends. It is important to identify potential risk factors for infidelity to target interventions that could prevent infidelity from occurring in the first place. The purpose of the present study was to identify potential factors associated with infidelity and to quantify and compare different factors to better understand which variables are the most strongly associated with infidelity.
A large body of literature has attempted to identify which factors contribute to infidelity. However, the studies have relied exclusively on linear models, which are often completely uninterpretable due to problems such as incorrect specification of the underlying causal structure, multicollinearity, unattainable parametric assumptions, and inability to examine complex associations (Breiman, 2001a; Lundberg et al., 2020; Yarkoni & Westfall, 2017). The present study is the first of its kind to examine predictors of infidelity using interpretable predictive models: random forests (Breiman, 2001b) with Shapley values (Lundberg et al., 2017, 2019). Based on our findings, the short answer to the question posed in the title, “is infidelity predictable?,” is somewhat. The effect sizes that consider the true and false positives and negatives of both classes ranged between small (.18) to large effect (.49) across analyses and samples suggesting that even though we were able to predict infidelity generally well above chance level, there are also other factors that we had not accounted for.
The Comparison of Predictors of Infidelity
While we examined the predictive accuracy of our models, our main aim was to compare a range of different factors in their ability to predict infidelity. A recent systematic review found that while demographics and individual characteristics are inconsistently associated with infidelity, relationship variables tend to be more consistent across studies (Haseli et al., 2019). We also found that relationship characteristics (relationship satisfaction, relationship length, dyadic desire, sexual satisfaction, romantic love, and some sexual activities within the relationship) were consistently in the top-10 most important predictors across different samples. These findings suggest that addressing relationship issues may buffer against the likelihood of one partner going out of the relationship to seek fulfillment. However, it is also important to note that while individuals who were more satisfied in their relationship were generally less likely to engage in infidelity, a subsample of highly satisfied individuals had engaged in infidelity in the past. This may either reflect the idea that infidelity does also occur in happy relationships (Perel, 2017) or perhaps couples have worked through the infidelity and by the time they responded to the survey were satisfied in their relationship (Olson et al., 2002).
Furthermore, because online infidelity has become more commonplace given the technological advances in recent years (Albright, 2008), we also examined predictors of online infidelity. Interestingly, one of the strongest predictors of a decreased likelihood of having engaged in infidelity online was never having had anal sex in the present relationship. This may reflect more restrictive attitudes toward sexuality overall. Indeed, attitudes toward sexuality were measured in Study 1 and ranked among the Top-10 predictors of online infidelity. However, the relationship was more complex, with the most liberal sexual attitudes predicting an increase in likelihood of having engaged in infidelity whereas more moderate and conservative attitudes predicted a decrease. These results are in line with other studies that have found that more permissive sexual attitudes have been associated with an increased likelihood of having engaged in infidelity (Fincham & May, 2017; Haseli et al., 2019; Martins et al., 2016). Higher relationship length and sexual desire also increased the likelihood of having engaged in online infidelity. However, sexual and relationship satisfaction were only among the top predictors in one of the two samples.
The results of the present study corroborate many of the existing studies, and akin to a recent systematic review (Haseli et al., 2019), show that the most robust predictors of infidelity lie within the relationship: individuals who are more satisfied and in love in their relationship are less likely to have engaged in infidelity. There are also a number of factors that have previously been associated with infidelity that were not among the most important predictors in the present study: education (Atkins et al., 2001; Martins et al., 2016; Treas & Giesen, 2000), relationship status (Amato & Previti, 2004; Fincham & May, 2017), and attachment (Fincham & May, 2017; Haseli et al., 2019; McDaniel et al., 2017). We only examined attachment in Study 1 and higher attachment avoidance did predict an increased likelihood of having engaged in infidelity in the total sample but was not among the top-10 predictors for men or women. Attachment anxiety was not predictive of past infidelity. Furthermore, many previous studies suggest that men are more likely to engage in sexual infidelity than women (Labrecque & Whisman, 2017; Petersen & Hyde, 2010). In the present study, being a man was only an important predictor of past online infidelity in one sample, supporting studies that have found that the gender gap in infidelity is decreasing (Allen et al., 2006; Fincham & May, 2017; Mark et al., 2011; Treas & Giesen, 2000).
There were also some inconsistencies in the findings across the two samples. In Study 1, hormonal contraceptives decreased the likelihood of men having engaged in online infidelity whereas in Study 2 the use of hormonal contraceptives increased the likelihood of both men and women having engaged in online infidelity. The use of hormonal contraceptives does not prevent sexually transmitted infections and therefore increases the likelihood of passing any potential infections from the infidelity partner to the primary partner. This may deter people from engaging in infidelity face-to-face and instead seek alternative partners online. It is not clear why in one sample hormonal contraceptives increased the likelihood of engaging in infidelity and in another decreased it and the role of contraceptives on infidelity warrants further investigation. Furthermore, because each individual predictor only predicted very little variance in the outcome, interpreting each individual variable becomes more difficult. When the signal is stronger (i.e., a variable predicts a larger amount of variance) the prediction also becomes more accurate.
Implications for Theory and Future Research
The present study examined predictors of infidelity from the ecological theory perspective (Bronfenbrenner, 1994). Specifically, we tested the ECSD model from a recent systematic review that suggested that both partners’ individual as well as couple’s factors predict infidelity. We found little evidence to suggest that partner variables predicted actor’s engagement in infidelity. In fact, in some analyses the predictive accuracy of the models decreased as a result of including partner variables in the models, suggesting that adding partner factors in the models may add noise that makes it more difficult for the model to make accurate predictions. Additionally, the present study suggested that relationship-related variables contributed the most to the prediction. However, it is important to caveat these findings in that we were essentially predicting infidelity in the past from the present variables. Therefore, it is possible that couples in which infidelity had occurred had worked through the infidelity and were now happier in their relationship than before.
In addition to relational variables, variables that tapped into people’s attitudes were also predictive of both in person and online infidelity. Overall, having less permissive attitudes toward sexuality suggested a decreased likelihood of having engaged in infidelity. Individuals with highly liberal attitudes were the most likely to have engaged in infidelity in the past. Certain sexual behaviors such as the use of sex toys, anal sex, and masturbation with a partner may also have acted as a proxy for attitudes in the present study. Indeed, previous studies have suggested that sexual attitudes and behaviors go hand in hand (Lefkowitz et al., 2014). The results of the present study suggested that individuals who had not engaged in traditionally more permissive sexual behaviors such as using sex toys or having anal sex were less likely to also have engaged in infidelity. Most other individual variables were not consistently among the Top-10 predictors of infidelity, which may explain why the results from previous studies (Haseli et al., 2019; Mark & Haus, 2019) have been inconsistent, especially when examining socio-demographic variables.
Finally, the purpose of the present study was to examine a range of variables in their ability to predict infidelity. Overall, each variable alone predicted little variance in infidelity. Therefore, the results do not suggest that there is one single, or a few, variables that are highly predictive of infidelity. Instead, a large number of variables together resulted in the algorithm’s overall ability to predict infidelity with a moderate to large effect size. Relationship variables together explained the largest amount of variance in the predictions. Relationship variables, however, are more likely to vary over time compared to certain individual characteristics (such as socio-demographic variables or attachment style). The prediction accuracy may have increased if the infidelity and relationship quality had been measured closer in time. Therefore, future research is needed to examine recent infidelity to more fully understand how relationship characteristics relate to infidelity. Additionally, because each variable contributed little to the overall prediction accuracy, using machine learning models with a large number of variables instead of focusing on single variables for predicting infidelity may be more fruitful in being able to predict who has or will engage in infidelity. This does not help target-specific factors but may be used to identify individuals or relationships who may be at a higher risk.
Strengths and Limitations
The present study adds to our understanding of the most important predictors for infidelity across two samples. We used a powerful interpretable machine learning technique that allowed us to produce reliable estimates of the effect sizes of each variable both for the mean effect as well as the spread of the individual effects (Lundberg et al., 2017, 2019). Using this method, we were also able to compare a large number of predictors simultaneously and estimate any non-linear associations and complex interactions. We also examined both in-person and online infidelity.
However, the study also had several limitations that should be considered. First, we used a single-item measure of in-person and online infidelity. We were thus unable to account for specific infidelity behaviors and did not examine emotional infidelity. Future research is needed to examine a wider range of infidelity behaviors to better understand whether the same predictors generalize across multiple forms of infidelity or whether these are predicted by different variables. The results from the present study suggest that these may be somewhat different given that the most important predictors of in-person and online infidelity also varied. Second, while we examined infidelity across two large samples with one sample including data from both members of the couple, the studies were all cross-sectional and it is not clear how recently the infidelity occurred. Therefore, some of the factors may have changed from when the infidelity occurred to when the participants completed the survey. This is a difficulty across most other studies on infidelity, but future research should examine infidelity over time or to conduct surveys on individuals who have just engaged in infidelity. Third, over 30% of the participants in Study 1 reported past infidelity. However, the number of participants who had engaged in infidelity in the dyadic sample was much lower. This made it more difficult for the algorithm to accurately predict infidelity which is reflected in lower precision and recall for the infidelity class compared to no infidelity. We used balanced random forests to mitigate this issue, but we still had less data available of people with past infidelity.
Furthermore, each variable contributed very little to the overall classification accuracy. Therefore, interpretation of the results may be less accurate than when individual variables have a clearer signal. Additionally, while random forests are a powerful tool that will take advantage of any correlations and interactions in the data, no matter how non-linear, it cannot be used to estimate causality. However, in the absence of a means to reliably estimate causality when examining factors relating to infidelity (after all we cannot create experiments in which we make people engage in infidelity), we believe that using a predictive model is perhaps the best option. Finally, we chose to use a random forest algorithm because of a moderate sample size. Random forests have been shown to perform well with their default settings without the need for hyperparameter tuning (Probst et al., 2019). Tuning hyperparameters requires a separate training set which would make the sample size in the test data smaller. However, there may be other algorithms that would perform better or similarly with hyperparameter tuning. Therefore, future research in larger samples could use different algorithms to compare the performance of different algorithms.