Friday, February 12, 2021

Laboratory earthquake forecasting: A machine learning competition

Laboratory earthquake forecasting: A machine learning competition. Paul A. Johnson et al. Proceedings of the National Academy of Sciences, February 2, 2021 118 (5) e2011362118;

Abstract: Earthquake prediction, the long-sought holy grail of earthquake science, continues to confound Earth scientists. Could we make advances by crowdsourcing, drawing from the vast knowledge and creativity of the machine learning (ML) community? We used Google’s ML competition platform, Kaggle, to engage the worldwide ML community with a competition to develop and improve data analysis approaches on a forecasting problem that uses laboratory earthquake data. The competitors were tasked with predicting the time remaining before the next earthquake of successive laboratory quake events, based on only a small portion of the laboratory seismic data. The more than 4,500 participating teams created and shared more than 400 computer programs in openly accessible notebooks. Complementing the now well-known features of seismic data that map to fault criticality in the laboratory, the winning teams employed unexpected strategies based on rescaling failure times as a fraction of the seismic cycle and comparing input distribution of training and testing data. In addition to yielding scientific insights into fault processes in the laboratory and their relation with the evolution of the statistical properties of the associated seismic data, the competition serves as a pedagogical tool for teaching ML in geophysics. The approach may provide a model for other competitions in geosciences or other domains of study to help engage the ML community on problems of significance.

Keywords: machine learning competitionlaboratory earthquakesearthquake predictionphysics of faulting

What Did We Learn from the Kaggle Competition?

Previous work on seismic data from Earth (3) suggests that the underlying physics may scale from a laboratory fault to large fault systems in Earth. If this is indeed the case, improvements in our ability to predict earthquakes in the laboratory could lead to significant progress in time-dependent earthquake hazard characterization. The ultimate goal of the earthquake prediction challenge was to identify promising ML approaches for seismic data analysis that may enable improved estimates of fault failure in the Earth. In the following, we will discuss shortcomings of the competition but also key innovations that improved laboratory quake predictions and may be transposed to Earth studies.

The approaches employed by the winning teams included several innovations considerably different from our initial work on laboratory quake prediction (1). Team Zoo added synthetic noise to the input seismic data before feature computing and model training, thus making their models more robust to noise and more likely to generalize.

Team Zoo, JunKoda, and GloryorDeath only considered features that exhibited similar distributions between the training and testing data, thereby ensuring that nonstationary features could not be used in the learning phase and again, improving model generalization. We note that employing the distribution of the testing set input is a form of data snooping that effectively made the test set actually a validation set. However, the idea of employing only features with distributions that do not evolve over time is insightful and could be used for scientific purposes by comparing feature distribution between portions of training data, for example.

Perhaps most interestingly from a physical standpoint, the fifth team, Team Reza, changed the target to be predicted and endeavored to predict the seismic cycle fraction remaining instead of time remaining before failure. Because they did not employ the approach of comparing input distribution between training and testing sets as done by the first, second, and fourth teams, the performance impact from the prediction of normalized time to failure (seismic cycle fraction) was significant.

As in any level of statistics, more data are in general better and can improve model performance. Thus, had the competitors been given more training data, in principle scores may have improved. At the same time, there is an element of nonstationarity in the experiment because the fault gouge layer thins as the experiment progresses, and therefore, even an extremely large dataset would not lead to a perfect prediction. In addition, Kaggle keeps the public/private test set split in such a way as to not reward overfitting. No matter how large the dataset is, if a model iterates enough times on that dataset, it will not translate well into “the real world,” so the competition structure was designed to prevent that opportunity.

It is worth noting that the ML metric should be carefully considered. In Earth, it will be important to accurately predict the next quake as it approaches, but MAE treats each time step equally with respect to the absolute error making this challenging.

Individuals participate on the Kaggle platform for many reasons; the most common are the ability to participate in interesting and challenging projects in many different domains, the ability to learn and practice ML and data science skills, the ability to interact with others who are seeking the same, and of course, cash prizes. The astounding intellectual diversity the Kaggle platform attracted for this competition, with team representations from cartoon publishers, insurance agents, and hotel managers, is especially notable. In fact, none of the competition winners came from geophysics. Teams exhibit collective interaction, evidenced by the step changes in the MAE through time (Fig. 6), likely precipitated by communication through the discussion board and shared code.

The competition contributed to an accelerating increase in ML applications in the geosciences, has become an introductory problem for the geoscience community to learn different ML approaches, and is used for ML classes in geoscience departments. Students and researchers have used the top five approaches to compare the nuances of competing ML methods, as well as to try to adapt and improve the approaches for other applications.

Cats show no avoidance of people who behave negatively to their owner, unlike dogs

Chijiiwa, H., Takagi, S., Arahori, M., Anderson, J. R., Fujita, K., & Kuroshima, H. (2021). Cats (Felis catus) show no avoidance of people who behave negatively to their owner. Animal Behavior and Cognition, 8(1), 23-35.

Rolf Degen's take: Unlike dogs, cats show no avoidance of people who behave negatively toward their owner

Abstract: Humans evaluate others based on interactions between third parties, even when those interactions are of no direct relevance to the observer. Such social evaluation is not limited to humans. We previously showed that dogs avoided a person who behaved negatively to their owner (Chijiiwa et al., 2015). Here, we explored whether domestic cats, another common companion animal, similarly evaluate humans based on third-party interactions. We used the same procedure that we used with dogs: cats watched as their owner first tried unsuccessfully to open a transparent container to take out an object, and then requested help from a person sitting nearby. In the Helper condition, this second person (helper) helped the owner to open the container, whereas in the Non-Helper condition the actor refused to help, turning away instead. A third, passive (neutral) person sat on the other side of the owner in both conditions. After the interaction, the actor and the neutral person each offered a piece of food to the cat, and we recorded which person the cat took food from. Cats completed four trials and showed neither a preference for the helper nor avoidance of the non-helper. We consider that cats might not possess the same social evaluation abilities as dogs, at least in this situation, because unlike the latter, they have not been selected to cooperate with humans. However, further work on cats’ social evaluation capacities needs to consider ecological validity, notably with regard to the species’ sociality.

Keywords: Cats, Social evaluation, Third-party interaction, Social cognition, Cat-human relationship, Domesticated animals

The stability of psychological adjustment among donor-conceived offspring in the U.S. National Longitudinal Lesbian Family Study from childhood to adulthood: Good adjustment in the long term

The stability of psychological adjustment among donor-conceived offspring in the U.S. National Longitudinal Lesbian Family Study from childhood to adulthood: differences by donor type. Nicola Carone et al. Fertility and Sterility, February 2 2021.

Rolf Degen's take: Having been conceived by an anonymous sperm donor did not interfere with identity development in children from lesbian parents


Objective: To study differences by sperm donor type in the psychological adjustment of the U.S. National Longitudinal Lesbian Family Study (NLLFS) offspring across three time periods from childhood to adulthood.

Design: U.S.-based prospective cohort study.

Setting: Paper-and-pencil questionnaires and protected online surveys.

Patients: A cohort of 74 offspring conceived by lesbian parents using an anonymous (n = 26), a known (n = 26), or an open-identity (n = 22) sperm donor. Data were reported when offspring were ages 10 (wave 4), 17 (wave 5), and 25 (wave 6).

Main Outcome Measure: Achenbach Child Behavior Checklist administered to lesbian parents when offspring were ages 10 and 17 and the Achenbach Adult Self-Report administered to offspring at age 25.

Results: In both relative and absolute stability, no differences were found in internalizing, externalizing, and total problem behaviors by donor type over 15 years. However, both externalizing and total problem behaviors significantly declined from age 10 to 17 and then increased from age 17 to 25. Irrespective of donor type, among the 74 offspring, the large majority scored continuously within the normal range on internalizing (n = 62, 83.8%), externalizing (n = 62, 83.8%), and total problem behaviors (n = 60, 81.1%).

Conclusions: The results reassure prospective lesbian parents and provide policy makers and reproductive medicine practitioners with empirical evidence that psychological adjustment in offspring raised by lesbian parents is unrelated to donor type in the long term.

Keywords: Sperm donationanonymityopen-identitypsychological adjustmentlesbian parents

From 2019... Simple hair‐like feathers served as insulating pelage, but the first feathers with complex branching structures and a plainer form evolved for the purpose of sexual display

From 2019... Feather evolution exemplifies sexually selected bridges across the adaptive landscape. W. Scott Persons  Philip J. Currie. Evolution, July 19 2019.

h/t David Schmitt  @PsychoSchmitt

Abstract: Over the last two decades, paleontologists have pieced together the early evolutionary history of feathers. Simple hair‐like feathers served as insulating pelage, but the first feathers with complex branching structures and a plainer form evolved for the purpose of sexual display. The evolution of these complex display feathers was essential to the later evolution of flight. Feathers illustrate how sexual selection can generate complex novel phenotypes, which are then available for natural selection to modify and direct toward novel functions. In the longstanding metaphor of the adaptive landscape, sexual selection is a means by which lineages resting on one adaptive peak may gradually bridge a gap to another peak, without the landscape itself being first altered by environmental changes.

Individuals with depression express more distorted thinking on social media

Individuals with depression express more distorted thinking on social media. Krishna C. Bathina, Marijn ten Thij, Lorenzo Lorenzo-Luaces, Lauren A. Rutter & Johan Bollen. Nature Human Behaviour, February 11 2021.

Abstract: Depression is a leading cause of disability worldwide, but is often underdiagnosed and undertreated. Cognitive behavioural therapy holds that individuals with depression exhibit distorted modes of thinking, that is, cognitive distortions, that can negatively affect their emotions and motivation. Here, we show that the language of individuals with a self-reported diagnosis of depression on social media is characterized by higher levels of distorted thinking compared with a random sample. This effect is specific to the distorted nature of the expression and cannot be explained by the presence of specific topics, sentiment or first-person pronouns. This study identifies online language patterns that are indicative of depression-related distorted thinking. We caution that any future applications of this research should carefully consider ethical and data privacy issues.


In a sample of online individuals, we used a theory-driven approach to measure the prevalence of linguistic markers that may indicate cognitive vulnerability to depression, according to CBT theory. We defined a set of CDS that we grouped along 12 widely accepted types of distorted thinking and compared their prevalence between two cohorts of Twitter users—the first included individuals who reported that they received a clinical diagnosis of depression and the second was a similar random sample.

As hypothesized, the individuals in the D cohort use significantly more CDS in their online language compared with individuals in the R cohort, particularly schemata associated with ‘personalizing’ and ‘emotional reasoning’. We observed significantly increased levels of CDS across nearly all cognitive distortion types, sometimes more than twice as much, but did not find a statistically significant increase in prevalence among the D cohort for two specific types, namely ‘fortune-telling’ and ‘catastrophizing’. This may be due to the difficulty of capturing these specific cognitive distortions in the form of a set of 1–5-grams—their expression in language can involve an interactive process of conversation and interpretation. Notably, our findings are not explained by the use of FPPs or more negatively loaded language. These results shed a light on the degree to which depression-related language of cognitive distortions are manifested in the colloquial language of social media platforms. This is of social relevance given that these platforms are specifically designed to propagate information through the social ties that connect individuals on a global scale.

An advantage of studying theory-driven differences between the language of individuals with and without depression, in contrast to a purely data-driven or machine learning approach, is that we can explicitly use the principles underpinning CBT to understand the cognitive and lexical components that may shape depression. Cognitive behavioural therapists have developed a set of strategies to challenge the distorted thinking patterns that are characteristic of depression. Preliminary findings suggest that specific language can be related to specific therapeutic practices and seems to be related to outcomes48. However, these practices have been largely shaped by a clinical understanding and not necessarily informed by objective measures of how patterns of language reflect cognitive distortions, which could be harnessed to facilitate the path of recovery.

Our results suggest a path for mitigation and intervention, including applications that engage individuals with mood disorders, such as major depressive disorder, through social media platforms and that challenge particular expressions and types of depression-related language. Future characterization of the relationship between depression-related language and mood may help in the development of automated interventions (such as ‘chatbots’) or suggest promising targets for psychotherapy. Another approach that has shown promise in leveraging social media for the treatment of mental health problems involves crowdsourcing the responses to cognitively distorted content49. These types of applications have the potential to be more-scalable mental health interventions compared with existing approaches such as face-to-face psychotherapy50. The extent to which user CDS prevalence can be used as a passive index of vulnerability to depression that may be expected to change with treatment could also be explored. Insofar as online language can be considered to be an index of cognitive vulnerability to depression, a better understanding of online language may help to tailor treatments, especially internet-based treatments, to the more-specific needs of individuals. For example, interventions that target depression-related thinking and language may be well-suited for individuals with depression who express relatively higher levels of these distortions, whereas interventions that target other mechanisms (such as physical activity, circadian rhythm) may be better suited for individuals who do not show relatively higher levels of CDS. More research towards understanding differences in language patterns in depression and related disorders, such as anxiety disorders, is recommended. However, when implementing these types of approaches, ethical considerations and privacy issues have to be adequately addressed38,39.

Several limitations of our theory-driven approach should be considered. First, we relied on individuals reporting their personal clinical depression diagnoses on social media. Although we verified that the statement indeed pertains to a clinical diagnosis, we do not have verification of the diagnosis itself nor of its accuracy. This may introduce individuals into the D cohort who might not have been diagnosed with depression or accurately diagnosed. Vice versa, we have no verification that individuals in our random sample do not suffer from depression. However, the potential inaccuracy of this inclusion criterion will probably reduce the difference in depression rates between the two cohorts and, therefore, reduce the observed effect sizes (PR values between cohorts) due to the larger heterogeneity of our sample. As a consequence, our results are probably not an artefact of the accuracy of our inclusion criterion. Second, our approach is limited to discovering only individuals who are willing to disclose their diagnosis on social media. As this might skew our D cohort to a subgroup of individuals suffering from depression, we recommend caution when generalizing our findings to the level of all individuals who have depression. Third, our lexicon of CDS was composed and approved by a panel of ten experts who may have been only partially successful in capturing all of the n-grams used to express distorted ways of thinking. On a related note, the use of CDS n-grams implies that we measure distorted thinking by proxy, namely through language, and our observations may be therefore be affected by linguistic and cultural factors. Common idiosyncratic or idiomatic expressions may syntactically represent a distorted form of thinking, but no longer do so in practice. For example, an expression such as ‘literally the worst’ may be commonly used to express dismay, without necessarily involving the speaker experiencing a distorted mode of thinking. Thus, the presence of a CDS does not point to a cognitive distortion per se. Fourth, both cohorts were sampled from Twitter, one of the leading social media platforms, the use of which may be associated with higher levels of psychopathology and reduced well-being51,52,53. We may therefore be observing increased or biased rates of distorted thinking in both cohorts as a result of platform effects. However, we report relative prevalence numbers with respect to a carefully construed random sample also taken from Twitter, which probably compensates for this effect and the effect that individuals with depression might be more active than their random counterparts. Furthermore, recent analysis indicates that representative samples with respect to psychological phenomena can be obtained from social media content54. This is an important discussion in computational social science that will continue to be investigated. Data-driven approaches that analyse natural language in real-time will continue to complement theory-driven work such as ours.

As we analysed individuals on the basis of inferred health-related information, we want to stress some additional considerations regarding ethical research practices and data privacy30,38,39. We limited our investigation strictly to comparing, in the aggregate, the publicly shared language of two deidentified cohorts of individuals (individuals who report that they have been diagnosed with depression and a random sample). We carefully deidentified all obtained data to protect user privacy and performed our analysis under the constraints of two IRB protocols (IU IRB Protocols 2010371843 and 1707249405). Whereas the outcomes of our analysis could contribute to a better understanding of depression as a mental health disorder, they could also inform approaches that detect traces of mental health issues in the online language of individuals, and as such contribute to future detection, diagnostics and intervention efforts. This may raise important ethical and user privacy concerns as well as risk of harm, including but not limited to the right to privacy, data ownership and transparency. For example, even though social media data are technically public, individuals do not necessarily realize nor consent to particular retrospective analysis when they share information on their public accounts55 nor can they consent to how these data may be leveraged in future approaches that may involve individualized interactions and inventions. Considering existing evidence that individuals are more willing to share biomedical data than social media data56, in future research, we hope to reach a larger sample of individuals who understand public data availability and increase transparency through a carefully managed consent process. We acknowledge that these considerations are part of an active and ongoing discussion in our community that we encourage and that we hope our research may contribute to.

We emphasize that not all use of CDS n-grams reflects depressive thinking, as these phrases are part of normal English usage, and it would therefore be wrong to try to diagnose depression merely on the basis of use of one or more such phrases. Such an approach would, as well as being inaccurate, potentially lead to harm in terms of stigmatizing individuals.