Monday, September 2, 2019

Way to detect false positives in experimentation: Advanced Meta-Experimental Protocol

False-Positive Effect in the Radin Double-Slit Experiment on Observer Consciousness as Determined With the Advanced Meta-Experimental Protocol. Jan Walleczek and Nikolaus von Stillfried. Front. Psychol., August 22 2019. https://doi.org/10.3389/fpsyg.2019.01891

Abstract: Prior work by Radin et al. (2012, 2016) reported the astonishing claim that an anomalous effect on double-slit (DS) light-interference intensity had been measured as a function of quantum-based observer consciousness. Given the radical implications, could there exist an alternative explanation, other than an anomalous consciousness effect, such as artifacts including systematic methodological error (SME)? To address this question, a conceptual replication study involving 10,000 test trials was commissioned to be performed blindly by the same investigator who had reported the original results. The commissioned study performed confirmatory and strictly predictive tests with the advanced meta-experimental protocol (AMP), including with systematic negative controls and the concept of the sham-experiment, i.e., counterfactual meta-experimentation. Whereas the replication study was unable to confirm the original results, the AMP was able to identify an unacceptably low true-negative detection rate with the sham-experiment in the absence of test subjects. The false-positive detection rate reached 50%, whereby the false-positive effect, which would be indistinguishable from the predicted true-positive effect, was significant at p = 0.021 (σ = −2.02; N = 1,250 test trials). The false-positive effect size was about 0.01%, which is within an-order-of-magnitude of the claimed consciousness effect (0.001%; Radin et al., 2016). The false-positive effect, which indicates the presence of significant SME in the Radin DS-experiment, suggests that skepticism should replace optimism concerning the radical claim that an anomalous quantum consciousness effect has been observed in a controlled laboratory setting.

Introduction

Breakthroughs in science often depend on breakthroughs in scientific methodology. A scientific breakthrough might depend, for example, on a superior skill to detect the effect of an external test stimulus upon a laboratory system. The development of a measurement technique capable of detecting potentially ultra-weak effects – defined here as effects in the range of 0.1–0.001% and below – often represents a daunting technological challenge. In particular, in the exploration of unconventional scientific possibilities, such as in the search for anomalous mind-matter interactions related to unproven phenomena such as “micro-psychokinesis” (e.g., Maier et al., 2018), there could be a risk of compromising the reliability of a standard test method if one seeks to push the detection limits of the method past the limits as adopted in standard applications. Therefore, when choosing to do so, careful testing and verification of (1) the stability of the method as well as of (2) the specificity of the employed detection technology for the tested intervention should routinely accompany the pursuit of an ultra-weak-effects research program.

In recent years, the widely discussed Radin double-slit (DS) experiment has claimed scientific evidence for anomalous mind-matter interactions under controlled laboratory conditions (e.g., Radin et al., 2012). Specifically, the claim was reported that test subjects may interact “psycho-physically” with laser-light waves interfering in a DS-apparatus (for details, see Section “Insertion of the AMP Into the Radin DS-Experiment”). Briefly, in the Radin DS-experiment, test subjects follow precisely timed, computer-assisted instructions which serve “to direct their attention toward the double-slit apparatus or to withdraw their attention and relax” (Radin et al., 2012). This experiment suggests a remarkable technological skill which enables – apparently – the detection of miniscule, observer-dependent reductions in light-interference intensity. The effect size in percent due to attentional observer consciousness affecting light intensity – as detected with a photo-imaging device – was reported to be about 0.001% (Radin et al., 2016).

Despite the extremely small effect size, the researchers have reported that the original effect (Radin et al., 2012) appears to be reproducible even across different studies – at least as part of conceptual replication attempts (Radin et al., 2013, 2015, 2016). Nevertheless, given (1) the radical implications of the claim that an anomalous consciousness effect has been detected in a controlled laboratory setting, and (2) the fact that the anomalous effect is ultra-weak, at least by the above definition (≈0.1–0.001%), it seems reasonable to explore the following question: Could there exist an alternative explanation, other than observer consciousness, for the reported effect, such as a statistical artifact or systematic measurement bias? In other words, is there any chance that the astonishing claim based on the Radin DS-experiment has come about as a result of type-1 error, i.e., due to the misidentification of a false-positive for a true-positive effect?

A cautionary tale regarding ultra-weak-effects detection is the so-called “faster-than-light neutrino anomaly” (The OPERA collaboration et al., 2011). The neutrino anomaly was found to be reproducible over several years, but it was shown eventually to be caused by systematic measurement bias. The claimed effect size of the anomalous neutrino effect was on the order of 0.0001% (one part in 10,000) and the effect had achieved a high degree of statistical significance, i.e., of about six sigma. “Despite the large significance,” the researchers had warned in 2011, “of the measurement reported here and the stability of the analysis, the potentially great impact of the result motivates the continuation of our studies in order to investigate possible still unknown systematic effects that could explain the observed anomaly.” After careful, additional testing of the employed research design, a small hidden bias in the experimental set-up was finally identified, and the anomalous neutrino effect was revealed to be a false-positive effect. The identification of an alternative explanation, other than faster-than-light neutrinos, namely, a type-1 detection error, prompted the immediate retraction of the prior positive reports on the anomalous neutrino effect (The OPERA collaboration et al., 2013).

Radin and co-workers, by contrast, have presumed unlikely the possibility of a false-positive effect as an explanation of their results, and they have concluded that a genuine, i.e., true-positive, observer-consciousness effect was detected with high statistical significance (Radin et al., 2012, 2013, 2015, 2016). Naturally, if the psycho-physical influence of the intentional consciousness of a test subject on a quantum-physical process could be proven scientifically, no matter how weak this effect might be, then the implications for our view of reality, in general, and for our understanding of the foundations of quantum mechanics, in particular, would be revolutionary.

Quantum mechanics is well known to invite the possibility of many different foundational interpretations. A type of wave-function-collapse interpretation was offered as a possible explanation for the reported anomalous effect in the Radin DS-experiment (see Radin et al., 2012), whereby the particular interpretation assigns a special role to human consciousness, hence the term also of “quantum consciousness,” as part of the quantum-measurement process (e.g., von Neumann, 1932). More than 40 years ago, Hall et al. (1977) tested in the laboratory the proposal that “the reduction of the wave packet is a physical event which occurs only when there is an interaction between the physical measuring apparatus and the psyche of some observers”; however, these experiments found no evidence for any influence of the consciousness of a test subject on the targeted quantum-based process (Hall et al., 1977).

To this day, there exists no accepted scientific proof for the intentional, controlling activity of observer consciousness over quantum states or electromagnetic waves. Therefore, again, scientific claims to the contrary, as have been promoted by Radin and collaborators (Radin et al., 2012, 2013, 2015, 2016), should be viewed with reasonable caution. For example, in the case of the Radin DS-experiment, the claimed effect is derived indirectly by calculating the combined differences between experimental and control conditions from many 1,000 s of individual signal recordings as collected over weeks and months. In that case, the employed methodology could easily be prone to measurement bias, e.g., as a function of hidden sensitivities of the method to as-yet unknown factors or interactions, i.e., to ultra-weak influences other than those possibly manifested by observer consciousness. In particular, lacking experimental confirmation of the specificity of the detection method for the applied test intervention, i.e., for intentional observer consciousness, an investigator could easily reach false-positive conclusions.

Therefore, given the high stakes, it seems prudent to perform stringent tests for evaluating the stability over time as well as the degree of specificity of the measurement technology for detecting the intentional consciousness of a test subject in the Radin DS-experiment. For example, the specificity of the employed detection technology can be assessed quantitatively by determining the true-negative detection rate with the so-called sham-experiment (see Section “Sham-Experiment: Counterfactual Meta-Experimentation”). Naturally, if alternative explanations, i.e., systematic methodological error (SME) including statistical errors and experimental bias, could be eliminated (for details, see also Section “In Search of an Explanation for False-Positive Observer Effect Detection”), then the Radin DS-experiment might indeed represent a major advance toward scientific evidence for the psycho-physical influence of quantum-based observer consciousness upon a laboratory device.

For an explanation of what is meant by SME in the context of a concrete physical device, such as a DS-interference apparatus, the example of a biased or unbalanced roulette wheel is revealing. That is, the methodological challenges that are encountered in research involving ultra-weak-effects detection, including in the Radin DS-experiment, are similar to those faced by operators of roulette tables in a casino. The spinning wheel must be near perfectly balanced on the table in order to assure that mostly unbiased, i.e., near random, outcomes are obtained with each spin that is associated with placing a bet. That is, none of the eight octants of the wheel should indicate any higher probability than the others for being hit by the ball. However, there will invariably be a practical, operational limit in that regard for any concrete physical system such as the roulette wheel; as a result, there will always be a dominant octant, even if this can be revealed to the careful observer only after a large number of spins. In principle, a player could discover an imbalance in the system, e.g., an imbalance due to a one- to two-degree tilt of the wheel toward one side, and then could exploit the imbalance to place bets on the preferred octant of the wheel. As a consequence, the probability of winning will grow ever so slightly above chance, and winning would be guaranteed in the long term. In fact, cases are known when players have earned money by exploiting this loophole, i.e., the discovery of systematic and uncontrolled imbalances, and hence systematic bias, of casino roulette wheels (e.g., https://www.roulettephysics.com). In the context of scientific measurement design, this loophole will be referred to as the SME-loophole.

The present article describes the use of an advanced research protocol which is capable of controlling for possible detrimental effects of the SME-loophole in the Radin DS-experiment. The closing of this loophole is of particular concern in ultra-weak-effects studies for which there is no good intuition about either the size or the probability of a systematic imbalance or measurement bias as part of some experimental design. It is essential in such studies to verify empirically that the amount of SME is well below the level that might impede the reliable detection of the targeted effect. For quantifying the actual amount of SME, which might be intrinsic to the Radin DS-experiment, the advanced meta-experimental protocol (AMP; Walleczek, in preparation) was implemented in this conceptual replication attempt which was commissioned by one of the funders of the original Radin DS-experiment (Radin et al., 2012; see Section “Materials and Methods” for details).

For explanation, in the roulette-wheel paradigm, the SME could be quantified by recording hundreds, or more, of individual games on a given roulette wheel. Data could be collected until there is an amount sufficient to calculate a statistically significant difference between any one of the octants and the other seven octants. The more balanced and unbiased is the spinning wheel, the smaller will be the SME. The same is relevant for scientific measurement paradigms also: the more balanced and unbiased is a particular research design, the smaller will be the SME, as confirmed by a low false-positive detection rate; consequently, the higher will be the effective specificity of the employed detection method. Similar to the above strategy for detecting an imbalance in the roulette-wheel paradigm, the here employed AMP-based strategy can detect measurement imbalances or biases in the experimental system under investigation.

In summary, upon insertion of the AMP into the Radin DS-experiment, it was possible to determine the amount of SME – as revealed by the determination of the true-negative rate of detection – constraining the effective specificity of the employed measurement technology. The present analysis will conclude that the specificity of the method for detecting the potential effect of observer consciousness in the Radin DS-experiment is likely to be below that required for the reliable, i.e., artifact-free, detection of a putative effect on the order of 0.001% (Radin et al., 2016). It is questionable, therefore, at least until further stringent, pre-specified, AMP-based tests have been conducted, whether the previously claimed, anomalous effect could be a reliable indicator of a genuine, i.e., true-positive, observer-consciousness effect in the Radin DS-experiment. Next will be described the experimental methodology and the confirmatory AMP-based protocol which was implemented in this commissioned replication study of the Radin DS-experiment.

No comments:

Post a Comment