

As a minimum, evidence should be reported in a manner that allows external verification of its veracity.

This allows the reader to judge how appropriate the research conclusions are. One of the cornerstones of science that aims to support this external verification is the peer review process. This review of research work by experts is designed to filter out poor-quality and unreliable research findings. Peer review has its limitations (Jefferson et al. 2002 Ware 2008) and may not have been successful in many scientific fields in ensuring the quality of published research, because a large number of published research findings may be false (Ioannidis 2005). Publication bias means that the vast majority of published findings are positive and support the research hypothesis and do not provide a representative sample of all scientific studies carried out (Sterling et al. This is a problem that is particularly prevalent for human factors research in lighting, because psychological and behavioral science has the highest proportion of studies reporting positive results compared with other scientific disciplines (Fanelli 2010). Publication bias may help explain the current reproducibility crisis affecting many sciences but particularly psychological and behavioral science. The Open Collaboration Project (Open Science Collaboration 2015) recently attempted replications of 100 studies published in three major psychology journals in 2008. Ninety-seven percent of the original studies reported significant findings, compared with only 36% of the replication studies. Mean effect sizes in the replications were also half the magnitude of those found in the original studies.Īt the heart of publication bias and the reproducibility crisis is the occurrence of type I errors (false-positive findings) and type II errors (false-negative findings). We use statistical methods in science in an attempt to avoid making claims that in reality may be a type I or type II error. Null hypothesis statistical testing (Hubbard and Ryan 2000) produces a P-value that represents the probability of obtaining the result (or something more extreme) assuming that there was no real effect or difference between the groups or measures being tested (the “null” hypothesis). The P-value does not explicitly refer to the probability of the null hypothesis being true, but it does provide a “measure of the strength of evidence against H 0 (the null hypothesis)” (Dorey 2010, p. Abelson ( 1995) referred to “discrediting the null hypothesis” (p. 10) based on the P-value from a statistical test. A smaller P-value provides stronger evidence against the null hypothesis. By convention, in the field of lighting research and most other scientific disciplines, we use a threshold of P < 0.05 to indicate a significant or “real” effect, based on proposals by Fisher ( 1925).


However, Fisher himself recognizedthat this threshold was arbitrary and debate is ongoing about its use.
