- Research is difficult
- Inadequate education
- Organisational effects
- Financial interest
- Political conflicts

Individual shortcomings related to research difficulties and inadequate education clearly play important roles in producing false research findings, but these mistakes are often easy to describe and explain in the review process, not least regarding statistical mistakes and misconceptions. The side effects of how research activities are organised and performed is a more difficult issue to address in the review process, but it has been discussed for several years. For example, it is not hard to understand that a “publish or perish” culture has consequences for the authors’ propensity to underestimate the uncertainty of their findings and to exaggerate the importance. The publication of sensational findings can also be a way for the editorial office to improve a scientific journal’s impact factor (and profit) and for the authors’ employers to increase the perceived status of their organisation. The peer-review system, based on a strange mixture of collaboration and competition, is an important counterweight.

The consequence of financial conflicts of interests has also been discussed during many years, and several measures to reduce the problems have already been developed. For example, the publication requirement of having registered a trial in a public trial register prior to the first patient’s randomisation counteracts the selective reporting of trials with outcomes that are financially beneficial for the sponsor. However, other trends are more worrying. While the ambition of scientific researchers traditionally has been to protect scientific integrity and to be politically neutral, the boundaries between political activism and scientific research is becoming increasingly difficult to identify. It is not always easy to tell whether an author is searching for the truth or if the author believes that he or she already knows the truth and just wants to find politically useful arguments. The problem is not new; medical science is undergoing development from authority-based research to evidence-based. What is new, however, is the frequent use of advanced statistical methods in authority-based research masquerading as evidence-based.

]]>All parts of a research project, from the development of a study hypothesis and a study design to collecting and analysing data statistically and interpreting the results and reporting the findings, impose practical and theoretical limitations on the outcome of the research. It is the statistical reviewer’s responsibility to verify that these limitations are clearly presented to the reader. In this role, the statistician evaluates evidence, not the fulfilment of underlying assumptions or clinical significance. However, it is part of the job to make sure that the authors distinguish between assumptions, opinions, and evidence. The main principle is that the author is responsible for providing the reader with the information that is necessary to interpret the validity, the uncertainty and the clinical significance of the findings.

**Literature**

Shor S. The responsibilities of a statistical reviewer. Chest 1972;61:486-487.

]]>If the uncertainty of the test result is unknown, the result is unreliable. It is therefore important to know the statistical precision of the analysis. A statistical test based on a sample size of 3 is not likely to provide sufficient statistical power. Figure 1. shows the relationship between statistical power and effect size (i.e. the mean difference relative to the standard deviation) when using Student’s t-test and dependent on the distribution of analysed variables.

**Figure 1.** The relation between statistical power and effect size with n=3.

One problem is that the statistical power depends on the distribution of the variable, and the statistical power to detect a non-Normal distribution (with the Shapiro-Wilk test) is minute with n=3. As a consequence, a huge effect size is necessary for testing with sufficient statistical power, and this means that the sensitivity to detect even a moderate difference in mean values is low, probably much lower than many lab investigators realise.

In addition, inadequate solutions to the multiplicity issue problem discussed here, leads to equally low specificity in the tests of mean values. The research results produced by such a methodology should not be taken too seriously.

]]>Multiplicity issues exist when multiple null hypotheses are tested. Testing more than one null hypothesis increases the risk of getting at least one false-positive test above the nominal significance level, see Figure 1. The phenomenon is important to take into account in confirmatory studies, and one way to do this is to use a Bonferroni correction, i.e. by lowering the significance level by a factor of 1/m, where m is the number of tested null hypotheses.

However, to avoid subjectivity, the adjustment should be pre-specified, and as it has negative effects on the statistical power of the comparisons, it should also be accounted for in the sample size calculation, and this increases patient numbers and costs.

Multiplicity problems can often be avoided in the study design by a careful definition of endpoints or solved by using closed test procedures or more effective adjustment methods such as Holm’s or Hochberg’s methods. In addition, while multiplicity issues is a problem in confirmatory studies, it is not relevant in non-confirmatory studies such as exploratory or hypothesis-generating investigations.

Furthermore, the statistical analysis of observational studies needs to include validity considerations as selection and confounding bias cannot be prevented in the study design, which implies that detailed pre-specification is not practically possible. Moreover, the strategy, common in laboratory studies, of Bonferroni correcting for the number of exposure groups but ignoring that multiple endpoints are tested, does not provide a reasonable solution to the multiplicity problem.

]]>RR = OR/(1-R+OR*R)

where R = baseline risk, RR = relative risk, and OR = odds ratio. The clinical significance of a treatment effect cannot always be evaluated if the studied effect is presented as an odds ratio. The problem can be avoided by using a statistical method that provides direct estimates of the relative risk.

]]>