What we know with what level of certainty

In evidence based practice the strength of research evidence and knowledge is important. In ranking evidence, the top level is occupied by evidence which answers the specific knowledge question, and which provides the most certainty to guide practice. Reviewing the relevance and strength of evidence is therefore a critical step in developing practice standards and guidelines, and in decision-making. Assessment of bias which influences the level of certainty or confidence in reported outcomes is currently a key criteria in study ranking.

While this might favour quantitative evidence, the National Health and Medical Research Council (NHMRC) notes that ‘strong observational studies can at times provide more reliable evidence than flawed randomised trials’. [1] In fact, use of evidence from qualitative studies is increasingly supported as part of guideline development. [2]

Bias

Bias refers to factors that can systematically affect the observations and conclusions of the study and cause them to be different from the truth. [1]

Risk of bias

Risk of bias is the likelihood that features of the study design or conduct of the study will give misleading results. [1]

Bias influences both ‘internal validity’ and ‘external validity’ of a research study: [3]

    • If study precision and accuracy are not distorted by bias then it has internal validity.
    • If the study results can be generalised to populations other than the study population or clinical context it has external validity.

There are many potential sources of bias arising from characteristics of the study design, realities of funding, urgency of knowledge need through to reporting and publication bias. The systems used to rank studies based on methodology have been widely debated. [4] Recent review has identified 45 different evidence hierarchies. This variation has been influenced by professional jurisdiction, practical concerns (including feasibility and cost of Randomised Controlled Trials, RCTs), methodological quality, and the fact that not all important questions can be answered with RCTs. [4] In practical terms it has meant that the ranking of RCTs, qualitative studies, and expert opinion varies between hierarchies.

Before accepting evidence based standards you need to know the basis of any hierarchy used to develop them. Here we look at some of the approaches taken and how they can help you to select the best available evidence.

Finding the best level of evidence in health research

Video from University of Sydney

Level of evidence pyramid

The evidence pyramid has evolved over many years but was largely developed for studies addressing treatment or therapeutic effects. [5] Different versions of the evidence pyramid are available, but in most cases meta-analysis and systematic reviews of randomised controlled trials occupy the highest positions. This is because of the measures taken in these studies to minimise bias or confounding of outcomes, and to assess generalisability to other populations. [4] Different research study designs have varying capacity to minimise bias and it is on this basis that they are ranked.

Primary interventional and observational studies occupy the intermediate level and expert opinion the lowest. For interventional research questions RCTs are the gold standard primary study. This is because the RCT design takes steps to minimise differences (potential source of bias) between the groups being compared. This is achieved by defining selection criteria for research participants so that they are very similar for the characteristics likely to influence outcomes. To further minimise bias, participants are then randomly allocated to study control or intervention groups, this avoids researcher bias when deciding which group to assign the person to. The premise is that if you cannot control for factors that could influence an outcome, then you cannot say with any certainty that what you do is the cause of what is observed. However, a criticism of RCTs is that the participants are highly selected and may not reflect the population of interest. This affects external validity. [6]

 

Evidence level matrix or table

Some questions cannot be answered with an RCT or to do so would be unethical or impractical. This influences the study designs employed to generate evidence. Less than five per cent of palliative care research in Australia is based on an RCT design. [7]

Evidence matrices separate ranking of evidence relating to questions of diagnostic tests, prognostic markers, treatment, and prevention in a way that is useful for clinicians. They also acknowledge that RCTs are not always the most appropriate design, and they are less complex than grading systems.

The Oxford Centre for Evidence Based Medicine (OECBM) table of evidence levels is one of the best known examples. As with the Evidence pyramid, systematic reviews represent the highest level of evidence. However, depending on the question type the study designs included may not be RCTs.

OCEBM order of searching for evidence depending on question type

The most appropriate evidence source for each study type is listed beginning with the best. Work your way down the list until suitable evidence is found. [5]

  • Screening questions

  • Diagnosis questions

  • Prognosis questions

  • Questions about treatment benefits

  • Questions about treatment harms

** A well conducted systematic review is generally better than an individual study

 

Evidence grade

Guideline developers grade evidence based on more than the study design or methodology. The certainty (strength and risk of bias), quality, generalisability, and applicability of research evidence are all important. Some of these considerations are reflected in evidence pyramids and matrices or tables, but not all. For example:

  • Many approaches to grading of evidence will increase the rank of evidence based on study quality, for example in GRADE an observational study with the power to demonstrate a dose-response effect will move up in ranking [8]
  • RCTs are not always an appropriate design. If you want to know the opinions or personal experiences of people, then a qualitative study design would be more appropriate. The value of this evidence is more likely to be reflected in grading of evidence than in ranking based on methodology.

There are a number of checklists available to assist with grading of evidence. The Australian National Health and Medical Research Centre (NHMRC) recommends the Grading of Recommendations, Assessment, Development and Evaluations (GRADE) approach. GRADE has gained wide international acceptance. It provides a framework to standardise how clinical practice recommendations are made based on the certainty of the evidence. GRADE defines four levels of evidence based on certainty. [8]

Grade Evidence level Definition
High We are very confident that the true effect lies close to that of the estimate of the effect.
Moderate We are moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different.
Low Our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect.
Very Low We have very little confidence in the effect estimate: the true effect is likely to be substantially different from the estimate of effect.

Last updated 01 June 2026