The Evidence Hierarchy

What we know with what level of certainty

In evidence based practice the strength of research evidence and knowledge is important. In ranking evidence, the top level is occupied by evidence which answers the specific knowledge question, and which provides the most certainty to guide practice. Reviewing the relevance and strength of evidence is therefore a critical step in developing practice standards and guidelines, and in decision-making. Assessment of bias which influences the level of certainty or confidence in reported outcomes is currently a key criteria in study ranking.

While this might favour quantitative evidence, the National Health and Medical Research Council (NHMRC) notes that ‘strong observational studies can at times provide more reliable evidence than flawed randomised trials’. [1] In fact, use of evidence from qualitative studies is increasingly supported as part of guideline development. [2]

Bias

Bias refers to factors that can systematically affect the observations and conclusions of the study and cause them to be different from the truth. [1]

Risk of bias

Risk of bias is the likelihood that features of the study design or conduct of the study will give misleading results. [1]

Bias influences both ‘internal validity’ and ‘external validity’ of a research study: [3]

- If study precision and accuracy are not distorted by bias then it has internal validity.
- If the study results can be generalised to populations other than the study population or clinical context it has external validity.

There are many potential sources of bias arising from characteristics of the study design, realities of funding, urgency of knowledge need through to reporting and publication bias. The systems used to rank studies based on methodology have been widely debated. [4] Recent review has identified 45 different evidence hierarchies. This variation has been influenced by professional jurisdiction, practical concerns (including feasibility and cost of Randomised Controlled Trials, RCTs), methodological quality, and the fact that not all important questions can be answered with RCTs. [4] In practical terms it has meant that the ranking of RCTs, qualitative studies, and expert opinion varies between hierarchies.

Before accepting evidence based standards you need to know the basis of any hierarchy used to develop them. Here we look at some of the approaches taken and how they can help you to select the best available evidence.

Finding the best level of evidence in health research

Video from University of Sydney

Level of evidence pyramid

The evidence pyramid has evolved over many years but was largely developed for studies addressing treatment or therapeutic effects. [5] Different versions of the evidence pyramid are available, but in most cases meta-analysis and systematic reviews of randomised controlled trials occupy the highest positions. This is because of the measures taken in these studies to minimise bias or confounding of outcomes, and to assess generalisability to other populations. [4] Different research study designs have varying capacity to minimise bias and it is on this basis that they are ranked.

Primary interventional and observational studies occupy the intermediate level and expert opinion the lowest. For interventional research questions RCTs are the gold standard primary study. This is because the RCT design takes steps to minimise differences (potential source of bias) between the groups being compared. This is achieved by defining selection criteria for research participants so that they are very similar for the characteristics likely to influence outcomes. To further minimise bias, participants are then randomly allocated to study control or intervention groups, this avoids researcher bias when deciding which group to assign the person to. The premise is that if you cannot control for factors that could influence an outcome, then you cannot say with any certainty that what you do is the cause of what is observed. However, a criticism of RCTs is that the participants are highly selected and may not reflect the population of interest. This affects external validity. [6]

Evidence level matrix or table

Some questions cannot be answered with an RCT or to do so would be unethical or impractical. This influences the study designs employed to generate evidence. Less than five per cent of palliative care research in Australia is based on an RCT design. [7]

Evidence matrices separate ranking of evidence relating to questions of diagnostic tests, prognostic markers, treatment, and prevention in a way that is useful for clinicians. They also acknowledge that RCTs are not always the most appropriate design, and they are less complex than grading systems.

The Oxford Centre for Evidence Based Medicine (OECBM) table of evidence levels is one of the best known examples. As with the Evidence pyramid, systematic reviews represent the highest level of evidence. However, depending on the question type the study designs included may not be RCTs.

OCEBM order of searching for evidence depending on question type

The most appropriate evidence source for each study type is listed beginning with the best. Work your way down the list until suitable evidence is found. [5]

Screening questions
1. Begin with Systematic reviews of RCTs
2. RCTs
3. Non-randomised controlled cohort/follow-up study**
4. Case-series, case-control, or historically controlled studies**
5. Mechanism-based reasoning
Diagnosis questions
1. Begin with Systematic review of cross sectional studies with consistently applied reference standard and blinding
2. Individual cross sectional studies with consistently applied reference standard and blinding
3. Non-consecutive studies, or studies without consistently applied reference standards**
4. Case-control studies, or “poor or non-independent reference standard**
5. Mechanism-based reasoning
Prognosis questions
1. Begin with Systematic review of inception cohort studies
2. Inception cohort studies
3. Cohort study or control arm of randomised trial**
4. Case-series or case control studies, or poor quality prognostic cohort study**
Questions about treatment benefits
1. Begin with Systematic review of randomised trials or n-of-1 trials
2. Randomised trial or observational study with dramatic effect
3. Non-randomised controlled cohort/follow-up study**
4. Case-series, case-control studies, or historically controlled studies**
5. Mechanism-based reasoning
Questions about treatment harms
1. Begin with Systematic review of randomized trials, systematic review of nested case-control studies, nof-1 trial with the patient you are raising the question about, or observational study with dramatic effect
2. Individual randomized trial or (exceptionally) observational study with dramatic effect
3. Non-randomised controlled cohort/follow-up study (post-marketing surveillance) provided there are sufficient numbers to rule out a common harm. (For long-term harms the duration of follow-up must be sufficient.)**
4. Case-series, case-control, or historically controlled studies**
5. Mechanism-based reasoning

** A well conducted systematic review is generally better than an individual study

Evidence grade

Guideline developers grade evidence based on more than the study design or methodology. The certainty (strength and risk of bias), quality, generalisability, and applicability of research evidence are all important. Some of these considerations are reflected in evidence pyramids and matrices or tables, but not all. For example:

Many approaches to grading of evidence will increase the rank of evidence based on study quality, for example in GRADE an observational study with the power to demonstrate a dose-response effect will move up in ranking [8]
RCTs are not always an appropriate design. If you want to know the opinions or personal experiences of people, then a qualitative study design would be more appropriate. The value of this evidence is more likely to be reflected in grading of evidence than in ranking based on methodology.

There are a number of checklists available to assist with grading of evidence. The Australian National Health and Medical Research Centre (NHMRC) recommends the Grading of Recommendations, Assessment, Development and Evaluations (GRADE) approach. GRADE has gained wide international acceptance. It provides a framework to standardise how clinical practice recommendations are made based on the certainty of the evidence. GRADE defines four levels of evidence based on certainty. [8]

To learn more visit the GRADE website

Visit GRADE

Grade Evidence level	Definition
High	We are very confident that the true effect lies close to that of the estimate of the effect.
Moderate	We are moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different.
Low	Our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect.
Very Low	We have very little confidence in the effect estimate: the true effect is likely to be substantially different from the estimate of effect.

References
1. National Health and Medical Research Council (NHMRC). Guidelines for guidelines: Assessing risk of bias [Internet]. Canberra: NHMRC; 2019. [updated 2019 Aug 29; cited 2022 May 10].
2. Ansari S, Rashidian A. Guidelines for guidelines: are they up to the task? A comparative assessment of clinical practice guideline development handbooks. PLoS One. 2012;7(11):e49864. doi:10.1371/journal.pone.0049864
3. Van den Block L, Vandevoorde J. Evidence-based practice in palliative care. In: MacLeod R, Van den Block L, editors. Textbook of Palliative Care. Cham, CH: Springer; 2019.
4. Vere J, Gibson B. Variation amongst hierarchies of evidence. J Eval Clin Pract. 2021 Jun;27(3):624-630. doi: 10.1111/jep.13404. Epub 2020 May 4.
5. Howick J, Chalmers I, Glasziou P, Greenhalgh T, Heneghan C, Liberati A, et al. Explanation of the 2011 Oxford Centre for Evidence-Based Medicine (OCEBM) levels of evidence (Background document). Oxford: Oxford Centre for Evidence-Based Medicine; 2011.
6. Mielke D, Rohde V. Randomized controlled trials-a critical re-appraisal. Neurosurg Rev. 2021 Aug;44(4):2085-2089. doi: 10.1007/s10143-020-01401-4. Epub 2020 Oct 6.
7. Khalil H, Downie A, Ristevski E. Mapping palliative and end of care research in Australia (2000-2018). Palliat Support Care. 2020 Dec;18(6):713-721. doi: 10.1017/S1478951519001111.
8. Schünemann H, Brozek J, Guyatt G, Oxman A, Editors. GRADE handbook for grading quality of evidence and strength of recommendations [Internet]. Hamilton, ONT: McMaster University; 2013. Section 5.1 Factors determining the quality of evidence. [updated 2013 Oct; cited 2022 May 9].

The Evidence Hierarchy

What we know with what level of certainty

Bias

Risk of bias

Finding the best level of evidence in health research

Level of evidence pyramid

Evidence level matrix or table

OCEBM order of searching for evidence depending on question type

Screening questions

Diagnosis questions

Prognosis questions

Questions about treatment benefits

Questions about treatment harms

Evidence grade

References