When Can We Be Confident about Estimates of Treatment Effects?



  • Summary:

    Dr. Gordon Guyatt from the Department of Clinical Epidemiology and Biostatistics, McMaster University, moderated the topic "When Can We Be Confident about Estimates of Treatment Effects?" with Drs. Paul Glasziou from the Centre for Research in Evidence-Based Practice, Bond University, Victor Montori from the Knowledge and Evaluation Research Unit, Mayo Clinic, Rochester, MN, and Holger Schünemann from the Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada.

    The discussion focused primarily on:

    1. The concept of quality of evidence;
    2. traditional approaches to assessing quality of evidence;
    3. limitations of the hierarchy of evidence approach; and
    4. the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach.

    Med Roundtable Gen Med Ed. 2014;1(3):178–184.

  • Compounds:
    No compounds discussed.
    • Women’s Health Initiative

You are not authorized to access this content.

You are not authorized to access this content.
You are not authorized to access this content.
Please subscribe to use our print features or to download PDF files.

DR. GUYATT: I am Gordon Guyatt, a distinguished Professor of Clinical Epidemiology in Medicine at McMaster University. In the last decade or so, I have been largely involved in evidence-based medicine with clinical practice guidelines.

DR. GLASZIOU: I am Paul Glasziou, a general practitioner for the last 20 years. I have also been working in the area of evidence-based practice, most recently as the Director of the Centre for Evidence-Based Medicine in Oxford, and I am currently in Australia where I’m a Professor of Evidence-Based Practice at Bond University.

DR. MONTORI: My name is Victor Montori. I am a Professor of Medicine and Endocrinologist at the Mayo Clinic in Rochester, Minnesota, where I lead a group that is working on helping patients make better decisions on the basis of the best available research evidence.

DR. SCHÜNEMANN: I am Holger Schünemann, chairperson of the Department of Clinical Epidemiology and Biostatistics at McMaster University in Hamilton, Canada. I am also a practicing internist.

DR. GUYATT: Today, we are going to talk about the concept of quality of evidence.

We need to think about the quality of evidence when we make our treatment decisions and work with our patients to make optimal decisions. We need to have estimates of the benefits or desirable consequences of our treatments, for instance, the prevention of stroke with anticoagulants, and the downsides or undesirable consequences such as the increased risk of bleeding and the burden associated with anticoagulant therapy. We may be very confident in those estimates, which makes decision making easier, or we may not at all be confident of those estimates because of inadequate research evidence, and that uncertainty makes it more difficult to select optimal treatment options.

I will illustrate this with a historical example of hormone replacement therapy. Ten or 15 years ago, hormone replacement therapy was widely advocated by the clinical and expert communities, and physicians were strongly encouraged to prescribe hormone replacement therapy for postmenopausal women to reduce cardiovascular risk. We subsequently found out that, certainly for women who are not immediately perimenopausal, hormone replacement therapy does not reduce cardiovascular risk and may even increase it.

The problem with the initial guidelines was that the authors and the practitioners talking to their patients were not sufficiently cognizant of the uncertainty of that evidence. It’s one thing to say to patients, “We believe that hormone replacement therapy lowers your cardiovascular risk; therefore, we think you should take it.” It’s quite another thing to say, “Well, it’s possible that it lowers your cardiovascular risk, but we are not at all certain. It’s also possible that it does nasty things like increase the risk of breast cancer; now considering that, do you want to take it?”

The quality of evidence refers to our confidence in estimates of effect. Our decisions will be very different if we are confident of the estimates of treatment effects compared to when we are very uncertain of them. That notion has been persistent for a while now. Dr. Glasziou, could you tell us something about traditional approaches to assessing quality of evidence?

DR. GLASZIOU: To do that, we should probably go back over 30 years, because in the late 1970s, development of an explicit hierarchy of evidence was first considered by the Canadian Task Force on Preventive Health Care, who were developing guidelines on the periodic health examination. They had the task of sifting through a large body of evidence in order to develop recommendations. They came up with 2 very interesting principles. The first principle was based on the basic design of the study, that is, whether it is a randomized trial, a cohort study, a case-control study, a case study, or an expert opinion. They classified the large body of evidence they possessed based on these principles.

The second principle, which is also based on the study design, refers to the likelihood of evidence being biased. The first hierarchy of evidence quality was created, where evidence of the highest quality would have to come from at least one randomized trial, and at the bottom of that hierarchy of evidence were opinions of respected experts without any empirical evidence. That seems really simple in retrospect, but, actually, it was an incredible breakthrough to address the way we dealt with the large amount of available research evidence. It made it feasible to sift through evidence in a meaningful way and apply the principles of using the best-quality and least-biased evidence.

If we used this approach in your example of hormone replacement therapy, we would have looked at the large body of evidence on this therapy, even prior to the Women’s Health Initiative study.1 There are a number of small randomized trials. Klim McPherson, an epidemiologist from Oxford, put these together and suggested that there may be an increased risk of cardiovascular events and deaths in the groups taking hormone replacement therapy. Unfortunately, even at that time in the late 1990s, there was little recognition of this principle of the hierarchy of evidence.

After the Canadian Task Force on Preventive Health Care, people started to adopt the general idea, but acknowledged that there were other elements, beyond the basic study design, that needed to be taken into account, such as blinding and sample size. A key development was the recognition that you needed different hierarchies for different types of questions. For example, if we are interested in the risk that a patient is at or the prognosis after the first event, such as a stroke, then the best study design that we can use to generate the required information would be a cohort study or a so-called inception cohort study. With this design, it is possible to evaluate many people at the beginning of their condition and follow them up over time.

For predicting risk or prognosis, the cohort study is better than a randomized trial because it involves a more representative group, generating much more valid research findings. That was one principle that was taken care of through a number of different hierarchies of evidence, but even then, they were trying to fit in things, such as I mentioned earlier, that didn’t quite fit. Eventually, the traditional hierarchies of evidence started to fall apart due to attempts to fit too many elements as well as a lack of standardization. Now, we have to move on to a new phase of trying to unify the principles.

DR. GUYATT: I think you made an excellent point about the need for different hierarchies for different types of questions. I think for this conversation, we should stick to treatment or management issues: Is it best for our patient to use management strategy A or management strategy B?

While we are focusing on that, you have laid emphasis on the traditional approach that was grounded in study design. Randomized trials were at the top of the hierarchy. Everything else, including observational studies, was lower, and that was the core element on which everyone focused. As time went by, we learned about the limitations of the simple hierarchy approach.

Dr. Montori, can you talk about the limitations, and how that has been superseded by a new conceptualization?

DR. MONTORI: Dr. Glasziou just described the proverbial shoulders on which new conceptualizations are now based. This initial notion of a hierarchy of evidence and these manifestations, as Dr. Glasziou has described, were a major breakthrough and one of the key notions that induced a paradigm shift to evidence-based medicine and the way people learned and practiced medicine.

Traditional hierarchies consider study design features to order studies according to the risk involved in introducing an error in the results. According to these hierarchies, randomized trials generate estimates in which we will have greater confidence as compared to the results produced by observational studies. Since these hierarchies were formulated, and thanks to the increasing evidence, we have recognized that not all randomized trials will implement similar degrees of protection against error; therefore, careful considerations of features within each trial are needed to determine how confident we should be in their estimates. Similarly, not all studies measure the outcomes with adequate precision, that is, not all are able to determine the effect of treatment and rule effects that might suggest that a different course of action could be superior. An imprecision in estimates indicates that we are uncertain about where the truth lies, thus reducing our confidence in the estimates of effect.

Another consideration that has emerged is that a study may have provided good evidence or estimates for only one of many outcomes. Yet, other outcomes may be important for making a decision when working with patients or for making broad recommendations.

The other point recognized was that even if 2 studies were similar in terms of their design, they may yield inconsistent results. We needed to examine each of those studies to understand why they led to inconsistent results. However, there was little understanding that inconsistency in itself should lead to a reduction in the confidence that we have about the overall estimates of effect. There’s been increased recognition that the question that the investigators may have been trying to answer could differ substantially from the question decision makers are trying to answer. If there are important differences between trial conditions and outcomes, and what decision makers are interested in, then the treatment effect estimates may not apply, and if they must apply, they do so with limited confidence.

Recognition of error and bias within each of the studies, recognition of consistency across studies, and the extent to which the studies may apply to the situation at hand became important. We are not only required to assess where the hierarchy of evidence of a particular study lies, but rather, what is the overall impact of the body of evidence on our confidence in the estimates. Thus, new hierarchies of evidence focus mostly on the body of evidence and the confidence in the estimates of effect we derive from its totality. New conceptualizations of the hierarchy of evidence have become necessary to account for an explosion of important evidence and because of the need to make decisions based on that evidence at the policy and clinical level.