A users guide
Alejandro R Jadad
4 Assessing the quality of RCTs: why, what, how, and by whom?
Unfortunately, there is no such as thing as a perfect trial. In real life, readers only have imperfect trials to read and face lots of barriers to determining their quality with confidence.
One of the main barriers that hinders the formal assessment of trial quality is that quality is a complex concept or construct. As with any other construct, such as anxiety, happiness, or love, quality can be acknowledged without difficulty, but it is not easy to define or measure.
Another major barrier is that there is an increasing number of tools available to assess trial quality, but little empirical evidence to guide the selection of tools and the incorporation of assessments into reviews and decisions. There is also little empirical evidence about who should do the assessments (number and background of assessors), about how the assessments should be done (that is, masked vs open conditions), or about the impact of the assessments on health care decisions.
A third major barrier hindering the assessment of trial quality is that, in most cases, the only way to assess quality is by relying on information contained in the written report. The problem is that a trial with a biased design that is well reported could be judged as having high quality, whereas a well designed but poorly reported trial could be judged as having low quality.
In this chapter, I discuss each of these barriers, present the results of recent empirical methodological studies that could help you overcome them, discuss recent efforts to improve the quality of reports of RCTs, and identify areas where further methodological research is required. I hope that the information you find in this chapter will help you not only if you are a clinician trying to decide whether to include an intervention in your armamentarium, but also if you are reading a systematic review and want to evaluate the effect that the assessment of trial quality may have had on the results of the review (see Chapter 7).
Regardless of why you want to assess the quality of a trial or group of trials, the first question that you should try to answer is: What is quality?
You could define quality by focusing on a single aspect of trials or on a combination of any of the above.1,2 You should, however, take into account that the relevance of the research question, the degree of generalisability of the results, the adequacy of data analysis and presentation, and the ethical implications depend on the context in which they are assessed (that is, they are very much in the eye of the beholder). Of all the aspects of a trial that have been used to define and assess quality, internal validity is the least context-dependent and perhaps the only one that has been the subject of the few empirical methodological studies available. As a result of this, I would recommend that you always include elements related to internal validity in any assessment of the trial quality, complementing them with other aspects of the trial that may be relevant to your specific circumstances. Consider, for example, a trial in which a new antidepressant has been studied in affluent men with suicidal ideation and shown to reduce suicide rates. The generalisability of the results of this trial would be very important if you were a clinician trying to decide whether to offer it to an indigent woman, but would be irrelevant if you were a peer-reviewer trying to decide whether to recommend the report for publication. The internal validity of the trial would, however, be important in both cases. Internal validity is an important and necessary component of the assessment of trial quality, but it is not sufficient to provide a comprehensive evaluation of a trial.
Once you establish what quality means to you, the next step is to select a tool to generate the assessments. At this point you can develop your own tool or you can use an existing one.
What is involved in developing a new tool to assess trial quality?
Alternatively, you could develop the new tool following established methodological procedures similar to those used in the formal development of any other type of health measurement tool. The advantages of this approach are that it is systematic, it can be replicated by others (if properly described), and it can yield tools with known reliability and construct validity which would allow readers to discriminate among trials of varied quality. A description of these procedures is beyond the scope of this book, but can be found elsewhere.3 The following is, however, a list of the steps for developing a new tool to assess trial quality:
The main disadvantage of developing a new tool is that it is a time-consuming process. To avoid developing a new tool using the established methodology or creating one without proper evaluation, the reader can choose to select an existing tool
How many tools exist to evaluate trial quality?
A component represents an item that describes a single aspect of quality. Assessing trial quality using a component can be achieved by scoring the component as present or absent or by judging the adequacy of the information available on the component. For example, concealment of patient assignment could be judged as present or absent or, if it could be judged as adequate, unclear or inadequate.4 There are empirical methodological studies suggesting a relationship between at least five specific components and the likelihood of bias in trials. Most of these studies are described in Chapter 3. Briefly, these studies suggest that trials with inadequate randomisation or double-blinding, inadequate or unclear concealment of allocation, or inappropriately used crossover designs are more likely to produce larger treatment effects than those obtained by their counterparts.4-7 There is also evidence suggesting that reports of trials sponsored by pharmaceutical companies are more likely to favour the experimental intervention over controls than trials not sponsored by pharmaceutical companies.8 Even though individual components are quick and easy to score, using a single component to assess quality is not recommended because it provides minimal information about the overall quality of trials.2
The narrow view provided by individual components can be overcome by using several components grouped in checklists or scales. The main difference between a checklist and a scale is that, in a checklist, the components are evaluated separately and do not have numerical scores attached to them, whereas, in a scale, each item is scored numerically and an overall quality score is generated.2 A systematic search of the literature identified nine checklists and 25 scales for assessing trial quality.9 Ongoing efforts to update this literature search suggest that there are now at least twice as many scales for assessing trial quality and that their number is likely to keep increasing.10 Among the available checklists and scales, only one has been developed using established methodological procedures.1
What are the characteristics of the validated tool to assess trial quality?
The scale includes three items that are directly related to bias reduction and are presented as questions to elicit yes or no answers. The scale produces scores from 0 to 5. Point awards for the first two items (randomisation and double-blinding) depend not only on whether the trial is described as randomised or double-blind, but also on the appropriateness of the methods used to randomise and blind the trial. For example, if the trial is described as randomised or double-blind, but there is no description of the methods used to generate the randomisation sequence or the double-blind conditions, 1 point is awarded in each case (that is, 1 point is awarded for randomisation and 1 point for double-blinding if the trial is described as both randomised and double-blind, or only 1 point is awarded if the trial is described as randomised but not as double-blind). If the methods of generating the randomisation sequence or creating blinded conditions are described and are appropriate, 1 additional point is given for each item (see Chapter 3). Conversely, if the methods used to generate the randomisation sequence or create blinded conditions are described, but inappropriate, the relevant item is given 0 points. The third item of the scale, withdrawals and drop outs, is awarded 0 points for a negative answer and 1 point for a positive answer. For a positive answer, the number of withdrawals and drop outs in each group and the reasons must be stated in the report. If there were no withdrawals, this should also be stated (Fig 4.1). If a trial provides the number and reasons for withdrawals and drop outs in each group, you, as a reader, could reanalyse the data. At the time of the development of the scale, it was debated whether this item should be scored according to the proportion of withdrawals and drop outs in the trials, but this was considered inappropriate because we do not know precisely when a trial has too many drop outs. Once you have scored all the items of the scale, a trial could be judged as having poor quality if it is awarded 2 points or less. It has been shown that studies that obtain 2 or less points are likely to produce treatment effects which are 35% larger than those produced by trials with 3 or more points.19
You can use the overall score produced by the scale or use individual items. The use of the individual items is the most appropriate option if you do not feel comfortable with lumping different aspects of a trial into a single score or when the reports describe trials in areas where double-blinding is not feasible or appropriate (that is, surgical trials). Even trials that cannot be double-blind can, however, still be awarded more than 2 points if they were conducted and reported properly, and thus leave the category of poor trials. Trials could be awarded 3 points if they included a description of appropriate methods to generate the randomisation sequence (2 points) and a detailed account of withdrawals and drop outs (1 point).
This scale by no means represents the only or most appropriate way to assess trial quality, but it is the only validated tool available and appears to produce robust and valid results in an increasing number of empirical methodological studies. This does not mean that the scale should be used in isolation. Instead, you should complement it with separate assessments of any components for which there is empirical evidence of a direct relationship with bias. In addition, you could also add separate assessments of any other component or group of components related to other aspects of trial quality (for example, external validity, quality of data analysis, or presentation, and so on) that you think are important in each case. My current practice, for instance, includes the use of the validated scale together with a separate assessment of concealment of allocation, sources of funding, language of publication, country of publication, and, when applicable, the appropriateness of crossover design.
For more than 10 years it has been suggested that the quality of trial reports should be assessed under masked conditions, that is, without the knowledge of the authors, institutions, sponsorship, publication year and journal, or study results.5 There are, however, only two published empirical studies addressing this issue.1,20 One of these studies showed that assessments under masked conditions were more likely to yield lower and more consistent scores than assessments under open conditions.1 The other study also showed that lower scores were obtained under masked conditions.20 These results imply that bias could be introduced by assessments under open conditions. There is evidence, however, suggesting that the differences in the scores obtained under open or masked conditions may not be important.20 Masking the reports would reduce the likelihood of bias marginally, but it would also increase the resources required to conduct the assessments. Given the methodological and financial implications of these findings, and the small amount of empirical evidence available, I would not recommend that you mask the trial reports as a mandatory step during the assessment of their quality.
Another issue that you should take into account is the number and background of the people required to assess the quality of a given trial. Again, if you are a clinician trying to keep up to date, this is not an issue, because you will be the only person assessing the quality of a particular trial at a given point in time. If, however, you are reading a systematic review done by others, you may want to see if the authors provide information on who assessed trial quality. Typically, systematic reviewers ask two individuals (called raters, observers, or assessors) to assess the trials independently. They are given copies of the trial reports, the assessment instrument(s) with instructions, and a form on which to complete the assessments. The purpose of using multiple individuals to assess trial quality is to minimise the number of mistakes (usually caused by oversight while reading the trial report), and the risk of bias during the assessments. Once they have completed the assessments, the raters are invited to meet to discuss their findings and agree on a single quality assessment for each trial. Reaching agreement on trial quality is usually easy, but on occasions it may require a third person to act as arbiter. The degree of agreement between the raters can be quantified using methods previously described (that is, percentage agreement, k, correlation coefficient, intraclass correlation coefficient). These methods and their advantages and limitations have been described elsewhere.3 If the raters reach consensus, however, the value of measuring interrater agreement is limited.
Once you have assessed the quality of one or more trials, you should use the assessments to guide your decisions. How you use the assessments, however, will depend on your role, the purpose of the quality assessments, and the number of trials on the same topic that you are evaluating. For example:
There are different approaches to incorporate quality assessments into your decisions. If you are a clinician, a peer-reviewer, or a journal editor, and you are dealing with only one trial, you could set thresholds below which the trial would have limited value to guide your clinical decisions or to be published. For instance, if you are a clinician evaluating a trial in which the main outcomes are subjective, you may decide to use the trial to guide your decisions only if it is double-blind and the authors provide a detailed account of co-interventions. In addition, as a journal editor you may decide not to publish trials in which allocation was not concealed and that do not provide a detailed account of the flow of participants.
The situation is more complicated if you are reading or conducting a review of multiple trials on the same topic. In this case, there are several approaches that could be used to incorporate quality assessments,2,7,21 but little research evidence has evaluated the impact of any of these methods on the results of the reviews.2 These approaches include the following.
Tabulation of the quality assessments
Use of quality assessments as thresholds to include or exclude trials from a review
Use of quality assessments to conduct sensitivity analyses
Use of quality assessments as the input sequence for cumulative meta-analysis
Use of quality assessments to weight trials included in meta-analyses
Recent efforts to improve the quality of reporting of RCTs
By this point I hope that you will be convinced that a major barrier hindering the assessment of trial quality is that, in most cases, we must rely on the information contained in the written report. The problem is that a trial with a biased design that is well reported could be judged as having high quality, whereas a well designed but poorly reported trial could be judged as having low quality. If you contact the authors of the report directly, they may be able to provide the missing information that you require to complete the quality assessments, but they may not have such information available or they may give you false information (that is, it would be easy for them to tell you that a trial that they published 20 years ago included concealment of allocation). Ideally, all these problems could be avoided if the authors of trial reports provided enough information for the readers to judge whether the results of the trials are reliable.
In 1996, a group of clinical epidemiologists, biostatisticians, and journal editors published a statement called CONSORT (Consolidation of the Standards of Reporting Trials), which resulted from an extensive collaborative process with the aim of improving the standard of written reports of RCTs.22 The CONSORT statement was designed to assist the reporting of RCTs with two groups and those with parallel designs. Some modifications will be required to report crossover trials and those with more than two groups.23
The CONSORT statement includes a checklist of 21 items and a flow diagram for use by the authors to provide journal editors and peer-reviewers with the page of the report in which each of the 21 items is addressed. The flow chart provides a detailed description of the progress of participants through the randomised trial, from the number of potentially eligible individuals for inclusion in the trial to the number of trial participants in each group who completed the trial.23 Each of the items in the checklist and the elements of the flow chart are described in detail in Chapter 5.
Will the quality of RCTs improve?
Soon after its publication, the CONSORT statement was endorsed by major journals such as the British Medical Journal, The Lancet, the Journal of the American Medical Association, and the Canadian Medical Association Journal. These journals incorporated the CONSORT statement as part of the requirements for authors from 1 January 1997. Within six months of the publication of the statement, another 30 journals endorsed it.
Although the CONSORT statement was not evaluated before its publication, it is expected that it will lead to an improvement in the quality of reporting of RCTs, at least in the journals that have endorsed it. It is also expected that the actual quality of the trials will improve as a result of authors being aware of the requirements for submission of trial reports.The overall effect of CONSORT and other initiatives to improve the quality of RCTs is hard to predict, taking into account that there are more than 30 000 biomedical journals and that their number is likely to continue increasing exponentially.25 Whether there is a substantial improvement in the overall quality of future trials will depend on the extent to which researchers and editors agree that there is a need to improve their quality and are willing to make the necessary efforts to improve it.
Jadad AR, Moore RA, Carroll D, Jenkinson C, Reynolds JM, Gavaghan DJ, McQuay DM. Assessing the quality of reports on randomized clinical trials: Is blinding necessary? Controlled Clin Trials 1996;17:1-12.
Schulz KF, Chalmers I, Hayes RJ, Altman DG. Empirical evidence of bias: dimensions of methodological quality associated with estimates of treatment effect in controlled clinical trials. JAMA 1995;273:408-12.
Moher D, Jadad AR, Nichol G, Penman M, Tugwell P, Walsh S. Assessing the quality of randomized controlled trials: an annotated bibliography of scales and checklists. Controlled Clin Trials 1995;16:62-73.
Jadad AR, Cook DJ, Jones AL, Klassen TP, Tugwell P, Moher M, Moher D. The quality of randomised controlled trials included in meta-analyses and systematic reviews: how often and how is it assessed? Published as: Abstract presented at the 4th Cochrane Colloquium, Adelaide, Australia, October, 1996. In Review at Br Med J.
Moher D, Fortin P, Jadad AR, Juni P, Klassen T, LeLorier J, Liberati A, Linde K, Penna A. Completeness of reporting of trials published in languages other than English: implications for conduct and reporting of systematic reviews. Lancet 1996;347:363-6.
Moher D, Jones A, Cook DJ, Jadad AR, Moher M, Tugwell P, Klassen TP. Does the poor quality of reports of randomized trials exaggerate estimates of intervention effectiveness reported in meta-analysis? In Press. Lancet.
Begg C, Cho M, Eastwood S, Horton R, Moher D, Olkin I, Pitkin R, Rennie D, Schulz KF, Simel D, Stroup D. Improving the quality of reporting of randomized controlled trials—The CONSORT Statement. JAMA 1996;276:7-9.
Home | Contents | Foreword | Introduction | Acknowledgments | How to order
© BMJ Books 1998. BMJ Books is an imprint of the BMJ Publishing Group. First published in 1998 by BMJ Books, BMA House, Tavistock Square, London WC1H 9JR. A catalogue record for this book is available from the British Library. ISBN 0-7279-1208-9