diff --git a/sample.jsonl.gz b/sample.jsonl.gz deleted file mode 100644 index 61b51d3..0000000 --- a/sample.jsonl.gz +++ /dev/null @@ -1,40 +0,0 @@ -{ - "pm_id": "https://pubmed.ncbi.nlm.nih.gov/19622512", - "pdf_text": "1Guidelines and Guidance\nThe PRISMA Statement for Reporting Systematic Reviews\nand Meta-Analyses of Studies That Evaluate Health Care\nInterventions: Explanation and Elaboration\nAlessandro Liberati1,2*, Douglas G. Altman3, Jennifer Tetzlaff4, Cynthia Mulrow5, Peter C. Gøtzsche6,\nJohn P. A. Ioannidis7, Mike Clarke8,9, P. J. Devereaux10, Jos Kleijnen11,12, David Moher4,13\n1Universita `di Modena e Reggio Emilia, Modena, Italy, 2Centro Cochrane Italiano, Istituto Ricerche Farmacologiche Mario Negri, Milan, Italy, 3Centre for Statistics in\nMedicine, University of Oxford, Oxford, United Kingdom, 4Ottawa Methods Centre, Ottawa Hospital Research Institute, Ottawa, Ontario, Canada, 5Annals of Internal\nMedicine, Philadelphia, Pennsylvania, United States of America, 6The Nordic Cochrane Centre, Copenhagen, Denmark, 7Department of Hygiene and Epidemiology,\nUniversity of Ioannina School of Medicine, Ioannina, Greece, 8UK Cochrane Centre, Oxford, United Kingdom, 9School of Nursing and Midwifery, Trinity College, Dublin,\nIreland, 10Departments of Medicine, Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada, 11Kleijnen Systematic Reviews Ltd, York,\nUnited Kingdom, 12School for Public Health and Primary Care (CAPHRI), University of Maastricht, Maastricht, The Netherlands, 13Department of Epidemiology and\nCommunity Medicine, Faculty of Medicine, Ottawa, Ontario, Canada\nAbstract: Systematic reviews and meta-analyses are\nessential to summarize evidence relating to efficacy and\nsafety of health care interventions accurately and reliably.The clarity and transparency of these reports, however, isnot optimal. Poor reporting of systematic reviews\ndiminishes their value to clinicians, policy makers, and\nother users. Since the development of the QUOROM(QUality OfReporting OfMeta-analysis) Statement—a\nreporting guideline published in 1999—there have been\nseveral conceptual, methodological, and practical advanc-\nes regarding the conduct and reporting of systematicreviews and meta-analyses. Also, reviews of published\nsystematic reviews have found that key information about\nthese studies is often poorly reported. Realizing theseissues, an international group that included experiencedauthors and methodologists developed PRISMA (Preferred\nReporting Items for Systematic reviews and Meta-Analy-\nses) as an evolution of the original QUOROM guideline forsystematic reviews and meta-analyses of evaluations ofhealth care interventions. The PRISMA Statement con-\nsists of a 27-item checklist and a four-phase flow diagram.\nThe checklist includes items deemed essential fortransparent reporting of a systematic review. In thisExplanation and Elaboration document, we explain the\nmeaning and rationale for each checklist item. For each\nitem, we include an example of good reporting and,where possible, references to relevant empirical studies\nand methodological literature. The PRISMA Statement,\nthis document, and the associated Web site (http://www.prisma-statement.org/) should be helpful resources toimprove reporting of systematic reviews and meta-\nanalyses.\nIntroduction\nSystematic reviews and meta-analyses are essential tools for\nsummarizing evidence accurately and reliably. They help\nclinicians keep up-to-date; provide evidence for policy makers to\njudge risks, benefits, and harms of health care behaviors and\ninterventions; gather together and summarize related research for\npatients and their carers; provide a starting point for clinicalpractice guideline developers; provide summaries of previous\nresearch for funders wishing to support new research [1]; and help\neditors judge the merits of publishing reports of new studies [2].Recent data suggest that at least 2,500 new systematic reviews\nreported in English are indexed in MEDLINE annually [3].\nUnfortunately, there is considerable evidence that key informa-\ntion is often poorly reported in systematic reviews, thusdiminishing their potential usefulness [3,4,5,6]. As is true for allresearch, systematic reviews should be reported fully andtransparently to allow readers to assess the strengths and\nweaknesses of the investigation [7]. That rationale led to the\ndevelopment of the QUOROM ( QUality OfReporting OfMeta-\nanalyses) Statement; those detailed reporting recommendationswere published in 1999 [8]. In this paper we describe the updating\nCitation: Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, et al. (2009) The\nPRISMA Statement for Reporting Systematic Reviews and Meta-Analyses of\nStudies That Evaluate Health Care Interventions: Explanation and\nElaboration. PLoS Med 6(7): e1000100. doi:10.1371/journal.pmed.1000100\nPublished July 21, 2009\nCopyright: /C2232009 Liberati et al. This is an open-access article distributed\nunder the terms of the Creative Commons Attribution License, which permits\nunrestricted use, distribution, and reproduction in any medium, provided the\noriginal author and source are credited.\nFunding: PRISMA was funded by the Canadian Institutes of Health Research;\nUniversita `di Modena e Reggio Emilia, Italy; Cancer Research UK; Clinical Evidence\nBMJ Knowledge; The Cochrane Collaboration; and GlaxoSmithKline, Canada. AL is\nfunded, in part, through grants of the Italian Ministry of University (COFIN - PRIN\n2002 prot. 2002061749 and COFIN - PRIN 2006 prot. 2006062298). DGA is funded\nby Cancer Research UK. DM is funded by a University of Ottawa Research Chair.None of the sponsors had any involvement in the planning, execution, or write-up\nof the PRISMA documents. Additionally, no funder played a role in drafting the\nmanuscript.\nCompeting Interests: MC’s employment is as Director of the UK Cochrane\nCentre. He is employed by the Oxford Radcliffe Hospitals Trust on behalf of theDepartment of Health and the National Institute for Health Research in England.\nThis is a fixed term contract, the renewal of which is dependent upon the value\nplaced upon his work, that of the UK Cochrane Centre, and of The Cochrane\nCollaboration more widely by the Department of Health. His work involves the\nconduct of systematic reviews and the support of the conduct and use of\nsystematic reviews. Therefore, work –such as this manuscript–relating to\nsystematic reviews might have an impact on his employment.\nAbbreviations: PICOS, participants, interventions, comparators, outcomes, and\nstudy design; PRISMA, Preferred Reporting Items for Systematic reviews and\nMeta-Analyses; QUOROM, QUality OfReporting OfMeta-analyses.\n* E-mail: alesslib@mailbase.it\nProvenance: Not commissioned; externally peer reviewed. In order to\nencourage dissemination of the PRISMA explanatory paper, this article is freely\naccessible on the PLoS Medicine ,Annals of Internal Medicine ,a n d BMJ Web sites.\nThe authors jointly hold the copyright of this article. For details on further use seethe PRISMA Web site (http://www.prisma-statement.org/).\nPLoS Medicine | www.plosmedicine.org 1 July 2009 | Volume 6 | Issue 7 | e1000100 of that guidance. Our aim is to ensure clear presentation of what\nwas planned, done, and found in a systematic review.\nTerminology used to describe systematic reviews and meta-\nanalyses has evolved over time and varies across different groups of\nresearchers and authors (see Box 1). In this document we adopt the\ndefinitions used by the Cochrane Collaboration [9]. A systematicreview attempts to collate all empirical evidence that fits pre-\nspecified eligibility criteria to answer a specific research question.\nIt uses explicit, systematic methods that are selected to minimizebias, thus providing reliable findings from which conclusions canbe drawn and decisions made. Meta-analysis is the use of statistical\nmethods to summarize and combine the results of independent\nstudies. Many systematic reviews contain meta-analyses, but notall.\nThe QUOROM Statement and Its Evolution into\nPRISMA\nThe QUOROM Statement, developed in 1996 and published\nin 1999 [8], was conceived as a reporting guidance for authorsreporting a meta-analysis of randomized trials. Since then, much\nhas happened. First, knowledge about the conduct and reporting\nof systematic reviews has expanded considerably. For example,The Cochrane Library’s Methodology Register (which includes\nreports of studies relevant to the methods for systematic reviews)\nnow contains more than 11,000 entries (March 2009). Second,there have been many conceptual advances, such as ‘‘outcome-level’’ assessments of the risk of bias [10,11], that apply to\nsystematic reviews. Third, authors have increasingly used\nsystematic reviews to summarize evidence other than thatprovided by randomized trials.\nHowever, despite advances, the quality of the conduct and\nreporting of systematic reviews remains well short of ideal[3,4,5,6]. All of these issues prompted the need for an update\nand expansion of the QUOROM Statement. Of note, recognizing\nthat the updated statement now addresses the above conceptualand methodological issues and may also have broader applicabilitythan the original QUOROM Statement, we changed the name of\nthe reporting guidance to PRISMA (Preferred Reporting Items for\nSystematic reviews and Meta-Analyses).\nDevelopment of PRISMA\nThe PRISMA Statement was developed by a group of 29 review\nauthors, methodologists, clinicians, medical editors, and consum-ers [12]. They attended a three-day meeting in 2005 andparticipated in extensive post-meeting electronic correspondence.\nA consensus process that was informed by evidence, whenever\npossible, was used to develop a 27-item checklist (Table 1; see alsoText S1 for a downloadable template checklist for researchers to\nre-use) and a four-phase flow diagram (Figure 1; see Figure S1 for\na downloadable template document for researchers to re-use).Items deemed essential for transparent reporting of a systematic\nreview were included in the checklist. The flow diagram originally\nproposed by QUOROM was also modified to show numbers ofidentified records, excluded articles, and included studies. After 11revisions the group approved the checklist, flow diagram, and this\nexplanatory paper.\nThe PRISMA Statement itself provides further details\nregarding its background and development [12]. This accom-\npanying Explanation and Elaboration document explains the\nmeaning and rationale for each checklist item. A few PRISMAGroup participants volunteered to help draft specific items for\nthis document, and four of these (DGA, AL, DM, and JT) meton several occasions to further refine the document, which was\ncirculated and ultimately approved by the larger PRISMAGroup.Box 1. Terminology\nThe terminology used to describe systematic reviews and\nmeta-analyses has evolved over time and varies between\nfields. Different terms have been used by different groups,\nsuch as educators and psychologists. The conduct of asystematic review comprises several explicit and repro-ducible steps, such as identifying all likely relevant records,\nselecting eligible studies, assessing the risk of bias,\nextracting data, qualitative synthesis of the includedstudies, and possibly meta-analyses.\nInitially this entire process was termed a meta-analysis\nand was so defined in the QUOROM Statement [8]. More\nrecently, especially in health care research, there has been atrend towards preferring the term systematic review. If\nquantitative synthesis is performed, this last stage alone is\nreferred to as a meta-analysis. The Cochrane Collaborationuses this terminology [9], under which a meta-analysis, ifperformed, is a component of a systematic review.\nRegardless of the question addressed and the complexities\ninvolved, it is always possible to complete a systematicreview of existing data, but not always possible, ordesirable, to quantitatively synthesize results, due to clinical,\nmethodological, or statistical differences across the includ-\ned studies. Conversely, with prospective accumulation ofstudies and datasets where the plan is eventually tocombine them, the term ‘‘(prospective) meta-analysis’’\nmay make more sense than ‘‘systematic review.’’\nFor retrospective efforts, one possibility is to use the\nterm systematic review for the whole process up to the\npoint when one decides whether to perform a quantitative\nsynthesis. If a quantitative synthesis is performed, someresearchers refer to this as a meta-analysis. This definitionis similar to that found in the current edition of the\nDictionary of Epidemiology [183].\nWhile we recognize that the use of these terms is\ninconsistent and there is residual disagreement among themembers of the panel working on PRISMA, we have adopted\nthe definitions used by the Cochrane Collaboration [9].\nSystematic review: A systematic review attempts to\ncollate all empirical evidence that fits pre-specified\neligibility criteria to answer a specific research question.\nIt uses explicit, systematic methods that are selected witha view to minimizing bias, thus providing reliable findingsfrom which conclusions can be drawn and decisions made\n[184,185]. The key characteristics of a systematic review\nare: (a) a clearly stated set of objectives with an explicit,reproducible methodology; (b) a systematic search thatattempts to identify all studies that would meet the\neligibility criteria; (c) an assessment of the validity of the\nfindings of the included studies, for example through theassessment of risk of bias; and (d) systematic presentation,\nand synthesis, of the characteristics and findings of the\nincluded studies.\nMeta-analysis: Meta-analysis is the use of statistical\ntechniques to integrate and summarize the results of\nincluded studies. Many systematic reviews contain meta-\nanalyses, but not all. By combining information from allrelevant studies, meta-analyses can provide more preciseestimates of the effects of health care than those derived\nfrom the individual studies included within a review.\nPLoS Medicine | www.plosmedicine.org 2 July 2009 | Volume 6 | Issue 7 | e1000100 Table 1. Checklist of items to include when reporting a systematic review (with or without meta-analysis).\nSection/Topic # Checklist Item Reported on Page #\nTITLE\nTitle 1 Identify the report as a systematic review, meta-analysis, or both.\nABSTRACTStructured summary 2 Provide a structured summary including, as applicable: background; objectives; data sources; study eligibility\ncriteria, participants, and interventions; study appraisal and synthesis methods; results; limitations; conclusionsand implications of key findings; systematic review registration number.\nINTRODUCTIONRationale 3 Describe the rationale for the review in the context of what is already known.\nObjectives 4 Provide an explicit statement of questions being addressed with reference to participants, interventions,\ncomparisons, outcomes, and study design (PICOS).\nMETHODS\nProtocol and registration 5 Indicate if a review protocol exists, if and where it can be accessed (e.g., Web address), and, if available, provide\nregistration information including registration number.\nEligibility criteria 6 Specify study characteristics (e.g., PICOS, length of follow-up) and report characteristics (e.g., years considered,\nlanguage, publication status) used as criteria for eligibility, giving rationale.\nInformation sources 7 Describe all information sources (e.g., databases with dates of coverage, contact with study authors to identify\nadditional studies) in the search and date last searched.\nSearch 8 Present full electronic search strategy for at least one database, including any limits used, such that it could be\nrepeated.\nStudy selection 9 State the process for selecting studies (i.e., screening, eligibility, included in systematic review, and, if applicable,\nincluded in the meta-analysis).\nData collection process 10 Describe method of data extraction from reports (e.g., piloted forms, independently, in duplicate) and any\nprocesses for obtaining and confirming data from investigators.\nData items 11 List and define all variables for which data were sought (e.g., PICOS, funding sources) and any assumptions and\nsimplifications made.\nRisk of bias in individual\nstudies12 Describe methods used for assessing risk of bias of individual studies (including specification of whether this was\ndone at the study or outcome level), and how this information is to be used in any data synthesis.\nSummary measures 13 State the principal summary measures (e.g., risk ratio, difference in means).\nSynthesis of results 14 Describe the methods of handling data and combining results of studies, if done, including measures of\nconsistency (e.g., I\n2) for each meta-analysis.\nRisk of bias across studies 15 Specify any assessment of risk of bias that may affect the cumulative evidence (e.g., publication bias, selective\nreporting within studies).\nAdditional analyses 16 Describe methods of additional analyses (e.g., sensitivity or subgroup analyses, meta-regression), if done,\nindicating which were pre-specified.\nRESULTSStudy selection 17 Give numbers of studies screened, assessed for eligibility, and included in the review, with reasons for exclusions\nat each stage, ideally with a flow diagram.\nStudy characteristics 18 For each study, present characteristics for which data were extracted (e.g., study size, PICOS, follow-up period)\nand provide the citations.\nRisk of bias within studies 19 Present data on risk of bias of each study and, if available, any outcome-level assessment (see Item 12).Results of individual\nstudies20 For all outcomes considered (benefits or harms), present, for each study: (a) simple summary data for each\nintervention group and (b) effect estimates and confidence intervals, ideally with a forest plot.\nSynthesis of results 21 Present results of each meta-analysis done, including confidence intervals and measures of consistency.\nRisk of bias across studies 22 Present results of any assessment of risk of bias across studies (see Item 15).\nAdditional analysis 23 Give results of additional analyses, if done (e.g., sensitivity or subgroup analyses, meta-regression [see Item 16]).DISCUSSION\nSummary of evidence 24 Summarize the main findings including the strength of evidence for each main outcome; consider their\nrelevance to key groups (e.g., health care providers, users, and policy makers).\nLimitations 25 Discuss limitations at study and outcome level (e.g., risk of bias), and at review level (e.g., incomplete retrieval of\nidentified research, reporting bias).\nConclusions 26 Provide a general interpretation of the results in the context of other evidence, and implications for future\nresearch.\nFUNDING\nFunding 27 Describe sources of funding for the systematic review and other support (e.g., supply of data); role of funders for\nthe systematic review.\ndoi:10.1371/journal.pmed.1000100.t001\nPLoS Medicine | www.plosmedicine.org 3 July 2009 | Volume 6 | Issue 7 | e1000100 Scope of PRISMA\nPRISMA focuses on ways in which authors can ensure the\ntransparent and complete reporting of systematic reviews andmeta-analyses. It does not address directly or in a detailed mannerthe conduct of systematic reviews, for which other guides are\navailable [13,14,15,16].\nWe developed the PRISMA Statement and this explanatory\ndocument to help authors report a wide array of systematic\nreviews to assess the benefits and harms of a health care\nintervention. We consider most of the checklist items relevantwhen reporting systematic reviews of non-randomized studiesassessing the benefits and harms of interventions. However, we\nrecognize that authors who address questions relating to\netiology, diagnosis, or prognosis, for example, and who reviewepidemiological or diagnostic accuracy studies may need tomodify or incorporate additional items for their systematicreviews.\nHow To Use This Paper\nWe modeled this Explanation and Elaboration document after\nthose prepared for other reporting guidelines [17,18,19]. To\nmaximize the benefit of this document, we encourage people to\nread it in conjunction with the PRISMA Statement [11].\nWe present each checklist item and follow it with a published\nexemplar of good reporting for that item. (We edited someexamples by removing citations or Web addresses, or by spelling\nout abbreviations.) We then explain the pertinent issue, the\nrationale for including the item, and relevant evidence from the\nliterature, whenever possible. No systematic search was carried out\nto identify exemplars and evidence. We also include seven Boxes\nthat provide a more comprehensive explanation of certainthematic aspects of the methodology and conduct of systematic\nreviews.\nAlthough we focus on a minimal list of items to consider when\nreporting a systematic review, we indicate places where additional\ninformation is desirable to improve transparency of the review\nprocess. We present the items numerically from 1 to 27; however,\nauthors need not address items in this particular order in their\nreports. Rather, what is important is that the information for eachitem is given somewhere within the report.\nThe PRISMA Checklist\nTITLE and ABSTRACT\nItem 1: TITLE. Identify the report as a systematic review,\nmeta-analysis, or both.\nExamples. ‘‘Recurrence rates of video-assisted thoraco-\nscopic versus open surgery in the prevention of recurrent\npneumothoraces: a systematic review of randomised and\nnon-randomised trials’’ [20]\nFigure 1. Flow of information through the different phases of a systematic review.\ndoi:10.1371/journal.pmed.1000100.g001\nPLoS Medicine | www.plosmedicine.org 4 July 2009 | Volume 6 | Issue 7 | e1000100 ‘‘Mortality in randomized trials of antioxidant supplements\nfor primary and secondary prevention: systematic review\nand meta-analysis’’ [21]\nExplanation. Authors should identify their report as a\nsystematic review or meta-analysis. Terms such as ‘‘review’’ or\n‘‘overview’’ do not describe for readers whether the review was\nsystematic or whether a meta-analysis was performed. A recent\nsurvey found that 50% of 300 authors did not mention the terms\n‘‘systematic review’’ or ‘‘meta-analysis’’ in the title or abstract of\ntheir systematic review [3]. Although sensitive search strategies\nhave been developed to identify systematic reviews [22], inclusion\nof the terms systematic review or meta-analysis in the title may\nimprove indexing and identification.\nWe advise authors to use informative titles that make key\ninformation easily accessible to readers. Ideally, a title reflecting the\nPICOS approach (participants, interventions, comparators, out-\ncomes, and study design) (see Item 11 and Box 2) may help readers as\nit provides key information about the scope of the review. Specifying\nthe design(s) of the studies included, as shown in the examples, may\nalso help some readers and those searching databases.\nSome journals recommend ‘‘indicative titles’’ that indicate the\ntopic matter of the review, while others require declarative titles\nthat give the review’s main conclusion. Busy practitioners may\nprefer to see the conclusion of the review in the title, but\ndeclarative titles can oversimplify or exaggerate findings. Thus,\nmany journals and methodologists prefer indicative titles as used in\nthe examples above.\nItem 2: STRUCTURED SUMMARY. Provide a structured\nsummary including, as applicable: background; objectives; data\nsources; study eligibility criteria, participants, and interventions;\nstudy appraisal and synthesis methods; results; limitations;\nconclusions and implications of key findings; funding for the\nsystematic review; and systematic review registration number.\nExample. ‘‘Context : The role and dose of oral vitamin D\nsupplementation in nonvertebral fracture prevention havenot been well established.\nObjective : To estimate the effectiveness of vitamin D\nsupplementation in preventing hip and nonvertebral frac-tures in older persons.\nData Sources : A systematic review of English and non-English\narticles using MEDLINE and the Cochrane Controlled\nTrials Register (1960–2005), and EMBASE (1991–2005).Additional studies were identified by contacting clinical\nexperts and searching bibliographies and abstracts presented\nat the American Society for Bone and Mineral Research(1995–2004). Search terms included randomized controlled\ntrial (RCT), controlled clinical trial, random allocation,\ndouble-blind method, cholecalciferol, ergocalciferol, 25-hydroxyvitamin D, fractures, humans, elderly, falls, and\nbone density.Study Selection : Only double-blind RCTs of oral vitamin D\nsupplementation (cholecalciferol, ergocalciferol) with orwithout calcium supplementation vs calcium supplementa-\ntion or placebo in older persons ( .60 years) that examined\nhip or nonvertebral fractures were included.\nData Extraction : Independent extraction of articles by 2\nauthors using predefined data fields, including study quality\nindicators.\nData Synthesis : All pooled analyses were based on random-\neffects models. Five RCTs for hip fracture (n = 9294) and 7Box 2. Helping To Develop the Research\nQuestion(s): The PICOS Approach\nFormulating relevant and precise questions that can be\nanswered in a systematic review can be complex and time\nconsuming. A structured approach for framing questions\nthat uses five components may help facilitate the process.This approach is commonly known by the acronym ‘‘PICOS’’where each letter refers to a component: the patient\npopulation or the disease being addressed (P), the\ninterventions or exposure (I), the comparator group (C),the outcome or endpoint (O), and the study design chosen(S) [186]. Issues relating to PICOS impact several PRISMAitems (i.e., Items 6, 8, 9, 10, 11, and 18).\nProviding information about the population requires a\nprecise definition of a group of participants (often patients),\nsuch as men over the age of 65 years, their defining\ncharacteristics of interest (often disease), and possibly thesetting of care considered, such as an acute care hospital.\nTheinterventions (exposures) under consideration in\nthe systematic review need to be transparently reported.For example, if the reviewers answer a question regardingthe association between a woman’s prenatal exposure to\nfolic acid and subsequent offspring’s neural tube defects,\nreporting the dose, frequency, and duration of folic acidused in different studies is likely to be important forreaders to interpret the review’s results and conclusions.Other interventions (exposures) might include diagnostic,\npreventative, or therapeutic treatments, arrangements of\nspecific processes of care, lifestyle changes, psychosocialor educational interventions, or risk factors.\nClearly reporting the comparator (control) group\nintervention(s), such as usual care, drug, or placebo, isessential for readers to fully understand the selection criteriaof primary studies included in systematic reviews, and\nmight be a source of heterogeneity investigators have to\ndeal with. Comparators are often very poorly described.Clearly reporting what the intervention is compared with isvery important and may sometimes have implications forthe inclusion of studies in a review—many reviews compare\nwith ‘‘standard care,’’ which is otherwise undefined; this\nshould be properly addressed by authors.\nTheoutcomes of the intervention being assessed, such as\nmortality, morbidity, symptoms, or quality of life improve-ments, should be clearly specified as they are required tointerpret the validity and generalizability of the systematic\nreview’s results.\nFinally, the type of study design(s) included in the\nreview should be reported. Some reviews only include\nreports of randomized trials whereas others have broader\ndesign criteria and include randomized trials and certaintypes of observational studies. Still other reviews, such asthose specifically answering questions related to harms,\nmay include a wide variety of designs ranging from cohort\nstudies to case reports. Whatever study designs areincluded in the review, these should be reported.\nIndependently from how difficult it is to identify the\ncomponents of the research question, the important point isthat a structured approach is preferable, and this extendsbeyond systematic reviews of effectiveness. Ideally the\nPICOS criteria should be formulated a priori, in the\nsystematic review’s protocol, although some revisions mightbe required due to the iterative nature of the review process.Authors are encouraged to report their PICOS criteria andwhether any modifications were made during the review\nprocess. A useful example in this realm is the Appendix of\nthe ‘‘Systematic Reviews of Water Fluoridation’’ undertakenby the Centre for Reviews and Dissemination [187].\nPLoS Medicine | www.plosmedicine.org 5 July 2009 | Volume 6 | Issue 7 | e1000100 RCTs for nonvertebral fracture risk (n = 9820) met our\ninclusion criteria. All trials used cholecalciferol. Heteroge-\nneity among studies for both hip and nonvertebral fractureprevention was observed, which disappeared after pooling\nRCTs with low-dose (400 IU/d) and higher-dose vitamin D\n(700–800 IU/d), separately. A vitamin D dose of 700 to800 IU/d reduced the relative risk (RR) of hip fracture by26% (3 RCTs with 5572 persons; pooled RR, 0.74; 95%\nconfidence interval [CI], 0.61–0.88) and any nonvertebral\nfracture by 23% (5 RCTs with 6098 persons; pooled RR,0.77; 95% CI, 0.68–0.87) vs calcium or placebo. No\nsignificant benefit was observed for RCTs with 400 IU/d\nvitamin D (2 RCTs with 3722 persons; pooled RR for hipfracture, 1.15; 95% CI, 0.88–1.50; and pooled RR for anynonvertebral fracture, 1.03; 95% CI, 0.86–1.24).\nConclusions : Oral vitamin D supplementation between 700 to\n800 IU/d appears to reduce the risk of hip and anynonvertebral fractures in ambulatory or institutionalizedelderly persons. An oral vitamin D dose of 400 IU/d is not\nsufficient for fracture prevention.’’ [23]\nExplanation. Abstracts provide key information that enables\nreaders to understand the scope, processes, and findings of a\nreview and to decide whether to read the full report. The abstract\nmay be all that is readily available to a reader, for example, in a\nbibliographic database. The abstract should present a balanced\nand realistic assessment of the review’s findings that mirrors, albeit\nbriefly, the main text of the report.\nWe agree with others that the quality of reporting in abstracts\npresented at conferences and in journal publications needs\nimprovement [24,25]. While we do not uniformly favor a specific\nformat over another, we generally recommend structured abstracts.\nStructured abstracts provide readers with a series of headings\npertaining to the purpose, conduct, findings, and conclusions of the\nsystematic review being reported [26,27]. They give readers more\ncomplete information and facilitate finding information more easily\nthan unstructured abstracts [28,29,30,31,32].\nA highly structured abstract of a systematic review could include\nthe following headings: Context (or Background); Objective (or\nPurpose); Data Sources; Study Selection (or Eligibility Criteria);\nStudy Appraisal and Synthesis Methods (or Data Extraction and\nData Synthesis); Results; Limitations; and Conclusions (or\nImplications). Alternatively, a simpler structure could cover but\ncollapse some of the above headings (e.g., label Study Selection\nand Study Appraisal as Review Methods) or omit some headings\nsuch as Background and Limitations.\nIn the highly structured abstract mentioned above, authors use\ntheBackground heading to set the context for readers and explain\nthe importance of the review question. Under the Objectives\nheading, they ideally use elements of PICOS (see Box 2) to state\nthe primary objective of the review. Under a Data Sources\nheading, they summarize sources that were searched, any\nlanguage or publication type restrictions, and the start and end\ndates of searches. Study Selection statements then ideally describe\nwho selected studies using what inclusion criteria. Data Extraction\nMethods statements describe appraisal methods during data\nabstraction and the methods used to integrate or summarize\nthe data. The Data Synthesis section is where the main results of\nthe review are reported. If the review includes meta-analyses,\nauthors should provide numerical results with confidence\nintervals for the most important outcomes. Ideally, they should\nspecify the amount of evidence in these analyses (numbers of\nstudies and numbers of participants). Under a Limitationsheading, authors might describe the most important weaknesses\nof included studies as well as limitations of the review process.Then authors should provide clear and balanced Conclusions that\nare closely linked to the objective and findings of the review.Additionally, it would be helpful if authors included someinformation about funding for the review. Finally, althoughprotocol registration for systematic reviews is still not commonpractice, if authors have registered their review or received aregistration number, we recommend providing the registrationinformation at the end of the abstract.\nTaking all the above considerations into account, the intrinsic\ntension between the goal of completeness of the abstract and itskeeping into the space limit often set by journal editors isrecognized as a major challenge.\nINTRODUCTION\nItem 3: RATIONALE. Describe the rationale for the review\nin the context of what is already known.\nExample. ‘‘Reversing the trend of increasing weight for\nheight in children has proven difficult. It is widely acceptedthat increasing energy expenditure and reducing energyintake form the theoretical basis for management. There-fore, interventions aiming to increase physical activity and\nimprove diet are the foundation of efforts to prevent and\ntreat childhood obesity. Such lifestyle interventions havebeen supported by recent systematic reviews, as well as bythe Canadian Paediatric Society, the Royal College ofPaediatrics and Child Health, and the American Academyof Pediatrics. However, these interventions are fraught with\npoor adherence. Thus, school -based interventions are\ntheoretically appealing because adherence with interven-tions can be improved. Consequently, many local govern-ments have enacted or are considering policies that mandateincreased physical activity in schools, although the effect ofsuch interventions on body composition has not been\nassessed.’’ [33]\nExplanation. Readers need to understand the rationale\nbehind the study and what the systematic review may add towhat is already known. Authors should tell readers whether theirreport is a new systematic review or an update of an existing one.If the review is an update, authors should state reasons for theupdate, including what has been added to the evidence base sincethe previous version of the review.\nAn ideal background or introduction that sets context for\nreaders might include the following. First, authors might define theimportance of the review question from different perspectives (e.g.,public health, individual patient, or health policy). Second, authorsmight briefly mention the current state of knowledge and itslimitations. As in the above example, information about the effectsof several different interventions may be available that helpsreaders understand why potential relative benefits or harms ofparticular interventions need review. Third, authors might whetreaders’ appetites by clearly stating what the review aims to add.They also could discuss the extent to which the limitations of theexisting evidence base may be overcome by the review.\nItem 4: OBJECTIVES. Provide an explicit statement of\nquestions being addressed with reference to participants,interventions, comparisons, outcomes, and study design(PICOS).\nPLoS Medicine | www.plosmedicine.org 6 July 2009 | Volume 6 | Issue 7 | e1000100 Example. ‘‘To examine whether topical or intraluminal\nantibiotics reduce catheter-related bloodstream infection, we\nreviewed randomized, controlled trials that assessed theefficacy of these antibiotics for primary prophylaxis against\ncatheter-related bloodstream infection and mortality com-\npared with no antibiotic therapy in adults undergoinghemodialysis.’’ [34]\nExplanation. The questions being addressed, and the\nrationale for them, are one of the most critical parts of a\nsystematic review. They should be stated precisely and explicitly\nso that readers can understand quickly the review’s scope and the\npotential applicability of the review to their interests [35].\nFraming questions so that they include the following five\n‘‘PICOS’’ components may improve the explicitness of review\nquestions: (1) the patient population or disease being addressed\n(P), (2) the interventions or exposure of interest (I), (3) thecomparators (C), (4) the main outcome or endpoint of interest\n(O), and (5) the study designs chosen (S). For more detail\nregarding PICOS, see Box 2.\nGood review questions may be narrowly focused or broad,\ndepending on the overall objectives of the review. Sometimes\nbroad questions might increase the applicability of the results and\nfacilitate detection of bias, exploratory analyses, and sensitivity\nanalyses [35,36]. Whether narrowly focused or broad, precisely\nstated review objectives are critical as they help define other\ncomponents of the review process such as the eligibility criteria\n(Item 6) and the search for relevant literature (Items 7 and 8).\nMETHODS\nItem 5: PROTOCOL AND REGISTRATION. Indicate if a\nreview protocol exists, if and where it can be accessed (e.g., Web\naddress) and, if available, provide registration information\nincluding the registration number.\nExample. ‘‘Methods of the analysis and inclusion criteria\nwere specified in advance and documented in a protocol.’’ [37]\nExplanation. A protocol is important because it pre-specifies\nthe objectives and methods of the systematic review. For instance,\na protocol specifies outcomes of primary interest, how reviewers\nwill extract information about those outcomes, and methods that\nreviewers might use to quantitatively summarize the outcome data\n(see Item 13). Having a protocol can help restrict the likelihood of\nbiased post hoc decisions in review methods, such as selective\noutcome reporting. Several sources provide guidance about\nelements to include in the protocol for a systematic review\n[16,38,39]. For meta-analyses of individual patient-level data, we\nadvise authors to describe whether a protocol was explicitly\ndesigned and whether, when, and how participating collaborators\nendorsed it [40,41].\nAuthors may modify protocols during the research, and readers\nshould not automatically consider such modifications inappropri-\nate. For example, legitimate modifications may extend the period\nof searches to include older or newer studies, broaden eligibility\ncriteria that proved too narrow, or add analyses if the primary\nanalyses suggest that additional ones are warranted. Authors\nshould, however, describe the modifications and explain their\nrationale.\nAlthough worthwhile protocol amendments are common, one\nmust consider the effects that protocol modifications may have onthe results of a systematic review, especially if the primary outcome\nis changed. Bias from selective outcome reporting in randomized\ntrials has been well documented [42,43]. An examination of 47Cochrane reviews revealed indirect evidence for possible selective\nreporting bias for systematic reviews. Almost all ( n= 43) contained\na major change, such as the addition or deletion of outcomes,\nbetween the protocol and the full publication [44]. Whether (or to\nwhat extent) the changes reflected bias, however, was not clear.For example, it has been rather common not to describe outcomes\nthat were not presented in any of the included studies.\nRegistration of a systematic review, typically with a protocol and\nregistration number, is not yet common, but some opportunities\nexist [45,46]. Registration may possibly reduce the risk of multiple\nreviews addressing the same question [45,46,47,48], reduce\npublication bias, and provide greater transparency when updatingsystematic reviews. Of note, a survey of systematic reviews indexed\nin MEDLINE in November 2004 found that reports of protocol\nuse had increased to about 46% [3] from 8% noted in previous\nsurveys [49]. The improvement was due mostly to Cochrane\nreviews, which, by requirement, have a published protocol [3].\nItem 6: ELIGIBILITY CRITERIA. Specify study charac-\nteristics (e.g., PICOS, length of follow-up) and report\ncharacteristics (e.g., years considered, language, publicationstatus) used as criteria for eligibility, giving rationale.\nExamples. Types of studies : ‘‘Randomised clinical trials\nstudying the administration of hepatitis B vaccine to CRF\n[chronic renal failure] patients, with or without dialysis. No\nlanguage, publication date, or publication status restrictions\nwere imposed…’’\nTypes of participants : ‘‘Participants of any age with CRF or\nreceiving dialysis (haemodialysis or peritoneal dialysis) were\nconsidered. CRF was defined as serum creatinine greaterthan 200\nmmol/L for a period of more than six months or\nindividuals receiving dialysis (haemodialysis or peritoneal\ndialysis)…Renal transplant patients were excluded from this\nreview as these individuals are immunosuppressed and are\nreceiving immunosuppressant agents to prevent rejection oftheir transplanted organs, and they have essentially normal\nrenal function…’’\nTypes of intervention : ‘‘Trials comparing the beneficial and\nharmful effects of hepatitis B vaccines with adjuvant or\ncytokine co-interventions [and] trials comparing the bene-\nficial and harmful effects of immunoglobulin prophylaxis.This review was limited to studies looking at active\nimmunization. Hepatitis B vaccines (plasma or recombinant\n(yeast) derived) of all types, dose, and regimens versus\nplacebo, control vaccine, or no vaccine…’’\nTypes of outcome measures : ‘‘Primary outcome measures:\nSeroconversion, ie, proportion of patients with adequate\nanti-HBs response ( .10 IU/L or Sample Ratio Units).\nHepatitis B infections (as measured by hepatitis B coreantigen (HBcAg) positivity or persistent HBsAg positivity),\nboth acute and chronic. Acute (primary) HBV [hepatitis B\nvirus] infections were defined as seroconversion to HBsAgpositivity or development of IgM anti-HBc. Chronic HBV\ninfections were defined as the persistence of HBsAg for more\nthan six months or HBsAg positivity and liver biopsy\ncompatible with a diagnos is or chronic hepatitis B.\nSecondary outcome measures: Adverse events of hepatitisB vaccinations…[and]…mortality.’’ [50]\nExplanation. Knowledge of the eligibility criteria is essential\nin appraising the validity, applicability, and comprehensiveness of\nPLoS Medicine | www.plosmedicine.org 7 July 2009 | Volume 6 | Issue 7 | e1000100 a review. Thus, authors should unambiguously specify eligibility\ncriteria used in the review. Carefully defined eligibility criteria\ninform various steps of the review methodology. They influence\nthe development of the search strategy and serve to ensure that\nstudies are selected in a systematic and unbiased manner.\nA study may be described in multiple reports, and one report may\ndescribe multiple studies. Therefore, we separate eligibility criteria\ninto the following two components: study characteristics and report\ncharacteristics. Both need to be reported. Study eligibility criteria\nare likely to include the populations, interventions, comparators,\noutcomes, and study designs of interest (PICOS; see Box 2), as well\nas other study-specific elements, such as specifying a minimum\nlength of follow-up. Authors should state whether studies will be\nexcluded because they do not include (or report) specific outcomes\nto help readers ascertain whether the systematic review may be\nbiased as a consequence of selective reporting [42,43].\nReport eligibility criteria are likely to include language of\npublication, publication status (e.g., inclusion of unpublished\nmaterial and abstracts), and year of publication. Inclusion or not of\nnon-English language literature [51,52,53,54,55], unpublished\ndata, or older data can influence the effect estimates in meta-\nanalyses [56,57,58,59]. Caution may need to be exercised in\nincluding all identified studies due to potential differences in the\nrisk of bias such as, for example, selective reporting in abstracts\n[60,61,62].\nItem 7: INFORMATION SOURCES. Describe all\ninformation sources in the search (e.g., databases with dates of\ncoverage, contact with study authors to identify additional studies)\nand date last searched.\nExample. ‘‘Studies were identified by searching electronic\ndatabases, scanning reference lists of articles and consulta-\ntion with experts in the field and drug companies…No limitswere applied for language and foreign papers were\ntranslated. This search was applied to Medline (1966–\nPresent), CancerLit (1975–Present), and adapted for Embase(1980–Present), Science Citation Index Expanded (1981–Present) and Pre-Medline electronic databases. Cochrane\nand DARE (Database of Abstracts of Reviews of Effective-\nness) databases were reviewed…The last search was run on19 June 2001. In addition, we handsearched contents pages\nof Journal of Clinical Oncology 2001, European Journal of\nCancer 2001 and Bone 2001, together with abstracts printedin these journals 1999–2001. A limited update literature\nsearch was performed from 19 June 2001 to 31 December\n2003.’’ [63]\nExplanation. The National Library of Medicine’s\nMEDLINE database is one of the most comprehensive sources\nof health care information in the world. Like any database,\nhowever, its coverage is not complete and varies according to the\nfield. Retrieval from any single database, even by an experienced\nsearcher, may be imperfect, which is why detailed reporting is\nimportant within the systematic review.\nAt a minimum, for each database searched, authors should\nreport the database, platform, or provider (e.g., Ovid, Dialog,\nPubMed) and the start and end dates for the search of each\ndatabase. This information lets readers assess the currency of the\nreview, which is important because the publication time-lag\noutdates the results of some reviews [64]. This information should\nalso make updating more efficient [65]. Authors should also report\nwho developed and conducted the search [66].In addition to searching databases, authors should report the\nuse of supplementary approaches to identify studies, such as hand\nsearching of journals, checking reference lists, searching trials\nregistries or regulatory agency Web sites [67], contacting\nmanufacturers, or contacting authors. Authors should also report\nif they attempted to acquire any missing information (e.g., on study\nmethods or results) from investigators or sponsors; it is useful to\ndescribe briefly who was contacted and what unpublished\ninformation was obtained.\nItem 8: SEARCH. Present the full electronic search strategy\nfor at least one major database, including any limits used, such that\nit could be repeated.\nExamples. In text : ‘‘We used the following search terms to\nsearch all trials registers and databases: immunoglobulin*;\nIVIG; sepsis; septic shock; septicaemia; and septicemia…’’[68]\nIn appendix : ‘‘Search strategy: MEDLINE (OVID)\n01. immunoglobulins/\n02. immunoglobulin$.tw.03. ivig.tw.\n04. 1 or 2 or 3\n05. sepsis/06. sepsis.tw.\n07. septic shock/\n08. septic shock.tw.\n09. septicemia/\n10. septicaemia.tw.11. septicemia.tw.\n12. 5 or 6 or 7 or 8 or 9 or 10 or 11\n13. 4 and 12\n14. randomized controlled trials/\n15. randomized-controlled-trial.pt.16. controlled-clinical-trial.pt.\n17. random allocation/\n18. double-blind method/19. single-blind method/\n20. 14 or 15 or 16 or 17 or 18 or 19\n21. exp clinical trials/\n22. clinical-trial.pt.\n23. (clin$ adj trial$).ti,ab.24. ((singl$ or doubl$ or trebl$ or tripl$) adj (blind$)).ti,ab.\n25. placebos/\n26. placebo$.ti,ab.\n27. random$.ti,ab.\n28. 21 or 22 or 23 or 24 or 25 or 26 or 2729. research design/\n30. comparative study/\n31. exp evaluation studies/32. follow-up studies/\n33. prospective studies/\n34. (control$ or prospective$ or volunteer$).ti,ab.\n35. 30 or 31 or 32 or 33 or 34\n36. 20 or 28 or 29 or 3537. 13 and 36’’ [68]\nExplanation. The search strategy is an essential part of the\nreport of any systematic review. Searches may be complicated anditerative, particularly when reviewers search unfamiliar databases\nor their review is addressing a broad or new topic. Perusing the\nsearch strategy allows interested readers to assess the\nPLoS Medicine | www.plosmedicine.org 8 July 2009 | Volume 6 | Issue 7 | e1000100 comprehensiveness and completeness of the search, and to\nreplicate it. Thus, we advise authors to report their full\nelectronic search strategy for at least one major database. As an\nalternative to presenting search strategies for all databases, authors\ncould indicate how the search took into account other databases\nsearched, as index terms vary across databases. If different\nsearches are used for different parts of a wider question (e.g.,\nquestions relating to benefits and questions relating to harms), we\nrecommend authors provide at least one example of a strategy for\neach part of the objective [69]. We also encourage authors to state\nwhether search strategies were peer reviewed as part of the\nsystematic review process [70].\nWe realize that journal restrictions vary and that having the\nsearch strategy in the text of the report is not always feasible. We\nstrongly encourage all journals, however, to find ways, such as a\n‘‘Web extra,’’ appendix, or electronic link to an archive, to make\nsearch strategies accessible to readers. We also advise all authors to\narchive their searches so that (1) others may access and review\nthem (e.g., replicate them or understand why their review of a\nsimilar topic did not identify the same reports), and (2) future\nupdates of their review are facilitated.\nSeveral sources provide guidance on developing search\nstrategies [71,72,73]. Most searches have constraints, for example\nrelating to limited time or financial resources, inaccessible or\ninadequately indexed reports and databases, unavailability ofexperts with particular language or database searching skills, or\nreview questions for which pertinent evidence is not easy to find.\nAuthors should be straightforward in describing their search\nconstraints. Apart from the keywords used to identify or exclude\nrecords, they should report any additional limitations relevant to\nthe search, such as language and date restrictions (see also\neligibility criteria, Item 6) [51].\nItem 9: STUDY SELECTION. State the process for selecting\nstudies (i.e., for screening, for determining eligibility, for inclusion\nin the systematic review, and, if applicable, for inclusion in the\nmeta-analysis).\nExample. ‘‘Eligibility assessment…[was] performed inde-\npendently in an unblinded standardized manner by 2reviewers…Disagreements between reviewers were resolved\nby consensus.’’ [74]\nExplanation. There is no standard process for selecting\nstudies to include in a systematic review. Authors usually start with\na large number of identified records from their search and\nsequentially exclude records according to eligibility criteria. We\nadvise authors to report how they screened the retrieved records\n(typically a title and abstract), how often it was necessary to review\nthe full text publication, and if any types of record (e.g., letters to\nthe editor) were excluded. We also advise using the PRISMA flow\ndiagram to summarize study selection processes (see Item 17; Box\n3).\nEfforts to enhance objectivity and avoid mistakes in study\nselection are important. Thus authors should report whether each\nstage was carried out by one or several people, who these people\nwere, and, whenever multiple independent investigators per-\nformed the selection, what the process was for resolving\ndisagreements. The use of at least two investigators may reduce\nthe possibility of rejecting relevant reports [75]. The benefit may\nbe greatest for topics where selection or rejection of an articlerequires difficult judgments [76]. For these topics, authors should\nideally tell readers the level of inter-rater agreement, how\ncommonly arbitration about selection was required, and whatefforts were made to resolve disagreements (e.g., by contact with\nthe authors of the original studies).\nItem 10: DATA COLLECTION PROCESS. Describe the\nmethod of data extraction from reports (e.g., piloted forms,independently by two reviewers) and any processes for obtainingand confirming data from investigators.\nExample. ‘‘We developed a data extraction sheet (based on\nthe Cochrane Consumers and Communication ReviewGroup’s data extraction template), pilot-tested it on tenrandomly-selected included studies, and refined it accord-ingly. One review author extracted the following data fromincluded studies and the second author checked theextracted data…Disagreements were resolved by discussionbetween the two review authors; if no agreement could bereached, it was planned a third author would decide. Wecontacted five authors for further information. All respondedand one provided numerical data that had only beenpresented graphically in the published paper.’’ [77]\nExplanation. Reviewers extract information from each\nincluded study so that they can critique, present, and summarize\nevidence in a systematic review. They might also contact authors\nof included studies for information that has not been, or isunclearly, reported. In meta-analysis of individual patient data,this phase involves collection and scrutiny of detailed rawdatabases. The authors should describe these methods, includingany steps taken to reduce bias and mistakes during data collection\nand data extraction [78] (Box 3).\nSome systematic reviewers use a data extraction form that could\nbe reported as an appendix or ‘‘Web extra’’ to their report. These\nforms could show the reader what information reviewers sought\n(see Item 11) and how they extracted it. Authors could tell readersif the form was piloted. Regardless, we advise authors to tellreaders who extracted what data, whether any extractions werecompleted in duplicate, and, if so, whether duplicate abstractionwas done independently and how disagreements were resolved.\nPublished reports of the included studies may not provide all the\ninformation required for the review. Reviewers should describeany actions they took to seek additional information from theoriginal researchers (see Item 7). The description might include\nhow they attempted to contact researchers, what they asked for,\nand their success in obtaining the necessary information. Authorsshould also tell readers when individual patient data were soughtfrom the original researchers [41] (see Item 11) and indicate thestudies for which such data were used in the analyses. Thereviewers ideally should also state whether they confirmed the\naccuracy of the information included in their review with the\noriginal researchers, for example, by sending them a copy of thedraft review [79].\nSome studies are published more than once. Duplicate\npublications may be difficult to ascertain, and their inclusionmay introduce bias [80,81]. We advise authors to describe anysteps they used to avoid double counting and piece together datafrom multiple reports of the same study (e.g., juxtaposing authornames, treatment comparisons, sample sizes, or outcomes). We\nalso advise authors to indicate whether all reports on a study were\nconsidered, as inconsistencies may reveal important limitations.For example, a review of multiple publications of drug trialsshowed that reported study characteristics may differ from reportto report, including the description of the design, number ofpatients analyzed, chosen significance level, and outcomes [82].\nPLoS Medicine | www.plosmedicine.org 9 July 2009 | Volume 6 | Issue 7 | e1000100 Authors ideally should present any algorithm that they used to\nselect data from overlapping reports and any efforts they used tosolve logical inconsistencies across reports.\nItem 11: DATA ITEMS. List and define all variables for\nwhich data were sought (e.g., PICOS, funding sources), and any\nassumptions and simplifications made.\nExamples. ‘‘Information was extracted from each included\ntrial on: (1) characteristics of trial participants (including age,\nstage and severity of disease, and method of diagnosis), andBox 3. Identification of Study Reports and\nData Extraction\nComprehensive searches usually result in a large number\nof identified records, a much smaller number of studies\nincluded in the systematic review, and even fewer of thesestudies included in any meta-analyses. Reports of system-atic reviews often provide little detail as to the methods\nused by the review team in this process. Readers are often\nleft with what can be described as the ‘‘X-files’’ phenom-enon, as it is unclear what occurs between the initial set ofidentified records and those finally included in the review.\nSometimes, review authors simply report the number of\nincluded studies; more often they report the initial numberof identified records and the number of included studies.\nRarely, although this is optimal for readers, do review\nauthors report the number of identified records, thesmaller number of potentially relevant studies, and theeven smaller number of included studies, by outcome.\nReview authors also need to differentiate between the\nnumber of reports and studies. Often there will not be a1:1 ratio of reports to studies and this information needs tobe described in the systematic review report.\nIdeally, the identification of study reports should be\nreported as text in combination with use of the PRISMAflow diagram. While we recommend use of the flowdiagram, a small number of reviews might be particularly\nsimple and can be sufficiently described with a few brief\nsentences of text. More generally, review authors will needto report the process used for each step: screening the\nidentified records; examining the full text of potentially\nrelevant studies (and reporting the number that could notbe obtained); and applying eligibility criteria to select theincluded studies.\nSuch descriptions should also detail how potentially\neligible records were promoted to the next stage of thereview (e.g., full text screening) and to the final stage ofthis process, the included studies. Often review teams have\nthree response options for excluding records or promoting\nthem to the next stage of the winnowing process: ‘‘yes,’’‘‘no,’’ and ‘‘maybe.’’\nSimilarly, some detail should be reported on who\nparticipated and how such processes were completed. Forexample, a single person may screen the identified recordswhile a second person independently examines a small\nsample of them. The entire winnowing process is one of\n‘‘good book keeping’’ whereby interested readers shouldbe able to work backwards from the included studies tocome up with the same numbers of identified records.\nThere is often a paucity of information describing the\ndata extraction processes in reports of systematic reviews.Authors may simply report that ‘‘relevant’’ data were\nextracted from each included study with little information\nabout the processes used for data extraction. It may beuseful for readers to know whether a systematic review’sauthors developed, a priori or not, a data extraction form,\nwhether multiple forms were used, the number of\nquestions, whether the form was pilot tested, and whocompleted the extraction. For example, it is important forreaders to know whether one or more people extracted\ndata, and if so, whether this was completed independent-\nly, whether ‘‘consensus’’ data were used in the analyses,and if the review team completed an informal trainingexercise or a more formal reliability exercise.Box 4. Study Quality and Risk of Bias\nIn this paper, and elsewhere [11], we sought to use a newterm for many readers, namely, risk of bias, for evaluating\neach included study in a systematic review. Previous\npapers [89,188] tended to use the term ‘‘quality’’. Whencarrying out a systematic review we believe it is importantto distinguish between quality and risk of bias and to focus\non evaluating and reporting the latter. Quality is often the\nbest the authors have been able to do. For example,authors may report the results of surgical trials in whichblinding of the outcome assessors was not part of the\ntrial’s conduct. Even though this may have been the best\nmethodology the researchers were able to do, there arestill theoretical grounds for believing that the study was\nsusceptible to (risk of) bias.\nAssessing the risk of bias should be part of the conduct\nand reporting of any systematic review. In all situations, weencourage systematic reviewers to think ahead carefully\nabout what risks of bias (methodological and clinical) may\nhave a bearing on the results of their systematic reviews.\nFor systematic reviewers, understanding the risk of bias\non the results of studies is often difficult, because the\nreport is only a surrogate of the actual conduct of the\nstudy. There is some suggestion [189,190] that the reportmay not be a reasonable facsimile of the study, althoughthis view is not shared by all [88,191]. There are three main\nways to assess risk of bias: individual components,\nchecklists, and scales. There are a great many scalesavailable [192], although we caution their use based on\ntheoretical grounds [193] and emerging empirical evi-\ndence [194]. Checklists are less frequently used andpotentially run the same problems as scales. We advocateusing a component approach and one that is based on\ndomains for which there is good empirical evidence and\nperhaps strong clinical grounds. The new Cochrane risk ofbias tool [11] is one such component approach.\nThe Cochrane risk of bias tool consists of five items for\nwhich there is empirical evidence for their biasing\ninfluence on the estimates of an intervention’s effective-ness in randomized trials (sequence generation, allocation\nconcealment, blinding, incomplete outcome data, and\nselective outcome reporting) and a catch-all item called‘‘other sources of bias’’ [11]. There is also some consensusthat these items can be applied for evaluation of studies\nacross very diverse clinical areas [93]. Other risk of bias\nitems may be topic or even study specific, i.e., they maystem from some peculiarity of the research topic or somespecial feature of the design of a specific study. These\npeculiarities need to be investigated on a case-by-case\nbasis, based on clinical and methodological acumen, andthere can be no general recipe. In all situations, systematic\nreviewers need to think ahead carefully about what\naspects of study quality may have a bearing on the results.\nPLoS Medicine | www.plosmedicine.org 10 July 2009 | Volume 6 | Issue 7 | e1000100 the trial’s inclusion and exclusion criteria; (2) type of\nintervention (including type, dose, duration and frequencyof the NSAID [non-steroidal anti-inflammatory drug];\nversus placebo or versus the type, dose, duration and\nfrequency of another NSAID; or versus another pain\nmanagement drug; or versus no treatment); (3) type of\noutcome measure (including the level of pain reduction,\nimprovement in quality of life score (using a validated scale),\neffect on daily activities, absence from work or school, length\nof follow up, unintended effects of treatment, number of\nwomen requiring more invasive treatment).’’ [83]\nExplanation. It is important for readers to know what\ninformation review authors sought, even if some of thisinformation was not available [84]. If the review is limited to\nreporting only those variables that were obtained, rather than\nthose that were deemed important but could not be obtained, biasmight be introduced and the reader might be misled. It is thereforehelpful if authors can refer readers to the protocol (see Item 5), and\narchive their extraction forms (see Item 10), including definitions\nof variables. The published systematic review should include adescription of the processes used with, if relevant, specification of\nhow readers can get access to additional materials.\nWe encourage authors to report whether some variables were\nadded after the review started. Such variables might include those\nfound in the studies that the reviewers identified (e.g., important\noutcome measures that the reviewers initially overlooked). Authorsshould describe the reasons for adding any variables to thosealready pre-specified in the protocol so that readers can\nunderstand the review process.\nWe advise authors to report any assumptions they made about\nmissing or unclear information and to explain those processes. For\nexample, in studies of women aged 50 or older it is reasonable to\nassume that none were pregnant, even if this is not reported.Likewise, review authors might make assumptions about the routeof administration of drugs assessed. However, special care should\nbe taken in making assumptions about qualitative information. For\nexample, the upper age limit for ‘‘children’’ can vary from 15 yearsto 21 years, ‘‘intense’’ physiotherapy might mean very different\nthings to different researchers at different times and for different\npatients, and the volume of blood associated with ‘‘heavy’’ bloodloss might vary widely depending on the setting.\nItem 12: RISK OF BIAS IN INDIVIDUAL STUDIES.\nDescribe methods used for assessing risk of bias in individualstudies (including specification of whether this was done at thestudy or outcome level, or both), and how this information is to beused in any data synthesis.\nExample. ‘‘To ascertain the validity of eligible randomized\ntrials, pairs of reviewers working independently and with\nadequate reliability determined the adequacy of randomi-\nzation and concealment of allocation, blinding of patients,\nhealth care providers, data collectors, and outcome\nassessors; and extent of loss to follow-up (i.e. proportion of\npatients in whom the investigators were not able to ascertain\noutcomes).’’ [85]\n‘‘To explore variability in study results (heterogeneity) we\nspecified the following hypotheses before conducting the\nanalysis. We hypothesised that effect size may differ\naccording to the methodological quality of the studies.’’\n[86]Explanation. The likelihood that the treatment effect reported\nin a systematic review approximates the truth depends on the validity\nof the included studies, as certain me thodological characteristics may\nbe associated with effect sizes [87,88]. For example, trials without\nreported adequate allocation concealment exaggerate treatment\neffects on average compared to those with adequate concealment[88]. Therefore, it is important for authors to describe any methods\nthat they used to gauge the risk of bias in the included studies and how\nthat information was used [89]. Additionally, authors should providea rationale if no assessment of risk of bias was undertaken. The most\npopular term to describe the issues relevant to this item is ‘‘quality,’’\nbut for the reasons that are elaborated in Box 4 we prefer to name this\nitem as ‘‘assessment of risk of bias.’’\nMany methods exist to assess the overall risk of bias in included\nstudies, including scales, checklists, and individual components\n[90,91]. As discussed in Box 4, scales that numerically summarize\nmultiple components into a single number are misleading andunhelpful [92,93]. Rather, authors should specify the methodolog-\nical components that they assessed. Common markers of validity for\nrandomized trials include the following: appropriate generation ofrandom allocation sequence [94]; concealment of the allocation\nsequence [93]; blinding of participants, health care providers, data\ncollectors, and outcome adjudicators [95,96,97,98]; proportion ofpatients lost to follow-up [99,100]; stopping of trials early for benefit\n[101]; and whether the analysis followed the intention-to-treat\nprinciple [100,102]. The ultimate decision regarding whichmethodological features to evaluate requires consideration of the\nstrength of the empiric data, theoretical rationale, and the unique\ncircumstances of the included studies.\nAuthors should report how they assessed risk of bias; whether\nit was in a blind manner; and if assessments were completed bymore than one person, and if so, whether they were completed\nindependently [103,104]. Similarly, we encourage authors to\nreport any calibration exercises among review team membersthat were done. Finally, authors need to report how their\nassessments of risk of bias are used subsequently in the data\nsynthesis (see Item 16). Despite the often difficult task ofassessing the risk of bias in included studies, authors are\nsometimes silent on what they did with the resultant assessments\n[89]. If authors exclude studies from the review or anysubsequent analyses on the basis of the risk of bias, they should\ntell readers which studies they excluded and explain the reasons\nfor those exclusions (see Item 6). Authors should also describeany planned sensitivity or subgroup analyses related to bias\nassessments (see Item 16).\nItem 13: SUMMARY MEASURES. State the principal\nsummary measures (e.g., risk ratio, difference in means).\nExamples. ‘‘Relative risk of mortality reduction was the\nprimary measure of treatment effect.’’ [105]\n‘‘The meta-analyses were performed by computing relative\nrisks (RRs) using random-effects model. Quantitative\nanalyses were performed on an intention-to-treat basis andwere confined to data derived from the period of follow-up.RR and 95% confidence intervals for each side effect (andall side effects) were calculated.’’ [106]\n‘‘The primary outcome measure was the mean difference in\nlog\n10HIV-1 viral load comparing zinc supplementation to\nplacebo…’’ [107]\nExplanation. When planning a systematic review, it is\ngenerally desirable that authors pre-specify the outcomes of\nPLoS Medicine | www.plosmedicine.org 11 July 2009 | Volume 6 | Issue 7 | e1000100 primary interest (see Item 5) as well as the intended summary effect\nmeasure for each outcome. The chosen summary effect measuremay differ from that used in some of the included studies. Ifpossible the choice of effect measures should be explained, thoughit is not always easy to judge in advance which measure is the mostappropriate.\nFor binary outcomes, the most common summary measures are\nthe risk ratio, odds ratio, and risk difference [108]. Relative effectsare more consistent across studies than absolute effects [109,110],although absolute differences are important when interpretingfindings (see Item 24).\nFor continuous outcomes, the natural effect measure is the\ndifference in means [108]. Its use is appropriate when outcomemeasurements in all studies are made on the same scale. Thestandardized difference in means is used when the studies do notyield directly comparable data. Usually this occurs when all studies\nassess the same outcome but measure it in a variety of ways (e.g.,\ndifferent scales to measure depression).\nFor time-to-event outcomes, the hazard ratio is the most\ncommon summary measure. Reviewers need the log hazard ratioand its standard error for a study to be included in a meta-analysis[111]. This information may not be given for all studies, butmethods are available for estimating the desired quantities fromother reported information [111]. Risk ratio and odds ratio (inrelation to events occurring by a fixed time) are not equivalent to\nthe hazard ratio, and median survival times are not a reliable basis\nfor meta-analysis [112]. If authors have used these measures theyshould describe their methods in the report.\nItem 14: PLANNED METHODS OF ANALYSIS. Describe\nthe methods of handling data and combining results of studies, ifdone, including measures of consistency (e.g., I\n2) for each meta-\nanalysis.\nExamples. ‘‘We tested for heterogeneity with the Breslow-\nDay test, and used the method proposed by Higgins et al. to\nmeasure inconsistency (the percentage of total variationacross studies due to heterogeneity) of effects across lipid-lowering interventions. The advantages of this measure ofinconsistency (termed I\n2) are that it does not inherently\ndepend on the number of studies and is accompanied by anuncertainty interval.’’ [113]\n‘‘In very few instances, estimates of baseline mean or mean\nQOL [Quality of life] responses were obtained withoutcorresponding estimates of variance (standard deviation\n[SD] or standard error). In these instances, an SD was\nimputed from the mean of the known SDs. In a number ofcases, the response data available were the mean andvariance in a pre study condition and after therapy. Thewithin-patient variance in these cases could not becalculated directly and was approximated by assuming\nindependence.’’ [114]\nExplanation. The data extracted from the studies in the\nreview may need some transformation (processing) before they aresuitable for analysis or for presentation in an evidence table.Although such data handling may facilitate meta-analyses, it issometimes needed even when meta-analyses are not done. Forexample, in trials with more than two intervention groups it maybe necessary to combine results for two or more groups (e.g.,receiving similar but non-identical interventions), or it may bedesirable to include only a subset of the data to match the review’s\ninclusion criteria. When several different scales (e.g., fordepression) are used across studies, the sign of some scores may\nneed to be reversed to ensure that all scales are aligned (e.g., so low\nvalues represent good health on all scales). Standard deviationsmay have to be reconstructed from other statistics such as p-values\nand tstatistics [115,116], or occasionally they may be imputed\nfrom the standard deviations observed in other studies [117].\nTime-to-event data also usually need careful conversions to a\nconsistent format [111]. Authors should report details of any suchdata processing.\nStatistical combination of data from two or more separate\nstudies in a meta-analysis may be neither necessary nor desirable\n(see Box 5 and Item 21). Regardless of the decision to combine\nindividual study results, authors should report how they planned toevaluate between-study variability (heterogeneity or inconsistency)\n(Box 6). The consistency of results across trials may influence the\ndecision of whether to combine trial results in a meta-analysis.\nWhen meta-analysis is done, authors should specify the effect\nmeasure (e.g., relative risk or mean difference) (see Item 13), thestatistical method (e.g., inverse variance), and whether a fixed- or\nrandom-effects approach, or some other method (e.g., Bayesian)\nwas used (see Box 6). If possible, authors should explain the\nreasons for those choices.\nItem 15: RISK OF BIAS ACROSS STUDIES. Specify any\nassessment of risk of bias that may affect the cumulative evidence(e.g., publication bias, selective reporting within studies).\nExamples. ‘‘For each trial we plotted the effect by the\ninverse of its standard error. The symmetry of such ‘funnel\nplots’ was assessed both visually, and formally with Egger’s\ntest, to see if the effect decreased with increasing sample\nsize.’’ [118]\n‘‘We assessed the possibility of publication bias by evaluating\na funnel plot of the trial mean differences for asymmetry,\nwhich can result from the non publication of small trials with\nnegative results…Because graphical evaluation can be\nsubjective, we also conducted an adjusted rank correlationtest and a regression asymmetry test as formal statistical tests\nfor publication bias…We acknowledge that other factors,\nsuch as differences in trial quality or true study heteroge-\nneity, could produce asymmetry in funnel plots.’’ [119]\nExplanation. Reviewers should explore the possibility that\nthe available data are biased. They may examine results from the\navailable studies for clues that suggest there may be missing studies\n(publication bias) or missing data from the included studies(selective reporting bias) (see Box 7). Authors should report in\ndetail any methods used to investigate possible bias across studies.\nIt is difficult to assess whether within-study selective reporting is\npresent in a systematic review. If a protocol of an individual study is\navailable, the outcomes in the protocol and the published report canbe compared. Even in the absence of a protocol, outcomes listed in\nthe methods section of the published report can be compared with\nthose for which results are presented [120]. In only half of 196 trialreports describing comparisons of two drugs in arthritis were all the\neffect variables in the methods and results sections the same [82]. In\nother cases, knowledge of the clinical area may suggest that it is\nlikely that the outcome was measured even if it was not reported.\nFor example, in a particular disease, if one of two linked outcomes isreported but the other is not, then one should question whether the\nlatter has been selectively omitted [121,122].\nOnly 36% (76 of 212) of therapeutic systematic reviews\npublished in November 2004 reported that study publication\nPLoS Medicine | www.plosmedicine.org 12 July 2009 | Volume 6 | Issue 7 | e1000100 bias was considered, and only a quarter of those intended to\ncarry out a formal assessment for that bias [3]. Of 60 meta-analyses in 24 articles published in 2005 in which formal\nassessments were reported, most were based on fewer than ten\nstudies; most displayed statistically significant heterogeneity; andmany reviewers misinterpreted the results of the tests employed[123]. A review of trials of antidepressants found that meta-analysis of only the published trials gave effect estimates 32%\nlarger on average than when all trials sent to the drug agency\nwere analyzed [67].\nItem 16: ADDITIONAL ANALYSES. Describe methods of\nadditional analyses (e.g., sensitivity or subgroup analyses, meta-\nregression), if done, indicating which were pre-specified.\nExample. ‘‘Sensitivity analyses were pre-specified. The\ntreatment effects were examined according to quality\ncomponents (concealed treatment allocation, blinding of\npatients and caregivers, blinded outcome assessment), timeto initiation of statins, and the type of statin. One post-hoc\nsensitivity analysis was conducted including unpublished\ndata from a trial using cerivastatin.’’ [124]\nExplanation. Authors may perform additional analyses to\nhelp understand whether the results of their review are robust, allof which should be reported. Such analyses include sensitivityanalysis, subgroup analysis, and meta-regression [125].Sensitivity analyses are used to explore the degree to which the\nmain findings of a systematic review are affected by changes in\nits methods or in the data used from individual studies (e.g.,\nstudy inclusion criteria, results of risk of bias assessment).\nSubgroup analyses address whether the summary effects vary\nin relation to specific (usually clinical) characteristics of the\nincluded studies or their participants. Meta-regression extends\nthe idea of subgroup analysis to the examination of the\nquantitative influence of study characteristics on the effect size\n[126]. Meta-regression also allows authors to examine the\ncontribution of different variables to the heterogeneity in study\nfindings. Readers of systematic reviews should be aware that\nmeta-regression has many limitations, including a danger of\nover-interpretation of findings [127,128].\nEven with limited data, many additional analyses can be\nundertaken. The choice of which analysis to undertake will depend\non the aims of the review. None of these analyses, however, are\nexempt from producing potentially misleading results. It is\nimportant to inform readers whether these analyses were\nperformed, their rationale, and which were pre-specified.\nRESULTS\nItem 17: STUDY SELECTION. Give numbers of studies\nscreened, assessed for eligibility, and included in the review, with\nreasons for exclusions at each stage, ideally with a flow diagram.\nExamples. In text :\n‘‘A total of 10 studies involving 13 trials were identified for\ninclusion in the review. The search of Medline, PsycInfo and\nCinahl databases provided a total of 584 citations. After\nadjusting for duplicates 509 remained. Of these, 479 studieswere discarded because after reviewing the abstracts itappeared that these papers clearly did not meet the criteria.\nThree additional studies…were discarded because full text\nof the study was not available or the paper could not befeasibly translated into English. The full text of the\nremaining 27 citations was examined in more detail. It\nappeared that 22 studies did not meet the inclusion criteriaas described. Five studies…met the inclusion criteria and\nwere included in the systematic review. An additional five\nstudies…that met the criteria for inclusion were identified bychecking the references of located, relevant papers and\nsearching for studies that have cited these papers. No\nunpublished relevant studies were obtained.’’ [129]See flow diagram Figure 2.\nExplanation. Authors should report, ideally with a flow\ndiagram, the total number of records identified from electronic\nbibliographic sources (including specialized database or registry\nsearches), hand searches of various sources, reference lists, citation\nindices, and experts. It is useful if authors delineate for readers the\nnumber of selected articles that were identified from the different\nsources so that they can see, for example, whether most articles were\nidentified through electronic bibli ographic sources or from references\nor experts. Literature identified primarily from references or experts\nmay be prone to citation or publication bias [131,132].\nThe flow diagram and text should describe clearly the process of\nreport selection throughout the review. Authors should report:\nunique records identified in searches; records excluded after\npreliminary screening (e.g., screening of titles and abstracts);\nreports retrieved for detailed evaluation; potentially eligible reports\nthat were not retrievable; retrieved reports that did not meetinclusion criteria and the primary reasons for exclusion; and theBox 5. Whether or Not To Combine Data\nDeciding whether or not to combine data involves\nstatistical, clinical, and methodological considerations.\nThe statistical decisions are perhaps the most technical\nand evidence-based. These are more thoroughly discussedin Box 6. The clinical and methodological decisions aregenerally based on discussions within the review team and\nmay be more subjective.\nClinical considerations will be influenced by the\nquestion the review is attempting to address. Broadquestions might provide more ‘‘license’’ to combine more\ndisparate studies, such as whether ‘‘Ritalin is effective in\nincreasing focused attention in people diagnosed withattention deficit hyperactivity disorder (ADHD).’’ Here\nauthors might elect to combine reports of studies\ninvolving children and adults. If the clinical question ismore focused, such as whether ‘‘Ritalin is effective inincreasing classroom attention in previously undiagnosed\nADHD children who have no comorbid conditions,’’ it is\nlikely that different decisions regarding synthesis of studiesare taken by authors. In any case authors should describetheir clinical decisions in the systematic review report.\nDeciding whether or not to combine data also has a\nmethodological component. Reviewers may decide not tocombine studies of low risk of bias with those of high riskof bias (see Items 12 and 19). For example, for subjective\noutcomes, systematic review authors may not wish to\ncombine assessments that were completed under blindconditions with those that were not.\nFor any particular question there may not be a ‘‘right’’\nor ‘‘wrong’’ choice concerning synthesis, as such decisionsare likely complex. However, as the choice may besubjective, authors should be transparent as to their key\ndecisions and describe them for readers.\nPLoS Medicine | www.plosmedicine.org 13 July 2009 | Volume 6 | Issue 7 | e1000100 studies included in the review. Indeed, the most appropriate layout\nmay vary for different reviews.\nAuthors should also note the presence of duplicate or\nsupplementary reports so that readers understand the number of\nindividual studies compared to the number of reports that were\nincluded in the review. Authors should be consistent in their use ofterms, such as whether they are reporting on counts of citations,\nrecords, publications, or studies. We believe that reporting the\nnumber of studies is the most important.\nA flow diagram can be very useful; it should depict all the\nstudies included based upon fulfilling the eligibility criteria,whether or not data have been combined for statistical analysis.\nA recent review of 87 systematic reviews found that about half\nincluded a QUOROM flow diagram [133]. The authors of thisresearch recommended some important ways that reviewers can\nimprove the use of a flow diagram when describing the flow of\ninformation throughout the review process, including a separate\nflow diagram for each important outcome reported [133].\nItem 18: STUDY CHARACTERISTICS. For each study,\npresent characteristics for which data were extracted (e.g., studysize, PICOS, follow-up period) and provide the citation.Examples. In text :\n‘‘Characteristics of included studiesMethodsAll four studies finally selected for the review were randomised\ncontrolled trials published in English. The duration of the\nintervention was 24 months for the RIO-North America and12 months for the RIO-Diabetes, RIO-Lipids and RIO-\nEurope study. Although the last two described a period of 24\nmonths during which they were conducted, only the first 12-months results are provided. All trials had a run-in, as a singleblind period before the randomisation.\nParticipants\nThe included studies involved 6625 participants. The main\ninclusion criteria entailed adults (18 years or older), with abody mass index greater than 27 kg/m\n2and less than 5 kg\nvariation in body weight within the three months before\nstudy entry.\nInterventionAll trials were multicentric. The RIO-North America was\nconducted in the USA and Canada, RIO-Europe in Europe\nand the USA, RIO-Diabetes in the USA and 10 otherBox 6. Meta-Analysis and Assessment of Consistency (Heterogeneity)\nMeta-Analysis: Statistical Combination of the Results\nof Multiple Studies If it is felt that studies should have\ntheir results combined statistically, other issues must be\nconsidered because there are many ways to conduct a meta-analysis. Different effect measures can be used for bothbinary and continuous outcomes (see Item 13). Also, there\nare two commonly used statistical models for combining\ndata in a meta-analysis [195]. The fixed-effect model assumesthat there is a common treatment effect for all includedstudies [196]; it is assumed that the observed differences in\nresults across studies reflect random variation [196]. The\nrandom-effects model assumes that there is no commontreatment effect for all included studies but rather that thevariation of the effects across studies follows a particular\ndistribution [197]. In a random-effects model it is believed\nthat the included studies represent a random sample from alarger population of studies addressing the question of\ninterest [198].\nThere is no consensus about whether to use fixed- or\nrandom-effects models, and both are in wide use. Thefollowing differences have influenced some researchers\nregarding their choice between them. The random-effects\nmodel gives more weight to the results of smaller trials thandoes the fixed-effect analysis, which may be undesirable assmall trials may be inferior and most prone to publication\nbias. The fixed-effect model considers only within-study\nvariability whereas the random-effects model considers bothwithin- and between-study variability. This is why a fixed-\neffect analysis tends to give narrower confidence intervals\n(i.e., provide greater precision) than a random-effectsanalysis [110,196,199]. In the absence of any between-studyheterogeneity, the fixed- and random-effects estimates will\ncoincide.\nIn addition, there are different methods for performing\nboth types of meta-analysis [200]. Common fixed-effectapproaches are Mantel-Haenszel and inverse variance,\nwhereas random-effects analyses usually use the DerSimo-\nnian and Laird approach, although other methods exist,including Bayesian meta-analysis [201].\nIn the presence of demonstrable between-study hetero-geneity (see below), some consider that the use of a fixed-\neffect analysis is counterintuitive because their main\nassumption is violated. Others argue that it is inappropriate\nto conduct any meta-analysis when there is unexplainedvariability across trial results. If the reviewers decide not tocombine the data quantitatively, a danger is that eventually\nthey may end up using quasi-quantitative rules of poor\nvalidity (e.g., vote counting of how many studies havenominally significant results) for interpreting the evidence.Statistical methods to combine data exist for almost any\ncomplex situation that may arise in a systematic review, but\none has to be aware of their assumptions and limitations toavoid misapplying or misinterpreting these methods.\nAssessment of Consistency (Heterogeneity) We expect\nsome variation (inconsistency) in the results of differentstudies due to chance alone. Variability in excess of that due\nto chance reflects true differences in the results of the trials,\nand is called ‘‘heterogeneity.’’ The conventional statisticalapproach to evaluating heterogeneity is a chi-squared test(Cochran’s Q), but it has low power when there are few\nstudies and excessive power when there are many studies\n[202]. By contrast, the I\n2statistic quantifies the amount of\nvariation in results across studies beyond that expected by\nchance and so is preferable to Q [202,203]. I2represents the\npercentage of the total variation in estimated effects acrossstudies that is due to heterogeneity rather than to chance;some authors consider an I\n2value less than 25% as low [202].\nHowever, I2also suffers from large uncertainty in the\ncommon situation where only a few studies are available[204], and reporting the uncertainty in I2(e.g., as the 95%\nconfidence interval) may be helpful [145]. When there are\nfew studies, inferences about heterogeneity should be\ncautious.\nWhen considerable heterogeneity is observed, it is\nadvisable to consider possible reasons [205]. In particular,\nthe heterogeneity may be due to differences betweensubgroups of studies (see Item 16). Also, data extractionerrors are a common cause of substantial heterogeneity in\nresults with continuous outcomes [139].\nPLoS Medicine | www.plosmedicine.org 14 July 2009 | Volume 6 | Issue 7 | e1000100 different countries not specified, and RIO-Lipids in eight\nunspecified different countries.\nThe intervention received was placebo, 5 mg of rimonabant\nor 20 mg of rimonabant once daily in addition to a mild\nhypocaloric diet (600 kcal/day deficit).\nOutcomes\nPrimary\nIn all studies the primary outcome assessed was weight\nchange from baseline after one year of treatment and the\nRIO-North America study also evaluated the prevention ofweight regain between the first and second year. All studies\nevaluated adverse effects, including those of any kind and\nserious events. Quality of life was measured in only one\nstudy, but the results were not described (RIO-Europe).\nSecondary and additional outcomes\nThese included prevalence of metabolic syndrome after one\nyear and change in cardiometabolic risk factors such as\nblood pressure, lipid profile, etc.\nNo study included mortality and costs as outcome.\nThe timing of outcome measures was variable and could\ninclude monthly investigations, evaluations every three\nmonths or a single final evaluation after one year.’’ [134]\nIn table : See Table 2.Explanation. For readers to gauge the validity and\napplicability of a systematic review’s results, they need to knowsomething about the included studies. Such information includesPICOS (Box 2) and specific information relevant to the reviewquestion. For example, if the review is examining the long-termeffects of antidepressants for moderate depressive disorder, authorsshould report the follow-up periods of the included studies. Foreach included study, authors should provide a citation for thesource of their information regardless of whether or not the studyis published. This information makes it easier for interestedreaders to retrieve the relevant publications or documents.\nReporting study-level data also allows the comparison of the\nmain characteristics of the studies included in the review. Authorsshould present enough detail to allow readers to make their ownjudgments about the relevance of included studies. Suchinformation also makes it possible for readers to conduct theirown subgroup analyses and interpret subgroups, based on studycharacteristics.\nAuthors should avoid, whenever possible, assuming information\nwhen it is missing from a study report (e.g., sample size, method ofrandomization). Reviewers may contact the original investigatorsto try to obtain missing information or confirm the data extractedfor the systematic review. If this information is not obtained, thisshould be noted in the report. If information is imputed, the readerBox 7. Bias Caused by Selective Publication of Studies or Results within Studies\nSystematic reviews aim to incorporate information from all\nrelevant studies. The absence of information from some\nstudies may pose a serious threat to the validity of a review.\nData may be incomplete because some studies were notpublished, or because of incomplete or inadequate reportingwithin a published article. These problems are often summa-\nrized as ‘‘publication bias’’ although in fact the bias arises from\nnon-publication of full studies and selective publication ofresults in relation to their findings. Non-publication of researchfindings dependent on the actual results is an important risk of\nbias to a systematic review and meta-analysis.\nMissing Studies Several empirical investigations have\nshown that the findings from clinical trials are more likely\nto be published if the results are statistically significant(p,0.05) than if they are not [125,206,207]. For example, of\n500 oncology trials with more than 200 participants for\nwhich preliminary results were presented at a conference of\nthe American Society of Clinical Oncology, 81% with p,0.05\nwere published in full within five years compared to only68% of those with p.0.05 [208].\nAlso, among published studies, those with statistically\nsignificant results are published sooner than those with non-significant findings [209]. When some studies are missing for\nthese reasons, the available results will be biased towards\nexaggerating the effect of an intervention.\nMissing Outcomes In many systematic reviews only some\nof the eligible studies (often a minority) can be included in a\nmeta-analysis for a specific outcome. For some studies, theoutcome may not be measured or may be measured but notreported. The former will not lead to bias, but the latter\ncould.\nEvidence is accumulating that selective reporting bias is\nwidespread and of considerable importance [42,43]. In\naddition, data for a given outcome may be analyzed inmultiple ways and the choice of presentation influenced by\nthe results obtained. In a study of 102 randomized trials,\ncomparison of published reports with trial protocols showed\nthat a median of 38% efficacy and 50% safety outcomes pertrial, respectively, were not available for meta-analysis.Statistically significant outcomes had a higher odds of being\nfully reported in publications when compared with non-\nsignificant outcomes for both efficacy (pooled odds ratio 2.4;95% confidence interval 1.4 to 4.0) and safety (4.7, 1.8 to 12)data. Several other studies have had similar findings [210,211].\nDetection of Missing Information Missing studies may\nincreasingly be identified from trials registries. Evidence of\nmissing outcomes may come from comparison with the\nstudy protocol, if available, or by careful examination ofpublished articles [11]. Study publication bias and selectiveoutcome reporting are difficult to exclude or verify from the\navailable results, especially when few studies are available.\nIf the available data are affected by either (or both) of the\nabove biases, smaller studies would tend to show largerestimates of the effects of the intervention. Thus one\npossibility is to investigate the relation between effect size\nand sample size (or more specifically, precision of the effectestimate). Graphical methods, especially the funnel plot\n[212], and analytic methods (e.g., Egger’s test) are often used\n[213,214,215], although their interpretation can be problem-atic [216,217]. Strictly speaking, such analyses investigate‘‘small study bias’’; there may be many reasons why smaller\nstudies have systematically different effect sizes than larger\nstudies, of which reporting bias is just one [218]. Severalalternative tests for bias have also been proposed, beyondthe ones testing small study bias [215,219,220], but none can\nbe considered a gold standard. Although evidence that\nsmaller studies had larger estimated effects than large onesmay suggest the possibility that the available evidence is\nbiased, misinterpretation of such data is common [123].\nPLoS Medicine | www.plosmedicine.org 15 July 2009 | Volume 6 | Issue 7 | e1000100 should be told how this was done and for which items. Presenting\nstudy-level data makes it possible to clearly identify unpublishedinformation obtained from the original researchers and make itavailable for the public record.\nTypically, study-level characteristics are presented as a table as\nin the example in Table 2. Such presentation ensures that allpertinent items are addressed and that missing or unclearinformation is clearly indicated. Although paper-based journalsdo not generally allow for the quantity of information available in\nelectronic journals or Cochrane reviews, this should not beaccepted as an excuse for omission of important aspects of themethods or results of included studies, since these can, if necessary,be shown on a Web site.\nFollowing the presentation and description of each included\nstudy, as discussed above, reviewers usually provide a narrativesummary of the studies. Such a summary provides readers with an\nFigure 2. Example Figure: Example flow diagram of study selection. DDW, Digestive Disease Week; UEGW, United European\nGastroenterology Week. Reproduced with permission from [130].doi:10.1371/journal.pmed.1000100.g002\nPLoS Medicine | www.plosmedicine.org 16 July 2009 | Volume 6 | Issue 7 | e1000100 overview of the included studies. It may for example address the\nlanguages of the published papers, years of publication, andgeographic origins of the included studies.\nThe PICOS framework is often helpful in reporting the narrative\nsummary indicating, for example, the clinical characteristics anddisease severity of the participants and the main features of theintervention and of the comparison group. For non-pharmacolog-ical interventions, it may be helpful to specify for each study the keyelements of the intervention received by each group. Full details ofthe interventions in included studies were reported in only three of25 systematic reviews relevant to general practice [84].\nItem 19: RISK OF BIAS WITHIN STUDIES. Present data\non risk of bias of each study and, if available, any outcome-levelassessment (see Item 12).\nExample. See Table 3.\nExplanation. We recommend that reviewers assess the risk of\nbias in the included studies using a standard approach withdefined criteria (see Item 12). They should report the results of anysuch assessments [89].\nReporting only summary data (e.g., ‘‘two of eight trials\nadequately concealed allocation’’) is inadequate because it failsto inform readers which studies had the particular methodologicalshortcoming. A more informative approach is to explicitly reportthe methodological features evaluated for each study. TheCochrane Collaboration’s new tool for assessing the risk of bias\nalso requests that authors substantiate these assessments with anyrelevant text from the original studies [11]. It is often easiest toprovide these data in a tabular format, as in the example.However, a narrative summary describing the tabular data canalso be helpful for readers.\nItem 20: RESULTS OF INDIVIDUAL STUDIES. For all\noutcomes considered (benefits and harms), present, for each study:(a) simple summary data for each intervention group and (b) effectestimates and confidence intervals, ideally with a forest plot.\nExamples. See Table 4 and Figure 3.\nExplanation. Publication of summary data from individual\nstudies allows the analyses to be reproduced and other analysesand graphical displays to be investigated. Others may wish toassess the impact of excluding particular studies or considersubgroup analyses not reported by the review authors. Displayingthe results of each treatment group in included studies also enablesinspection of individual study features. For example, if only oddsratios are provided, readers cannot assess the variation in eventrates across the studies, making the odds ratio impossible tointerpret [138]. Additionally, because data extraction errors inmeta-analyses are common and can be large [139], thepresentation of the results from individual studies makes it easierto identify errors. For continuous outcomes, readers may wish toTable 2. Example Table: Summary of included studies evaluating the efficacy of antiemetic agents in acute gastroenteritis.\nSource SettingNo. of\nPatients Age Range Inclusion Criteria Antiemetic Agent Route Follow-Up\nFreedman et al., 2006 ED 214 6 months–10 years GE with mild to moderate\ndehydration and vomiting\nin the preceding 4 hoursOndansetron PO 1–2 weeks\nReeves et al., 2002 ED 107 1 month–22 years GE and vomiting requiring IV\nrehydrationOndansetron IV 5–7 days\nRoslund et al., 2007 ED 106 1–10 years GE with failed oral rehydration\nattempt in EDOndansetron PO 1 week\nStork et al., 2006 ED 137 6 months–12 years GE, recurrent emesis, mild\nto moderate dehydration,and failed oral hydrationOndansetron and\ndexamethasoneIV 1 and 2 days\nED, emergency department; GE, gastroenteritis; IV, intravenous; PO, by mouth.\nAdapted from [135].\ndoi:10.1371/journal.pmed.1000100.t002\nTable 3. Example Table: Quality measures of the randomized controlled trials that failed to fulfill any one of six markers of validity.\nTrialsConcealment of\nRandomisationRCT Stopped\nEarlyPatients\nBlindedHealth Care\nProviders BlindedData Collectors\nBlindedOutcome\nAssessors Blinded\nLiu No No Yes Yes Yes Yes\nStone Yes No No Yes Yes YesPolderman Yes Yes No No No Yes\nZaugg Yes No No No Yes Yes\nUrban Yes Yes No No, except\nanesthesiologistsYes Yes\nRCT, randomized controlled trial.\nAdapted from [96].doi:10.1371/journal.pmed.1000100.t003\nPLoS Medicine | www.plosmedicine.org 17 July 2009 | Volume 6 | Issue 7 | e1000100 examine the consistency of standard deviations across studies, for\nexample, to be reassured that standard deviation and standarderror have not been confused [138].\nFor each study, the summary data for each intervention group\nare generally given for binary outcomes as frequencies with andwithout the event (or as proportions such as 12/45). It is notsufficient to report event rates per intervention group aspercentages. The required summary data for continuous outcomesare the mean, standard deviation, and sample size for each group.In reviews that examine time-to-event data, the authors shouldreport the log hazard ratio and its standard error (or confidenceinterval) for each included study. Sometimes, essential data aremissing from the reports of the included studies and cannot becalculated from other data but may need to be imputed by thereviewers. For example, the standard deviation may be imputed\nusing the typical standard deviations in the other trials [116,117](see Item 14). Whenever relevant, authors should indicate whichresults were not reported directly and had to be estimated fromother information (see Item 13). In addition, the inclusion ofunpublished data should be noted.\nFor all included studies it is important to present the estimated\neffect with a confidence interval. This information may beincorporated in a table showing study characteristics or may beshown in a forest plot [140]. The key elements of the forest plot are\nthe effect estimates and confidence intervals for each study shown\ngraphically, but it is preferable also to include, for each study, thenumerical group-specific summary data, the effect size andconfidence interval, and the percentage weight (see second example[Figure 3]). For discussion of the results of meta-analysis, see Item 21.\nIn principle, all the above information should be provided for\nevery outcome considered in the review, including both benefitsand harms. When there are too many outcomes for fullinformation to be included, results for the most importantoutcomes should be included in the main report with otherinformation provided as a Web appendix. The choice of theinformation to present should be justified in light of what wasoriginally stated in the protocol. Authors should explicitly mentionif the planned main outcomes cannot be presented due to lack ofinformation. There is some evidence that information on harms isonly rarely reported in systematic reviews, even when it is available\nin the original studies [141]. Selective omission of harms results\nbiases a systematic review and decreases its ability to contribute toinformed decision making.\nItem 21: SYNTHESES OF RESULTS. Present the main\nresults of the review. If meta-analyses are done, include for each,confidence intervals and measures of consistency.Table 4. Example Table: Heterotopic ossification in trials\ncomparing radiotherapy to non-steroidal anti-inflammatory\ndrugs after major hip procedures and fractures.\nAuthor (Year) Radiotherapy NSAID\nKienapfel (1999) 12/49 24.5% 20/55 36.4%\nSell (1998) 2/77 2.6% 18/77 23.4%\nKolbl (1997) 39/188 20.7% 18/113 15.9%Kolbl (1998) 22/46 47.8% 6/54 11.1%\nMoore (1998) 9/33 27.3% 18/39 46.2%\nBremen-Kuhne (1997) 9/19 47.4% 11/31 35.5%Knelles (1997) 5/101 5.0% 46/183 25.4%\nNSAID, non-steroidal anti-inflammatory drug.\nAdapted from [136].\ndoi:10.1371/journal.pmed.1000100.t004\nFigure 3. Example Figure: Overall failure (defined as failure of assigned regimen or relapse) with tetracycline-rifampicin versus\ntetracycline-streptomycin. CI, confidence interval. Reproduced with permission from [137].\ndoi:10.1371/journal.pmed.1000100.g003\nPLoS Medicine | www.plosmedicine.org 18 July 2009 | Volume 6 | Issue 7 | e1000100 Examples. ‘‘Mortality data were available for all six trials,\nrandomizing 311 patients and reporting data for 305patients. There were no deaths reported in the threerespiratory syncytial virus/severe bronchiolitis trials; thusour estimate is based on three trials randomizing 232patients, 64 of whom died. In the pooled analysis, surfactant\nwas associated with significantly lower mortality (relative\nrisk = 0.7, 95% confidence interval = 0.4–0.97, P = 0.04).There was no evidence of heterogeneity (I\n2= 0%)’’. [142]\n‘‘Because the study designs, participants, interventions, and\nreported outcome measures varied markedly, we focused ondescribing the studies, their results, their applicability, andtheir limitations and on qualitative synthesis rather thanmeta-analysis.’’ [143]\n‘‘We detected significant heterogeneity within this compar-\nison (I\n2= 46.6%; x2= 13.11, df = 7; P = 0.07). Retrospective\nexploration of the heterogeneity identified one trial that\nseemed to differ from the others. It included only small\nulcers (wound area less than 5 cm2). Exclusion of this trial\nremoved the statistical heterogeneity and did not affect thefinding of no evidence of a difference in healing ratebetween hydrocolloids and simple low adherent dressings(relative risk = 0.98, [95% confidence interval] 0.85 to 1.12;\nI\n2= 0%).’’ [144]\nExplanation. Results of systematic reviews should be\npresented in an orderly manner. Initial narrative descriptions ofthe evidence covered in the review (see Item 18) may tell readersimportant things about the study populations and the design andconduct of studies. These descriptions can facilitate theexamination of patterns across studies. They may also provideimportant information about applicability of evidence, suggest thelikely effects of any major biases, and allow consideration, in asystematic manner, of multiple explanations for possibledifferences of findings across studies.\nIf authors have conducted one or more meta-analyses, they\nshould present the results as an estimated effect across studieswith a confidence interval. It is often simplest to show eachmeta-analysis summary with the actual results of included studiesin a forest plot (see Item 20) [140]. It should always be clearwhich of the included studies contributed to each meta-analysis.Authors should also provide, for each meta-analysis, a measureof the consistency of the results from the included studies suchas I\n2(heterogeneity; see Box 6); a confidence interval may also\nbe given for this measure [145]. If no meta-analysis wasperformed, the qualitative inferences should be presented assystematically as possible with an explanation of why meta-analysis was not done, as in the second example above [143].Readers may find a forest plot, without a summary estimate,helpful in such cases.\nAuthors should in general report syntheses for all the outcome\nmeasures they set out to investigate (i.e., those described in theprotocol; see Item 4) to allow readers to draw their ownconclusions about the implications of the results. Readers shouldbe made aware of any deviations from the planned analysis.Authors should tell readers if the planned meta-analysis was notthought appropriate or possible for some of the outcomes and thereasons for that decision.\nIt may not always be sensible to give meta-analysis results and\nforest plots for each outcome. If the review addresses a broad\nquestion, there may be a very large number of outcomes. Also,some outcomes may have been reported in only one or twostudies, in which case forest plots are of little value and may be\nseriously biased.\nOf 300 systematic reviews indexed in MEDLINE in 2004, a\nlittle more than half (54%) included meta-analyses, of which themajority (91%) reported assessing for inconsistency in results.\nItem 22: RISK OF BIAS ACROSS STUDIES. Present\nresults of any assessment of risk of bias across studies (see Item 15).\nExamples. ‘‘Strong evidence of heterogeneity (I2= 79%,\nP,0.001) was observed. To explore this heterogeneity, a\nfunnel plot was drawn. The funnel plot in Figure 4 showsevidence of considerable asymmetry.’’ [146]\n‘‘Specifically, four sertraline trials involving 486 participants\nand one citalopram trial involving 274 participants were\nreported as having failed to achieve a statistically significant\ndrug effect, without reporting mean HRSD [Hamilton RatingScale for Depression] scores. We were unable to find data from\nthese trials on pharmaceutical company Web sites or through\nour search of the published literature. These omissionsrepresent 38% of patients in sertraline trials and 23% of\npatients in citalopram trials. Analyses with and without\ninclusion of these trials found no differences in the patternsof results; similarly, the revealed patterns do not interact with\ndrug type. The purpose of usin g the data obtained from the\nFDA was to avoid publication bias, by including unpublishedas well as published trials. Inclusion of only those sertraline and\ncitalopram trials for which means were reported to the FDA\nwould constitute a form of reporting bias similar to publication\nbias and would lead to overestimation of drug–placebo\ndifferences for these drug types. Therefore, we present analysesonly on data for medications for which complete clinical trials’\nchange was reported.’’ [147]\nExplanation. Authors should present the results of any\nassessments of risk of bias across studies. If a funnel plot isreported, authors should specify the effect estimate and measure ofprecision used, presented typically on the x-axis and y-axis,\nrespectively. Authors should describe if and how they have tested\nthe statistical significance of any possible asymmetry (see Item 15).\nResults of any investigations of selective reporting of outcomeswithin studies (as discussed in Item 15) should also be reported.Also, we advise authors to tell readers if any pre-specified analysesfor assessing risk of bias across studies were not completed and thereasons (e.g., too few included studies).\nItem 23: ADDITIONAL ANALYSES. Give results of\nadditional analyses, if done (e.g., sensitivity or subgroup analyses,\nmeta-regression [see Item 16]).\nExamples. ‘‘…benefits of chondroitin were smaller in trials\nwith adequate concealment of allocation compared withtrials with unclear concealment (P for interaction = 0.050), in\ntrials with an intention-to-treat analysis compared with those\nthat had excluded patients from the analysis (P forinteraction = 0.017), and in large compared with small trials\n(P for interaction = 0.022).’’ [148]\n‘‘Subgroup analyses according to antibody status, antiviral\nmedications, organ transplanted, treatment duration, use of\nantilymphocyte therapy, time to outcome assessment, studyquality and other aspects of study design did not\ndemonstrate any differences in treatment effects. Multivar-\nPLoS Medicine | www.plosmedicine.org 19 July 2009 | Volume 6 | Issue 7 | e1000100 iate meta-regression showed no significant difference in\nCMV [cytomegalovirus] disease after allowing for potential\nconfounding or effect-modification by prophylactic drug\nused, organ transplanted or recipient serostatus in CMV\npositive recipients and CMV negative recipients of CMV\npositive donors.’’ [149]\nExplanation. Authors should report any subgroup or\nsensitivity analyses and whether or not they were pre-specified(see Items 5 and 16). For analyses comparing subgroups of\nstudies (e.g., separating studies of low- and high-dose aspirin), the\nauthors should report any tests for interactions, as well asestimates and confidence intervals from meta-analyses within\neach subgroup. Similarly, meta-regression results (see Item 16)\nshould not be limited to p-values, but should include effect sizes\nand confidence intervals [150], as the first example reported\nabove does in a table. The amount of data included in each\nadditional analysis should be specified if different from thatconsidered in the main analyses. This information is especiallyrelevant for sensitivity analyses that exclude some studies; for\nexample, those with high risk of bias.\nImportantly, all additional analyses conducted should be\nreported, not just those that were statistically significant. This\ninformation will help avoid selective outcome reporting biaswithin the review as has been demonstrated in reports ofrandomized controlled trials [42,44,121,151,152]. Results from\nexploratory subgroup or sensitivity analyses should be interpret-\ned cautiously, bearing in mind the potential for multiple analysesto mislead.\nDISCUSSION\nItem 24: SUMMARY OF EVIDENCE. Summarize the main\nfindings, including the strength of evidence for each mainoutcome; consider their relevance to key groups (e.g., health\ncare providers, users, and policy makers).Example. ‘‘Overall, the evidence is not sufficiently robust\nto determine the comparative effectiveness of angioplasty\n(with or without stenting) and medical treatment alone. Only2 randomized trials with long-term outcomes and a third\nrandomized trial that allowed substantial crossover of\ntreatment after 3 months directly compared angioplasty\nand medical treatment…the randomized trials did not\nevaluate enough patients or did not follow patients for asufficient duration to allow definitive conclusions to be made\nabout clinical outcomes, such as mortality and cardiovascu-\nlar or kidney failure events.\nSome acceptable evidence from comparison of medical\ntreatment and angioplasty suggested no difference in long-\nterm kidney function but possibly better blood pressure\ncontrol after angioplasty, an effect that may be limited to\npatients with bilateral atherosclerotic renal artery stenosis.\nThe evidence regarding other outcomes is weak. Because the\nreviewed studies did not explicitly address patients withrapid clinical deterioration who may need acute interven-\ntion, our conclusions do not apply to this important subset of\npatients.’’ [143]\nExplanation. Authors should give a brief and balanced\nsummary of the nature and findings of the review. Sometimes,\noutcomes for which little or no data were found should be noteddue to potential relevance for policy decisions and future research.\nApplicability of the review’s findings, to different patients, settings,\nor target audiences, for example, should be mentioned. Although\nthere is no standard way to assess applicability simultaneously to\ndifferent audiences, some systems do exist [153]. Sometimes,authors formally rate or assess the overall body of evidence\naddressed in the review and can present the strength of their\nsummary recommendations tied to their assessments of the quality\nof evidence (e.g., the GRADE system) [10].\nFigure 4. Example Figure: Example of a funnel plot showing evidence of considerable asymmetry. SE, standard error. Adapted from\n[146], with permission.\ndoi:10.1371/journal.pmed.1000100.g004\nPLoS Medicine | www.plosmedicine.org 20 July 2009 | Volume 6 | Issue 7 | e1000100 Authors need to keep in mind that statistical significance of the\neffects does not always suggest clinical or policy relevance.\nLikewise, a non-significant result does not demonstrate that a\ntreatment is ineffective. Authors should ideally clarify trade-offs\nand how the values attached to the main outcomes would lead\ndifferent people to make different decisions. In addition, adroit\nauthors consider factors that are important in translating the\nevidence to different settings and that may modify the estimates of\neffects reported in the review [153]. Patients and health care\nproviders may be primarily interested in which intervention is\nmost likely to provide a benefit with acceptable harms, while policy\nmakers and administrators may value data on organizational\nimpact and resource utilization.\nItem 25: LIMITATIONS. Discuss limitations at study and\noutcome level (e.g., risk of bias), and at review level (e.g.,\nincomplete retrieval of identified research, reporting bias).\nExamples. Outcome level:\n‘‘The meta-analysis reported here combines data across\nstudies in order to estimate treatment effects with more\nprecision than is possible in a single study. The main\nlimitation of this meta-analysis, as with any overview, is thatthe patient population, the antibiotic regimen and theoutcome definitions are not the same across studies.’’ [154]\nStudy and review level:‘‘Our study has several limitations. The quality of the studies\nvaried. Randomization was adequate in all trials; however, 7of the articles did not explicitly state that analysis of data\nadhered to the intention-to-treat principle, which could lead\nto overestimation of treatment effect in these trials, and wecould not assess the quality of 4 of the 5 trials reported as\nabstracts. Analyses did not identify an association between\ncomponents of quality and re-bleeding risk, and the effectsize in favour of combination therapy remained statisticallysignificant when we excluded trials that were reported as\nabstracts.\nPublication bias might account for some of the effect we\nobserved. Smaller trials are, in general, analyzed with less\nmethodological rigor than larger studies, and an asymmet-rical funnel plot suggests that selective reporting may have\nled to an overestimation of effect sizes in small trials.’’ [155]\nExplanation. A discussion of limitations should address the\nvalidity (i.e., risk of bias) and reporting (informativeness) of the\nincluded studies, limitations of the review process, and\ngeneralizability (applicability) of the review. Readers may find it\nhelpful if authors discuss whether studies were threatened by\nserious risks of bias, whether the estimates of the effect of theintervention are too imprecise, or if there were missing data for\nmany participants or important outcomes.\nLimitations of the review process might include limitations of the\nsearch (e.g., restricting to English-language publications), and any\ndifficulties in the study selection, appraisal, and meta-analysis\nprocesses. For example, poor or incomplete reporting of study\ndesigns, patient populations, and interventions may hamper\ninterpretation and synthesis of the included studies [84]. Applica-\nbility of the review may be affected if there are limited data for\ncertain populations or subgroups where the intervention might\nperform differently or few studies assessing the most important\noutcomes of interest; or if there is a substantial amount of data\nrelating to an outdated intervention or comparator or heavy relianceon imputation of missing values for summary estimates (Item 14).Item 26: CONCLUSIONS. Provide a general interpretation\nof the results in the context of other evidence, and implications for\nfuture research.\nExample. Implications for practice:\n‘‘Between 1995 and 1997 five different meta-analyses of the\neffect of antibiotic prophylaxis on infection and mortalitywere published. All confirmed a significant reduction in\ninfections, though the magnitude of the effect varied from\none review to another. The estimated impact on overallmortality was less evident and has generated considerablecontroversy on the cost effectiveness of the treatment. Only\none among the five available reviews, however, suggested\nthat a weak association between respiratory tract infectionsand mortality exists and lack of sufficient statistical power\nmay have accounted for the limited effect on mortality.’’Implications for research :\n‘‘A logical next step for future trials would thus be the\ncomparison of this protocol against a regimen of a systemicantibiotic agent only to see whether the topical component\ncan be dropped. We have already identified six such trials\nbut the total number of patients so far enrolled (n = 1056) istoo small for us to be confident that the two treatments are\nreally equally effective. If the hypothesis is therefore\nconsidered worth testing more and larger randomisedcontrolled trials are warranted. Trials of this kind, however,\nwould not resolve the relevant issue of treatment induced\nresistance. To produce a satisfactory answer to this, studieswith a different design would be necessary. Though adetailed discussion goes beyond the scope of this paper,\nstudies in which the intensive care unit rather than the\nindividual patient is the unit of randomisation and in whichthe occurrence of antibiotic resistance is monitored over a\nlong period of time should be undertaken.’’ [156]\nExplanation. Systematic reviewers sometimes draw\nconclusions that are too optimistic [157] or do not consider\nthe harms equally as carefully as the benefits, although some\nevidence suggests these problems are decreasing [158]. If\nconclusions cannot be drawn because there are too few reliable\nstudies, or too much uncertainty, this should be stated. Such afinding can be as important as finding consistent effects from\nseveral large studies.\nAuthors should try to relate the results of the review to other\nevidence, as this helps readers to better interpret the results. For\nexample, there may be other systematic reviews about the same\ngeneral topic that have used different methods or have addressed\nrelated but slightly different questions [159,160]. Similarly, there\nmay be additional information relevant to decision makers, such as\nthe cost-effectiveness of the intervention (e.g., health technology\nassessment). Authors may discuss the results of their review in the\ncontext of existing evidence regarding other interventions.\nWe advise authors also to make explicit recommendations for\nfuture research. In a sample of 2,535 Cochrane reviews, 82%\nincluded recommendations for research with specific interventions,\n30% suggested the appropriate type of participants, and 52%\nsuggested outcome measures for future research [161]. There is no\ncorresponding assessment about systematic reviews published in\nmedical journals, but we believe that such recommendations are\nmuch less common in those reviews.\nClinical research should not be planned without a thorough\nknowledge of similar, existing research [162]. There is evidence\nPLoS Medicine | www.plosmedicine.org 21 July 2009 | Volume 6 | Issue 7 | e1000100 that this still does not occur as it should and that authors of\nprimary studies do not consider a systematic review when they\ndesign their studies [163]. We believe systematic reviews have\ngreat potential for guiding future clinical research.\nFUNDING\nItem 27: FUNDING. Describe sources of funding or other\nsupport (e.g., supply of data) for the systematic review; role of\nfunders for the systematic review.\nExamples: ‘‘The evidence synthesis upon which this article\nwas based was funded by the Centers for Disease Control\nand Prevention for the Agency for Healthcare Research andQuality and the U.S. Prevention Services Task Force.’’\n[164]‘‘Role of funding source: the funders played no role in study\ndesign, collection, analysis, interpretation of data, writing of\nthe report, or in the decision to submit the paper for\npublication. They accept no responsibility for the contents.’’[165]\nExplanation. Authors of systematic reviews, like those of any\nother research study, should disclose any funding they received to\ncarry out the review, or state if the review was not funded. Lexchinand colleagues [166] observed that outcomes of reports of\nrandomized trials and meta-analyses of clinical trials funded by\nthe pharmaceutical industry are more likely to favor the sponsor’s\nproduct compared to studies with other sources of funding. Similar\nresults have been reported elsewhere [167,168]. Analogous data\nsuggest that similar biases may affect the conclusions of systematic\nreviews [169].\nGiven the potential role of systematic reviews in decision\nmaking, we believe authors should be transparent about the\nfunding and the role of funders, if any. Sometimes the funders will\nprovide services, such as those of a librarian to complete the\nsearches for relevant literature or access to commercial databases\nnot available to the reviewers. Any level of funding or services\nprovided to the systematic review team should be reported.\nAuthors should also report whether the funder had any role in the\nconduct or report of the review. Beyond funding issues, authors\nshould report any real or perceived conflicts of interest related to\ntheir role or the role of the funder in the reporting of the\nsystematic review [170].\nIn a survey of 300 systematic reviews published in November\n2004, funding sources were not reported in 41% of the\nreviews [3]. Only a minority of reviews (2%) reported being\nfunded by for-profit sources, but the true proportion may be\nhigher [171].\nAdditional Considerations for Systematic Reviews\nof Non-Randomized Intervention Studies or forOther Types of Systematic Reviews\nThe PRISMA Statement and this document have focused on\nsystematic reviews of reports of randomized trials. Other study\ndesigns, including non-randomized studies, quasi-experimental\nstudies, and interrupted time series, are included in some\nsystematic reviews that evaluate the effects of health care\ninterventions [172,173]. The methods of these reviews may differ\nto varying degrees from the typical intervention review, forexample regarding the literature search, data abstraction,\nassessment of risk of bias, and analysis methods. As such, their\nreporting demands might also differ from what we have describedhere. A useful principle is for systematic review authors to ensure\nthat their methods are reported with adequate clarity and\ntransparency to enable readers to critically judge the availableevidence and replicate or update the research.\nIn some systematic reviews, the authors will seek the raw data\nfrom the original researchers to calculate the summary statistics.These systematic reviews are called individual patient (or\nparticipant) data reviews [40,41]. Individual patient data meta-\nanalyses may also be conducted with prospective accumulation ofdata rather than retrospective accumulation of existing data. Here\ntoo, extra information about the methods will need to be reported.\nOther types of systematic reviews exist. Realist reviews aim to\ndetermine how complex programs work in specific contexts and\nsettings [174]. Meta-narrative reviews aim to explain complexbodies of evidence through mapping and comparing different\nover-arching storylines [175]. Network meta-analyses, also known\nas multiple treatments meta-analyses, can be used to analyze datafrom comparisons of many different treatments [176,177]. They\nuse both direct and indirect comparisons, and can be used to\ncompare interventions that have not been directly compared.\nWe believe that the issues we have highlighted in this paper are\nrelevant to ensure transparency and understanding of the\nprocesses adopted and the limitations of the information presentedin systematic reviews of different types. We hope that PRISMA\ncan be the basis for more detailed guidance on systematic reviews\nof other types of research, including diagnostic accuracy andepidemiological studies.\nDiscussion\nWe developed the PRISMA Statement using an approach for\ndeveloping reporting guidelines that has evolved over several years[178]. The overall aim of PRISMA is to help ensure the clarity\nand transparency of reporting of systematic reviews, and recent\ndata indicate that this reporting guidance is much needed [3].PRISMA is not intended to be a quality assessment tool and it\nshould not be used as such.\nThis PRISMA Explanation and Elaboration document was\ndeveloped to facilitate the understanding, uptake, and dissemina-\ntion of the PRISMA Statement and hopefully provide a\npedagogical framework for those interested in conducting andreporting systematic reviews. It follows a format similar to that\nused in other explanatory documents [17,18,19]. Following the\nrecommendations in the PRISMA checklist may increase the wordcount of a systematic review report. We believe, however, that the\nbenefit of readers being able to critically appraise a clear,\ncomplete, and transparent systematic review report outweighsthe possible slight increase in the length of the report.\nWhile the aims of PRISMA are to reduce the risk of flawed\nreporting of systematic reviews and improve the clarity andtransparency in how reviews are conducted, we have little data to\nstate more definitively whether this ‘‘intervention’’ will achieve its\nintended goal. A previous effort to evaluate QUOROM was notsuccessfully completed [178]. Publication of the QUOROM\nStatement was delayed for two years while a research team\nattempted to evaluate its effectiveness by conducting a randomizedcontrolled trial with the participation of eight major medical\njournals. Unfortunately that trial was not completed due to accrual\nproblems (David Moher, personal communication). Other evalu-ation methods might be easier to conduct. At least one survey of\n139 published systematic reviews in the critical care literature\n[179] suggests that their quality improved after the publication ofQUOROM.\nPLoS Medicine | www.plosmedicine.org 22 July 2009 | Volume 6 | Issue 7 | e1000100 If the PRISMA Statement is endorsed by and adhered to in\njournals, as other reporting guidelines have been\n[17,18,19,180], there should be evidence of improved reporting\nof systematic reviews. For example, there have been several\nevaluations of whether the use of CONSORT improves reports\nof randomized controlled trials. A systematic review of thesestudies [181] indicates that use of CONSORT is associated\nwith improved reporting of certain items, such as allocation\nconcealment. We aim to evaluate the benefits (i.e., improved\nreporting) and possible adverse effects (e.g., increased word\nlength) of PRISMA and we encourage others to consider doing\nlikewise.\nEven though we did not carry out a systematic literature\nsearch to produce our checklist, and this is indeed a limitation\nof our effort, PRISMA was nevertheless developed using an\nevidence-based approach, whenever possible. Checklist items\nwere included if there was evidence that not reporting the item\nwas associated with increased risk of bias, or where it was\nclear that information was necessary to appraise the reliability\nof a review. To keep PRISMA up-to-date and as evidence-\nbased as possible requires regular vigilance of the literature,\nwhich is growing rapidly. Currently the Cochrane Methodol-\nogy Register has more than 11,000 records pertaining to theconduct and reporting of systematic reviews and other\nevaluations of health and social care. For some checklist items,\nsuch as reporting the abstract (Item 2), we have used evidence\nfrom elsewhere in the belief that the issue applies equally well\nto reporting of systematic reviews. Yet for other items,\nevidence does not exist; for example, whether a training\nexercise improves the accuracy and reliability of data\nextraction. We hope PRISMA will act as a catalyst to help\ngenerate further evidence that can be considered when further\nrevising the checklist in the future.\nMore than ten years have passed between the development of\nthe QUOROM Statement and its update, the PRISMA\nStatement. We aim to update PRISMA more frequently. We\nhope that the implementation of PRISMA will be better than it\nhas been for QUOROM. There are at least two reasons to be\noptimistic. First, systematic reviews are increasingly used by health\ncare providers to inform ‘‘best practice’’ patient care. Policyanalysts and managers are using systematic reviews to inform\nhealth care decision making, and to better target future research.\nSecond, we anticipate benefits from the development of the\nEQUATOR Network, described below.\nDeveloping any reporting guideline requires considerable effort,\nexperience, and expertise. While reporting guidelines have been\nsuccessful for some individual efforts [17,18,19], there are likely\nothers who want to develop reporting guidelines who possess little\ntime, experience, or knowledge as to how to do so appropriately.\nThe EQUATOR Network (Enhancing the QUAlity and Trans-\nparency Of health Research) aims to help such individuals and\ngroups by serving as a global resource for anybody interested in\ndeveloping reporting guidelines, regardless of the focus\n[7,180,182]. The overall goal of EQUATOR is to improve the\nquality of reporting of all health science research through the\ndevelopment and translation of reporting guidelines. Beyond this\naim, the network plans to develop a large Web presence bydeveloping and maintaining a resource center of reporting tools,\nand other information for reporting research (http://www.\nequator-network.org/).\nWe encourage health care journals and editorial groups, such as\nthe World Association of Medical Editors and the International\nCommittee of Medical Journal Editors, to endorse PRISMA in\nmuch the same way as they have endorsed other reportingguidelines, such as CONSORT. We also encourage editors of\nhealth care journals to support PRISMA by updating their‘‘Instructions to Authors’’ and including the PRISMA Webaddress, and by raising awareness through specific editorialactions.\nSupporting Information\nFigure S1 Flow of information through the different phases of a\nsystematic review (downloadable template document for research-ers to re-use).\nFound at: doi:10.1371/journal.pmed.1000100.s001 (0.08 MB\nDOC)\nText S1 Checklist of items to include when reporting a\nsystematic review or meta-analysis (downloadable templatedocument for researchers to re-use).\nFound at: doi:10.1371/journal.pmed.1000100.s002 (0.04 MB\nDOC)\nAcknowledgments\nThe following people contributed to this paper:\nDoug Altman, DSc, Centre for Statistics in Medicine (Oxford, UK);\nGerd Antes, PhD, University Hospital Freiburg (Freiburg, Germany);David Atkins, MD, MPH, Health Services Research and DevelopmentService, Veterans Health Administration (Washington, D. C., US);Virginia Barbour, MRCP, DPhil, PLoS Medicine (Cambridge, UK); Nick\nBarrowman, PhD, Children’s Hospital of Eastern Ontario (Ottawa,Canada); Jesse A. Berlin, ScD, Johnson & Johnson PharmaceuticalResearch and Development (Titusville, New Jersey, US); Jocalyn Clark,PhD, PLoS Medicine (at the time of writing, BMJ, London, UK); Mike\nClarke, PhD, UK Cochrane Centre (Oxford, UK) and School ofNursing and Midwifery, Trinity College (Dublin, Ireland); DeborahCook, MD, Departments of Medicine, Clinical Epidemiology andBiostatistics, McMaster University (Hamilton, Canada); RobertoD’Amico, PhD, Universita ` di Modena e Reggio Emilia (Modena,\nItaly) and Centro Cochrane Italiano, Istituto Ricerche FarmacologicheMario Negri (Milan, Italy); Jonathan J. Deeks, PhD, University ofBirmingham (Birmingham, UK); P. J. Devereaux, MD, PhD,Departments of Medicine, Clinical Epidemiology and Biostatistics,McMaster University (Hamilton, Canada); Kay Dickersin, PhD, JohnsHopkins Bloomberg School of Public Health (Baltimore, Maryland,US); Matthias Egger, MD, Department of Social and PreventiveMedicine, University of Bern (Bern, Switzerland); Edzard Ernst, MD,PhD, FRCP, FRCP(Edin), Peninsula Medical School (Exeter, UK);Peter C. Gøtzsche, MD, MSc, The Nordic Cochrane Centre(Copenhagen, Denmark); Jeremy Grimshaw, MBChB, PhD, FRCFP,Ottawa Hospital Research Institute (Ottawa, Canada); Gordon Guyatt,MD, Departments of Medicine, Clinical Epidemiology and Biostatistics,McMaster University (Hamilton, Canada); Julian Higgins, PhD, MRCBiostatistics Unit (Cambridge, UK); John P. A. Ioannidis, MD,University of Ioannina Campus (Ioannina, Greece); Jos Kleijnen,MD, PhD, Kleijnen Systematic Reviews Ltd (York, UK) and Schoolfor Public Health and Primary Care (CAPHRI), University ofMaastricht (Maastricht, Netherlands); Tom Lang, MA, Tom LangCommunications and Training (Davis, California, US); AlessandroLiberati, MD, Universita ` di Modena e Reggio Emilia (Modena, Italy)\nand Centro Cochrane Italiano, Istituto Ricerche Farmacologiche Mario\nNegri (Milan, Italy); Nicola Magrini, MD, NHS Centre for the\nEvaluation of the Effectiveness of Health Care – CeVEAS (Modena,Italy); David McNamee, PhD, The Lancet (London, UK); LorenzoMoja, MD, MSc, Centro Cochrane Italiano, Istituto Ricerche Farm-acologiche Mario Negri (Milan, Italy); David Moher, PhD, OttawaMethods Centre, Ottawa Hospital Research Institute (Ottawa, Canada);Cynthia Mulrow, MD, MSc, Annals of Internal Medicine (Philadelphia,Pennsylvania, US); Maryann Napoli, Center for Medical Consumers(New York, New York, US); Andy Oxman, MD, Norwegian HealthServices Research Centre (Oslo, Norway); Ba’ Pham, MMath, TorontoHealth Economics and Technology Assessment Collaborative (Toronto,Canada) (at the time of the first meeting of the group, GlaxoSmithK-\nPLoS Medicine | www.plosmedicine.org 23 July 2009 | Volume 6 | Issue 7 | e1000100 line Canada, Mississauga, Canada); Drummond Rennie, MD, FRCP,\nFACP, University of California San Francisco (San Francisco,\nCalifornia, US); Margaret Sampson, MLIS, Children’s Hospital of\nEastern Ontario (Ottawa, Canada); Kenneth F. Schulz, PhD, MBA,\nFamily Health International (Durham, North Carolina, US); Paul G.\nShekelle, MD, PhD, Southern California Evidence Based PracticeCenter (Santa Monica, California, US); Jennifer Tetzlaff, BSc, Ottawa\nMethods Centre, Ottawa Hospital Research Institute (Ottawa, Canada);\nDavid Tovey, FRCGP, The Cochrane Library, Cochrane Collabora-\ntion (Oxford, UK) (at the time of the first meeting of the group, BMJ ,\nLondon, UK); Peter Tugwell, MD, MSc, FRCPC, Institute of\nPopulation Health, University of Ottawa (Ottawa, Canada).Dr. Lorenzo Moja helped with the preparation and the several updates\nof the manuscript and assisted with the preparation of the reference list.\nAlessandro Liberati is the guarantor of the manuscript.\nAuthor Contributions\nICMJE criteria for authorship read and met: AL DGA JT CM PCG JPAI\nMC PJD JK DM. Wrote the first draft of the paper: AL DGA JT JPAI\nDM. Contributed to the writing of the paper: AL DGA JT CM PCG JPAI\nMC PJD JK DM. Concept and design of the Explanation and Elaboration\nstatement: AL DGA JT DM. Agree with the recommendations: AL DGA\nJT CM PCG JPAI MC PJD JK DM.\nReferences\n1. Canadian Institutes of Health Research (2006) Randomized controlled trials\nregistration/application checklist (12/2006). Available: http://www.cihr-irsc.gc.ca/e/documents/rct_reg_e.pdf. Accessed 26 May 2009.\n2. Young C, Horton R (2005) Putting clinical trials into context. Lancet 366:\n107–108.\n3 .M o h e rD ,T e t z l a f fJ ,T r i c c oA C ,S a m p s o nM ,A l t m a nD G( 2 0 0 7 )\nEpidemiology and reporting characteristics of systematic reviews. PLoS Med\n4: e78. doi:10.1371/journal.pmed.0040078.\n4. Dixon E, Hameed M, Sutherland F, Cook DJ, Doig C (2005) Evaluating meta-\nanalyses in the general surgical literature: A critical appraisal. Ann Surg 241:\n450–459.\n5. Hemels ME, Vicente C, Sadri H, Masson MJ, Einarson TR (2004) Quality\nassessment of meta-analyses of RCTs of pharmacotherapy in major depressive\ndisorder. Curr Med Res Opin 20: 477–484.\n6. Jin W, Yu R, Li W, Youping L, Ya L, et al. (2008) The reporting quality of\nmeta-analyses improves: A random sampling study. J Clin Epidemiol 61:\n770–775.\n7. Moher D, Simera I, Schulz KF, Hoey J, Altman DG (2008) Helping editors,\npeer reviewers and authors improve the clarity, completeness and transparencyof reporting health research. BMC Med 6: 13.\n8. Moher D, Cook DJ, Eastwood S, Olkin I, Rennie D, et al. (1999) Improving\nthe quality of reports of meta-analyses of randomised controlled trials: TheQUOROM statement. Quality of Reporting of Meta-analyses. Lancet 354:\n1896–1900.\n9. Green S, Higgins JPT, Alderson P, Clarke M, Mulrow CD, et al. (2008)\nChapter 1: What is a systematic review? In: Higgins JPT, Green S, editors.\nCochrane handbook for systematic reviews of interventions version 5.0.0\n[updated February 2008]. The Cochrane Collaboration. Available: http://\nwww.cochrane-handbook.org/. Accessed 26 May 2009.\n10. Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck-Ytter Y, et al. (2008)\nGRADE: An emerging consensus on rating quality of evidence and strength ofrecommendations. BMJ 336: 924–926.\n11. Higgins JPT, Altman DG (2008) Chapter 8: Assessing risk of bias in included\nstudies. In: Higgins JPT, Green S, eds. Cochrane handbook for systematicreviews of interventions version 5.0.0 [updated February 2008]. The Cochrane\nCollaboration, Available: http://www.cochrane-handbook.org/. Accessed 26\nMay 2009.\n12. Moher D, Liberati A, Tetzlaff J, Altman DG, The PRISMA Group (2008)\nPreferred reporting items for systematic reviews and meta-analyses: The\nPRISMA Statement. PLoS Med 6: e1000097. 10.1371/journal.pmed.1000097.\n13. Atkins D, Fink K, Slutsky J (2005) Better information for better health care:\nThe Evidence-based Practice Center program and the Agency for Healthcare\nResearch and Quality. Ann Intern Med 142: 1035–1041.\n14. Helfand M, Balshem H (2009) Principles for developing guidance: AHRQ and\nthe effective health-care program. J Clin Epidemiol, In press.\n15. Higgins JPT, Green S (2008) Cochrane handbook for systematic reviews of\ninterventions version 5.0.0 [upda ted February 2008]. The Cochrane\nCollaboration. Available: http://www.cochrane-handbook.org/. Accessed 26\nMay 2009.\n16. Centre for Reviews and Dissemination (2009) Systematic reviews: CRD’s\nguidance for undertaking reviews in health care. York: University of York,\nAvailable: http://www.york.ac.uk/inst/crd/systematic_reviews_book.htm. Ac-\ncessed 26 May 2009.\n17. Altman DG, Schulz KF, Moher D, Egger M, Davidoff F, et al. (2001) The\nrevised CONSORT statement for reporting randomized trials: Explanation\nand elaboration. Ann Intern Med 134: 663–694.\n18. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, et al. (2003)\nThe STARD statement for reporting studies of diagnostic accuracy:\nExplanation and elaboration. Clin Chem 49: 7–18.\n19. Vandenbroucke JP, von Elm E, Altman DG, Gøtzsche PC, Mulrow CD, et al.\n(2007) Strengthening the Reporting of Observational Studies in Epidemiology\n(STROBE): Explanation and elaboration. PLoS Med 4: e297. doi:10.1371/\njournal.pmed.0040297.\n20. Barker A, Maratos EC, Edmonds L, Lim E (2007) Recurrence rates of video-\nassisted thoracoscopic versus open surgery in the prevention of recurrent\npneumothoraces: A systematic review of randomised and non-randomised\ntrials. Lancet 370: 329–335.21. Bjelakovic G, Nikolova D, Gluud LL, Simonetti RG, Gluud C (2007)\nMortality in randomized trials of antioxidant supplements for primary andsecondary prevention: Systematic review and meta-analysis. JAMA 297:842–857.\n22. Montori VM, Wilczynski NL, Morgan D, Haynes RB (2005) Optimal search\nstrategies for retrieving systematic reviews from Medline: Analytical survey.BMJ 330: 68.\n23. Bischoff-Ferrari HA, Willett WC, Wong JB, Giovannucci E, Dietrich T, et al.\n(2005) Fracture prevention with vitamin D supplementation: A meta-analysis ofrandomized controlled trials. JAMA 293: 2257–2264.\n24. Hopewell S, Clarke M, Moher D, Wager E, Middleton P, et al. (2008)\nCONSORT for reporting randomised trials in journal and conferenceabstracts. Lancet 371: 281–283.\n25. Hopewell S, Clarke M, Moher D, Wager E, Middleton P, et al. (2008)\nCONSORT for reporting randomized controlled trials in journal andconference abstracts: Explanation and elaboration. PLoS Med 5: e20.doi:10.1371/journal.pmed.0050020.\n26. Haynes RB, Mulrow CD, Huth EJ, Altman DG, Gardner MJ (1990) More\ninformative abstracts revisited. Ann Intern Med 113: 69–76.\n27. Mulrow CD, Thacker SB, Pugh JA (1988) A proposal for more informative\nabstracts of review articles. Ann Intern Med 108: 613–615.\n28. Froom P, Froom J (1993) Deficiencies in structured medical abstracts. J Clin\nEpidemiol 46: 591–594.\n29. Hartley J (2000) Clarifying the abstracts of systematic literature reviews. Bull\nMed Libr Assoc 88: 332–337.\n30. Hartley J, Sydes M, Blurton A (1996) Obtaining information accurately and\nquickly: Are structured abstract more efficient? J Infor Sci 22: 349–356.\n31. Pocock SJ, Hughes MD, Lee RJ (1987) Statistical problems in the reporting of\nclinical trials. A survey of three medical journals. N Engl J Med 317: 426–432.\n32. Taddio A, Pain T, Fassos FF, Boon H, Ilersich AL, et al. (1994) Quality of\nnonstructured and structured abstracts of original research articles in the BritishMedical Journal, the Canadian Medical Association Journal and the Journal ofthe American Medical Association. CMAJ 150: 1611–1615.\n33. Harris KC, Kuramoto LK, Schulzer M, Retallack JE (2009) Effect of school-\nbased physical activity interventions on body mass index in children: A meta-analysis. CMAJ 180: 719–726.\n34. James MT, Conley J, Tonelli M, Manns BJ, MacRae J, et al. (2008) Meta-\nanalysis: Antibiotics for prophylaxis against hemodialysis catheter-related\ninfections. Ann Intern Med 148: 596–605.\n35. Counsell C (1997) Formulating questions and locating primary studies for\ninclusion in systematic reviews. Ann Intern Med 127: 380–387.\n36. Gotzsche PC (2000) Why we need a broad perspective on meta-analysis. It may\nbe crucially important for patients. BMJ 321: 585–586.\n37. Grossman P, Niemann L, Schmidt S, Walach H (2004) Mindfulness-based\nstress reduction and health benefits. A meta-analysis. J Psychosom Res 57:35–43.\n38. Brunton G, Green S, Higgins JPT, Kjeldstrøm M, Jackson N, et al. (2008)\nChapter 2: Preparing a Cochrane review. In: Higgins JPT, Green S, eds.Cochrane handbook for systematic reviews of interventions version 5.0.0[updated February 2008]. The Cochrane Collaboration, Available: http://www.cochrane-handbook.org/. Accessed 26 May 2009.\n39. Sutton AJ, Abrams KR, Jones DR, Sheldon TA, Song F (1998) Systematic\nreviews of trials and other studies. Health Technol Assess 2: 1–276.\n40. Ioannidis JP, Rosenberg PS, Goedert JJ, O’Brien TR (2002) Commentary:\nMeta-analysis of individual participants’ data in genetic epidemiology.Am J Epidemiol 156: 204–210.\n41. Stewart LA, Clarke MJ (1995) Practical methodology of meta-analyses\n(overviews) using updated individual patient data. Cochrane Working Group.Stat Med 14: 2057–2079.\n42. Chan AW, Hrobjartsson A, Haahr MT, Gøtzsche PC, Altman DG (2004)\nEmpirical evidence for selective reporting of outcomes in randomized trials:Comparison of protocols to published articles. JAMA 291: 2457–2465.\n43. Dwan K, Altman DG, Arnaiz JA, Bloom J, Chan AW, et al. (2008) Systematic\nreview of the empirical evidence of study publication bias and outcomereporting bias. PLoS ONE 3: e3081. doi:10.1371/journal.pone.0003081.\nPLoS Medicine | www.plosmedicine.org 24 July 2009 | Volume 6 | Issue 7 | e1000100 44. Silagy CA, Middleton P, Hopewell S (2002) Publishing protocols of systematic\nreviews: Comparing what was done to what was planned. JAMA 287:\n2831–2834.\n45. Centre for Reviews and Dissemination (2009) Research projects. York:\nUniversity of York, Available: http://www.crd.york.ac.uk/crdweb. Accessed\n26 May 2009.\n46. The Joanna Briggs Institute (2009) Protocols & work in progress. Available:\nhttp://www.joannabriggs.edu.au/pubs/sy stematic_reviews_prot.php. Ac-\ncessed 26 May 2009.\n47. Bagshaw SM, McAlister FA, Manns BJ, Ghali WA (2006) Acetylcysteine in the\nprevention of contrast-induced nephropathy: A case study of the pitfalls in the\nevolution of evidence. Arch Intern Med 166: 161–166.\n48. Biondi-Zoccai GG, Lotrionte M, Abbate A, Testa L, Remigi E, et al. (2006)\nCompliance with QUOROM and quality of reporting of overlapping meta-\nanalyses on the role of acetylcysteine in the prevention of contrast associated\nnephropathy: Case study. BMJ 332: 202–209.\n49. Sacks HS, Berrier J, Reitman D, Ancona-Berk VA, Chalmers TC (1987) Meta-\nanalyses of randomized controlled trials. N Engl J Med 316: 450–455.\n50. Schroth RJ, Hitchon CA, Uhanova J, Noreddin A, Taback SP, et al. (2004)\nHepatitis B vaccination for patients with chronic renal failure. Cochrane\nDatabase Syst Rev Issue 3: CD003775. doi:10.1002/14651858.CD003775.\npub2.\n51. Egger M, Zellweger-Zahner T, Schneider M, Junker C, Lengeler C, et al.\n(1997) Language bias in randomised controlled trials published in English and\nGerman. Lancet 350: 326–329.\n52. Gregoire G, Derderian F, Le Lorier J (1995) Selecting the language of the\npublications included in a meta-analysis: Is there a Tower of Babel bias? J Clin\nEpidemiol 48: 159–163.\n53. Ju ¨ni P, Holenstein F, Sterne J, Bartlett C, Egger M (2002) Direction and impact\nof language bias in meta-analyses of controlled trials: Empirical study.\nInt J Epidemiol 31: 115–123.\n54. Moher D, Pham B, Klassen TP, Schulz KF, Berlin JA, et al. (2000) What\ncontributions do languages other than English make on the results of meta-\nanalyses? J Clin Epidemiol 53: 964–972.\n55. Pan Z, Trikalinos TA, Kavvoura FK, Lau J, Ioannidis JP (2005) Local\nliterature bias in genetic epidemiology: An empirical evaluation of the Chinese\nliterature. PLoS Med 2: e334. doi:10.1371/journal.pmed.0020334.\n56. Hopewell S, McDonald S, Clarke M, Egger M (2007) Grey literature in meta-\nanalyses of randomized trials of health care interventions. Cochrane Database\nSyst Rev Issue 2: MR000010. doi:10.1002/14651858.MR000010.pub3.\n57. Melander H, Ahlqvist-Rastad J, Meijer G, Beermann B (2003) Evidence\nb(i)ased medicine—Selective reporting from studies sponsored by pharmaceu-\ntical industry: Review of studies in new drug applications. BMJ 326:\n1171–1173.\n58. Sutton AJ, Duval SJ, Tweedie RL, Abrams KR, Jones DR (2000) Empirical\nassessment of effect of publication bias on meta-analyses. BMJ 320: 1574–1577.\n59. Gotzsche PC (2006) Believability of relative risks and odds ratios in abstracts:\nCross sectional study. BMJ 333: 231–234.\n60. Bhandari M, Devereaux PJ, Guyatt GH, Cook DJ, Swiontkowski MF, et al.\n(2002) An observational study of orthopaedic abstracts and subsequent full-text\npublications. J Bone Joint Surg Am 84-A: 615–621.\n61. Rosmarakis ES, Soteriades ES, Vergidis PI, Kasiakou SK, Falagas ME (2005)\nFrom conference abstract to full paper: Differences between data presented in\nconferences and journals. Faseb J 19: 673–680.\n62. Toma M, McAlister FA, Bialy L, Adams D, Vandermeer B, et al. (2006)\nTransition from meeting abstract to full-length journal article for randomized\ncontrolled trials. JAMA 295: 1281–1287.\n63. Saunders Y, Ross JR, Broadley KE, Edmonds PM, Patel S (2004) Systematic\nreview of bisphosphonates for hypercalcaemia of malignancy. Palliat Med 18:\n418–431.\n64. Shojania KG, Sampson M, Ansari MT, Ji J, Doucette S, et al. (2007) How\nquickly do systematic reviews go out of date? A survival analysis. Ann Intern\nMed 147: 224–233.\n65. Bergerhoff K, Ebrahim S, Paletta G (2004) Do we need to consider ‘in process\ncitations’ for search strategies? 12th Cochrane Colloquium; 2–6 October 2004;\nOttawa, Ontario, Canada. Available: http://www.cochrane.org/colloquia/\nabstracts/ottawa/P-039.htm. Accessed 26 May 2009.\n66. Zhang L, Sampson M, McGowan J (2006) Reporting of the role of expert\nsearcher in Cochrane reviews. Evid Based Libr Info Pract 1: 3–16.\n67. Turner EH, Matthews AM, Linardatos E, Tell RA, Rosenthal R (2008)\nSelective publication of antidepressant trials and its influence on apparent\nefficacy. N Engl J Med 358: 252–260.\n68. Alejandria MM, Lansang MA, Dans LF, Mantaring JB (2002) Intravenous\nimmunoglobulin for treating sepsis and septic shock. Cochrane Database Syst\nRevIssue 1: CD001090. doi:10.1002/14651858.CD001090.\n69. Golder S, McIntosh HM, Duffy S, Glanville J (2006) Developing efficient\nsearch strategies to identify reports of adverse effects in MEDLINE andEMBASE. Health Info Libr J 23: 3–12.\n70. Sampson M, McGowan J, Cogo E, Grimshaw J, Moher D, et al. (2009) An\nevidence-based practice guideline for the peer review of electronic searchstrategies. J Clin Epidemiol, E-pub 2009 February 18.\n71. Flores-Mir C, Major MP, Major PW (2006) Search and selection methodology\nof systematic reviews in orthodontics (2000–2004). Am J Orthod DentofacialOrthop 130: 214–217.72. Major MP, Major PW, Flores-Mir C (2006) An evaluation of search and\nselection methods used in dental systematic reviews published in English. J AmDent Assoc 137: 1252–1257.\n73. Major MP, Major PW, Flores-Mir C (2007) Benchmarking of reported search\nand selection methods of systematic reviews by dental speciality. Evid BasedDent 8: 66–70.\n74. Shah MR, Hasselblad V, Stevenson LW, Binanay C, O’Connor CM, et al.\n(2005) Impact of the pulmonary artery catheter in critically ill patients: Meta-\nanalysis of randomized clinical trials. JAMA 294: 1664–1670.\n75. Edwards P, Clarke M, DiGuiseppi C, Pratap S, Roberts I, et al. (2002)\nIdentification of randomized controlled trials in systematic reviews: Accuracyand reliability of screening records. Stat Med 21: 1635–1640.\n76. Cooper HM, Ribble RG (1989) Influences on the outcome of literature\nsearches for integrative research reviews. Knowledge 10: 179–201.\n77. Mistiaen P, Poot E (2006) Telephone follow-up, initiated by a hospital-based\nhealth professional, for postdischarge problems in patients discharged from\nhospital to home. Cochrane Database Syst Rev Issue 4: CD004510.\ndoi:10.1002/14651858.CD004510.pub3.\n78. Jones AP, Remmington T, Williamson PR, Ashby D, Smyth RL (2005) High\nprevalence but low impact of data extraction and reporting errors were foundin Cochrane systematic reviews. J Clin Epidemiol 58: 741–742.\n79. Clarke M, Hopewell S, Juszczak E, Eisinga A, Kjeldstrom M (2006)\nCompression stockings for preventing deep vein thrombosis in airline\npassengers. Cochrane Database Syst Rev Issue 2: CD004002. doi:10.1002/\n14651858.CD004002.pub2.\n80. Tramer MR, Reynolds DJ, Moore RA, McQuay HJ (1997) Impact of covert\nduplicate publication on meta-analysis: A case study. BMJ 315: 635–640.\n81. von Elm E, Poglia G, Walder B, Tramer MR (2004) Different patterns of\nduplicate publication: An analysis of articles used in systematic reviews. JAMA291: 974–980.\n82. Gotzsche PC (1989) Multiple publication of reports of drug trials. Eur J Clin\nPharmacol 36: 429–432.\n83. Allen C, Hopewell S, Prentice A (2005) Non-steroidal anti-inflammatory drugs\nfor pain in women with endometriosis. Cochrane Database Syst Rev Issue 4:\nCD004753. doi:10.1002/14651858.CD004753.pub2.\n84. Glasziou P, Meats E, Heneghan C, Shepperd S (2008) What is missing from\ndescriptions of treatment in trials and reviews? BMJ 336: 1472–1474.\n85. Tracz MJ, Sideras K, Bolona ER, Haddad RM, Kennedy CC, et al. (2006)\nTestosterone use in men and its effects on bone health. A systematic review andmeta-analysis of randomized placebo-controlled trials. J Clin Endocrinol Metab91: 2011–2016.\n86. Bucher HC, Hengstler P, Schindler C, Guyatt GH (2000) Percutaneous\ntransluminal coronary angioplasty versus medical treatment for non-acutecoronary heart disease: Meta-analysis of randomised controlled trials. BMJ 321:73–77.\n87. Gluud LL (2006) Bias in clinical intervention research. Am J Epidemiol 163:\n493–501.\n88. Pildal J, Hro ´bjartsson A, Jorgensen KJ, Hilden J, Altman DG, et al. (2007)\nImpact of allocation concealment on conclusions drawn from meta-analyses ofrandomized trials. Int J Epidemiol 36: 847–857.\n89. Moja LP, Telaro E, D’Amico R, Moschetti I, Coe L, et al. (2005) Assessment of\nmethodological quality of primary studies by systematic reviews: Results of themetaquality cross sectional study. BMJ 330: 1053.\n90. Moher D, Jadad AR, Tugwell P (1996) Assessing the quality of randomized\ncontrolled trials. Current issues and future directions. Int J Technol Assess\nHealth Care 12: 195–208.\n91. Sanderson S, Tatt ID, Higgins JP (2007) Tools for assessing quality and\nsusceptibility to bias in observational studies in epidemiology: A systematicreview and annotated bibliography. Int J Epidemiol 36: 666–676.\n92. Greenland S (1994) Invited commentary: A critical look at some popular meta-\nanalytic methods. Am J Epidemiol 140: 290–296.\n93. Ju ¨ni P, Altman DG, Egger M (2001) Systematic reviews in health care:\nAssessing the quality of controlled clinical trials. BMJ 323: 42–46.\n94. Kunz R, Oxman AD (1998) The unpredictability paradox: Review of empirical\ncomparisons of randomised and non-randomised clinical trials. BMJ 317:\n1185–1190.\n95. Balk EM, Bonis PA, Moskowitz H, Schmid CH, Ioannidis JP, et al. (2002)\nCorrelation of quality measures with estimates of treatment effect in meta-analyses of randomized controlled trials. JAMA 287: 2973–2982.\n96. Devereaux PJ, Beattie WS, Choi PT, Badner NH, Guyatt GH, et al. (2005)\nHow strong is the evidence for the use of perioperative beta blockers in non-cardiac surgery? Systematic review and meta-analysis of randomised controlledtrials. BMJ 331: 313–321.\n97. Devereaux PJ, Bhandari M, Montori VM, Manns BJ, Ghali WA, et al. (2002)\nDouble blind, you are the weakest link—Good-bye! ACP J Club 136: A11.\n98. van Nieuwenhoven CA, Buskens E, van Tiel FH, Bonten MJ (2001)\nRelationship between methodological trial quality and the effects of selectivedigestive decontamination on pneumonia and mortality in critically ill patients.JAMA 286: 335–340.\n99. Guyatt GH, Cook D, Devereaux PJ, Meade M, Straus S (2002) Therapy.\nUsers’ guides to the medical literature. AMA Press. pp 55–79.\n100. Sackett DL, Gent M (1979) Controversy in counting and attributing events in\nclinical trials. N Engl J Med 301: 1410–1412.\nPLoS Medicine | www.plosmedicine.org 25 July 2009 | Volume 6 | Issue 7 | e1000100 101. Montori VM, Devereaux PJ, Adhikari NK, Burns KE, Eggert CH, et al. (2005)\nRandomized trials stopped early for benefit: A systematic review. JAMA 294:\n2203–2209.\n102. Guyatt GH, Devereaux PJ (2002) Therapy and validity: The principle of\nintention-to-treat. In: Guyatt GH, Rennie DR, eds. Users’ guides to the\nmedical literature AMA Press. pp 267–273.\n103. Berlin JA (1997) Does blinding of readers affect the results of meta-analyses?\nUniversity of Pennsylvania Meta-analysis Blinding Study Group. Lancet 350:185–186.\n104. Jadad AR, Moore RA, Carroll D, Jenkinson C, Reynolds DJ, et al. (1996)\nAssessing the quality of reports of randomized clinical trials: Is blinding\nnecessary? Control Clin Trials 17: 1–12.\n105. Pittas AG, Siegel RD, Lau J (2004) Insulin therapy for critically ill hospitalized\npatients: A meta-analysis of randomized controlled trials. Arch Intern Med 164:\n2005–2011.\n106. Lakhdar R, Al-Mallah MH, Lanfear DE (2008) Safety and tolerability of\nangiotensin-converting enzyme inhibitor versus the combination of angioten-\nsin-converting enzyme inhibitor and angiotensin receptor blocker in patientswith left ventricular dysfunction: A systematic review and meta-analysis of\nrandomized controlled trials. J Card Fail 14: 181–188.\n107. Bobat R, Coovadia H, Stephen C, Naidoo KL, McKerrow N, et al. (2005)\nSafety and efficacy of zinc supplementation for children with HIV-1 infection\nin South Africa: A randomised double-blind placebo-controlled trial. Lancet366: 1862–1867.\n108. Deeks JJ, Altman DG (2001) Effect measures for meta-analysis of trials with\nbinary outcomes. In: Egger M, Smith GD, Altman DG, eds. Systematic reviews\nin healthcare: Meta-analysis in context. 2nd edition. London: BMJ Publishing\nGroup.\n109. Deeks JJ (2002) Issues in the selection of a summary statistic for meta-analysis of\nclinical trials with binary outcomes. Stat Med 21: 1575–1600.\n110. Engels EA, Schmid CH, Terrin N, Olkin I, Lau J (2000) Heterogeneity and\nstatistical significance in meta-analysis: An empirical study of 125 meta-\nanalyses. Stat Med 19: 1707–1728.\n111. Tierney JF, Stewart LA, Ghersi D, Burdett S, Sydes MR (2007) Practical\nmethods for incorporating summary time-to-event data into meta-analysis.\nTrials 8: 16.\n112. Michiels S, Piedbois P, Burdett S, Syz N, Stewart L, et al. (2005) Meta-analysis\nwhen only the median survival times are known: A comparison with individual\npatient data results. Int J Technol Assess Health Care 21: 119–125.\n113. Briel M, Studer M, Glass TR, Bucher HC (2004) Effects of statins on stroke\nprevention in patients with and without coronary heart disease: A meta-analysisof randomized controlled trials. Am J Med 117: 596–606.\n114. Jones M, Schenkel B, Just J, Fallowfield L (2004) Epoetin alfa improves quality\nof life in patients with cancer: Results of metaanalysis. Cancer 101: 1720–1732.\n115. Elbourne DR, Altman DG, Higgins JP, Curtin F, Worthington HV, et al.\n(2002) Meta-analyses involving cross-over trials: Methodological issues.\nInt J Epidemiol 31: 140–149.\n116. Follmann D, Elliott P, Suh I, Cutler J (1992) Variance imputation for overviews\nof clinical trials with continuous response. J Clin Epidemiol 45: 769–773.\n117. Wiebe N, Vandermeer B, Platt RW, Klassen TP, Moher D, et al. (2006) A\nsystematic review identifies a lack of standardization in methods for handling\nmissing variance data. J Clin Epidemiol 59: 342–353.\n118. Hrobjartsson A, Gotzsche PC (2004) Placebo interventions for all clinical\nconditions. Cochrane Database Syst Rev Issue 2: CD003974. doi:10.1002/\n14651858.CD003974.pub2.\n119. Shekelle PG, Morton SC, Maglione M, Suttorp M, Tu W, et al. (2004)\nPharmacological and surgical treatment of obesity. Evid Rep Technol Assess\n(Summ). pp 1–6.\n120. Chan AW, Altman DG (2005) Identifying outcome reporting bias in\nrandomised trials on PubMed: Review of publications and survey of authors.\nBMJ 330: 753.\n121. Williamson PR, Gamble C (2005) Identification and impact of outcome\nselection bias in meta-analysis. Stat Med 24: 1547–1561.\n122. Williamson PR, Gamble C, Altman DG, Hutton JL (2005) Outcome selection\nbias in meta-analysis. Stat Methods Med Res 14: 515–524.\n123. Ioannidis JP, Trikalinos TA (2007) The appropriateness of asymmetry tests for\npublication bias in meta-analyses: A large survey. CMAJ 176: 1091–1096.\n124. Briel M, Schwartz GG, Thompson PL, de Lemos JA, Blazing MA, et al. (2006)\nEffects of early treatment with statins on short-term clinical outcomes in acute\ncoronary syndromes: A meta-analysis of randomized controlled trials. JAMA\n295: 2046–2056.\n125. Song F, Eastwood AJ, Gilbody S, Duley L, Sutton AJ (2000) Publication and\nrelated biases. Health Technol Assess 4: 1–115.\n126. Schmid CH, Stark PC, Berlin JA, Landais P, Lau J (2004) Meta-regression\ndetected associations between heterogeneous treatment effects and study-level,\nbut not patient-level, factors. J Clin Epidemiol 57: 683–697.\n127. Higgins JP, Thompson SG (2004) Controlling the risk of spurious findings from\nmeta-regression. Stat Med 23: 1663–1682.\n128. Thompson SG, Higgins JP (2005) Treating individuals 4: Can meta-analysis\nhelp target interventions at individuals most likely to benefit? Lancet 365:\n341–346.\n129. Uitterhoeve RJ, Vernooy M, Litjens M, Potting K, Bensing J, et al. (2004)\nPsychosocial interventions for patients with advanced cancer—A systematic\nreview of the literature. Br J Cancer 91: 1050–1062.130. Fuccio L, Minardi ME, Zagari RM, Grilli D, Magrini N, et al. (2007) Meta-\nanalysis: Duration of first-line proton-pump inhibitor based triple therapy forHelicobacter pylori eradication. Ann Intern Med 147: 553–562.\n131. Egger M, Smith GD (1998) Bias in location and selection of studies. BMJ 316:\n61–66.\n132. Ravnskov U (1992) Cholesterol lowering trials in coronary heart disease:\nFrequency of citation and outcome. BMJ 305: 15–19.\n133. Hind D, Booth A (2007) Do health technology assessments comply with\nQUOROM diagram guidance? An empirical study. BMC Med Res Methodol7: 49.\n134. Curioni C, Andre C (2006) Rimonabant for overweight or obesity. Cochrane\nDatabase Syst Rev Issue 4: CD006162. doi:10.1002/14651858.CD006162.\npub2.\n135. DeCamp LR, Byerley JS, Doshi N, Steiner MJ (2008) Use of antiemetic agents\nin acute gastroenteritis: A systematic review and meta-analysis. Arch PediatrAdolesc Med 162: 858–865.\n136. Pakos EE, Ioannidis JP (2004) Radiotherapy vs. nonsteroidal anti-inflammatory\ndrugs for the prevention of heterotopic ossification after major hip procedures:\nA meta-analysis of randomized trials. Int J Radiat Oncol Biol Phys 60:\n888–895.\n137. Skalsky K, Yahav D, Bishara J, Pitlik S, Leibovici L, et al. (2008) Treatment of\nhuman brucellosis: Systematic review and meta-analysis of randomised\ncontrolled trials. BMJ 336: 701–704.\n138. Altman DG, Cates C (2001) The need for individual trial results in reports of\nsystematic reviews. BMJ. Rapid response.\n139. Gotzsche PC, Hrobjartsson A, Maric K, Tendal B (2007) Data extraction\nerrors in meta-analyses that use standardized mean differences. JAMA 298:430–437.\n140. Lewis S, Clarke M (2001) Forest plots: Trying to see the wood and the trees.\nBMJ 322: 1479–1480.\n141. Papanikolaou PN, Ioannidis JP (2004) Availability of large-scale evidence on\nspecific harms from systematic reviews of randomized trials. Am J Med 117:\n582–589.\n142. Duffett M, Choong K, Ng V, Randolph A, Cook DJ (2007) Surfactant therapy\nfor acute respiratory failure in children: A systematic review and meta-analysis.Crit Care 11: R66.\n143. Balk E, Raman G, Chung M, Ip S, Tatsioni A, et al. (2006) Effectiveness of\nmanagement strategies for renal artery stenosis: A systematic review. AnnIntern Med 145: 901–912.\n144. Palfreyman S, Nelson EA, Michaels JA (2007) Dressings for venous leg ulcers:\nSystematic review and meta-analysis. BMJ 335: 244.\n145. Ioannidis JP, Patsopoulos NA, Evangelou E (2007) Uncertainty in heteroge-\nneity estimates in meta-analyses. BMJ 335: 914–916.\n146. Appleton KM, Hayward RC, Gunnell D, Peters TJ, Rogers PJ, et al. (2006)\nEffects of n-3 long-chain polyunsaturated fatty acids on depressed mood:\nSystematic review of published trials. Am J Clin Nutr 84: 1308–1316.\n147. Kirsch I, Deacon BJ, Huedo-Medina TB, Scoboria A, Moore TJ, et al. (2008)\nInitial severity and antidepressant benefits: A meta-analysis of data submitted tothe Food and Drug Administration. PLoS Med 5: e45. doi:10.1371/journal.pmed.0050045.\n148. Reichenbach S, Sterchi R, Scherer M, Trelle S, Burgi E, et al. (2007) Meta-\nanalysis: Chondroitin for osteoarthritis of the knee or hip. Ann Intern Med 146:580–590.\n149. Hodson EM, Craig JC, Strippoli GF, Webster AC (2008) Antiviral medications\nfor preventing cytomegalovirus disease in solid organ transplant recipients.Cochrane Database Syst Rev Issue 2: CD003774. doi:10.1002/\n14651858.CD003774.pub3.\n150. Thompson SG, Higgins JP (2002) How should meta-regression analyses be\nundertaken and interpreted? Stat Med 21: 1559–1573.\n151. Chan AW, Krleza-Jeric K, Schmid I, Altman DG (2004) Outcome reporting\nbias in randomized trials funded by the Canadian Institutes of Health\nResearch. CMAJ 171: 735–740.\n152. Hahn S, Williamson PR, Hutton JL, Garner P, Flynn EV (2000) Assessing the\npotential for bias in meta-analysis due to selective reporting of subgroupanalyses within studies. Stat Med 19: 3325–3336.\n153. Green LW, Glasgow RE (2006) Evaluating the relevance, generalization, and\napplicability of research: Issues in external validation and translationmethodology. Eval Health Prof 29: 126–153.\n154. Liberati A, D’Amico R, Pifferi, Torri V, Brazzi L (2004) Antibiotic prophylaxis\nto reduce respiratory tract infections and mortality in adults receiving intensivecare. Cochrane Database Syst Rev Issue 1: CD000022. doi:10.1002/\n14651858.CD000022.pub2.\n155. Gonzalez R, Zamora J, Gomez-Camarero J, Molinero LM, Banares R, et al.\n(2008) Meta-analysis: Combination endoscopic and drug therapy to preventvariceal rebleeding in cirrhosis. Ann Intern Med 149: 109–122.\n156. D’Amico R, Pifferi S, Leonetti C, Torri V, Tinazzi A, et al. (1998) Effectiveness\nof antibiotic prophylaxis in critically ill adult patients: Systematic review ofrandomised controlled trials. BMJ 316: 1275–1285.\n157. Olsen O, Middleton P, Ezzo J, Gotzsche PC, Hadhazy V, et al. (2001) Quality\nof Cochrane reviews: Assessment of sample from 1998. BMJ 323: 829–832.\n158. Hopewell S, Wolfenden L, Clarke M (2008) Reporting of adverse events in\nsystematic reviews can be improved: Survey results. J Clin Epidemiol 61:597–602.\nPLoS Medicine | www.plosmedicine.org 26 July 2009 | Volume 6 | Issue 7 | e1000100 159. Cook DJ, Reeve BK, Guyatt GH, Heyland DK, Griffith LE, et al. (1996) Stress\nulcer prophylaxis in critically ill patients. Resolving discordant meta-analyses.JAMA 275: 308–314.\n160. Jadad AR, Cook DJ, Browman GP (1997) A guide to interpreting discordant\nsystematic reviews. CMAJ 156: 1411–1416.\n161. Clarke L, Clarke M, Clarke T (2007) How useful are Cochrane reviews in\nidentifying research needs? J Health Serv Res Policy 12: 101–103.\n162. [No authors listed] (2000) World Medical Association Declaration of Helsinki:\nEthical principles for medical research involving human subjects. JAMA 284:3043–3045.\n163. Clarke M, Hopewell S, Chalmers I (2007) Reports of clinical trials should begin\nand end with up-to-date systematic reviews of other relevant evidence: A statusreport. J R Soc Med 100: 187–190.\n164. Dube C, Rostom A, Lewin G, Tsertsvadze A, Barrowman N, et al. (2007) The\nuse of aspirin for primary prevention of colorectal cancer: A systematic reviewprepared for the U.S. Preventive Services Task Force. Ann Intern Med 146:365–375.\n165. Critchley J, Bates I (2005) Haemoglobin colour scale for anaemia diagnosis\nwhere there is no laboratory: A systematic review. Int J Epidemiol 34:1425–1434.\n166. Lexchin J, Bero LA, Djulbegovic B, Clark O (2003) Pharmaceutical industry\nsponsorship and research outcome and quality: Systematic review. BMJ 326:\n1167–1170.\n167. Als-Nielsen B, Chen W, Gluud C, Kjaergard LL (2003) Association of funding\nand conclusions in randomized drug trials: A reflection of treatment effect oradverse events? JAMA 290: 921–928.\n168. Peppercorn J, Blood E, Winer E, Partridge A (2007) Association between\npharmaceutical involvement and outcomes in breast cancer clinical trials.Cancer 109: 1239–1246.\n169. Yank V, Rennie D, Bero LA (2007) Financial ties and concordance between\nresults and conclusions in meta-analyses: Retrospective cohort study. BMJ 335:1202–1205.\n170. Jorgensen AW, Hilden J, Gøtzsche PC (2006) Cochrane reviews compared with\nindustry supported meta-analyses and other meta-analyses of the same drugs:Systematic review. BMJ 333: 782.\n171. Gotzsche PC, Hrobjartsson A, Johansen HK, Haahr MT, Altman DG, et al.\n(2007) Ghost authorship in industry-initiated randomised trials. PLoS Med 4:\ne19. doi:10.1371/journal.pmed.0040019.\n172. Akbari A, Mayhew A, Al-Alawi M, Grimshaw J, Winkens R, et al. (2008)\nInterventions to improve outpatient referrals from primary care to secondarycare. Cochrane Database Syst Rev Issue 2: CD005471. doi:10.1002/\n14651858.CD005471.pub2.\n173. Davies P, Boruch R (2001) The Campbell Collaboration. BMJ 323: 294–295.\n174. Pawson R, Greenhalgh T, Harvey G, Walshe K (2005) Realist review—A new\nmethod of systematic review designed for complex policy interventions. J HealthServ Res Policy 10(Suppl 1): 21–34.\n175. Greenhalgh T, Robert G, Macfarlane F, Bate P, Kyriakidou O, et al. (2005)\nStorylines of research in diffusion of innovation: A meta-narrative approach tosystematic review. Soc Sci Med 61: 417–430.\n176. Lumley T (2002) Network meta-analysis for indirect treatment comparisons.\nStat Med 21: 2313–2324.\n177. Salanti G, Higgins JP, Ades AE, Ioannidis JP (2008) Evaluation of networks of\nrandomized trials. Stat Methods Med Res 17: 279–301.\n178. Altman DG, Moher D (2005) [Developing guidelines for reporting healthcare\nresearch: Scientific rationale and procedures.]. Med Clin (Barc) 125(Suppl 1):8–13.\n179. Delaney A, Bagshaw SM, Ferland A, Manns B, Laupland KB, et al. (2005) A\nsystematic evaluation of the quality of meta-analyses in the critical careliterature. Crit Care 9: R575–582.\n180. Altman DG, Simera I, Hoey J, Moher D, Schulz K (2008) EQUATOR:\nReporting guidelines for health research. Lancet 371: 1149–1150.\n181. Plint AC, Moher D, Morrison A, Schulz K, Altman DG, et al. (2006) Does the\nCONSORT checklist improve the quality of reports of randomised controlledtrials? A systematic review. Med J Aust 185: 263–267.\n182. Simera I, Altman DG, Moher D, Schulz KF, Hoey J (2008) Guidelines for\nreporting health research: The EQUATOR network’s survey of guidelineauthors. PLoS Med 5: e139. doi:10.1371/journal.pmed.0050139.\n183. Last JM (2001) A dictionary of epidemiology. Oxford: Oxford University Press\n& International Epidemiological Association.\n184. Antman EM, Lau J, Kupelnick B, Mosteller F, Chalmers TC (1992) A\ncomparison of results of meta-analyses of randomized control trials andrecommendations of clinical experts. Treatments for myocardial infarction.JAMA 268: 240–248.\n185. Oxman AD, Guyatt GH (1993) The science of reviewing research.\nAnn N Y Acad Sci 703: 125–133; discussion 133–124.\n186. O’Connor D, Green S, Higgins JPT (2008) Chapter 5: Defining the review\nquestion and developing criteria for including studies. In: Higgins JPT, Green S,eds. Cochrane handbook for systematic reviews of interventions version 5.0.0[updated February 2008]. The Cochrane Collaboration, Available: http://\nwww.cochrane-handbook.org/. Accessed 26 May 2009.\n187. McDonagh M, Whiting P, Bradley M, Cooper J, Sutton A, et al. (2000) A\nsystematic review of public water fluoridation. Protocol changes (Appendix M).NHS Centre for Reviews and Dissemination. York: University of York,Available: http://www.york.ac.uk/inst/crd/pdf/appm.pdf. Accessed 26 May\n2009.\n188. Moher D, Cook DJ, Jadad AR, Tugwell P, Moher M, et al. (1999) Assessing the\nquality of reports of randomised trials: Implications for the conduct of meta-\nanalyses. Health Technol Assess 3: i–iv, 1–98.\n189. Devereaux PJ, Choi PT, El-Dika S, Bhandari M, Montori VM, et al. (2004) An\nobservational study found that authors of randomized controlled trials\nfrequently use concealment of randomization and blinding, despite the failure\nto report these methods. J Clin Epidemiol 57: 1232–1236.\n190. Soares HP, Daniels S, Kumar A, Clarke M, Scott C, et al. (2004) Bad reporting\ndoes not mean bad methods for randomised trials: Observational study of\nrandomised controlled trials performed by the Radiation Therapy OncologyGroup. BMJ 328: 22–24.\n191. Liberati A, Himel HN, Chalmers TC (1986) A quality assessment of\nrandomized control trials of primary treatment of breast cancer. J Clin Oncol\n4: 942–951.\n192. Moher D, Jadad AR, Nichol G, Penman M, Tugwell P, et al. (1995) Assessing\nthe quality of randomized controlled trials: An annotated bibliography of scales\nand checklists. Control Clin Trials 16: 62–73.\n193. Greenland S, O’Rourke K (2001) On the bias produced by quality scores in\nmeta-analysis, and a hierarchical view of proposed solutions. Biostatistics 2:\n463–471.\n194. Ju ¨ni P, Witschi A, Bloch R, Egger M (1999) The hazards of scoring the quality\nof clinical trials for meta-analysis. JAMA 282: 1054–1060.\n195. Fleiss JL (1993) The statistical basis of meta-analysis. Stat Methods Med Res 2:\n121–145.\n196. Villar J, Mackey ME, Carroli G, Donner A (2001) Meta-analyses in systematic\nreviews of randomized controlled trials in perinatal medicine: Comparison of\nfixed and random effects models. Stat Med 20: 3635–3647.\n197. Lau J, Ioannidis JP, Schmid CH (1998) Summing up evidence: One answer is\nnot always enough. Lancet 351: 123–127.\n198. DerSimonian R, Laird N (1986) Meta-analysis in clinical trials. Control Clin\nTrials 7: 177–188.\n199. Hunter JE, Schmidt FL (2000) Fixed effects vs. random effects meta-analysis\nmodels: Implications for cumulative research knowledge. Int J Sel Assess 8:\n275–292.\n200. Deeks JJ, Altman DG, Bradburn MJ (2001) Statistical methods for examining\nheterogeneity and combining results from several studies in meta-analysis. In:\nEgger M, Davey Smith G, Altman DG, eds. Systematic reviews in healthcare:\nMeta-analysis in context. London: BMJ Publishing Group. pp 285–312.\n201. Warn DE, Thompson SG, Spiegelhalter DJ (2002) Bayesian random effects\nmeta-analysis of trials with binary outcomes: Methods for the absolute risk\ndifference and relative risk scales. Stat Med 21: 1601–1623.\n202. Higgins JP, Thompson SG, Deeks JJ, Altman DG (2003) Measuring\ninconsistency in meta-analyses. BMJ 327: 557–560.\n203. Higgins JP, Thompson SG (2002) Quantifying heterogeneity in a meta-\nanalysis. Stat Med 21: 1539–1558.\n204. Huedo-Medina TB, Sanchez-Meca J, Marin-Martinez F, Botella J (2006)\nAssessing heterogeneity in meta-analysis: Q statistic or I2 index? Psychol\nMethods 11: 193–206.\n205. Thompson SG, Turner RM, Warn DE (2001) Multilevel models for meta-\nanalysis, and their application to absolute risk differences. Stat Methods MedRes 10: 375–392.\n206. Dickersin K (2005) Publication bias: Recognising the problem, understanding\nits origin and scope, and preventing harm. In: Rothstein HR, Sutton AJ,\nBorenstein M, eds. Publication bias in meta-analysis—Prevention, assessment\nand adjustments. West Sussex: John Wiley & Sons. 356 p.\n207. Scherer RW, Langenberg P, von Elm E (2007) Full publication of results\ninitially presented in abstracts. Cochrane Database Syst Rev Issue 2:\nMR000005. doi:10.1002/14651858.MR000005.pub3.\n208. Krzyzanowska MK, Pintilie M, Tannock IF (2003) Factors associated with\nfailure to publish large randomized trials presented at an oncology meeting.\nJAMA 290: 495–501.\n209. Hopewell S, Clarke M (2001) Methodologists and their methods. Do\nmethodologists write up their conference presentations or is it just 15 minutes\nof fame? Int J Technol Assess Health Care 17: 601–603.\n210. Ghersi D (2006) Issues in the design, conduct and reporting of clinical trials that\nimpact on the quality of decision making. PhD thesis. Sydney: School of Public\nHealth, Faculty of Medicine, University of Sydney.\n211. von Elm E, Rollin A, Blumle A, Huwiler K, Witschi M, et al. (2008) Publication\nand non-publication of clinical trials: Longitudinal study of applicationssubmitted to a research ethics committee. Swiss Med Wkly 138: 197–203.\n212. Sterne JA, Egger M (2001) Funnel plots for detecting bias in meta-analysis:\nGuidelines on choice of axis. J Clin Epidemiol 54: 1046–1055.\n213. Harbord RM, Egger M, Sterne JA (2006) A modified test for small-study effects\nin meta-analyses of controlled trials with binary endpoints. Stat Med 25:\n3443–3457.\n214. Peters JL, Sutton AJ, Jones DR, Abrams KR, Rushton L (2006) Comparison of\ntwo methods to detect publication bias in meta-analysis. JAMA 295: 676–680.\n215. Rothstein HR, Sutton AJ, Borenstein M (2005) Publication bias in meta-\nanalysis: Prevention, assessment and adjustments. West Sussex: John Wiley &\nSons.\n216. Lau J, Ioannidis JP, Terrin N, Schmid CH, Olkin I (2006) The case of the\nmisleading funnel plot. BMJ 333: 597–600.\nPLoS Medicine | www.plosmedicine.org 27 July 2009 | Volume 6 | Issue 7 | e1000100 217. Terrin N, Schmid CH, Lau J (2005) In an empirical evaluation of the funnel\nplot, researchers could not visually identify publication bias. J Clin Epidemiol\n58: 894–901.\n218. Egger M, Davey Smith G, Schneider M, Minder C (1997) Bias in meta-analysis\ndetected by a simple, graphical test. BMJ 315: 629–634.\n219. Ioannidis JP, Trikalinos TA (2007) An exploratory test for an excess of\nsignificant findings. Clin Trials 4: 245–253.220. Sterne JAC, Egger M, Moher D (2008) Chapter 10: Addressing reporting\nbiases. In: Higgins JPT, Green S, eds. Cochrane handbook for systematicreviews of interventions version 5.0.0 [updated February 2008]. The CochraneCollaboration, Available: http://www.cochrane-handbook.org/. Accessed 26May 2009.\nPLoS Medicine | www.plosmedicine.org 28 July 2009 | Volume 6 | Issue 7 | e1000100" -} -{ - "pm_id": "https://pubmed.ncbi.nlm.nih.gov/18436948", - "pdf_text": "924 BMJ | 26 April 2008 | Volu Me 336ANALYSIS\nadvantages and disadvantages but also by their confi -\ndence in these estimates. The cartoon depicting the \nweather forecaster’s uncertainty captures the difference \nbetween an assessment of the likelihood of an outcome \nand the confidence in that assessment (figure). The use -\nfulness of an estimate of the magnitude of intervention \neffects depends on our confidence in that estimate.\nExpert clinicians and organisations offering recom -\nmendations to the clinical community have often erred \nas a result of not taking sufficient account of the quality \nof evidence.2 For a decade, organisations recommended \nthat clinicians encourage postmenopausal women to use \nhormone replacement therapy.3 Many primary care phy -\nsicians dutifully applied this advice in their practices.\nA belief that such therapy substantially decreased \nwomen’s cardiovascular risk drove this recommenda -\ntion. Had a rigorous system of rating the quality of evi -\ndence been applied at the time, it would have shown \nthat because the data came from observational studies \nwith inconsistent results, the evidence for a reduction in \n cardiovascular risk was of very low quality.4 Recognition \nof the limitations of the evidence would have tempered \nthe recommendations. Ultimately, randomised \n controlled trials have shown that hormone replacement \ntherapy fails to reduce cardiovascular risk and may even \nincrease it.5 6\nThe US Food and Drug Administration licensed the \nantiarrhythmic agents encainide and flecainide for use \nin patients on the basis of the drugs’ ability to reduce \nasymptomatic ventricular arrhythmias associated with \nsudden death. This decision failed to acknowledge that \nbecause arrhythmia reduction reflected only indirectly \non the outcome of sudden death the quality of the \n evidence for the drugs’ benefit was of low quality. \nSubsequently, a randomised controlled trial showed \nthat the two drugs increase the risk of sudden death.7 \nAppropriate attention to the low quality of the evidence \nwould have saved thousands of lives.\nFailure to recognise high quality evidence \ncan cause similar problems. For instance, expert \n recommendations lagged a decade behind the evidence \nfrom well conducted randomised controlled trials that \n thrombolytic therapy achieved a reduction in mortality \nin myocardial infarction.8\nInsufficient attention to quality of evidence risks \ninappropriate guidelines and recommendations that \nmay lead clinicians to act to the detriment of their Guideline developers around the world are inconsist -\nent in how they rate quality of evidence and grade \nstrength of recommendations. As a result, guideline \nusers face challenges in understanding the messages \nthat grading systems try to communicate. Since 2006 \nthe BMJ has requested in its “Instructions to Authors” \non bmj.com that authors should preferably use the \nGrading of Recommendations Assessment, Develop -\nment and Evaluation (GRADE) system for grading \nevidence when submitting a clinical guidelines article. \nWhat was behind this decision?\nIn this first in a series of five articles we will explain \nwhy many organisations use formal systems to grade \nevidence and recommendations and why this is \nimportant for clinicians; we will focus on the GRADE \napproach to recommendations. In the next two articles \nwe will examine how the GRADE system categorises \nquality of evidence and strength of recommendations. \nThe final two articles will focus on recommendations \nfor diagnostic tests and GRADE’s framework for tack -\nling the impact of interventions on use of resources.\nGRADE has advantages over previous rating systems \n(box 1). Other systems share some of these advantages, \nbut none, other than GRADE, combines them all.1\nWhat is “quality of evidence” and why is it important?\nIn making healthcare management decisions, patients \nand clinicians must weigh up the benefits and down -\nsides of alternative strategies. Decision makers will be \ninfluenced not only by the best estimates of the expected Gordon H Guyat t professor, \nDepartment of Clinical \nEpidemiology and Biostatistics, \nMcMaster University, Hamilton, \nON, Canada L8N 3Z5\nAndrew D o xman researcher, \nNorwegian Knowledge Centre for \nthe Health Services, PO Box 7004, \nSt Olavs Plass, 0130 Oslo, Norway\nGunn e Vis t researcher , \nNorwegian Knowledge Centre for \nthe Health Services, PO Box 7004, \nSt Olavs Plass, 0130 Oslo, Norway\nregina Kun z associate professor, \nBasel Institute of Clinical \nEpidemiology, University Hospital \nBasel, Hebelstrasse 10, 4031 \nBasel, Switzerland\nYngve Falck-Ytte r assistant \nprofessor, Division of \nGastroenterology, Case Medical \nCenter, Case Western Reserve \nUniversity, Cleveland, \nOH 44106, USA\npablo Alonso-Coello researcher, \nIberoamerican Cochrane Center, \nServicio de Epidemiología Clínica \ny Salud Pública (Universidad \nAutónoma de Barcelona), Hospital \nde Sant Pau, Barcelona 08041, \nSpain \nHolger J Schünemann professor, \nDepartment of Epidemiology, \nItalian National Cancer Institute \nRegina Elena, Rome, Italy\nfor the GRADE Working Group\nCorrespondence to: \nG H Guyatt, CLARITY Research \nGroup, Department of Clinical \nEpidemiology and Biostatistics, \nRoom 2C12, 1200 Main Street, West \nHamilton, ON, Canada L8N 3Z5 \nguyatt@mcmaster.caGuidelines are inconsistent in how they rate the quality of evidence and the strength of \nrecommendations. This article explores the advantages of the GRADE system, which is increasingly \nbeing adopted by organisations worldwide GrADE: an emerging consensus on rating quality \nof evidence and strength of recommendationsrAtING QUALIt Y of EvIDENcE AND StrENG tH of rE commENDA tIoNS \nThis is the first in a series of five \narticles that explain the GRADE \nsystem for rating the quality \nof evidence and strength of \nrecommendations.Box 1 | Advantages of GRADE over other systems\nDeveloped by a widely representative group of •\t\ninternational guideline developers\nClear separation between quality of evidence and •\t\nstrength of recommendations\nExplicit evaluation of the importance of outcomes of •\t\nalternative management strategies\nExplicit, comprehensive criteria for downgrading and •\t\nupgrading quality of evidence ratings\nTransparent process of moving from evidence to •\t\nrecommendations\nExplicit acknowledgment of values and preferences•\t\nClear, pragmatic interpretation of strong versus weak •\t\nrecommendations for clinicians, patients, and policy \nmakers\nUseful for systematic reviews and health technology •\t\nassessments, as well as guidelines BMJ | 26 April 2008 | Volu Me 336 925ANALYSIS\nindicate whether (a) the evidence is high quality and \nthe desirable effects clearly outweigh the undesirable \neffects, or (b) there is a close or uncertain balance. A \nsimple, transparent grading of the recommendation \ncan effectively convey this key information.\nThere are limitations to formal grading of recom -\nmendations. Like the quality of evidence, the balance \nbetween desirable and undesirable effects reflects a con -\ntinuum. Some arbitrariness will therefore be associated \nwith placing particular recommendations in categories \nsuch as “strong” and “weak.” Most organisations produc -\ning guidelines have decided that the merits of an explicit \ngrade of recommendation outweigh the disadvantages.\nWhat makes a good grading system?\nNot all grading systems separate decisions regarding \nthe quality of evidence from strength of recommenda -\ntions. Those that fail to do so create confusion. High \nquality evidence doesn’t necessarily imply strong \nrecommendations, and strong recommendations can \narise from low quality evidence.\nFor example, patients who experience a first deep \nvenous thrombosis with no obvious provoking factor \nmust, after the first months of anticoagulation, decide \nwhether to continue taking warfarin long term. High \nquality randomised controlled trials show that continu -\ning warfarin will decrease the risk of recurrent throm -\nbosis but at the cost of increased risk of bleeding and \ninconvenience. Because patients with varying values \nand preferences will make different choices, guideline \npanels addressing whether patients should continue \nor terminate warfarin should—despite the high quality \nevidence—offer a weak recommendation.\nConsider the decision to administer aspirin or para -\ncetamol (acetaminophen) to children with chicken \npox. Observational studies have observed an associa -\ntion between aspirin administration and Reye’s syn -\ndrome.9 Because aspirin and paracetamol are similar \nin their analgesic and antipyretic effects, the low qual -\nity evidence regarding the association between aspi -\nrin and Reye’s syndrome does not preclude a strong \nrecommendation for paracetamol.\nSystems that classify “expert opinion” as a category \nof evidence also create confusion. Judgment is neces -\nsary for interpretation of all evidence, whether that \nevidence is high or low quality. Expert reports of \ntheir clinical experience should be explicitly labelled \nas very low quality evidence, along with case reports \nand other uncontrolled clinical observations.\nGrading systems that are simple with respect to \njudgments both about the quality of the evidence \nand the strength of recommendations facilitate use \nby patients, clinicians, and policy makers.1 Detailed \nand explicit criteria for ratings of quality and grading \nof strength will make judgments more transparent to \nthose using guidelines and recommendations.\nAlthough many grading systems to some extent \nmeet these criteria,1 a plethora of systems makes their \nuse difficult for frontline clinicians. Understanding a \nvariety of systems is neither an efficient nor a realistic \nuse of clinicians’ time. The GRADE system is used patients. Recognising the quality of evidence will help \nto prevent these errors.\nHow should guideline developers alert clinicians to \nquality of evidence?\nA formal system that categorises quality of evidence— \nfor example, from high to very low—represents an \nobvious strategy for conveying quality of evidence to \nclinicians. Some limitations, however, do exist. Quality \nof evidence is a continuum; any discrete categorisation \ninvolves some degree of arbitrariness. Nevertheless, \nadvantages of simplicity, transparency, and vividness \noutweigh these limitations.\nWhat is “strength of recommendation” and why is it \nimportant?\nA recommendation to offer patients a particular treat -\nment may arise from large, rigorous randomised con -\ntrolled trials that show consistent impressive benefits \nwith few side effects and minimal inconvenience and \ncost. Such is the case with using a short course of \noral steroids in patients with exacerbations of asthma. \nClinicians can offer such treatments to almost all their \npatients with little or no hesitation.\nAlternatively, treatment recommendations may \narise from observational studies and may involve \nappreciable harms, burdens, or costs. Deciding \nwhether to use antithrombotic therapy in pregnant \nwomen with prosthetic heart valves involves weigh -\ning the magnitude of reduction in valve thrombosis \nagainst inconvenience, cost, and risk of teratogenesis. \nClinicians offering such treatments must help patients \nto weigh up the desirable and undesirable effects care -\nfully according to their values and preferences.\nGuidelines and recommendations must therefore \n 926 BMJ | 26 April 2008 | Volu Me 336ANALYSIS\nDetails of the GRADE working group, contributors, and competing interests \nappear in the version on bmj.com\nAtkins D, Eccles M, Flottorp S, Guyatt GH, Henry D, Hill S, et al. 1 \nSystems for grading the quality of evidence and the strength of \nrecommendations I: critical appraisal of existing approaches. The \nGRADE Working Group. BMC Health Serv Res 2004;4(1):38.\nLacchetti C, Guyatt G. Surprising results of randomized trials. In: 2 \nGuyatt G, Drummond R, eds. Users’ guides to the medical literature: \na manual of evidence-based clinical practice . Chicago, IL: AMA \nPress, 2002.\nAmerican College of Physicians. Guidelines for counseling 3 \npostmenopausal women about preventive hormone therapy. Ann \nIntern Med 1992;117:1038-41.\nHumphrey LL, Chan BK, Sox HC. Postmenopausal hormone 4 \nreplacement therapy and the primary prevention of cardiovascular \ndisease. Ann Intern Med 2002;137:273-84.\nHulley S, Grady D, Bush T, Furberg C, Herrington D, Riggs B, et 5 \nal. Randomized trial of estrogen plus progestin for secondary \nprevention of coronary heart disease in postmenopausal women. \nHeart and Estrogen/progestin Replacement Study (HERS) Research \nGroup. JAMA 1998;280:605-13.\nRossouw JE, Anderson GL, Prentice RL, LaCroix AZ, Kooperberg C, 6 \nStefanick ML, et al. Risks and benefits of estrogen plus progestin \nin healthy postmenopausal women: principal results from the \nWomen’s Health Initiative randomized controlled trial. JAMA \n2002;288:321-33.\nEcht DS, Liebson PR, Mitchell LB, Peters RW, Obias-Manno D, Barker 7 \nAH, et al. Mortality and morbidity in patients receiving encainide, \nflecainide, or placebo. The cardiac arrhythmia suppression trial. N \nEngl J Med 1991;324:781-8.\nAntman EM, Lau J, Kupelnick B, Mosteller F, Chalmers TC. A 8 \ncomparison of results of meta-analyses of randomized control \ntrials and recommendations of clinical experts. Treatments for \nmyocardial infarction. JAMA 1992;268:240-8.\nCommittee on Infectious Diseases. Aspirin and Reye syndrome. 9 \nPediatrics 1982;69:810-2. widely: the World Health Organization, the American \nCollege of Physicians, the American Thoracic Society, \nUpToDate (an electronic resource widely used in North \nAmerica, www.uptodate.com), and the Cochrane Col -\nlaboration are among the more than 25 organisations \nthat have adopted GRADE. This widespread adoption \nof GRADE reflects GRADE’s success as a methodo -\nlogically rigorous, user friendly grading system.\nHow does the GRADE system classify quality of \nevidence?\nTo achieve transparency and simplicity, the GRADE \nsystem classifies the quality of evidence in one of four \nlevels—high, moderate, low, and very low (box 2). Some \nof the organisations using the GRADE system have cho -\nsen to combine the low and very low categories. Evi -\ndence based on randomised controlled trials begins as \nhigh quality evidence, but our confidence in the evidence \nmay be decreased for several reasons, including:\nStudy limitations• \nInconsistency of results• \nIndirectness of evidence• \nImprecision• \nReporting bias.• \nAlthough observational studies (for example, cohort \nand case-control studies) start with a “low quality” \nrating, grading upwards may be warranted if the \nmagnitude of the treatment effect is very large (such \nas severe hip osteoarthritis and hip replacement), if \nthere is evidence of a dose-response relation or if all \nplausible biases would decrease the magnitude of an \napparent treatment effect.\nHow does the GRADE system consider strength of \nrecommendation?\nThe GRADE system offers two grades of recommenda -\ntions: “strong” and “weak” (though guidelines panels \nmay prefer terms such as “conditional” or “discretion -\nary” instead of weak). When the desirable effects of an \nintervention clearly outweigh the undesirable effects, \nor clearly do not, guideline panels offer strong recom -\nmendations. On the other hand, when the trade-offs are \nless certain—either because of low quality evidence or \nbecause evidence suggests that desirable and undesir -\nable effects are closely balanced—weak recommenda -\ntions become mandatory.\nIn addition to the quality of the evidence, several \nother factors affect whether recommendations are \nstrong or weak (table 1).SUmmAr Y poINtS \nFailure to consider the quality of evidence can lead to misguided recommendations; \nhormone replacement therapy for post-menopausal women provides an instructive example \nHigh quality evidence that an intervention’s desirable effects are clearly greater than its \nundesirable effects, or are clearly not, warrants a strong recommendation\nUncertainty about the trade-offs (because of low quality evidence or because the desirable \nand undesirable effects are closely balanced) warrants a weak recommendation\nGuidelines should inform clinicians what the quality of the underlying evidence is and \nwhether recommendations are strong or weak\nThe Grading of Recommendations Assessment, Development and Evaluation (GRADE ) \napproach provides a system for rating quality of evidence and strength of recommendations \nthat is explicit, comprehensive, transparent, and pragmatic and is increasingly being \nadopted by organisations worldwide\nBox 2 | Quality of evidence and definitions \nHigh quality— Further research is very unlikely to change \nour confidence in the estimate of effect\nModerate quality— Further research is likely to have an \nimportant impact on our confidence in the estimate of effect \nand may change the estimate\nLow quality— Further research is very likely to have an \nimportant impact on our confidence in the estimate of effect \nand is likely to change the estimate\nVery low quality— Any estimate of effect is very uncertainFactors that affect the strength of a recommendation\nFactor Examples of strong recommendationsExamples of weak \nrecommendations\nQuality of evidence Many high quality randomised trials have \nshown the benefit of inhaled steroids in \nasthmaOnly case series have \nexamined the utility of \npleurodesis in pneumothorax\nUncertainty about the balance \nbetween desirable and undesirable \neffectsAspirin in myocardial infarction \nreduces mortality with minimal toxicity, \ninconvenience, and costWarfarin in low risk patients \nwith atrial fibrillation results \nin small stroke reduction but \nincreased bleeding risk and \nsubstantial inconvenience\nUncertainty or variability in values \nand preferencesYoung patients with lymphoma will \ninvariably place a higher value on the life \nprolonging effects of chemotherapy than \non treatment toxicityOlder patients with lymphoma \nmay not place a higher value \non the life prolonging effects \nof chemotherapy than on \ntreatment toxicity\nUncertainty about whether the \nintervention represents a wise use \nof resourcesThe low cost of aspirin as prophylaxis \nagainst stroke in patients with transient \nischemic attacksThe high cost of clopidogrel \nand of combination \ndipyridamole and aspirin as \nprophylaxis against stroke \nin patients with transient \nischaemic attacks" -} -{ - "pm_id": "https://pubmed.ncbi.nlm.nih.gov/20003500", - "pdf_text": "BioMed Central\nPage 1 of 9\n(page number not for citation purposes)BMC Bioinformatics\nOpen Access Software\nBLAST+: architecture and applications\nChristiam Camacho, George Coulou ris, Vahram Avagyan, Ning Ma, \nJason Papadopoulos, Kevin Bealer and Thomas L Madden*\nAddress: National Center for Biotechnology Information, National Library of Medicine, National Instit utes of Health, Building 3 8A, 8600 \nRockville Pike, Beth esda, MD 20894, USA\nEmail: Christiam Camacho - camacho @ncbi.nlm.nih.gov; Ge orge Coulouris - coulouri@ncbi.nlm.nih.gov; \nVahram Avagyan - avagyanv@ncbi.nlm.nih.gov; Ning Ma - maning@nc bi.nlm.nih.gov; Jason Papad opoulos - jason p@boo.net; \nKevin Bealer - kevinbealer@gm ail.com; Thomas L Madden* - madden@ncbi.nlm.nih.gov\n* Corresponding author \nAbstract\nBackground: Sequence similarity searching is a very im portant bioinformatics task. While Basic\nLocal Alignment Search Tool (B LAST) outperforms exact methods through its use of heuristics,\nthe speed of the current BLAST software is s uboptimal for very long queries or database\nsequences. There are also some shortcomings in the user-interface of the current command-line\napplications.\nResults: We describe features and improvements of rewritten BLAST software and introduce new\ncommand-line applications. Long query sequences ar e broken into chunks for processing, in some\ncases leading to dramatically shor ter run times. For long database sequences, it is possible to\nretrieve only the relevant parts of the seque nce, reducing CPU time and memory usage for\nsearches of short queries against databases of contigs or chromosome s. The program can now\nretrieve masking information fo r database sequences from the BLAST databases. A new modular\nsoftware library can now access subject sequence da ta from arbitrary data sources. We introduce\nseveral new features, including strategy files that allow a user to save and reuse their favorite set\nof options. The strategy files can be uploaded to and downloaded from the NCBI BLAST web site.\nConclusion: The new BLAST command-line applications, compared to the cu rrent BLAST tools,\ndemonstrate substantial speed improvements for long queries as well as chromosome length\ndatabase sequences. We have also improved the user interface of the command-line applications.\nBackground\nBasic Local Alignment Search Tool (BLAST) [1,2] is a\nsequence similarity search program that can be used to\nquickly search a sequence database for matches to a query\nsequence. Several variants of BLAST exist to compare all\ncombinations of nucleotide or protein queries against a\nnucleotide or protein database. In addition to performingalignments, BLAST provides an \"expect\" value, statistical\ninformation about the significance of each alignment.\nBLAST is one of the more popular bioinformatics tools.\nResearchers use command-line applications to perform\nsearches locally, often searching custom databases and\nperforming searches in bulk, possibly distributing thePublished: 15 December 2009\nBMC Bioinformatics 2009, 10:421 doi:10.1186/1471-2105-10-421Received: 28 July 2009\nAccepted: 15 December 2009\nThis article is available from: http ://www.biomedcentral.com/1471-2105/10/421\n© 2009 Camacho et al; licensee BioMed Central Ltd. \nThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons. org/licenses/by/2.0 ), \nwhich permits unrestricted use, distribution, and reproduction in any medium, provided the orig inal work is properly cited. BMC Bioinformatics 2009, 10:421 http://www.biomedc entral.com/1471-2105/10/421\nPage 2 of 9\n(page number not for citation purposes)searches on their own computer cluster. The current\nBLAST command-line applications (i.e., blastall and blast-\npgp) were available to the public in late 1997. They are\npart of the NCBI C toolkit [3] and are supported on a\nnumber of platforms that currently includes Linux, vari-\nous flavors of UNIX (including Mac OS X), and Microsoft\nWindows.\nThe initial BLAST applications from 1997 lacked many\nfeatures that are presently taken for granted. Within three\nyears of the initial public release, BLAST was modified to\nhandle databases with more than 2 billion letters, to limit\na search by a list of GenInfo Identifiers (GIs), and to\nsimultaneously search multiple databases. PHI-BLAST [4],\nIMPALA [5], and composition-based statistics [6] were\nalso introduced within this time period, followed by\nMegaBLAST [7] and the concept of query-concatenation\n(whereby the database is scanned once for many queries).\nChris Joerg of Compaq Computer Corporation suggested\nperformance enhancements in 1999. A group at Apple,\nInc. suggested other enhancements in 2002 [8]. These and\nother features were of great importance to BLAST users,\nbut the continual addition of unforeseen modifications\nmade the BLAST code fragile and difficult to maintain.\nMany mammalian genomes contain a large fraction of\ninterspersed repeats, with 38.5% of the mouse genome\nand 46% of the human genome reported as interspersed\nrepeats [9]. Traditionally, the only supported method\navailable to mask interspersed repeats in stand-alone\nBLAST has been to execute a separate tool (e.g., Repeat-\nMasker [10]) on a query, produce a FASTA file with the\nmasked region in lower-case letters, and have BLAST treat\nthe lower-case letters as masked query sequence. This\nrequires separate processing on each query before the\nBLAST search.\nNCBI recently redesigned the BLAST web site [11] to\nimprove usability [12], which helped to identify issues\nthat might also occur in the stand-alone BLAST com-\nmand-line applications. These changes have, unfortu-\nnately, made it more difficult to match parameters used in\na stand-alone search with default parameters on the NCBI\nweb site.\nThe advent of complete genomes resulted in much longer\nquery and subject sequences, leading to new challenges\nthat the current framework cannot handle. At the same\ntime, increases in generally available computer memory\nmade other approaches to similarity searching viable.\nBLAT [13] uses an index stored in memory. Cameron and\ncollaborators designed a \"cache-conscious\" implementa-\ntion of the initial word finding module of BLAST [14]. The\nconcerns listed in this section and the start of a new C++toolkit at the NCBI [15] motivated us to rewrite the BLAST\ncode and release a completely new set of command-line\napplications. Here we report on the design of the new\nBLAST code, the resulting improvements, and a new set of\nBLAST command-line applications.\nIn this article, a search type is described by a word or two\nin all upper-case letters. For example, a BLASTX search\ntranslates the nucleotide query in six frames and compares\nit to a protein database.\nImplementation\nThis section reports first on the overall design of the new\nsoftware and then discusses several enhancements to\nBLAST.\nOverall design\nTwo criteria were most important in the design of the new\nBLAST code: 1.) the code structure should be modular\nenough to allow easy modification; and 2.) the same\nBLAST code should be embedded in at least two different\nhost toolkits. This would allow both the new NCBI C++\ntoolkit and the older NCBI C toolkit to use the same\nBLAST source code.\nAt a high level, the BLAST process can be broken down\ninto three modules (Figure 1). The \"setup\" module sets up\nthe search. The \"scanning\" module scans each subject\nsequence for word matches and extends them. The \"trace-\nback\" module produces a full gapped alignment with\ninsertions and deletions.\nThe setup phase reads the query sequence, applies low-\ncomplexity or other filtering to it, and builds a \"lookup\"\ntable (i.e., perfect hashing). The lookup table contains\nonly words from the query for nucleotide-nucleotide\nsearches such as BLASTN or MEGABLAST. DISCONTIGU-\nOUS MEGABLAST allows non-consecutive matches in the\ninitial seed. Protein-protein searches such as BLASTP\nallow \"neighboring\" words. The neighboring words are\nsimilar to a word in the query, as judged by the scoring\nmatrix and a threshold value.\nThe scanning phase scans the database and performs\nextensions. Each subject sequence is scanned for words\n(\"hits\") matching those in the lookup table. These hits are\nused to initiate a gap-free alignment. Gap-free alignments\nthat exceed a threshold score then initiate a gapped align-\nment, and those gapped alignments that exceed another\nthreshold score are saved as \"preliminary\" matches for\nfurther processing. The scanning phase employs a few\noptimizations. The gapped alignment returns only the\nscore and extent of the alignment. The number and posi-\ntion of insertions, deletions and matching letters are not BMC Bioinformatics 2009, 10:421 http://www.biomedc entral.com/1471-2105/10/421\nPage 3 of 9\n(page number not for citation purposes)stored (no \"trace-back), reducing the CPU time and mem-\nory demands. Searches against nucleotide subject\nsequences consider only unambiguous bases (A, C, G, T),\nwith ambiguous bases (e.g., N) replaced at random during\npreparation of the BLAST database or subject sequence. A\nfour letter alphabet allows packing of four bases into one\nbyte, and the subject sequences are scanned four letters at\na time. Finally, less sensitive heuristic parameters are\nemployed for the gapped alignment, and the full extent of\na gapped alignment may, in rare cases, not be found.\nThe final phase of the BLAST search is the trace-back.\nInsertions and deletions are calculated for the alignments\nfound in the scanning phase. Ambiguous bases are\nrestored for nucleotide subject sequences, and more sen-\nsitive heuristic parameters are used for the gapped align-\nment. Composition-based statistics [6] may also be\napplied for BLASTP (protein-protein) and TBLASTN (pro-\ntein compared against translated nucleotide subject\nsequences).Ideally, one should be able to independently replace the\nfunctionality described in each of the small rectangles of\nFigure 1 (e.g., \"build lookup table\") with another imple-\nmentation. Some coordination is required: for example,\nthe lookup table is used when finding word matches, so\nboth \"build lookup table\" and \"find word matches\" need\nto be changed together. Finding word matches is the most\ncomputationally intensive part of the BLAST search, so the\nimplementation should be as fast as possible. To address\nthis, the author of the lookup table implementation must\nprovide the scanning routine for finding word hits. Other\nmodules can be changed independently.\nThe selection of ISO C99 allows use of the new BLAST\ncode in both C and C++ environments. The host toolkit\nprovides a software layer to allow BLAST to communicate\nwith the rest of each toolkit. This design requires a clean\nseparation between the algorithmic part of BLAST and the\nmodule that retrieves subject sequences from the data-\nbase. To allow this, the retrieval of subject sequences forSchematic of a BLAST searchFigure 1\nSchematic of a BLAST search . The first phase is \"setup\". The query is re ad, low-complexity or other filtering might be \napplied to the query, and a \"loo kup\" table is built. The next phase is \"scanni ng\". Each subject sequence is scanned for words \n(\"hits\") matching those in the lookup table. These hits are further processed, extend ed by gap-free and gapped alignments, and \nscored. Significant \"preliminary\" matches ar e saved for further processing. The final pha se in the BLAST algorithm, called the \n\"trace-back\", finds the locations of insertions and de letions for alignments saved in the scanning phase.Setup \nRead options \nMask query \nBuild lookup \ntable Read query Trace-back \nCalculate improved \nscore and \ninsertions/deletions Scanning \nFind word \nmatches \nGap free \nextensions \nGapped \nextensions \nMatches? \nSave hits More \nsequence? \nY \nY N N BMC Bioinformatics 2009, 10:421 http://www.biomedc entral.com/1471-2105/10/421\nPage 4 of 9\n(page number not for citation purposes)processing by the core of the BLAST code is performed\nthrough an Abstract Data Type (ADT), which specifies a\nset of data values and permitted operations. The actual\nretrieval occurs through an implementation of the ADT in\nthe host toolkit. The implementation can be changed\ndepending upon the need and requires no changes to the\nBLAST algorithm code itself.\nThe subject sequence information required by BLAST is\nquite simple. It consists of the total number of sequences\nto be searched, the length of any given sequence, as well\nas methods to retrieve the actual sequence. The total data-\nbase length is needed for calculation of expect values. A\ndatabase name and the length of the longest subject\nsequence are also required to implement some functions\nin an efficient manner. In order to satisfy the above\nrequirements, an ADT, called the BlastSeqSrc [16], was\nimplemented.\nDatabase masking\nLow-complexity regions and interspersed repeats typically\nmatch many sequences. These matches are normally not\nof biological interest, may lead to spurious results, and\nconfound the statistics used by BLAST. BLAST offers two\nquery masking modes to avoid such matches. One is\nknown as \"hard-masking\" and replaces the masked por-\ntion of the query by X's or N's for all phases of the search.\nOn the other hand, \"soft-masking\" makes the masked\nportion of the query unavailable for finding the initial\nword hits, but the masked portion is available for the gap-\nfree and gapped extensions once an initial word hit has\nbeen found.\nThe BLAST databases can also be masked. Masking infor-\nmation is stored as a series of intervals, so that masking\ncan be switched on or off. Information from multiple\nmasking algorithms can be stored in the same BLAST data-\nbase and accessed separately. Currently, database masking\nconsists of skipping masked portions of the database dur-\ning the scanning phase, but it is still possible to extend\nthrough masked portions of the database; as such, data-\nbase masking is analogous to soft-masking a query.\nMinimizing memory and cache footprint\nModifications that reduce the CPU time and memory\nfootprint of BLAST searches with long query or subject\nsequences are examined. First, an optimization for the\nscanning phase of the BLAST search is presented. Then, an\nimprovement for the trace-back phase is described.\nBLAST searches with very large queries are routine, but\nsome of the data structures scale with the query length.\nThe following analysis examines the scanning phase (Fig-\nure 1) of the BLAST search.Two large structures are frequently accessed during the\nscanning phase. The first is the \"lookup table\", which\nmaps words in a subject sequence to positions in the\nquery. The second is the \"diag-array\", which tracks how far\nBLAST has already extended word hits on any given diag-\nonal; its size scales with the query length. The scanning\nphase is a large fraction of the time of most BLAST\nsearches, so these structures must be accessed quickly.\nContemporary CPUs typically communicate with main\nmemory through several levels of cache, called a \"memory\nhierarchy\". For example, the L1 cache is the smallest and\nhas the lowest latency; the L2 cache is larger but slower.\nOn a machine with an Intel Xeon CPU, the L1 cache might\nbe around 16 kB and the L2 cache can range in size from\n0.5-4 MB. If the CPU does not find data or an instruction\nin the cache, it must fetch it from main memory; a \"cache\nmiss\". Performance could be improved by making the\nlookup table and diag-array small enough to fit into L2\ncache, still leaving room for instructions and other data.\nIn order to be specific, the discussion in the next two par-\nagraphs is limited to a BLASTX search, which translates a\nnucleotide query in six frames (three frames on each\nstrand) and compares it to a protein database.\nThe lookup table contains a long array (the \"backbone\"),\nwith each cell mapping to a unique word. The lookup\ntable translates each residue type to a number between 1\nand 24, so a three-letter word maps to an integer between\n1 and 243. For a three-letter word, an array of 32768 (323)\ncells allows a quick calculation of the offset into the back-\nbone while scanning the database for word matches. Each\ncell of the backbone consists of four integers. The first\ninteger specifies how many times that word appears in the\nquery; the other three can have one of two functions. For\nthree or fewer occurrences, the three integers simply spec-\nify the positions of the word in the query. If there are more\nthan three occurrences, however, the integers are an index\ninto another array containing the positions of the word in\nthe query. The total memory occupied by the backbone is\n16 bytes × 32768, or about 524 kB. Finally, there is a bit\nvector occupying 4096 bytes (32768/8). The correspond-\ning bit is set in the bit vector for backbone cells containing\nentries. For a short query, where the backbone may be\nsparsely populated, this allows a quick check whether a\ncell contains any information.\nA BLASTX query of N nucleotides becomes twice as long\nwhen it is represented as six protein sequences. The diag-\narray consumes one four-byte integer per letter in the\nquery. An estimate of the total memory occupied by the\nlookup table backbone and the diag-array, in bytes, for a\nnucleotide query of length N is:\n528 384 8, +N BMC Bioinformatics 2009, 10:421 http://www.biomedc entral.com/1471-2105/10/421\nPage 5 of 9\n(page number not for citation purposes)For a query of N = 50 k, this is close to a million bytes,\nalready the total size of L2 cache in many computers used\nfor BLAST searching. Modifications to these structures\nmight permit larger queries, but for contigs and chromo-\nsomes the structures would still overflow the L2 cache. To\novercome this, the query is split into smaller overlapping\npieces for the scanning phase of the search. BLAST then\nmerges the results and aligns the entire query during the\ntrace-back phase, obtaining the same results as a search\nthat was not split. Splitting the query has an additional\nadvantage; since the sub-query used during the scanning\nphase is of bounded length, it is possible to use a smaller\ndata type in the lookup table (specifically, a two byte\nrather than a four byte integer). This reduces the first term\nin the above equation from 528,384 to 266,240 bytes.\nThe final phase of the BLAST search, the trace-back, proc-\nesses the preliminary matches, producing an alignment\nwith insertions and deletions. Additionally, heuristic\nparameters may be assigned a more sensitive value, ambi-\nguities in a nucleotide database sequence are resolved,\nand the composition of the subject sequences may be\ntaken into account when calculating expect values. Some\nsubject sequences must be retrieved again for this calcula-\ntion, but since the preliminary phase finds the rough\nextent of any alignment, the entire sequence is often not\nneeded. This is most important for short queries searched\nagainst a database of much longer sequences. Only part of\nthe subject sequences, when appropriate, is now retrieved,\nand performance results are presented under \"Partial sub-\nject sequence retrieval\" below.\nResults and discussion\nFirst, we introduce a set of BLAST command-line applica-\ntions built with the software library discussed above.\nThen, we present an example use of database masking as\nwell as two performance analyses that demonstrate\nimprovements in search time: searches with very long\nqueries and searches of chromosome-sized database\nsequences. For each performance analysis, we prepared a\nbaseline application that disables the new feature being\ntested. Finally, we discuss an example of retrieving subject\nsequences from an arbitrary source.\nA SUSE Linux machine with an Intel Xeon 3.6 GHz CPU,\n16 kB of L1 cache, 1 MB of L2 cache, and 8 GB of RAM,\nprovided data for the comparisons described here.\nBLAST+ command-line applications\nNew command-line applications have been developed\nusing the NCBI C++ toolkit, and they are referred to as the\nBLAST+ command-line applications (or BLAST+ applica-\ntions). Extensive documentation about the different com-\nmand-line options is available [17], so only general\ncomments about the interface are presented here. TheNCBI C++ toolkit argument parser permitted the use of\nmulti-letter command-line arguments. New BLAST+ com-\nmand-line applications were introduced, dependent upon\nthe molecule types of the query and subject sequences. For\nexample, there is a \"blastx\" application that translates a\nnucleotide query and compares it to a protein database,\nand a \"blastn\" application that compares a nucleotide\nquery to a nucleotide database. The command-line\noptions and help messages are specific to each applica-\ntion. In contrast, the current C toolkit command-line\napplication (\"blastall\") presents usage instructions about\nnucleotide match and mismatch scores, needed only for\nBLASTN, even if the user wants to perform a BLASTX\nsearch. Users also need to optimize for different tasks\nwithin a single command-line application. For example,\nMEGABLAST compares a nucleotide query to a nucleotide\ndatabase, but is optimized for closely related sequences\n(e.g., searching for sequencing errors), using a large word\nsize and a linear gap penalty. BLASTN, on the other hand,\nis the traditional nucleotide-nucleotide search program\nand uses a smaller word size and affine gapping by\ndefault. The concept of a \"task\" allows a user to optimize\nthe search for different scenarios within one application.\nSetting the task for the blastn application changes the\ndefault value of a number of command-line arguments,\nsuch as the word size, but also the default scoring param-\neters for insertions, deletions, and mismatches. These val-\nues are changed to typical values that would be used with\nthe selected task. For the MEGABLAST task, the nucleotide\nmatch and mismatch values are 1 and -2, as this corre-\nsponds to 95% identity matches. In contrast, for BLASTN\nand DISCONTIGUOUS MEGABLAST, the values are 2 and\n-3 as they correspond to 85% identity [18].\nPower users of BLAST often have a specially crafted set of\ncommand-line options that they find useful for their par-\nticular task. However, lacking a method to save these, they\nmust write scripts or simply re-type them for each search.\nThe BLAST+ applications can write the query, database,\nand command-line options for a BLAST search into a\n\"strategy\" file. A user may then rerun a set of commands\nby specifying the strategy file, though a new query and\ndatabase can be specified with the command-line. This\nfile is currently written as ASN.1 (Abstract Syntax Nota-\ntion, a structured language similar to XML), but an XML\noption could be added in the future. Users can also\nupload this file to the NCBI BLAST web site to populate a\nBLAST search form, or download a strategy file for a search\nperformed at the NCBI BLAST web site.\nThe BLAST+ applications have a number of new features.\nA GI or accession may be used as the query, with the actual\nsequence automatically retrieved from a BLAST database\n(the sequence must be available in a BLAST database) or\nfrom GenBank. The applications can send a search to BMC Bioinformatics 2009, 10:421 http://www.biomedc entral.com/1471-2105/10/421\nPage 6 of 9\n(page number not for citation purposes)NCBI servers as well as locally search a set of queries\nagainst a set of FASTA subject sequences [17].\nTables listing the command-line options, as well as their\ntypes and defaults, were provided as additional file 1 for\nthis article.\nDatabase masking\nApplying masking information to the BLAST database\nrather than the query will improve the workflow for\nBLAST users. A specialized tool, such as WindowMasker\n[19] or RepeatMasker [10], can provide masking informa-\ntion for a single-species database when it is created, and it\nbecomes unnecessary to mask every query. Adding mask-\ning information to a BLAST database is a two step process.\nA file containing masking intervals in either XML or ASN.1\nformat is first produced, and then the information is\nadded to the BLAST database. The NCBI C++ toolkit pro-\nvides tools to produce this information for seg [20], dust\n[21], and WindowMasker [19]. Users may also provide\nintervals for algorithms not supported by the NCBI C++\ntoolkit; see the BLAST+ manual [17] for further informa-\ntion on how to produce a masked database. Currently,\ndatabase masking is only available in soft-masking mode.\nTo test the performance of database masking, 163 human\nESTs from UniGene cluster 235935 were searched against\nthe build 36.1 reference assembly of the human genome\n[22]. RepeatMasker processed the EST queries, producing\nFASTA files with repeats identified in lower-case. Repeat-\nMasker also processed the human genome FASTA files,\nlocations of repeats were produced from that data, and\nthose locations were then added as masking information\nto the BLAST database. Two sets of searches were run. One\nused the lower-case query masking to filter out inter-\nspersed repeats; the other used the database masking to do\nthe same. Alignments with a score of 100 or more were\nretained. Table 1 presents the results, which indicate that\ndifferences in query masking with RepeatMasker caused\nextra matches. For example GI 14400848 is only 145\nbases long and is not masked by RepeatMasker at all, but\nthe portion of the genome it matches is masked. For GI\n13529935 the last 78 bases are not masked, but the por-\ntion of the genome it matches is masked by RepeatMasker.\nCurrently, database masking is not supported for searches\nof translated database sequences (i.e., tblastn and tblastx),\nbut it will be supported in the near future.Database masking is not a new concept. Kent [13] men-\ntions cases where BLAT users might find repeat masking of\nthe database useful. Morgulis et al. [23] also allow users to\napply soft-masking to their database. In both of these\ncases, it is not simple to turn the masking on or off or to\nswitch the type of masking (e.g., from RepeatMasker to\nWindowMasker). The implementation presented here\nallows this flexibility.\nQuery splitting\nBreaking longer queries into smaller pieces for processing\ncan lead to significantly shorter search times. At the same\ntime, splitting the query into pieces makes it possible to\nguarantee that the query length is always bounded, allow-\ning the use of smaller data types in the lookup table. Use\nof smaller data types with a BLASTP search (protein-pro-\ntein) shows no improvement for sequences under 500 res-\nidues, but performance increases by up to 2% as the\nsequence length increases to 8000 residues. Use of a\nsmaller data type never makes performance worse, so it is\nused in the tests described in this section.\nBLAST searches of differently-sized chunks of zebra fish\nchromosome 2 [Genbank:NC_007113.2 ] against a set of\nhuman proteins were performed to test the query splitting\nimplementation. A baseline blastx application that does\nnot split the query was prepared. Figure 2 presents the\nspeedup for these searches, with speedup defined as (Tbase-\nline/Tblastx) - 1. Query splitting decreases the search time for\nqueries longer than 20 kbases, and the improvement con-\ntinues with increasing query length. The Cachegrind\nmemory profiling tool [24] confirmed a smaller number\nof cache misses with query splitting. Figure 3 presents\nthose results. Figures 2 and 3 reflect an expect value cutoff\nof 1.0e-6.\nCameron et al. [14] replaced the BLAST lookup table with\na DFA (Deterministic Finite Automaton) to improve the\ncache behavior. They reported a 10-15% reduction in\nsearch time for BLASTP (protein-protein) searches. Most\nproteins are too short to split, so no significant BLASTP\nimprovements were apparent in the work presented here.\nThis work emphasized improving the worst-case behavior\ntypically seen with very long nucleotide queries. The\nquery splitting approach does not preclude the use of a\nDFA or some other optimization instead of a lookup\ntable.\nTable 1: Comparison of query versus database masking.\nType of masking Number of alignmen ts found GIs of extra sequences found\nQuery 387 13529935, 14400848, 14430244, 14430457\nDatabase 383 BMC Bioinformatics 2009, 10:421 http://www.biomedc entral.com/1471-2105/10/421\nPage 7 of 9\n(page number not for citation purposes)Partial subject sequence retrieval\nPartial retrieval of subject sequences is most effective\nwhen a small fraction of the subject sequence is required\nin the trace-back phase, such as in a search of ESTs against\nchromosomes. A baseline blastn application that retrieves\nthe entire subject sequence in the trace-back phase was\nprepared. 163 human ESTs from UniGene cluster 235935\nwere searched against the masked human genome data-\nbase from build 36.1 of the reference assembly [22]. Fig-\nure 4 presents search times with the standard blastn\napplication and a baseline application. A word size of 24\nand database masking (with RepeatMasker) was used. The\nESTs with matches to the largest number of subject\nsequences showed the best improvement. The three right-\nmost data points on Figure 4 are for GIs 14429426,\n13529935, and 34478925 (left to right). These three ESTs\nmatch four, six, and eight database sequences respectively.\nOverall, 158 sequences matched only one subject\nsequence, two matched two sequences and there was one\nmatch each for four, six, and eight sequences. As expected,\nperformance did not improve for ESTs searched against a\ndatabase of ESTs (data not shown).\nRetrieving subject sequences from an arbitrary source\nAn Abstract Data Type (ADT) supplies the subject\nsequences to be searched in the new BLAST code. This\nabstraction avoids coupling the BLAST engine to a partic-\nular database format. It permits a search of sequences in\nthe \"Short Read Archive\" (SRA) at the NCBI through the\nSRA Software Development Kit [25]. An SRA BLAST web\npage accessible from the BLAST web site [11] was also cre-\nated.Future development\nFuture developments include adding hard-masking sup-\nport for databases, and making database masking availa-\nble for programs with translated database sequences\n(tblastn and tblastx). At this point, only the scanning\nphase of the BLAST search is multi-threaded; we also plan\nto make the trace-back phase multi-threaded.\nConclusions\nWe have reported on a new modular software library for\nBLAST. The design allows the addition of features that\ngreatly benefit performance, such as query splitting and\npartial retrieval of subject sequences. It also allows the\nreplacement of the lookup table with another design, so\nthat new implementations can easily be added. An\nindexed version of MEGABLAST [23] was implemented\nusing these libraries. The new library also supports a\nframework for retrieving subject sequences from arbitrary\ndata sources. This framework, an Abstract Data Type\n(ADT), allows the use of different modules to read the\nBLAST databases in the NCBI C++ and the C toolkits. It is\npossible to write a new module to supply subject\nsequences to the BLAST engine using this ADT [16] with-\nout any modifications of the BLAST algorithm code. An\nADT implementation has been written to support produc-\ntion searches of SRA sequences at the NCBI.L2 data cache misses for BLAS TX searches with and without query splittingFigure 3\nL2 data cache misses for BLASTX searches with and \nwithout query splitting . Cache misses were measured by \nCachegrind [24] and only miss es reading from the cache are \nshown. On the x-axis are diff erent query lengths in kbases. \nThe number of L2 cache misses is shown on the y-axis. The \ntop line is for the baseline a pplication without query splitting, \nthe bottom line is for the blastx application. The queries are \ndifferent sized pieces of [Genbank:NC_007113.2 ] searched \nagainst the set of human proteins used for Figure 2.050,000,000100,000,000150,000,000200,000,000250,000,000300,000,000\n10 15 20 25 30 35 40 45 50 55 60 65 70 75 80L2 data read misses\nQuery length (kbases) Speedup of BLASTX searches for differently sized queries with and without query splittingFigure 2\nSpeedup of BLASTX searches for differently sized \nqueries with and wi thout query splitting . Different \nsized pieces of [Genbank:NC_007113.2 ] were searched \nagainst a set of human proteins. The query length in kbases is \non the x-axis, with a log scale. On the y-axis is the fractional \nspeedup, which is defined as (Tbaseline /Tblastx) - 1. Three \nsearches were performed with both the baseline and the \nblastx applications (for each da ta point), and th e lowest time \nfor each application was used.00.511.522.533.5\n1 10 100 1000 10000 100000Speedup\nQuery length (kbases) BMC Bioinformatics 2009, 10:421 http://www.biomedc entral.com/1471-2105/10/421\nPage 8 of 9\n(page number not for citation purposes)We also described a new set of BLAST command-line\napplications. The applications have a new, more logical\norganization that groups together similar types of searches\nin one application. The concept of a task allows a user to\nspecify an optimal parameter set for a given task. Strategy\nfiles were also introduced, allowing a user to record\nparameters of a search in order to later rerun it in stand-\nalone mode or at the NCBI web site.\nAvailability and requirements\nBLAST is Public Domain software [26]. The latest version\nof BLAST can be retrieved from ftp://\nftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST .\nThis software was implemented with the C and C++ pro-\ngramming languages and was tested under Microsoft Win-\ndows, Linux, and Mac OS X. There are no restrictions on\nuse by non-academics. Query files and BLAST databases\nused for tests are available at ftp://ftp.ncbi.nih.gov/blast/\ndemo/bmc .\nAuthors' contributions\nAll authors participated in the design and coding of the\nsoftware. TLM drafted the manuscript and the other\nauthors provided feedback. All authors read and approved\nthe final version of the manuscript.Additional material\nAcknowledgements\nA number of people contributed to th is project. Richa Agarwala, Alejandro \nSchaffer, and Mike DiCuccio offered ideas and feedback. Mike Gertz, Ale-\nksandr Morgulis, and Ilya Dondoshansky contributed some of the code \nused in the core of BLAST. Denis Va katov, Aaron Ucko and other members \nof the NCBI C++ toolkit group offered a ssistance as well as the C++ toolkit \nused to build BLAST+. Eugene Yasche nko, Kurt Rodarmer and Ty Roach \nprovided help in using the NCBI SRA Software Development Toolkit. David \nLipman and Jim Ostell originally sugge sted the need for a rewritten version \nof BLAST and provided encouragement and feedback. Greg Boratyn, Mau-\nreen Madden and John Spouge read the manuscript and offered helpful sug-\ngestions.\nThis research was supported by the Intramural Research Program of the \nNIH, National Library of Medicine. Funding to pay the Open Access publi-\ncation charges for this article was prov ided by the Nation al Institutes of \nHealth.\nReferences\n1. Altschul S, Gish W, Miller W, Myers E, Lipman D: Basic local align-\nment search tool. J Mol Biol 1990, 215(3): 403-410.\n2. Altschul S, Madden T, Schäffer A, Zhang J, Zhang Z, Miller W, Lipman\nD: Gapped BLAST and PSI-BLAST: a new generation of pro-\ntein database search programs. Nucleic Acids Res 1997,\n25(17): 3389-3402.\n3. NCBI C toolkit [http://www.ncbi.nlm.nih.gov/IEB/ToolBox/SDK\nDOCS/INDEX.HTML ]\n4. Zhang Z, Schäffer A, Miller W, Madden T, Lipman D, Koonin E, Alts-\nchul S: Protein sequence similarity searches using patterns as\nseeds. Nucleic Acids Res 1998, 26(17): 3986-3990.\n5. Schäffer A, Wolf Y, Ponting C, Koonin E, Aravind L, Altschul S:\nIMPALA: matching a protein sequence against a collection\nof PSI-BLAST-constructed posi tion-specific score matrices.\nBioinformatics 1999, 15(12): 1000-1011.\n6. Schäffer A, Aravind L, Madden T, Sh avirin S, Spouge J, Wolf Y, Koonin\nE, Altschul S: Improving the accuracy of PSI-BLAST protein\ndatabase searches with comp osition-based statistics and\nother refinements. Nucleic Acids Res 2001, 29(14): 2994-3005.\n7. Zhang Z, Schwartz S, Wagner L, Miller W: A greedy algorithm for\naligning DNA sequences. J Comput Biol 7(1-2): 203-214.\n8. A/G BLAST [http://www.apple.com/downloads/macosx/\nmath_science/agblast.html ]\n9. Waterston R, Lindblad-Toh K, Birney E, Rogers J, Abril J, Agarwal P,\nAgarwala R, Ainscough R, Alexandersson M, An P, et al. : Initial\nsequencing and comparative analysis of the mouse genome.\nNature 2002, 420(6915): 520-562.\n10. RepeatMasker Web site [http://www.repeatmasker.org/ ]\n11. NCBI BLAST web site [http://blast.ncbi.nlm.nih.gov/Blast.cgi ]\n12. Johnson M, Zaretskaya I, Raytselis Y, Merezhuk Y, McGinnis S, Mad-\nden T: NCBI BLAST: a be tter web in terface. Nucleic Acids Res\n2008, 36(Web Server issue): W5-9.Additional file 1\nEight tables list the command-line a pplication options, as well as their \ntypes, default values, and a short explanation. The first table has infor-\nmation common to the search applicat ions blastn, blastp, blastx, tblastn, \nand tblastx. The next five tables de scribe options for those applications. \nThe last two tables list the options for makeblastdb (used to build a blast \ndatabase) and blastdbcmd (used to read a database).\nClick here for file\n[http://www.biomedcentral.co m/content/supplementary/1471-\n2105-10-421-S1.PDF]\nScatter plot of MEGABLAST se arch times with and without partial retrievalFigure 4\nScatter plot of MEGABLAST search times with and \nwithout partial retrieval . 163 human ESTs from UniGene \ncluster 235935 were searched against all human chromo-\nsomes [22]. On the x-axis are times for the baseline applica-\ntion; on the y-axis are times fo r the new blastn application. \nSequences with the best improvement are those furthest to \nthe right, and they also matched the largest number of sub-\nject sequences. A word size of 24 was used for the runs as \nwell as database masking with RepeatMasker. Three searches \nwere done with both the baseli ne and blastn application for \neach data point, and the lowest time for each application was \nused.012345678\n012345678blastn (seconds )\nbaseline (seconds) Publish with BioMed Central and every \nscientist can read your work free of charge\n\"BioMed Central will be the most significant development for \ndisseminating the results of biomedical research in our lifetime.\"\nSir Paul Nurse, Cancer Research UK\nYour research papers will be:\navailable free of charge to the entire biomedical community\npeer reviewed and published immediately upon acceptance\ncited in PubMed and archived on PubMed Central \nyours — you keep the copyright\nSubmit your manuscript here:\nhttp://www.biomedcentral.com/info/publishing_adv.aspBioMed centralBMC Bioinformatics 2009, 10:421 http://www.biomedc entral.com/1471-2105/10/421\nPage 9 of 9\n(page number not for citation purposes)13. Kent W: BLAT--the BLAST-like alignment tool. Genome Res\n2002, 12(4): 656-664.\n14. Cameron M, Williams H, Cannane A: A deterministic finite\nautomaton for faster protein hit detection in BLAST. J Com-\nput Biol 2006, 13(4): 965-978.\n15. NCBI C++ toolkit documentation [ h t t p : / /\nwww.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=toolkit ]\n16. Implementing a BlastSeqSrc [http://www.ncbi.nlm.nih.gov/IEB/\nToolBox/CPP_DOC/doxyhtml/_impl_blast_seqsrc_howto.html ]\n17. BLAST+ Command Line Applications User Manual [http://\nwww.ncbi.nlm.nih.gov/bookshel f/br.fcgi?book=helpblast ]\n18. States DJ, Gish W, Altschul SF: Improved sensitivity of nucleic\nacid database searches using application-specific scoring\nmatrices. METHODS: A Companion to Methods in Enzymology 1991,\n3:66-70.\n19. Morgulis A, Gertz E, Schäffer A, Agarwala R: WindowMasker: win-\ndow-based masker for sequenced genomes. Bioinformatics\n2006, 22(2): 134-141.\n20. Wootton JC, Federhen S: Analysis of compositionally biased\nregions in sequence databases. Computer Methods for Macromo-\nlecular Sequence Analysis 1996, 266:554-571.\n21. Morgulis A, Gertz E, Schäffer A, Agarwala R: A fast and symmetric\nDUST implementation to ma sk low-complexity DNA\nsequences. J Comput Biol 2006, 13(5): 1028-1040.\n22. Reference assembly for Human genome build 36.1 [http://\nwww.ncbi.nlm.nih.gov/genome/guide/human/\nrelease_notes.html#b36 ]\n23. Morgulis A, Coulouris G, Raytselis Y, Madden T, Agarwala R, Schäffer\nA: Database indexing for production MegaBLAST searches.\nBioinformatics 2008, 24(16): 1757-1764.\n24. Cachegrind [http://valgrind.org/d ocs/manual/cg-manual.html ]\n25. NCBI SRA Software Development Kit\n[http:www.ncbi.nlm.nih.gov/Trac es/sra/sra.cgi?cmd=show&f=soft\nware&m=software&s=software ]\n26. PUBLIC DOMAIN NOTICE for NCBI [ h t t p : / /\nwww.ncbi.nlm.nih.gov/bookshelf/\nbr.fcgi?book=toolkit&part=toolkit.fm#A3 ]" -} -{ - "pm_id": "https://pubmed.ncbi.nlm.nih.gov/12117397", - "pdf_text": "UC San Diego\nUC San Diego Previously Published Works\nTitle\nRisks and Benefits of Estrogen Plus Progestin in Healthy Postmenopausal Women: Principal \nResults From the Women's Health Initiative Randomized Controlled Trial\nPermalink\nhttps://escholarship.org/uc/item/3mr6f93p\nJournal\nJAMA, 288(3)\nISSN\n0098-7484\nAuthors\nRossouw, Jacques E\nAnderson, Garnet L\nPrentice, Ross L\net al.\nPublication Date\n2002-07-17\nDOI\n10.1001/jama.288.3.321\n \nPeer reviewed\neScholarship.org Powered by the California Digital Library\nUniversity of California ORIGINAL CONTRIBUTION JAMA -EXPRESS\nRisks and Benefits of Estrogen Plus Progestin\nin Healthy Postmenopausal Women\nPrincipal Results From the Women’s Health Initiative\nRandomized Controlled Trial\nWriting Group for the\nWomen’s Health InitiativeInvestigators\nTHEWOMEN ’SHEALTH INITIA -\ntive (WHI) focuses on defin-ing the risks and benefits ofstrategies that could poten-\ntially reduce the incidence of heart dis-ease, breast and colorectal cancer, andfractures in postmenopausal women.Between 1993 and 1998, the WHI en-rolled 161809 postmenopausal womenin the age range of 50 to 79 years intoa set of clinical trials (trials of low-fatdietary pattern, calcium and vitamin Dsupplementation, and 2 trials of post-menopausal hormone use) and an ob-servational study at 40 clinical centersin the United States.\n1This article re-\nports principal results for the trial ofcombined estrogen and progestin inwomen with a uterus. The trial wasstopped early based on health risks thatexceeded health benefits over an aver-age follow-up of 5.2 years. A paralleltrial of estrogen alone in women whohave had a hysterectomy is being con-tinued, and the planned end of this trialis March 2005, by which time the av-erage follow-up will be about 8.5 years.\nThe WHI clinical trials were de-\nsigned in 1991-1992 using the accu-mulated evidence at that time. The pri-mary outcome for the trial of estrogenplus progestin was designated as coro-nary heart disease (CHD). Potential car-dioprotection was based on generally\nAuthor Information and Financial Disclosures appear at the end of this article.Context Despite decades of accumulated observational evidence, the balance of risks\nand benefits for hormone use in healthy postmenopausal women remains uncertain.\nObjective To assess the major health benefits and risks of the most commonly used\ncombined hormone preparation in the United States.\nDesign Estrogen plus progestin component of the Women’s Health Initiative, a ran-\ndomized controlled primary prevention trial (planned duration, 8.5 years) in which 16608postmenopausal women aged 50-79 years with an intact uterus at baseline were re-cruited by 40 US clinical centers in 1993-1998.\nInterventions Participants received conjugated equine estrogens, 0.625 mg/d, plus\nmedroxyprogesterone acetate, 2.5 mg/d, in 1 tablet (n=8506) or placebo (n=8102).\nMain Outcomes Measures The primary outcome was coronary heart disease (CHD)\n(nonfatal myocardial infarction and CHD death), with invasive breast cancer as theprimary adverse outcome. A global index summarizing the balance of risks and ben-efits included the 2 primary outcomes plus stroke, pulmonary embolism (PE), endo-metrial cancer, colorectal cancer, hip fracture, and death due to other causes.\nResults On May 31, 2002, after a mean of 5.2 years of follow-up, the data and safety\nmonitoring board recommended stopping the trial of estrogen plus progestin vs placebobecause the test statistic for invasive breast cancer exceeded the stopping boundary forthis adverse effect and the global index statistic supported risks exceeding benefits. Thisreport includes data on the major clinical outcomes through April 30, 2002. Estimatedhazard ratios (HRs) (nominal 95% confidence intervals [CIs]) were as follows: CHD, 1.29(1.02-1.63) with 286 cases; breast cancer, 1.26 (1.00-1.59) with 290 cases; stroke, 1.41(1.07-1.85) with 212 cases; PE, 2.13 (1.39-3.25) with 101 cases; colorectal cancer, 0.63(0.43-0.92) with 112 cases; endometrial cancer, 0.83 (0.47-1.47) with 47 cases; hip frac-ture, 0.66 (0.45-0.98) with 106 cases; and death due to other causes, 0.92 (0.74-1.14)with 331 cases. Corresponding HRs (nominal 95% CIs) for composite outcomes were1.22 (1.09-1.36) for total cardiovascular disease (arterial and venous disease), 1.03 (0.90-1.17) for total cancer, 0.76 (0.69-0.85) for combined fractures, 0.98 (0.82-1.18) for totalmortality, and 1.15 (1.03-1.28) for the global index. Absolute excess risks per 10000 person-years attributable to estrogen plus progestin were 7 more CHD events, 8 more strokes, 8more PEs, and 8 more invasive breast cancers, while absolute risk reductions per 10000person-years were 6 fewer colorectal cancers and 5 fewer hip fractures. The absolute ex-cess risk of events included in the global index was 19 per 10000 person-years.\nConclusions Overall health risks exceeded benefits from use of combined estrogen\nplus progestin for an average 5.2-year follow-up among healthy postmenopausal USwomen. All-cause mortality was not affected during the trial. The risk-benefit profilefound in this trial is not consistent with the requirements for a viable intervention forprimary prevention of chronic diseases, and the results indicate that this regimen shouldnot be initiated or continued for primary prevention of CHD.\nJAMA. 2002;288:321-333 www.jama.com\nFor editorial comment see p 366.\n©2002 American Medical Association. All rights reserved. (Reprinted) JAMA, July 17, 2002—Vol 288, No. 3 321\nDownloaded From: https://jamanetwork.com/ by a University of California - San Diego User on 05/16/2019 supportive data on lipid levels in inter-\nmediate outcome clinical trials, trials innonhuman primates, and a large bodyof observational studies suggesting a40% to 50% reduction in risk amongusers of either estrogen alone or, lessfrequently, combined estrogen and pro-gestin.\n2-5Hip fracture was designated as\na secondary outcome, supported by ob-servational data as well as clinical tri-als showing benefit for bone mineraldensity.\n6,7Invasive breast cancer was\ndesignated as a primary adverse out-come based on observational data.\n3,8Ad-\nditional clinical outcomes chosen assecondary outcomes that may plausi-bly be affected by hormone therapy in-clude other cardiovascular diseases; en-dometrial, colorectal, and other cancers;and other fractures.\n3,6,9\nThe effect of hormones on overall\nhealth was an important consider-ation in the design and conduct of theWHI clinical trial. In an attempt to sum-marize important aspects of health ben-efits vs risks, a global index was de-fined as the earliest occurrence of CHD,invasive breast cancer, stroke, pulmo-nary embolism (PE), endometrial can-cer, colorectal cancer, hip fracture, ordeath due to other causes. Comparedwith total mortality, which may be tooinsensitive, this index assigns addi-tional weight to the 7 listed diseases.Procedures for monitoring the trial in-volved semiannual comparisons of theestrogen plus progestin and placebo\ngroups with respect to each of the el-ements of the global index and to theoverall global index.\nThis report pertains primarily to\nestrogen plus progestin use amonghealthy postmenopausal women, sinceonly 7.7% of participating women re-ported having had prior cardiovascu-lar disease. During the course of theWHI trial, the Heart and Estrogen/progestin Replacement Study (HERS)reported its principal results.\n10HERS\nwas another blinded, randomized con-trolled trial comparing the same regi-men of estrogen plus progestin with pla-cebo among women with a uterus;however, in HERS, all 2763 participat-ing women had documented CHD priorto randomization. The HERS findingsof no overall effect on CHD but an ap-parent increased risk in the first yearafter randomization seemed surpris-ing given preceding observational stud-ies of hormone use in women withCHD.\n3Subsequent to HERS, some in-\nvestigators reanalyzed their observa-tional study data and were able to de-tect an early elevation in CHD riskamong women with prior CHD\n11-13but\nnot in ostensibly healthy women,14\nprompting speculation that any earlyadverse effect of hormones on CHD in-cidence was confined to women whohave experienced prior CHD events.\nThe WHI is the first randomized trial\nto directly address whether estrogenplus progestin has a favorable or unfa-vorable effect on CHD incidence andon overall risks and benefits in pre-dominantly healthy women.\nMETHODS\nStudy Population\nDetailed eligibility criteria and recruit-ment methods have been published.\n1\nBriefly, most women were recruited bypopulation-based direct mailing cam-paigns to age-eligible women, in con-junction with media awareness pro-grams. Eligibility was defined as age 50to 79 years at initial screening, post-menopausal, likelihood of residence inthe area for 3 years, and provision of writ-ten informed consent. A woman wasconsidered postmenopausal if she had\nexperienced no vaginal bleeding for 6months (12 months for 50- to 54-year-olds), had had a hysterectomy, or hadever used postmenopausal hormones.Major exclusions were related to com-peting risks (any medical condition likelyto be associated with a predicted sur-vival of /H110213 years), safety (eg, prior breast\ncancer, other prior cancer within the last10 years except nonmelanoma skin can-cer, low hematocrit or platelet counts),and adherence and retention concerns(eg, alcoholism, dementia).\nA 3-month washout period was re-\nquired before baseline evaluation ofwomen using postmenopausal hor-mones at initial screening. Women withan intact uterus at initial screening wereeligible for the trial of combined post-menopausal hormones, while womenwith a prior hysterectomy were eli-gible for the trial of unopposed estro-gen. This report is limited to the 16608women with an intact uterus at base-line who were enrolled in the trial com-ponent of estrogen plus progestin vsplacebo. The protocol and consentforms were approved by the institu-tional review boards for all participat-ing institutions (see Acknowledgment).\nStudy Regimens, Randomization,\nand Blinding\nCombined estrogen and progestin was\nprovided in 1 daily tablet containingconjugated equine estrogen (CEE),0.625 mg, and medroxyprogesteroneacetate (MPA), 2.5 mg (Prempro,Wyeth Ayerst, Philadelphia, Pa). Amatching placebo was provided to thecontrol group. Eligible women wererandomly assigned to receive estrogenplus progestin or placebo after eligibil-ity was established and baseline assess-ments made ( F\nIGURE 1). The random-\nization procedure was developed at theWHI Clinical Coordinating Center andimplemented locally through a distrib-uted study database, using a random-ized permuted block algorithm, strati-fied by clinical center site and agegroup. All study medication bottles hada unique bottle number and bar codeto allow for blinded dispensing.Figure 1. Profile of the Estrogen Plus\nProgestin Component ofthe Women ’s Health Initiative\n8506 Assigned to\nReceive Estrogen+ Progestin8102 Assigned to\nReceive Placebo18 845 Provided Consent and\nReported No Hysterectomy373 092 Women Initiated Screening\n16 608 Randomized\nStatus on April 30, 2002\n7968 Alive and Outcomes\nData Submitted inLast 18 mo\n307 Unknown Vital \nStatus\n231 DeceasedStatus on April 30, 2002\n7608 Alive and Outcomes\nData Submitted inLast 18 mo\n276 Unknown Vital \nStatus\n218 DeceasedRISKS AND BENEFITS OF ESTROGEN PLUS PROGESTIN\n322 JAMA, July 17, 2002 —Vol 288, No. 3 (Reprinted) ©2002 American Medical Association. All rights reserved.\nDownloaded From: https://jamanetwork.com/ by a University of California - San Diego User on 05/16/2019 Initially, the design allowed women\nwith a uterus to be randomized to re-ceive unopposed estrogen, estrogenplus progestin, or placebo. After the re-lease of the Postmenopausal Estrogen/Progestin Intervention (PEPI) trialresults\n15indicating that long-term ad-\nherence to unopposed estrogen was notfeasible in women with a uterus, theWHI protocol was changed to random-ize women with a uterus to only estro-gen plus progestin or placebo in equalproportions. The 331 women previ-ously randomized to unopposed estro-gen were unblinded and reassigned toestrogen plus progestin. These womenare included in the estrogen plus pro-gestin group in this report, resulting in8506 participants in the estrogen plusprogestin group vs 8102 in the pla-cebo group. Analysis of the data ex-cluding the women randomized be-fore this protocol change did not affectthe results. Considerable effort wasmade to maintain blinding of other par-ticipants and clinic staff. When re-quired for safety or symptom manage-ment, an unblinding officer providedthe clinic gynecologist, who was not in-volved with study outcomes activities,with the treatment assignment.\nFollow-up\nStudy participants were contacted bytelephone 6 weeks after randomiza-tion to assess symptoms and reinforceadherence. Follow-up for clinical eventsoccurred every 6 months, with annualin-clinic visits required. At each semi-annual contact, a standardized inter-view collected information on desig-nated symptoms and safety concerns,and initial reports of outcome eventswere obtained using a self-adminis-tered questionnaire. Adherence to studyinterventions was assessed by weigh-ing of returned bottles. The study pro-tocol required annual mammogramsand clinical breast examinations; studymedications were withheld if safety pro-cedures were not performed, but theseparticipants continued to be followedup. Electrocardiograms were col-lected at baseline and at follow-up years3 and 6.Data Collection, Management,\nand Quality Assurance\nAll data were collected on standard-\nized study forms by certified staff ac-cording to documented study proce-dures. Study data were entered into alocal clinical center database devel-oped and maintained by the ClinicalCoordinating Center and provided toeach site in the form of a local area net-work connected to the Clinical Coor-dinating Center through a wide areanetwork. Data quality was ensuredthrough standard data entry mecha-nisms, routine reporting and databasechecks, random chart audits, and rou-tine site visits.\nMaintenance/Discontinuation\nof Study Medications\nDuring the trial, some flexibility of the\ndosages of both estrogen and proges-tin was allowed to manage symptomssuch as breast tenderness and vaginalbleeding. Vaginal bleeding was man-aged according to an algorithm that ac-counted for the time since randomiza-tion, severity of the bleeding, treatmentassignment, and endometrial histol-ogy. Women who had a hysterectomyafter randomization for indicationsother than cancer were switched to un-opposed estrogen or the correspond-ing placebo without unblinding. Thesewomen are included in the original ran-domization group for analyses.\nPermanent discontinuation of study\nmedication was required by protocolfor women who developed breast can-cer, endometrial pathologic state (hy-perplasia not responsive to treatment,atypia, or cancer), deep vein thrombo-sis (DVT) or PE, malignant mela-noma, meningioma, triglyceridelevel greater than 1000 mg/dL (11.3mmol/L), or prescription of estrogen,testosterone, or selective estrogen-receptor modulators by their personalphysician. Medications were tempo-rarily discontinued in participantswho had acute myocardial infarction(MI), stroke, fracture, or major injuryinvolving hospitalization, surgeryinvolving use of anesthesia, any ill-ness resulting in immobilization formore than 1 week, or any other severe\nillness in which hormone use is tem-porarily inappropriate.\nOutcome Ascertainment\nCardiovascular Disease. Coronary heart\ndisease was defined as acute MI requir-ing overnight hospitalization, silent MIdetermined from serial electrocardio-grams (ECGs), or CHD death. The di-agnosis of acute MI was established ac-cording to an algorithm adapted fromstandardized criteria\n16that included car-\ndiac pain, cardiac enzyme and tropo-nin levels, and ECG readings. The pri-mary analyses included both definite andprobable MIs as defined by the algo-rithm. Myocardial infarction occurringduring surgery and aborted MIs were in-cluded. An aborted MI was defined aschest pain and ECG evidence of acuteMI at presentation, an intervention (eg,thrombolysis) followed by resolution ofECG changes, and all cardiac enzymelevels within normal ranges. Silent MIwas diagnosed by comparing baselineand follow-up ECGs at 3 and 6 years af-ter randomization. Coronary death wasdefined as death consistent with CHDas underlying cause plus 1 or more ofthe following: preterminal hospitaliza-tion with MI within 28 days of death,previous angina or MI and no poten-tially lethal noncoronary disease, deathresulting from a procedure related tocoronary artery disease, or death cer-tificate consistent with CHD as the un-derlying cause. Stroke diagnosis wasbased on rapid onset of a neurologic defi-cit lasting more than 24 hours, sup-ported by imaging studies when avail-able. Pulmonary embolism and DVTrequired clinical symptoms supportedby relevant diagnostic studies.\nCancer. Breast, colorectal, endome-\ntrial, and other cancers were con-firmed by pathological reports whenavailable. Current data indicate that atleast 98% of breast, colorectal, and en-dometrial cancers and 92% of other can-cers were documented with pathologi-cal reports.\nFractures. Reports of hip, verte-\nbral, and other osteoporotic fractures(including all fractures except those ofRISKS AND BENEFITS OF ESTROGEN PLUS PROGESTIN\n©2002 American Medical Association. All rights reserved. (Reprinted) JAMA, July 17, 2002 —Vol 288, No. 3 323\nDownloaded From: https://jamanetwork.com/ by a University of California - San Diego User on 05/16/2019 the ribs, chest/sternum, skull/face, fin-\ngers, toes, and cervical vertebrae) wereroutinely ascertained. All fracture out-comes were verified by radiology re-ports. Study radiographs were not ob-tained to ascertain subclinical vertebralfractures.\nThis report is based on outcomes\nadjudicated by clinical center physi-cian adjudicators, as used for trial-monitoring purposes. Clinical centerphysician adjudicators were centrallytrained and blinded to treatment as-signment and participants ’symptoms.\nFuture communications will report re-sults based on centrally adjudicated out-comes and will include a broader rangeof outcomes with more extensive ex-planatory analyses. Since this report ispresented before the planned studycloseout, outcome information is stillbeing collected and adjudicated. Lo-cal adjudication is complete for ap-proximately 96% of the designated self-reported events. To date, agreementrates between local and central adju-dication are: MI, 84%; revasculariza-tion procedures, 97%; PE, 89%; DVT,84%; stroke, 94%; invasive breast can-cer, 98%; endometrial cancer, 96%; co-lorectal cancer, 98%; hip fracture, 95%;and specific cause of death, 82%. Whenrelated cardiovascular conditions arecombined (eg, when unstable angina orcongestive heart failure is grouped withMI), agreement rates exceed 94% forcardiovascular disease and 90% for spe-cific cause of death.\nStatistical Analyses\nAll primary analyses use time-to-event methods and are based on the in-tention-to-treat principle. For a givenoutcome, the time of event was de-fined as the number of days from ran-domization to the first postrandomiza-tion diagnosis, as determined by thelocal adjudicator. For silent MIs, thedate of the follow-up ECG applied. Par-ticipants without a diagnosis were cen-sored for that event at the time of lastfollow-up contact. Primary outcomecomparisons are presented as hazard ra-tios (HRs) and 95% confidence inter-vals (CIs) from Cox proportional haz-Table 1. Baseline Characteristics of the Women ’s Health Initiative Estrogen Plus Progestin\nTrial Participants (N = 16 608) by Randomization Assignment *\nCharacteristicsEstrogen + Progestin\n(n = 8506)Placebo\n(n = 8102)P\nValue†\nAge at screening, mean (SD), y 63.2 (7.1) 63.3 (7.1) .39\nAge group at screening, y\n50-59 2839 (33.4) 2683 (33.1)\n60-69 3853 (45.3) 3657 (45.1) .80\n70-79 1814 (21.3) 1762 (21.7)\nRace/ethnicity\nWhite 7140 (83.9) 6805 (84.0)\nBlack 549 (6.5) 575 (7.1)\nHispanic 472 (5.5) 416 (5.1).33\nAmerican Indian 26 (0.3) 30 (0.4)\nAsian/Pacific Islander 194 (2.3) 169 (2.1)\nUnknown 125 (1.5) 107 (1.3)\nHormone use\nNever 6280 (73.9) 6024 (74.4)\nPast 1674 (19.7) 1588 (19.6) .49\nCurrent‡ 548 (6.4) 487 (6.0)\nDuration of prior hormone use, y\n/H110215 1538 (69.1) 1467 (70.6)\n5-10 426 (19.1) 357 (17.2) .25\n/H1135010 262 (11.8) 253 (12.2)\nBody mass index, mean (SD), kg/m2§ 28.5 (5.8) 28.5 (5.9) .66\nBody mass index, kg/m2\n/H1102125 2579 (30.4) 2479 (30.8)\n25-29 2992 (35.3) 2834 (35.2) .89\n/H1135030 2899 (34.2) 2737 (34.0)\nSystolic BP, mean (SD), mm Hg 127.6 (17.6) 127.8 (17.5) .51Diastolic BP, mean (SD), mm Hg 75.6 (9.1) 75.8 (9.1) .31Smoking\nNever 4178 (49.6) 3999 (50.0)\nPast 3362 (39.9) 3157 (39.5) .85\nCurrent 880 (10.5) 838 (10.5)\nParity\nNever pregnant/no term pregnancy 856 (10.1) 832 (10.3).67\n/H113501 term pregnancy 7609 (89.9) 7233 (89.7)\nAge at first birth, y /H14067\n/H1102120 1122 (16.4) 1114 (17.4)\n20-29 4985 (73.0) 4685 (73.0) .11\n/H1135030 723 (10.6) 621 (9.7)\nTreated for diabetes 374 (4.4) 360 (4.4) .88Treated for hypertension or\nBP/H11350140/90 mm Hg3039 (35.7) 2949 (36.4) .37\nElevated cholesterol levels requiring\nmedication944 (12.5) 962 (12.9) .50\nStatin use at baseline¶ 590 (6.9) 548 (6.8) .66\nAspirin use ( /H1135080 mg/d) at baseline 1623 (19.1) 1631 (20.1) .09\nHistory of myocardial infarction 139 (1.6) 157 (1.9) .14History of angina 238 (2.8) 234 (2.9) .73History of CABG/PTCA 95 (1.1) 120 (1.5) .04History of stroke 61 (0.7) 77 (1.0) .10History of DVT or PE 79 (0.9) 62 (0.8) .25Female relative had breast cancer 1286 (16.0) 1175 (15.3) .28Fracture at age /H1135055 y 1031 (13.5) 1029 (13.6) .87\ncontinuedRISKS AND BENEFITS OF ESTROGEN PLUS PROGESTIN\n324 JAMA, July 17, 2002 —Vol 288, No. 3 (Reprinted) ©2002 American Medical Association. All rights reserved.\nDownloaded From: https://jamanetwork.com/ by a University of California - San Diego User on 05/16/2019 ards analyses,17stratified by clinical\ncenter, age, prior disease, and random-ization status in the low-fat diet trial.\nTwo forms of CIs are presented,\nnominal and adjusted. Nominal 95%CIs describe the variability in the esti-mates that would arise from a simpletrial for a single outcome. Although tra-ditional, these CIs do not account forthe multiple statistical testing issues(across time and across outcome cat-egories) that occurred in this trial, sothe probability is greater than .05 thatat least 1 of these CIs will exclude unityunder an overall null hypothesis. Theadjusted 95% CIs presented herein usegroup sequential methods to correct formultiple analyses over time. A Bonfer-roni correction for 7 outcomes as speci-fied in the monitoring plan (describedherein) was applied to all clinical out-comes other than CHD and breast can-cer, the designated primary and pri-mary adverse effect outcomes, and theglobal index. The adjusted CIs areclosely related to the monitoring pro-cedures and, as such, represent a con-servative assessment of the evidence.This report focuses primarily on re-sults using the unadjusted statistics andalso relies on consistency across diag-nostic categories, supportive data fromother studies, and biologic plausibil-ity for interpretation of the findings.\nData and Safety Monitoring\nTrial monitoring guidelines for earlystopping considerations were based onO’Brien-Fleming boundaries\n18using\nasymmetric upper and lower bound-aries: a 1-sided, .025-level upper bound-ary for benefit and 1-sided, .05-levellower boundaries for adverse effects.The adverse-effect boundaries were fur-ther adjusted with a Bonferroni correc-tion for the 7 major outcomes otherthan breast cancer that were specifi-cally monitored (CHD, stroke, PE, co-lorectal cancer, endometrial cancer, hipfracture, and death due to other causes).The global index of monitored out-comes played a supportive role as asummary measure of the overall bal-ance of risks and benefits. Trial moni-toring for early stopping consider-ations was conducted semiannually by\nan independent data and safety moni-toring board (DSMB). Aspects of themonitoring plan have been pub-lished.\n19\nRESULTS\nTrial Monitoring\nand Early Stopping\nFormal monitoring began in the fall of\n1997 with the expectation of final analy-sis in 2005 after an average of approxi-mately 8.5 years of follow-up. Late in1999, with 5 interim analyses com-pleted, the DSMB observed small butconsistent early adverse effects in car-diovascular outcomes and in the globalindex. None of the disease-specificboundaries had been crossed. In thespring of 2000 and again in the springof 2001, at the direction of the DSMB,hormone trial participants were giveninformation indicating that increases inMI, stroke, and PE/DVT had been ob-served and that the trial continued be-cause the balance of risks and benefitsremained uncertain.\nIn reviewing the data for the 10th in-\nterim analyses on May 31, 2002, theDSMB found that the adverse effects incardiovascular diseases persisted, al-though these results were still within themonitoring boundaries. However, thedesign-specified weighted log-rank teststatistic for breast cancer ( z=−3.19)crossed the designated boundary\n(z=−2.32) and the global index was sup-\nportive of a finding of overall harm(z=−1.62). Updated analyses includ-\ning 2 months of additional data, avail-able by the time of the meeting, did notappreciably change the overall results.On the basis of these data, the DSMBconcluded that the evidence for breastcancer harm, along with evidence forsome increase in CHD, stroke, and PE,outweighed the evidence of benefit forfractures and possible benefit for coloncancer over the average 5.2-year fol-low-up period. Therefore, the DSMB rec-ommended early stopping of the estro-gen plus progestin component of thetrial. Because the balance of risks andbenefits in the unopposed-estrogen com-ponent remains uncertain, the DSMBrecommended continuation of that com-ponent of the WHI. Individual trial par-ticipants have been informed.\nBaseline Characteristics\nThere were no substantive differencesbetween study groups at baseline; 8506women were randomized into the es-trogen plus progestin group and 8102into the placebo group ( T\nABLE 1). The\nmean (SD) age was 63.3 (7.1) years.Two thirds of the women who re-ported prior or current hormone usehad taken combined hormones and onethird had used unopposed estrogen.Table 1. Baseline Characteristics of the Women ’s Health Initiative Estrogen Plus Progestin\nTrial Participants (N = 16 608) by Randomization Assignment *(cont)\nCharacteristicsEstrogen + Progestin\n(n = 8506)Placebo\n(n = 8102)P\nValue\nGail model 5-year risk of breast cancer, %\n/H110211 1290 (15.2) 1271 (15.7)\n1-/H110212 5384 (63.3) 5139 (63.4).64\n2-/H110215 1751 (20.6) 1621 (20.0)\n/H113505 81 (1.0) 71 (0.9)\nNo. of falls in last 12 mo\n0 5168 (66.2) 5172 (67.5)\n1 1643 (21.0) 1545 (20.2).18\n2 651 (8.3) 645 (8.4)\n/H113503 349 (4.5) 303 (4.0)\n*Data are presented as number (percentage) of patients unless otherwise noted. BP indicates blood pressure; CABG/\nPTCA, coronary artery bypass graft/percutaneous transluminal coronary angioplasty; DVT, deep vein thrombosis;and PE, pulmonary embolism.\n†Based on /H9273\n2tests (categorical variables) or ttests (continuous variables).\n‡Required a 3-month washout prior to randomization.\n§Total number of participants with data available was 8470 for estrogen plus progestin and 8050 for placebo.\n/H14067Among women who reported having a term pregnancy.\n¶Statins are 3-hydroxy-3-methylglutaryl coenzyme A reductase inhibitors.RISKS AND BENEFITS OF ESTROGEN PLUS PROGESTIN\n©2002 American Medical Association. All rights reserved. (Reprinted) JAMA, July 17, 2002 —Vol 288, No. 3 325\nDownloaded From: https://jamanetwork.com/ by a University of California - San Diego User on 05/16/2019 Prevalence of prior cardiovascular\ndisease was low and levels of cardio-vascular risk factors were consistentwith a generally healthy population ofpostmenopausal women. An assess-ment of commonly studied breastcancer risk factors, both individuallyand combined using the Gail model,\n20\nindicate that the cohort in generalwas not at increased risk of breastcancer.\nFollow-up, Adherence,\nand Unblinding\nVital status is known for 16025 ran-\ndomized participants (96.5%), includ-ing 449 (2.7%) known to be deceased.A total of 583 (3.5%) participants werelost to follow-up or stopped providingoutcomes information for more than 18months. The remaining 15576 (93.8%)\nprovided recent outcome information(Figure 1).\nAt the time of this report, all women\nhad been enrolled for at least 3.5 years,with an average follow-up of 5.2 years\nand a maximum of 8.5 years. A sub-stantial number of women had stoppedtaking study drugs at some time (42%of estrogen plus progestin and 38% ofplacebo). Dropout rates over time(F\nIGURE 2) exceeded design projec-\ntions, particularly early on, but com-pare favorably with community-basedadherence to postmenopausal hor-mones.\n21Some women in both groups\ninitiated hormone use through theirown clinician (6.2% in the estrogen plusprogestin group and 10.7% in the pla-cebo group cumulatively by the sixthFigure 2. Cumulative Dropout and Drop-in\nRates by Randomization Assignment andFollow-up Duration\n40\n520\n15\n10253035\n0\nYearRate, %\n1 2 3 4 5 6 7Estrogen + Progestin\nPlaceboEstrogen + ProgestinDropout Drop-in\nPlacebo\nDropout refers to women who discontinued study\nmedication; drop-in, women who discontinued studymedication and received postmenopausal hormonesthrough their own clinician.\nTable 2. Clinical Outcomes by Randomization Assignment *\nOutcomesNo. of Patients (Annualized %)\nHazard Ratio Nominal 95% CI Adjusted 95% CIEstrogen + Progestin\n(n = 8506)Placebo\n(n = 8102)\nFollow-up time, mean (SD), mo 62.2 (16.1) 61.2 (15.0) NA NA NA\nCardiovascular disease †\nCHD 164 (0.37) 122 (0.30) 1.29 1.02-1.63 0.85-1.97\nCHD death 33 (0.07) 26 (0.06) 1.18 0.70-1.97 0.47-2.98\nNonfatal MI 133 (0.30) 96 (0.23) 1.32 1.02-1.72 0.82-2.13\nCABG/PTCA 183 (0.42) 171 (0.41) 1.04 0.84-1.28 0.71-1.51\nStroke 127 (0.29) 85 (0.21) 1.41 1.07-1.85 0.86-2.31\nFatal 16 (0.04) 13 (0.03) 1.20 0.58-2.50 0.32-4.49\nNonfatal 94 (0.21) 59 (0.14) 1.50 1.08-2.08 0.83-2.70\nVenous thromboembolic disease 151 (0.34) 67 (0.16) 2.11 1.58-2.82 1.26-3.55\nDeep vein thrombosis 115 (0.26) 52 (0.13) 2.07 1.49-2.87 1.14-3.74\nPulmonary embolism 70 (0.16) 31 (0.08) 2.13 1.39-3.25 0.99-4.56\nTotal cardiovascular disease 694 (1.57) 546 (1.32) 1.22 1.09-1.36 1.00-1.49\nCancer\nInvasive breast 166 (0.38) 124 (0.30) 1.26 1.00-1.59 0.83-1.92\nEndometrial 22 (0.05) 25 (0.06) 0.83 0.47-1.47 0.29-2.32\nColorectal 45 (0.10) 67 (0.16) 0.63 0.43-0.92 0.32-1.24\nTotal 502 (1.14) 458 (1.11) 1.03 0.90-1.17 0.86-1.22\nFractures\nHip 44 (0.10) 62 (0.15) 0.66 0.45-0.98 0.33-1.33\nVertebral 41 (0.09) 60 (0.15) 0.66 0.44-0.98 0.32-1.34\nOther osteoporotic ‡ 579 (1.31) 701 (1.70) 0.77 0.69-0.86 0.63-0.94\nTotal 650 (1.47) 788 (1.91) 0.76 0.69-0.85 0.63-0.92\nDeath\nDue to other causes 165 (0.37) 166 (0.40) 0.92 0.74-1.14 0.62-1.35\nTotal 231 (0.52) 218 (0.53) 0.98 0.82-1.18 0.70-1.37\nGlobal index § 751 (1.70) 623 (1.51) 1.15 1.03-1.28 0.95-1.39\n*CI indicates confidence interval; NA, not applicable; CHD, coronary heart disease; MI, myocardial infarction; CABG, coronary artery bypass grafti ng; and PTCA, percutaneous\ntransluminal coronary angioplasty.\n†CHD includes acute MI requiring hospitalization, silent MI determined from serial electrocardiograms, and coronary death. There were 8 silent MIs. Total cardiovascular disease is\nlimited to events during hospitalization except venous thromboembolic disease reported after January 1, 2000.\n‡Other osteoporotic fractures include all fractures other than chest/sternum, skull/face, fingers, toes, and cervical vertebrae, as well as hip and vertebral fractures reported sepa-\nrately.\n§The global index represents the first event for each participant from among the following types: CHD, stroke, pulmonary embolism, breast cancer, end ometrial cancer, colorectal\ncancer, hip fracture, and death due to other causes.RISKS AND BENEFITS OF ESTROGEN PLUS PROGESTIN\n326 JAMA, July 17, 2002 —Vol 288, No. 3 (Reprinted) ©2002 American Medical Association. All rights reserved.\nDownloaded From: https://jamanetwork.com/ by a University of California - San Diego User on 05/16/2019 year). These “drop-in ”rates were also\ngreater than expected.\nAt the time of this report, clinic gyne-\ncologists had been unblinded to treat-ment assignment for 3444 women in theestrogen plus progestin group and 548women in the placebo group, primarilyto manage persistent vaginal bleeding.During the trial, 248 women in the es-trogen plus progestin group and 183 inthe placebo group had a hysterectomy.\nIntermediate Cardiovascular\nDisease End Points\nBlood lipid levels, assessed in an 8.6%\nsubsample of fasting blood specimenscollected from women at baseline andyear 1, showed greater reductions inlow-density lipoprotein cholesterol(−12.7%) and increases in high-\ndensity lipoprotein cholesterol (7.3%)and triglycerides (6.9%) with estrogenplus progestin relative to placebo (datanot shown), consistent with HERS andPEPI.\n10,22Systolic blood pressure was,\non average, 1.0 mm Hg higher in womentaking estrogen plus progestin at 1 year,rising to 1.5 mm Hg at 2 years andbeyond (data not shown). Diastolicblood pressures did not differ.\nClinical Outcomes\nCardiovascular Disease. Overall CHD\nrates were low ( TABLE 2). The rate of\nwomen experiencing CHD events wasincreased by 29% for women taking es-trogen plus progestin relative to pla-cebo (37 vs 30 per 10000 person-years), reaching nominal statisticalsignificance (at the .05 level). Most ofthe excess was in nonfatal MI. No sig-nificant differences were observed inCHD deaths or revascularization pro-cedures (coronary artery bypass graft-ing or percutaneous transluminal coro-nary angioplasty). Stroke rates were alsohigher in women receiving estrogenplus progestin (41% increase; 29 vs 21per 10000 person-years), with most ofthe elevation occurring in nonfatalevents. Women in the estrogen plusprogestin group had 2-fold greater ratesof venous thromboembolism (VTE), aswell as DVT and PE individually, withalmost all associated CIs excluding 1.Rates of VTE were 34 and 16 per 10000\nperson-years in the estrogen plus pro-gestin and placebo groups, respec-tively. Total cardiovascular disease,including other events requiring hos-pitalization, was increased by 22% inthe estrogen plus progestin group.\nCancer. The invasive breast cancer\nrates in the placebo group were con-sistent with design expectations. The26% increase (38 vs 30 per 10000 per-son-years) observed in the estrogen plusprogestin group almost reached nomi-nal statistical significance and, as notedherein, the weighted test statistic usedfor monitoring was highly significant.No significant difference was ob-served for in situ breast cancers. Fol-low-up rates for mammography werecomparable in the estrogen plus pro-gestin and placebo groups. Colorectalcancer rates were reduced by 37% (10vs 16 per 10000 person-years), alsoreaching nominal statistical signifi-cance. Endometrial cancer incidencewas not affected, nor was lung cancerincidence (54 vs 50; HR, 1.04; 95% CI,0.71-1.53) or total cancer incidence.\nFractures. This cohort experienced\nlow hip fracture rates (10 per 10000person-years in the estrogen plus pro-gestin group vs 15 per 10000 person-years in the placebo group). Estrogenplus progestin reduced the observed hipand clinical vertebral fracture rates byone third compared with placebo, bothnominally significantly. The reduc-tions in other osteoporotic fractures(23%) and total fractures (24%) werestatistically significant (all associatedCIs exclude 1).\nThe global index showed a nomi-\nnally significant 15% increase in the es-trogen plus progestin group (170 vs 151\nper 10000 person-years). There were nodifferences in mortality or cause of deathbetween groups ( T\nABLE 3).\nTime Trends\nThe Kaplan Meier estimates of cumu-\nlative hazards ( FIGURE 3) for CHD in-\ndicate that the difference between treat-ment groups began to develop soonafter randomization. These curves pro-vide little evidence of convergencethrough 6 years of follow-up. The cu-mulative hazards for stroke begin to di-verge between 1 and 2 years after ran-domization, and this difference persistsbeyond the fifth year. For PE, the curvesseparate soon after randomization andshow continuing adverse effectsthroughout the observation period. Forbreast cancer, the cumulative hazardfunctions are comparable through thefirst 4 years, at which point the curvefor estrogen plus progestin begins to risemore rapidly than that for placebo.Curves for colorectal cancer show ben-efit beginning at 3 years, and curves forhip fracture show increasing cumula-tive benefit over time. The differencein hazard rates for the global index(F\nIGURE 4) suggests a gradual in-\ncrease in adverse effects compared withbenefits for estrogen plus progestinthrough year 5, with a possible nar-rowing of the difference by year 6; how-ever, HR estimates tend to be unstablebeyond 6 years after randomization.Total mortality rates are indistinguish-able between estrogen plus progestinand placebo.\nTests for linear trends with time since\nrandomization, based on a Cox pro-portional hazards model with a time-Table 3. Cause of Death by Randomization Assignment\nNo. (Annualized %)\nEstrogen + Progestin (n = 8506) Placebo (n = 8102)\nTotal deaths 231 (0.52) 218 (0.53)Adjudicated deaths 215 (0.49) 201 (0.49)\nCardiovascular 65 (0.15) 55 (0.13)\nBreast cancer 3 (0.01) 2 ( /H110210.01)\nOther cancer 104 (0.24) 86 (0.21)\nOther known cause 34 (0.08) 41 (0.10)\nUnknown cause 9 (0.02) 17 (0.04)RISKS AND BENEFITS OF ESTROGEN PLUS PROGESTIN\n©2002 American Medical Association. All rights reserved. (Reprinted) JAMA, July 17, 2002 —Vol 288, No. 3 327\nDownloaded From: https://jamanetwork.com/ by a University of California - San Diego User on 05/16/2019 dependent covariate, detected no trend\nwith time for CHD, stroke, colorectalcancer, hip fracture, total mortality, orthe global index ( T\nABLE 4). There wassome evidence for an increasing risk of\nbreast cancer over time with estrogenplus progestin ( z=2.56 compared with\na nominal zscore for statistical signifi-cance of 1.96) and a decreasing risk of\nVTE with time ( z=−2.45). These re-\nsults must be viewed cautiously be-cause the number of events in eachinterval is modest, the data in lateryears are still incomplete, and later yearcomparisons are limited to womenstill at risk of their first event for thatoutcome.\nSubgroup Analyses\nCardiovascular Disease. A small sub-\nset of women (n=400; average follow-up, 57.4 months) in WHI reported con-ditions at baseline that would havemade them eligible for HERS, ie, priorMI or revascularization procedures.Among these women with establishedcoronary disease, the HR for subse-quent CHD for estrogen plus proges-tin relative to placebo was 1.28 (95%CI, 0.64-2.56) with 19 vs 16 events. Theremaining women, those without priorCHD, had an identical HR for CHD(145 vs 106; HR, 1.28; 95% CI, 1.00-1.65). Few women with a history ofVTE were enrolled, but these data sug-gest a possibility that these women maybe at greater risk of future VTE eventswhen taking estrogen plus progestin (7vs 1; HR, 4.90; 95% CI, 0.58-41.06)than those without a history of VTE(144 vs 66; HR, 2.06; 95% CI,1.54-2.76). For stroke, prior history didnot confer additional risk (1 vs 5 inwomen with prior stroke; HR, 0.46;95% CI, 0.05-4.51; 126 vs 80 with noprior stroke; HR, 1.47; 95% CI,1.11-1.95). No noteworthy interac-tions with age, race/ethnicity, body massindex, prior hormone use, smoking sta-tus, blood pressure, diabetes, aspirinuse, or statin use were found for theeffect of estrogen plus progestin onCHD, stroke, or VTE.\nBreast Cancer. Women reporting\nprior postmenopausal hormone usehad higher HRs for breast cancer asso-ciated with estrogen plus progestinuse than those who never used post-menopausal hormones (among neverusers, 114 vs 102; HR, 1.06; 95% CI,0.81-1.38; for women with /H110215 years\nof prior use, 32 vs 15; HR, 2.13; 95%CI, 1.15-3.94; for women with 5-10Figure 3. Kaplan-Meier Estimates of Cumulative Hazards for Selected Clinical Outcomes\nEstrogen + Progestin Placebo\n0.02\n0.01\n00.03\n8506Estrogen +\nProgestinNo. at Risk\n8353 8133 7004 4251 2085 814 8248\n8102 Placebo 7999 7789 6639 3948 1756 523 7899Cumulative HazardCoronary Heart Disease\n8506 8375 8155 7032 4272 2088 814 8277\n8102 8005 7804 6659 3960 1760 524 7912Stroke\nHR, 1.29\n95% nCI, 1.02-1.6395% aCI, 0.85-1.97HR, 1.41\n95% nCI, 1.07-1.8595% aCI, 0.86-2.31\n0.01\n00.03\n0.02\n8506Estrogen +\nProgestinNo. at Risk\n8364 8174 7054 4295 2108 820 8280\n8102 Placebo 8013 7825 6679 3973 1770 526 7924Cumulative HazardPulmonary Embolism\n8506 8378 8150 7000 4234 2064 801 8277\n8102 8001 7772 6619 3922 1740 523 7891Invasive Breast Cancer\nHR, 2.13\n95% nCI, 1.39-3.2595% aCI, 0.99-4.56HR, 1.26\n95% nCI, 1.00-1.5995% aCI, 0.83-1.92\n0.01\n00.03\n0.02\n1 0 3 4 5 6 7 2\n8506Estrogen +\nProgestinNo. at Risk\n8379 8194 7073 4305 2111 825 8297\n8102 Placebo 8003 7814 6660 3958 1756 522 7916Time, yCumulative HazardColorectal Cancer\n1 0 3 4 5 6 7 2\n8506 8382 8190 7073 4305 2116 826 8299\n8102 8009 7807 6659 3958 1763 525 7915Time, yHip Fracture\nHR, 0.63\n95% nCI, 0.43-0.9295% aCI, 0.32-1.24HR, 0.66\n95% nCI, 0.45-0.9895% aCI, 0.33-1.33\nHR indicates hazard ratio; nCI, nominal confidence interval; and aCI, adjusted confidence interval.RISKS AND BENEFITS OF ESTROGEN PLUS PROGESTIN\n328 JAMA, July 17, 2002 —Vol 288, No. 3 (Reprinted) ©2002 American Medical Association. All rights reserved.\nDownloaded From: https://jamanetwork.com/ by a University of California - San Diego User on 05/16/2019 years of prior use, 11 vs 2; HR, 4.61;\n95% CI, 1.01-21.02; and for womenwith/H1135010 years of prior use, 9 vs 5;\nHR, 1.81; 95% CI, 0.60-5.43; test fortrend, z=2.17). No interactions\nbetween estrogen plus progestin andage, race/ethnicity, family history,parity, age at first birth, body massindex, or Gail-model risk score wereobserved for invasive breast cancer.\nFurther Analyses\nBecause a number of women stoppedstudy medications during follow-up,several analyses were performed to ex-amine the sensitivity of the principal HRestimates to actual use of study medi-cations. Analyses that censored a wom-an’s event history 6 months after be-\ncoming nonadherent (using /H1102180% of\nor stopping study drugs) produced thelargest changes to estimated effect sizes.This approach increased HRs to 1.51 forCHD, to 1.49 for breast cancer, to 1.67for stroke, and to 3.29 for VTE. Analy-ses attributing events to actual hor-mone use ( “as treated, ”allowing for a\n6-month lag) produced more modestchanges to these estimates. Analyses ex-cluding women randomized during theperiod when the unopposed-estrogen\ncomponent was open to women witha uterus and analyses stratifying by en-rollment period did not substantially\nFigure 4. Kaplan-Meier Estimates of Cumulative Hazards for Global Index and Death\n0.15\n0.05\n00.10\n1 0 3 4 5 6 7 2\n8506Estrogen +\nProgestinNo. at Risk\n8291 7927 6755HR, 1.15\n95% nCI, 1.03-1.2895% aCI, 0.95-1.39HR, 0.98\n95% nCI, 0.82-1.1895% aCI, 0.70-1.37\n4058 1964 758 8113\n8102 Placebo 7939 7607 6425 3794 1662 495 7774Time, yCumulative HazardGlobal Index\n1 0 3 4 5 6 7 2\n8506 8388 8214 7095 4320 2121 828 8313\n8102 8018 7840 6697 3985 1777 530 7936Time, yDeath\nEstrogen + Progestin\nPlacebo\nHR indicates hazard ratio; nCI, nominal confidence interval; and aCI, adjusted confidence interval.\nTable 4. Selected Clinical Outcomes by Follow-up Year and Randomization Assignment *\nOutcomesYear 1 Year 2 Year 3\nE + P Placebo Ratio E + P Placebo Ratio E + P Placebo Ratio\nNo. of participant-years 8435 8050 8353 7980 8268 7888Coronary heart disease 43 (0.51) 23 (0.29) 1.78 36 (0.43) 30 (0.38) 1.15 20 (0.24) 18 (0.23) 1.06Stroke 17 (0.20) 17 (0.21) 0.95 27 (0.32) 15 (0.19) 1.72 30 (0.36) 16 (0.20) 1.79Venous thromboembolism 49 (0.58) 13 (0.16) 3.60 26 (0.31) 11 (0.14) 2.26 21 (0.25) 12 (0.15) 1.67Invasive breast cancer 11 (0.13) 17 (0.21) 0.62 26 (0.31) 30 (0.38) 0.83 28 (0.34) 23 (0.29) 1.16Endometrial cancer 2 (0.02) 2 (0.02) 0.95 4 (0.05) 4 (0.05) 0.96 4 (0.05) 5 (0.06) 0.76Colorectal cancer 10 (0.12) 15 (0.19) 0.64 11 (0.13) 9 (0.11) 1.17 6 (0.07) 8 (0.10) 0.72Hip fracture 6 (0.07) 9 (0.11) 0.64 8 (0.10) 13 (0.16) 0.59 11 (0.13) 12 (0.15) 0.87Total death 22 (0.26) 17 (0.21) 1.24 30 (0.36) 30 (0.38) 0.96 39 (0.47) 35 (0.44) 1.06Global index 123 (1.46) 96 (1.19) 1.22 134 (1.60) 117 (1.47) 1.09 127 (1.54) 107 (1.36) 1.13\nOutcomesYear 4\nYear 5 Year 6 and Later\nzScore\nfor Trend † E + P Placebo Ratio E + P Placebo Ratio E + P Placebo Ratio\nNo. of participant-years 7926 7562 5964 5566 5129 4243Coronary heart disease 25 (0.32) 24 (0.32) 0.99 23 (0.39) 9 (0.16) 2.38 17 (0.33) 18 (0.42) 0.78 −1.19Stroke 25 (0.32) 14 (0.19) 1.70 16 (0.27) 8 (0.14) 1.87 12 (0.23) 15 (0.35) 0.66 −0.51Venous thromboembolism 27 (0.34) 14 (0.19) 1.84 16 (0.27) 6 (0.11) 2.49 12 (0.23) 11 (0.26) 0.90 −2.45Invasive breast cancer 40 (0.50) 22 (0.29) 1.73 34 (0.57) 12 (0.22) 2.64 27 (0.53) 20 (0.47) 1.12 2.56Endometrial cancer 10 (0.13) 5 (0.07) 1.91 1 (0.02) 4 (0.07) 0.23 1 (0.02) 5 (0.12) 0.17 −1.58Colorectal cancer 9 (0.11) 20 (0.26) 0.43 4 (0.07) 8 (0.14) 0.47 5 (0.10) 7 (0.16) 0.59 −0.81Hip fracture 8 (0.10) 11 (0.15) 0.69 5 (0.08) 8 (0.14) 0.58 6 (0.12) 9 (0.21) 0.55 0.25Total death 55 (0.69) 48 (0.63) 1.09 41 (0.69) 44 (0.79) 0.87 44 (0.86) 44 (1.04) 0.83 −0.79Global index 155 (1.96) 127 (1.68) 1.16 112 (1.88) 77 (1.38) 1.36 100 (1.95) 99 (2.33) 0.84 −0.87\n*E + P indicates estrogen plus progestin. All outcome data are number of patients (annualized percentage).\n†Tests for trends are based on Cox proportional hazards models with time-dependent treatment effects. The zscores shown indicate trends across all years.RISKS AND BENEFITS OF ESTROGEN PLUS PROGESTIN\n©2002 American Medical Association. All rights reserved. (Reprinted) JAMA, July 17, 2002 —Vol 288, No. 3 329\nDownloaded From: https://jamanetwork.com/ by a University of California - San Diego User on 05/16/2019 affect the results. These analyses sug-\ngest that the intention-to-treat esti-mates of HRs may somewhat underes-timate the effect sizes relative to whatwould be observed with full adher-ence to study medications.\nCOMMENT\nThe WHI provides evidence from a\nlarge randomized trial that addresses theimportant issue of whether mostwomen with an intact uterus in the de-cades of life following menopauseshould consider hormone therapy toprevent chronic disease. The WHI en-rolled a cohort of mostly healthy, eth-nically diverse women, spanning a largeage range (50-79 years at baseline). Itis noteworthy that the increased risksfor cardiovascular disease and inva-sive breast cancer were present acrossracial/ethnic and age strata and were notinfluenced by the antecedent risk sta-tus or prior disease. Hence, the resultsare likely to be generally applicable tohealthy women in this age range. At thetime the trial was stopped, the in-creases in numbers of invasive breastcancers, CHD, stroke, and PE made ap-proximately equal contributions toharm in the estrogen plus progestingroup compared with placebo, whichwere not counterbalanced by thesmaller reductions in numbers of hipfractures and colorectal cancers.\nCardiovascular Disease\nEven though the trial was stopped earlyfor harm from breast cancer, a suffi-cient number of CHD events had oc-curred by 5.2 years of average fol-low-up to suggest that continuation tothe planned end would have been un-likely to yield a favorable result for theprimary outcome of CHD. Even if therewere a reversal of direction toward ben-efit of a magnitude seen in the obser-vational studies (ie, a risk reduction of55%) during the remaining years, con-ditional power analyses indicate thatless than 10% power remained forshowing potential benefit if the trialcontinued.\nThe WHI finding that estrogen plus\nprogestin does not confer benefit forpreventing CHD among women with\na uterus concurs with HERS findingsamong women with clinically appar-ent CHD,\n10with the Estrogen Replace-\nment for Atherosclerosis trial, in whichestrogen plus progestin did not in-hibit progression,\n23and with a trial in\nwomen with unstable angina that didnot observe a reduction in ischemicevents.\n24The finding of an increased\nrisk after initiation of treatment in WHIis similar to HERS. In HERS, after 4.1and 6.8 years of follow-up, hormonetherapy did not increase or decrease riskof cardiovascular events in women withCHD.\n25The WHI extends these find-\nings to include a wider range of women,including younger women and thosewithout clinically apparent CHD, andindicates that the risk may persist forsome years.\nUnlike CHD, the excess risk of stroke\nin the estrogen plus progestin groupwas not present in the first year but ap-peared during the second year and per-sisted through the fifth year. Prelimi-nary analyses indicate that the modestdifference in blood pressure betweengroups does not contribute much to anexplanation of the increase in strokes(data not shown). The findings in WHIfor stroke are consistent with but some-what more extreme than those of HERS,which reported a nonsignificant 23% in-crease in the treatment group.\n26The re-\nsults were also more extreme than thoseof the Women ’s Estrogen and Stroke\nTrial of estradiol (without progestin) inwomen with prior stroke, which foundno effect of estrogen on recurrentstrokes overall but some increase in thefirst 6 months.\n27Trials of the effect of\nestradiol on carotid intima-media thick-ness have yielded conflicting re-sults.\n28,29At least 1 observational study\nhas suggested that that use of estrogenplus progestin is associated with higherrisk of stroke than estrogen alone.\n14In\nWHI, there was no indication that ex-cess strokes due to estrogen plus pro-gestin were more likely to occur in olderwomen, in women with prior strokehistory, by race/ethnicity, or in womenwith high blood pressure at baseline.Therefore, it appears that estrogen plusprogestin increases the risk of strokes\nin apparently healthy women.\nVenous thromboembolism is an ex-\npected complication of postmeno-pausal hormones, and the pattern overtime in WHI is consistent with the find-ings from HERS and several observa-tional studies.\n30,31\nCancer\nThe WHI is the first randomized con-trolled trial to confirm that combined es-trogen plus progestin does increase therisk of incident breast cancer and toquantify the degree of risk. The WHIcould not address the risk of death dueto breast cancer because with the rela-tively short follow-up time, few womenin the WHI have thus far died as a re-sult of breast cancer (3 in the active treat-ment group and 2 in the placebo group).The risk of breast cancer emerged sev-eral years after randomization. After anaverage follow-up of about 5 years, theadverse effect on breast cancer hadcrossed the monitoring boundary. The26% excess of breast cancer is consis-tent with estimates from pooled epide-miological data, which reported a 15%increase for estrogen plus progestin usefor less than 5 years and a 53% increasefor use for more than 5 years.\n32It is also\nconsistent with the (nonsignificant) 27%increase found after 6.8 years of fol-low-up in HERS.\n33\nWith more common use of estrogen\nplus progestin, several epidemiologicalstudies have reported that estrogen plusprogestin appears to be associated withgreater risk of breast cancer than estro-gen alone.\n34-37In the PEPI trial, women\nin the 3 estrogen plus progestin groupshad much greater increases in mammo-graphic density (a predictor of breastcancer) than women in the estrogen orplacebo groups.\n38In WHI, the HR for es-\ntrogen plus progestin was not higher inwomen with a family history or otherrisk factors for breast cancer, except forreported prior use of postmenopausalhormones. This may suggest a cumula-tive effect of years of exposure to post-menopausal hormones.\nEndometrial cancer rates were low\nand were not increased by 5 years of es-RISKS AND BENEFITS OF ESTROGEN PLUS PROGESTIN\n330 JAMA, July 17, 2002 —Vol 288, No. 3 (Reprinted) ©2002 American Medical Association. All rights reserved.\nDownloaded From: https://jamanetwork.com/ by a University of California - San Diego User on 05/16/2019 trogen plus progestin exposure. Close\nmonitoring for bleeding and treat-ment of hyperplasia may contribute tothe absence of increased risk of endo-metrial cancer.\nThe reduction in colorectal cancer in\nthe hormone group is consistent withobservational studies, which have sug-gested fairly consistently that users ofpostmenopausal hormones may be atlower risk of colorectal cancer.\n39The\nmechanisms by which hormone usemight reduce risk are unclear. Resultsfrom other trials of postmenopausalhormones will help resolve the effectsof hormones on colorectal cancer.\n40\nFractures\nThe reductions in clinical vertebral frac-tures, other osteoporotic fractures, andcombined fractures supported the ben-efit for hip fractures found in this trial.These findings are consistent with theobservational data and limited data fromclinical trials\n41and are also consistent\nwith the known ability of estrogen (withor without progestin) to maintain bonemineral density.\n42The WHI is the first\ntrial with definitive data supporting theability of postmenopausal hormones toprevent fractures at the hip, vertebrae,and other sites.\nOverall Risks and Benefits\nAt the end of the trial, the global in-dex indicated that there were moreharmful than beneficial outcomes in theestrogen plus progestin group vs theplacebo group. The monitored out-comes included in the global index wereselected to represent diseases of seri-ous import that estrogen plus proges-tin treatment might affect, and do notinclude a variety of other conditions andmeasures that may be affected in un-favorable or favorable ways (eg, gall-bladder disease, diabetes, quality oflife, and cognitive function). The dataon these and other outcomes will bethe subject of future publications.All-cause mortality was balancedbetween the groups; however, longerfollow-up may be needed to assessthe impact of the incident diseases ontotal mortality.The absolute excess risk (or risk re-\nduction) attributable to estrogen plusprogestin was low. Over 1 year, 10000women taking estrogen plus progestincompared with placebo might experi-ence 7 more CHD events, 8 morestrokes, 8 more PEs, 8 more invasivebreast cancers, 6 fewer colorectal can-cers, and 5 fewer hip fractures. Com-bining all the monitored outcomes,women taking estrogen plus progestinmight expect 19 more events per yearper 10000 women than women takingplacebo. Over a longer period, more typi-cal of the duration of treatment thatwould be needed to prevent chronic dis-ease, the absolute numbers of excess out-comes would increase proportionately.\nDuring the 5.2 years of this trial, the\nnumber of women experiencing a globalindex event was about 100 more per10000 women taking estrogen plus pro-gestin than taking placebo. If the cur-rent findings can be extrapolated to aneven longer treatment duration, the ab-solute risks and benefits associated withestrogen plus progestin for each of theseconditions could be substantial and ona population basis could account fortens of thousands of conditions caused,or prevented, by hormone use.\nLimitations\nThis trial tested only 1 drug regimen,CEE, 0.625 mg/d, plus MPA, 2.5 mg/d,in postmenopausal women with an in-tact uterus. The results do not necessar-ily apply to lower dosages of these drugs,to other formulations of oral estrogensand progestins, or to estrogens and pro-gestins administered through the trans-dermal route. It remains possible thattransdermal estradiol with progester-one, which more closely mimics the nor-mal physiology and metabolism of en-dogenous sex hormones, may providea different risk-benefit profile. The WHIfindings for CHD and VTE are sup-ported by findings from HERS, but thereis no other evidence from clinical trialsfor breast cancer and colorectal cancer,and only limited data from trials con-cerning fractures.\nImportantly, this trial could not dis-\ntinguish the effects of estrogen fromthose of progestin. The effects of pro-\ngestin may be important for breast can-cer and atherosclerotic diseases, in-cluding CHD and stroke. Per protocol,in a separate and adequately poweredtrial, WHI is testing the hypothesis ofwhether oral estrogen will prevent CHDin 10739 women who have had a hys-terectomy. The monitoring of this trialis similar to that for the trial of estro-gen plus progestin. At an average fol-low-up of 5.2 years, the DSMB has rec-ommended that this trial continuebecause the balance of overall risks andbenefits remains uncertain. These re-sults are expected to be available in2005, at the planned termination.\nThe relatively high rates of discon-\ntinuation in the active treatment arm(42%) and crossover to active treat-ment in the placebo arm (10.7%) are alimitation of the study; however, the lackof adherence would tend to decrease theobserved treatment effects. Thus, the re-sults presented here may underesti-mate the magnitude of both adverse ef-fects on cardiovascular disease and breastcancer and the beneficial effects on frac-tures and colorectal cancer amongwomen who adhere to treatment.\nThe fact that the trial was stopped\nearly decreases the precision of esti-mates of long-term treatment effects. Alonger intervention period might haveshown more pronounced benefit forfractures and might have yielded a moreprecise test of the hypothesis that treat-ment reduces colorectal cancer. None-theless, it appears unlikely that benefitfor CHD would have emerged by con-tinuing the trial to its planned termina-tion. The trial results indicate that treat-ment for up to 5.2 years is not beneficialoverall and that there is early harm forCHD, continuing harm for stroke andVTE, and increasing harm for breast can-cer with increasing duration of treat-ment. This risk-benefit profile is not con-sistent with the requirements for a viableintervention for the primary preven-tion of chronic diseases.\nImplications\nThe WHI trial results provide the firstdefinitive data on which to base treat-RISKS AND BENEFITS OF ESTROGEN PLUS PROGESTIN\n©2002 American Medical Association. All rights reserved. (Reprinted) JAMA, July 17, 2002 —Vol 288, No. 3 331\nDownloaded From: https://jamanetwork.com/ by a University of California - San Diego User on 05/16/2019 ment recommendations for healthy post-\nmenopausal women with an intactuterus. This trial did not address theshort-term risks and benefits of hor-mones given for the treatment of meno-pausal symptoms. On the basis of HERSand other secondary prevention trials,the American Heart Association recom-mended against initiating postmeno-pausal hormones for the secondary pre-vention of cardiovascular disease.\n43The\nAmerican Heart Association made nofirm recommendation for primary pre-vention while awaiting the results fromrandomized clinical trials such as WHI,and stated that continuation of the treat-ment should be considered on the ba-sis of established noncoronary benefitsand risks, possible coronary benefits andrisks, and patient preference.\nResults from WHI indicate that the\ncombined postmenopausal hormonesCEE, 0.625 mg/d, plus MPA, 2.5 mg/d,should not be initiated or continued forthe primary prevention of CHD. In ad-dition, the substantial risks for cardio-vascular disease and breast cancer mustbe weighed against the benefit for frac-ture in selecting from the availableagents to prevent osteoporosis.\nWriting Group for the Women’s Health Initiative In-\nvestigators: Jacques E. Rossouw, MBChB, MD, Na-\ntional Heart, Lung, and Blood Institute, Bethesda, Md;Garnet L. Anderson, PhD, Ross L. Prentice, PhD, An-drea Z. LaCroix, PhD, and Charles Kooperberg, PhD,Fred Hutchinson Cancer Research Center, Seattle, Wash;Marcia L. Stefanick, PhD, Stanford University ClinicalCenter, Stanford, Calif; Rebecca D. Jackson, MD, OhioState University Clinical Center, Columbus; Shirley A.A. Beresford, PhD, Fred Hutchinson Cancer ResearchCenter, Seattle, Wash; Barbara V. Howard, PhD, Med-Star Research Institute, Washington, DC; Karen C.Johnson, MD, MPH, University of Tennessee, Mem-phis; Jane Morley Kotchen, MD, Medical College of Wis-consin, Milwaukee; Judith Ockene, PhD, University ofMassachusetts Medical School, Worcester.Financial Disclosures: Dr LaCroix is an investigator\non 2 osteoporosis studies separately funded by Merckand Pfizer. Dr Jackson is an investigator on 1 osteo-porosis study funded by Merck and 1 hormone studyon libido in women funded by Procter & Gamble Phar-maceuticals.For Correspondence: Jacques E. Rossouw, MBChB,\nMD, Division of Women ’s Health Initiative, National\nHeart, Lung, and Blood Institute, 6705 Rockledge Dr,One Rockledge Ctr, Suite 300, MS/7966, Bethesda,MD 20817 (e-mail: rossouw@nih.gov); Garnet L.Anderson, PhD, Division of Public Health Sciences, FredHutchinson Cancer Research Center, 1100 FairviewAve N, MP-1002, PO Box 19024, Seattle, WA98109-1024 (e-mail: garnet@whi.org).Reprints: WHI Clinical Coordinating Center, Divi-\nsion of Public Health Sciences, Fred Hutchinson Can-cer Research Center, 1100 Fairview Ave N, MP-1002, PO Box 19024, Seattle, WA 98109-1024.Author Contributions: Dr Anderson, as co –principal\ninvestigator of the Women ’s Health Initiative Clinical\nCoordinating Center, had full access to the data in thestudy and takes responsibility for the integrity of thedata and the accuracy of the data analyses.Study concept and design: Rossouw, Anderson,\nPrentice.Acquisition of data: Anderson, Prentice, LaCroix,\nKooperberg, Stefanick, Jackson, Beresford, Howard,Johnson, Kotchen, Ockene.Analysis and interpretation of data: Rossouw, Ander-\nson, Prentice, LaCroix, Kooperberg, Stefanick, Jackson.Drafting of the manuscript: Rossouw, Anderson,\nPrentice.Critical revision of the manuscript for important in-tellectual content: Rossouw, Anderson, LaCroix,\nKooperberg, Stefanick, Jackson, Beresford, Howard,Johnson, Kotchen, Ockene.Statistical expertise: Anderson, Prentice, Kooper-\nberg.Obtained funding: Rossouw, Anderson, Prentice,\nStefanick, Beresford, Howard, Kotchen, Ockene.Administrative, technical, or material support: Ros-\nsouw, Anderson, Prentice, LaCroix, Kooperberg,Stefanick, Jackson, Beresford, Howard, Johnson,Kotchen, Ockene.Study supervision: Rossouw, Anderson, LaCroix,\nStefanick, Jackson, Beresford, Howard, Johnson,Kotchen, Ockene.WHI Steering Committee: Clinical Centers: Marcia\nL. Stefanick (chair), Stanford Center for Research inDisease Prevention, Stanford University; Rebecca D.Jackson (vice chair), Ohio State University; CatherineI. Allen, University of Wisconsin, Madison; AnnlouiseR. Assaf, Brown University, Providence, RI; TamsenBassford, University of Arizona, Tucson/Phoenix; Shir-ley A. A. Beresford, Fred Hutchinson Cancer Re-search Center; Henry Black, Rush-Presbyterian-StLuke’s Medical Center, Chicago, Ill; Robert Brunner,\nUniversity of Nevada, Reno; Gregory L. Burke, WakeForest University School of Medicine, Winston-Salem, NC; Bette Caan, Kaiser Permanente Divisionof Research, Oakland, Calif; Rowan T. Chlebowski,Harbor-UCLA Research and Education Institute, Tor-rance, Calif; David Curb, University of Hawaii, Ho-nolulu; Margery Gass, University of Cincinnati, Cin-cinnati, Ohio; Jennifer Hays, Baylor College ofMedicine, Houston, Tex; Gerardo Heiss, University ofNorth Carolina, Chapel Hill; Susan Hendrix, WayneState University School of Medicine/Hutzel Hospital,Detroit, Mich; Barbara V. Howard, MedStar Re-search Institute, Washington, DC; Judith Hsia, GeorgeWashington University, Washington, DC; F. Allan Hub-bell, University of California, Irvine, Orange; Karen C.Johnson, University of Tennessee, Memphis; HowardJudd, University of California, Los Angeles; Jane Mor-ley Kotchen, Medical College of Wisconsin, Milwau-kee; Lewis Kuller, University of Pittsburgh, Pitts-burgh, Pa; Dorothy Lane, State University of New Yorkat Stony Brook; Robert D. Langer, University of Cali-fornia, San Diego, LaJolla/Chula Vista; Norman Lasser,University of Medicine and Dentistry of New Jersey,Newark; Cora E. Lewis, University of Alabama at Bir-mingham; Marian Limacher, University of Florida,Gainesville/Jacksonville; JoAnn Manson, Brigham andWomen ’s Hospital, Harvard Medical School, Boston,\nMass; Karen Margolis, University of Minnesota, Min-neapolis; Judith Ockene, University of Massachu-setts Medical School, Worcester; Mary Jo O ’Sullivan,\nUniversity of Miami, Miami, Fla; Lawrence Phillips,Emory University, Atlanta, Ga; Cheryl Ritenbaugh, Kai-\nser Permanente Center for Health Research, Port-land, Ore; John Robbins, University of California, Davis,Sacramento; Robert Schenken, University of TexasHealth Science Center, San Antonio; Sylvia Wassertheil-Smoller, Albert Einstein College of Medicine, Bronx,NY; Maurizio Trevisan, State University of New Yorkat Buffalo; Linda Van Horn, Northwestern Univer-sity, Chicago/Evanston, Ill; and Robert Wallace, Uni-\nversity of Iowa, Iowa City/Davenport; Program Of-\nfice: Jacques E. Rossouw, National Heart, Lung, and\nBlood Institute; Clinical Coordinating Center: An-\ndrea Z. LaCroix, Ruth E. Patterson, and Ross L. Pren-tice, Fred Hutchinson Cancer Research Center.Data and Safety Monitoring Board: Janet Wittes\n(chair), Eugene Braunwald, Margaret Chesney, Har-vey Cohen, Elizabeth Barrett-Connor, David DeMets,Leo Dunn, Johanna Dwyer, Robert P. Heaney, VictorVogel, LeRoy Walters, and Salim Yusuf.Funding/Support: The National Heart, Lung, and\nBlood Institute funds the WHI program. Wyeth-Ayerst Research provided the study medication (ac-tive and placebo).Acknowledgment: The WHI Steering Committee\ngratefully acknowledges the dedicated efforts of theWHI participants and of key WHI investigators andstaff at the clinical centers and the Clinical Coordinat-ing Center. A full listing of the WHI investigators canbe found at http://www.whi.org.\nREFERENCES\n1.The Women ’s Health Initiative Study Group. De-\nsign of the Women ’s Health Initiative clinical trial and\nobservational study. Control Clin Trials . 1998;19:\n61-109.2.Stampfer M, Colditz G. Estrogen replacement\ntherapy and coronary heart disease: a quantitative as-sessment of the epidemiologic evidence. Prev Med .\n1991;20:47-63.3.Grady D, Rueben SB, Pettiti DB, et al. Hormone\ntherapy to prevent disease and prolong life in post-menopausal women. Ann Intern Med . 1992;117:\n1016-1037.4.Rijpkema AH, van der Sanden AA, Ruijs AH. Ef-\nfects of postmenopausal estrogen-progesteronetherapy on serum lipids and lipoproteins: a review. Ma-\nturitas . 1990;12:259-285.\n5.Adams MR, Kaplan JR, Manuck SB, et al. Inhibi-\ntion of coronary artery atherosclerosis by 17-beta es-tradiol in ovariectomized monkeys: lack of an effectof added progesterone. Arteriosclerosis . 1990;10:\n1051-1057.6.Weiss NS, Ure CL, Ballard JH, Williams AR, Daling\nJR. Decreased risk of fractures of the hip and lowerforearm with postmenopausal use of estrogen. N Engl\nJ Med. 1980;303:1195-1198.\n7.Genant HK, Baylink DJ, Gallagher JC, Harris ST,\nSteiger P, Herber M. Effect of estrone sulfate on post-menopausal bone loss. Obstet Gynecol. 1990;76:\n579-584.8.Steinberg KA, Thacker SB, Smith SJ, et al. A meta-\nanalysis of the effect of estrogen replacement therapyon the risk of breast cancer. JAMA . 1991;265:1985-\n1990.9.Gerhardsson de VM, London S. Reproductive fac-\ntors, exogenous female hormones, and colorectalcancer by subsite. Cancer Causes Control. 1992;3:\n355-360.10.Hulley S, Grady D, Bush T, et al. Randomized trial\nof estrogen plus progestin for secondary preventionof coronary heart disease in postmenopausal women.JAMA . 1998;280:605-613.\n11.Grodstein F, Manson JE, Stampfer MJ. Postmeno-\npausal hormone use and secondary prevention of coro-nary events in the Nurses ’Health Study. Ann Intern\nMed. 2001;135:1-8.\n12.Alexander KP, Newby LK, Hellkamp AS, et al. Ini-\ntiation of hormone replacement therapy after acutemyocardial infarction is associated with more cardiacevents during follow-up. J Am Coll Cardiol. 2001;\n38:1-7.13.Heckbert SR, Kaplan RC, Weiss NS, et al. Risk of\nrecurrent coronary events in relation to use and re-cent initiation of postmenopausal hormone therapy.Arch Intern Med. 2001;161:1709-1173.RISKS AND BENEFITS OF ESTROGEN PLUS PROGESTIN\n332 JAMA, July 17, 2002 —Vol 288, No. 3 (Reprinted) ©2002 American Medical Association. All rights reserved.\nDownloaded From: https://jamanetwork.com/ by a University of California - San Diego User on 05/16/2019 14.Grodstein F, Manson JE, Colditz GA, Willit WC,\nSpeizer FE, Stampfer MJ. A prospective, observa-tional study of postmenopausal hormone therapy andprimary prevention of cardiovascular disease. Ann In-\ntern Med. 2000;133:933-941.\n15.The Writing Group for the PEPI Trial. Effects of\nhormone replacement therapy on endometrial histol-ogy in postmenopausal women: the Postmeno-pausal Estrogen/Progestin Interventions (PEPI) Trial.JAMA . 1996;275:370-375 .\n16.Ives DG, Fitzpatrick AL, Bild DE, et al. Surveil-\nlance and ascertainment of cardiovascular events: theCardiovascular Health Study. Ann Epidemiol. 1995;\n5:275-285.17.Cox DR. Regression analysis and life tables. JR\nStat Soc. 1972;34:187-220.\n18.O’Brien PC, Fleming RT. A multiple testing pro-\ncedure for clinical trials. Biometrics . 1979;35:549-\n556.19.Freedman LS, Anderson G, Kipnis V, et al. Ap-\nproaches to monitoring the results of long-term dis-ease prevention trials: examples from the Women ’s\nHealth Initiative. Control Clin Trials . 1996;17:509-\n525.20.Gail MH, Brinton LA, Byar DP, et al. Projecting\nindividualized probabilities of developing breast can-cer for white females who are being examined annu-ally.J Natl Cancer Inst. 1989;81:1879-1886.\n21.Pilon D, Castilloux AM, Lelorier J. Estrogen re-\nplacement therapy: determinants of persistence withtreatment. Obstet Gynecol. 2001;97:97-100.\n22.Writing Group for the PEPI Trial. Effects of estro-\ngen or estrogen/progestin regimens on heart diseaserisk factors in postmenopausal women. JAMA . 1995;\n273:199-208.23.Herrington DM, Reboussin DM, Brosnihan KB, et\nal. Effects of estrogen replacement on the progres-sion of coronary artery atherosclerosis. N Engl J Med.\n2000;343:522-529.24.Schulman SP, Thiemann DR, Ouyang P, et al. Ef-\nfects of acute hormone therapy on recurrent ische-mia in postmenopausal women with unstable an-gina. J Am Coll Cardiol. 2002;39:231-237.25.Grady D, Herrington D, Bittner V, et al, for the\nHERS Research Group. Cardiovascular disease out-comes during 6.8 years of hormone therapy: Heart andEstrogen/progestin Replacement Study Follow-up(HERS II). JAMA . 2002;288:49-57.\n26.Simon JA, Hsia J, Cauley JA, et al. Postmeno-\npausal hormone therapy and risk of stroke: the Heartand Estrogen-progestin Replacement Study (HERS).Circulation . 2001;103:638-642.\n27.Viscoli CM, Brass LM, Kernan WN, Sarrel PM, Su-\nissa S, Horwitz RI. A clinical trial of estrogen-replacement therapy after ischemic stroke. N Engl J\nMed. 2001;345:1243-1249.\n28.Hodis HN, Mack WJ, Lobo RA, et al. Estrogen in\nthe prevention of atherosclerosis: a randomized,double-blind controlled trial. Ann Intern Med. 2001;\n135:939-953.29.Angerer P, Stork S, Kothny W, Schmitt P, von\nSchacky C. Effect of oral postmenopausal hormonereplacement on progression of atherosclerosis: a ran-domized, controlled trial. Arterioscler Thromb Vasc Biol.\n2001;21:262-268.30.Castellsague J, Perez Gutthann S, Garcia Rod-\nriguez LA. Recent epidemiological studies of the as-sociation between hormone replacement therapy andvenous thromboembolism: a review. Drug Saf. 1998;\n18:117-123.31.Grady D, Wenger NK, Herrington D, et al. Post-\nmenopausal hormone therapy increases risk for ve-nous thromboembolic disease: the Heart and Estrogen/progestin Replacement Study. Ann Intern Med. 2000;\n132:689-696.32.Collaborative Group on Hormonal Factors in Breast\nCancer. Breast cancer and hormone replacementtherapy: collaborative reanalysis of data from 51 epi-demiological studies of 52,705 women with breast can-cer and 108,411 women without breast cancer. Lan-\ncet. 1997;350:1047-1059.\n33.Hulley S, Furberg C, Barrett-Connor E, et al, for\nthe HERS Research Group. Noncardiovascular dis-ease outcomes during 6.8 years of hormone therapy:Heart and Estrogen/progestin Replacement Study Fol-low-up (HERS II). JAMA . 2002;288:58-66.34.Schairer C, Lubin J, Troisi R, Sturgeon S, Brinton\nL, Hoover R. Menopausal estrogen and estrogen-progestin replacement therapy and breast cancer risk.JAMA . 2000;283:485-491.\n35.Ross RK, Paganini-Hill A, Wan PC, Pike MC. Effect\nof hormone replacement therapy on breast cancer risk:estrogen versus estrogen plus progestin. J Natl Can-\ncer Inst. 2000;92:328-332.\n36.Colditz GA, Hankinson SE, Hunter DJ, et al. The\nuse of estrogens and progestins and the risk of breastcancer in postmenopausal women. N Engl J Med. 1995;\n332:1589-1593.37.Magnusson C, Baron JA, Correia N, Bergstrom R,\nAdami H-O, Persson I. Breast-cancer risk followinglong-term oestrogen and oestrogen-progestin-replacement therapy. Int J Cancer. 1999;81:339-\n344.38.Greendale GA, Reboussin BA, Sie A, et al. Effects\nof estrogen and estrogen-progestin on mammo-graphic parenchymal density. Ann Intern Med. 1999;\n130:262-269.39.Grodstein F, Newcomb PA, Stampfer MJ. Post-\nmenopausal hormone therapy and the risk of colo-rectal cancer: a review and meta-analysis. Am J Med.\n1999;106:574-582.40.Vickers MR, Meade TW, Wilkes HC. Hormone re-\nplacement therapy and cardiovascular disease: the casefor a randomized controlled trial. Ciba Found Symp .\n1995;191:150-160.41.Torgerson DJ, Bell-Seyer SE. Hormone replace-\nment therapy and prevention of nonvertebral frac-tures: a meta-analysis of randomized trials. JAMA .\n2001;285:2891-2897.42.Writing Group for the PEPI Trial. Effects of\nhormone therapy on bone mineral density: resultsfrom the Postmenopausal Estrogen/ProgestinInterventions (PEPI) Trial. JAMA . 1996;275:1389-\n1396.43.Mosca L, Collins P, Herrington DM, et al. Hor-\nmone replacement therapy and cardiovascular dis-ease: a statement for healthcare professionals from theAmerican Heart Association. Circulation . 2001;104:\n499-503.\nRISKS AND BENEFITS OF ESTROGEN PLUS PROGESTIN\n©2002 American Medical Association. All rights reserved. (Reprinted) JAMA, July 17, 2002 —Vol 288, No. 3 333\nDownloaded From: https://jamanetwork.com/ by a University of California - San Diego User on 05/16/2019" -} -{ - "pm_id": "https://pubmed.ncbi.nlm.nih.gov/6546423", - "pdf_text": "Volume12Number11984 NucleicAcidsResearch\nAcomprehensivesetofsequenceanalysisprogramsfortheVAX\nJohnDevereux,PaulHaeberli*andOliverSmithies\nLaboratoryofGenetics,UniversityofWisconsin,Madison,WI53706,USA\nReceived18August1983\nABSTRACT\nTheUniversityofWisconsinGeneticsComputerGroup(UWGCG)hasbeen\norganizedtodevelopcomputationaltoolsfortheanalysisandpublicationof\nbiologicalsequencedata.Agroupofprogramsthatwillinteractwitheach\notherhasbeendevelopedfortheDigitalEquipmentCorporationVAXcomputer\nusingtheVMSoperatingsystem.Theprogramsavailableandtheconditionsfor\ntransferaredescribed.\nINTRODUCTION\nTherapidadvancesinthefieldofmoleculargeneticsandDNAsequencing\nhavemadeitimperativeformanylaboratoriestousecomputerstoanalyzeand\nmanagesequencedata.UWGCGwasfoundedwhenitbecamecleartoseveral\nfacultymembersattheUniversityofWisconsinthatthetherewasnosetof\nsequenceanalysisprogramsthatcouldbeusedtogetherasacoherentsystem\nandbemodifiedeasilyinresponsetonewideas.\nWithintramuralsupportacomputergroupwasorganizedtobuildastrong\nfoundationofsoftwareuponwhichfutureprogramsinmoleculargeneticscould\nbebased.Thisinitialprojecthasbeencompletedandtheresultingprograms,\nwritteninFortran77,areavailableforVAXcomputersusingtheVMSoperating\nsystem.Mostoftheprogramscanbeusedwithonlyaterminal,although\nseveralrequireaHewlettPackardplotter.\nUWGCGsoftwarehasbeeninstalledfortestingateightdifferent\ninstitutions.Asimplemethodhasbeendevelopedfortransferringand\nmaintainingthissystemonotherVAXcomputers.\nDESIGNPRINCIPLES\nUWGCGprogramdesignisbasedonthe\"softwaretools\"approachof\nKernighanandPlauger(l).Eachprogramperformsasimplefunctionandiseasy\ntouse.Theprogramscanbeusedindependentlyindifferentcombinations so\n©RLPrmsLimited,Oxford,England. 387 01RLPressLimited,Oxford,England. 387 NucleicAcidsResearch\nthatcomplexproblemsaresolvedbytheuseofseveralprogramsinsuccession.\nNewprogrammingissimplifiedsincelesseffortisrequiredtobridgeagap\nbetweenexistingprograms.\nUWGCGsoftwareisdesignedtobemaintainedandmodifiedatsitesother\nthantheUniversityofWisconsin.Theprogrammanualisextensiveandthe\nsourcecodesareorganizedtomakemodificationconvenient.Scientistsusing\nUWGCGsoftwareareencouragedtouseexistingprogramsasaframeworkfor\ndevelopingnewones.Ourcopyrightcanberemovedfromanyprogrammodified\nbymorethan25%ofouroriginaleffort.\nPROGRAMSAVAILABLEFROMUWGCG\nTheprogramsdescribedbelowarenamedanddefinedindividuallyinTable1.\nProgramnamesinthetextareunderlined.\nComparisons\nComparisonsmaybedonewith\"dotplots\"usingthemethodofMaizeland\nLenk(2).OptimalalignmentscanbegeneratedbythemethodsofNeedlemanand\nWunsch(3),ofSellers(4),andthe\"localhomology\"methodofSmithand\nWaterman(5).TheSmithandWatermanalignmentalgorithmisalsothemost\nsensitivemethodavailableforidentifyingsimilaritiesbetweenweaklyrelated\nsequences.\nMappingandSearching\nMappingisavailableinseveralformats.Graphicmapsdisplayallofthe\ncutsforeachrestrictionenzymeonparallellines.Thisgraphicmap\nfacilitatesselectionofenzymesforisolatinganyregionofasequencedDNA\nmolecule.Sortedmapsintabularformatarrangethefragmentsfromany\ndigestioninorderofmolecularweighttoshowwhichfragmentsaresimilarin\nsizeandthuslikelytobeconfusedingels.Anotherfrequentlyusedmapping\nformat,designedbyFrederickBlattner(6),displaystheenzymecutsabovethe\noriginalDNAsequence.BothstrandsoftheDNAandallsixframesof\ntranslationareshown.\nAllmappingprogramswillsearchforuser-specifiedsequences,allowing\nfeaturestobemarkedattheappropriatepositiononarestrictionmap.The\nmappingandsearchingprogramscanbeusedtoaidsite-specificmutagenesis\nexperimentsbyshowingwheremutationscouldgeneratenewrestrictionsites.\nAllofthepositionsinasequencewhereasyntheticprobecouldpairwithone\normoremismatchescanalsobelocated.Sequencesrelatedtolessprecisely\ndefinedfeaturessuchaspromotersorinterveningsequencesplicesites,can\nbelocatedwithaprogramthatusesaconsensussequenceasaprobe.The\n388 NucleicAcidsResearch\nTable1\nProgramsAvailablefromUWGCG\nFunction\nDotPlot+\nGap\nBestFit\nMapPlot+\nMapSort\nMap\nConsensus\nFitConsensus\nFind\nStemloop\nFold*\nCodonPreference+\nCodonFrequency\nCorrespond\nTestCode+\nFrame+\nPlotStatistics+\nComposition\nRepeat\nFingerprint\nSeqed\nAssemble\nShuffle\nReverse\nReformat\nTranslate\nBackTranslate\nSpew\nGetSeq\nCrypt\nSimplify\nPublish\nPoster+\nOverPrintmakesadotplotbymethodofMaizelandLenk(2)\nfindsoptimalalignmentbymethodofNeedlemanandWunsch(3)\nfindsoptimalalignmentbymethodofSmithandWaterman(5)\nshowsrestrictionmapforeachenzymegraphically\ntabulatesmapssortedbyfragmentpositionandsize\ndisplaysrestrictionsitesandproteintranslationsabove\nandbelowtheoriginalsequence(Blattner,6)\ncreatesaconsensustablefrompre-alignedsequences\nfindssequencessimilartoaconsensussequenceusinga\nconsensustableasaprobe\nfindssitesspecifiedinteractively\nfindsallpossiblestems(invertedrepeats)andloops\nfindsanRNAsecondarystructureofminimumfreeenergy\nbythemethodofZuker(7)\nplotsthesimilaritybetweenthecodonchoicesineach\nreadingframeandacodonfrequencytable(8)\ntabulatescodonfrequencies\nfindssimilarpatternsofcodonchoicebycomparing\ncodonfrequencytables(Granthametal,9)\nfindspossiblecodingregionsbyplotting\nthe\"TestCode\"statisticofFickett(10)\nplotsrarecodonsandopenreadingframes(8)\nplotsasymmetriesofcompositionforonestrand\nmeasurescomposition,diandtrinucleotidefrequencies\nfindsrepeats(direct,notinverted)\nshowsthelabelledfragmentsexpectedforanRNAfingerprint\nscreenorientedsequenceeditorforentering,editing\nandcheckingsequences\njoinssequencestogether\nrandomizesasequencemaintainingcomposition\nreversesand/orcomplementsasequence\nconvertsasequencefilefromoneformattoanother\ntranslatesanucleotideintoapeptidesequence\ntranslatesapeptideintoanucleotidesequence\nsendsasequencetoanothercomputer\nacceptsasequencefromanothercomputer\nencryptsafileforaccessonlybypassword\nsubstitutesoneofsixchemicallysimilaraminoacid\nfamiliesforeachresidueinapeptidesequence\narrangessequencesforpublication\nplotstext(forlabellingfiguresandposters)\nprintsdarkenedtextforfigureswithadaisywheelprinter\n+requiresaHewlettPackardSeries7221terminalplotter\n*FoldisdistributedbyDr.MichaelZukernotUWGCG.\n389Name NucleicAcidsResearch\nmappingprogramscanalsobeusedonproteinsequencestoidentifythe\npeptidesresultingfromproteolyticcleavage.\nSecondaryStructure\nThreeprogramsareavailabletoexaminesecondarystructureinnucleic\nacids.TheprogramStemLoopidentifiesallinvertedrepeats.An\nimplementationofDr.MichaelZuker'sFoldprogram(7)findsanRNAsecondary\nstructureofminimumfreeenergybasedonpublishedvaluesofstackingand\nroopdestabilizingenergies.The\"dotplot\"comparison(mentionedabove)ofa\nsequencecomparedtoitsoppositestrandgivesagraphicpictureofthe\npatternofinvertedrepeatsinasequence.\nAnalysisofCompositionandtheLocationofGeneticDomains\nRegionsofasequencewithnon-randombasedistributioncanbedisplayed\nwiththreegraphictoolsdesignedtoidentifygeneticdomains.Theprogram\nCodonPreference(8)identifiespotentialcodingregionsbysearchingthrough\neachreadingframeforapatternofpreferredcodonchoices.The\nCodonPreferenceplotpredictstheleveloftranslationalexpressionofmRNAs\nandhelpsidentifyframeshiftsinDNAsequencedata.Patternsofcodon\nchoicecanbecomparedwiththeprogramCorrespond(9).Whenastrongpattern\nofcodonpreferencesisnotexpected,the\"TestCode\"statisticofFickett(10)\ncanbeplottedtoshowregionsofcompositionalconstraintateverythird\nbase.Anotherprogramplotsasymmetriesofcompositionbystrand.Strand\nasymmetrieshavebeenassociatedwithgeneticdomainsbyseveral\nauthors(ll)(12). AfourthprogramcalledFramemarksthepositionsofrare\ncodonsandopenreadingframesonagraphshowingallsixreadingframes.\nSeveraltoolsareavailabletomeasurecontentandtocountdinucleotide,\ntrinucleotide,neighborandrepeatfrequencies.AprogramthatpredictsRNA\nfingerprintpatternsandanotherthattabulatescodonfrequenciescompletethe\ngroupofprogramsthatanalyzecomposition.\nSequenceManipulation\nSequencesmaybeentered,assembled,edited,reversed,randomized,\nreformatted,translated,back-translated, documented,transferred,or\nencryptedrapidlywithalargesetofsequencemanipulationtools.\nAscreen-orientededitorisavailablethatallowssequencestobeentered\nandchecked.Afterasequenceisentered,itmaybereenteredfor\nproofreading.Wheneverareenteredbaseisatvariancewiththeoriginal,the\nterminalbellringsandthepositionismarked.Existingsequencescanbe\neditedquicklybymovingdirectlytoasequencepositionspecifiedbyeithera\ncoordinateorasequencepattern.Theprogramcanreassigntheterminal's\n390 NucleicAcidsResearch\nkeystoplaceG,A,TandCconvenientlyunderthefingersofonehandinthe\nsameorderasthelanesofasequencinggel.\nProgramsareavailableforchangingsequencefileformat.Sequencedata\nfromanysourcecanbeusedinUWGCGprograms,andsequencefilesmaintained\nwithUWGCGsoftwarecanbeconvertedforuseinothernon-UWGCGprograms.For\ninstance,theprogramsofRogerStaden(13)orIntelligeneticsInc.(14)could\nbeusedtoassembleasequencefromthesequencesofmanysmallsub-fragments\ngeneratedbyDNAaseIdigestion.Theassembledsequencecouldthenbe\nreformattedforuseinanyUWGCGprogram.Aprogramisavailablethat\ntransferssequencestoandfromothercomputers.\nSequencePublication\nAprogram,Publish,willformatsequencesintofigures.Publishhas\nalternativesforlinesize,numbering,scaling,translationandcomparisonto\nothersequences.Posterisaprogramthatwillplottextonfigures.\nGENERALFEATURESOFUWGCGSOFTWARE\nInteractiveStyle\nEachprogramisrunbysimplytypingitsname.Everyparameterrequired\nbytheprogramisobtainedinteractively.Questionsareansweredwithafile\nname,ayes,ano,anumber,oraletterfromamenu.Defaultanswersare\ndisplayed.Programsareinsensitivetoabsurdanswersandwillaskthe\nquestionagainif,forinstance,younameafilethatdoesnotexistorifyou\nuseanonnumericcharacterwhentypinganumber.Specialfeaturessuchas\nplottingfeaturesorientedtopublication,areobtainedbyusinganextraword\nnexttotheprogram'snamewhentheprogramisrun.Thusparameterqueries\narekepttoaminimumforthenormaluseofeachprogram.\nData\nBoththeNIH-GenBank(15)andtheEMBL(16)nucleotidesequencedata\nlibrariesareavailable\"on-line\"toanyUWGCGprogram.ASearchutilitywill\nlocatesequencesinthelibrariesbykeyword.AFindutilitywilllocate\nlibraryentriescontaininganyspecifiedsequence.Aprogramisavailable\nthatinstallsthenewdatasentperiodicallyfromGenBankandEMBLtoupdate\ntheirdatalibraries.\nAllofthedatainthesystemarestoredintextfilesthatcanberead\nandmodifiedeasily.EverydatafilehasanEnglishheadingdescribingthe\ncontents.Thedatafilesmaybecopiedbyeachuserforanalysisor\nmodification.Programsrecognizeandreaduser-modifiedinputdata\nautomatically.Datafilescanbemodifiedwithanytexteditor.\n391 NucleicAcidsResearch\nSequenceFileStructure\nSequencesaremaintainedinfilesthatallowdocumentationandnumbering\nbothaboveandwithinthesequence.Thisfileformatiscompatiblewithboth\nofthenucleicacidsequencelibrariesandhasbeenadoptedasthestandard\nsequencefileformatbythedatabaseprojectattheEuropeanMolecular\nBiologyLab.Becausegeneticmanipulationscommonlyinvolvelinkingseveral\nmoleculesofknownsequence,UWGCGsequencefilesaredesignedtosupport\nconcatenationbyallowingcommentstoappearwithinthesequencesatany\nlocation.Codingsequencesortheboundariesbetweencloningvectorand\ninsert,forinstance,canbemarkedwithinthesequenceitselfforimmediate\nidentification.\nSequenceSymbols\nAllpossiblenucleotideambiguitiesandallstandardone-letteramino\nacidcodesarepartoftheUWGCGsymbolsetthatincludesallalphabetic\ncharactersplusfiveadditionalcharacters.TheproposedIUB-IUPACstandard\nnucleotideambiguitysymbols(17)areusedforthemapping,searchingand\ncomparisonprograms.Lowercasecharactersareusedinsequencestoindicate\nuncertaintyasdistinctfromambiguity.Thisallowstheentirelexiconof\nsymbolstobereusedwithsamemeaning,butwiththeprefix\"maybe-.\"This\nreuseofthesymbolsetinlowercasemakestheuncertaintysymbolsmore\ncomplete,understandableandvisible.\nSymbolComparison\nSequenceanalysisprogramsgenerallymakecomparisonsbetweensequence\nsymbols(basesoraminoacids)inordertofindenzymesites,create\nalignments,locateinvertedrepeatsetc.Thesesymbolcomparisonsarehandled\ninseveralways.\nSymbolcomparisonsforalignment,comparisonandsecondarystructure\nanalysisaremadebylookingupavalueinasymbolcomparisontableforthe\nqualityofthematch.Thetablemightcontainl'sformatchesandO'sfor\nmismatches.Ifaminoacidsarebeingcompared,however,arealnumbercould\nbeassignedateachpositionbasedonsomepreviouslyassignedchemical\nsimilarityofthepairofresiduesoronthemutationaldistancebetweentheir\ncodons.StandardsymboltablesareprovidedbyUWGCG,butthesystemis\ndesignedtoalloweachusertospecifyhisownvalues.\nSymbolscomparisonsformappingandsearchingoperationsinnucleicacids\naremadebyconvertingtheIUB-IUPACsymbolsintoabinarycode.Thebitsof\nthiscoderepresentG,A,TandCwithambiguitysymbolscausingmorethanone\n392 NucleicAcidsResearch\nbittobeset.Agroupoflibraryfunctionsidentifyoverlapbetweenthebits\nforeachIUB-IUPACsymbol.\nDocumentation\nDocumentationisavailablebothinprintedformandontheterminal\nscreen.A350pagemanualdescribestheoperatignofeachprogramindetail,\ngivespracticalconsiderationsandshowswhatwillappearonthescreenduring\nasessionwiththeprogram.Outputfilesandplotsareshownforthesession.\nThedataforthesessionshowninthedocumentationareincludedwiththe\nsystemsothattheeachprogram'soperationcanbechecked.The\"on-line\"\ndocumentationisthesameasthemanual,butcanbechangedimmediatelywhena\nprogramismodified.\nAllprogramswriteoutputtofilesthatarecompletelydocumentedand\nsensiblyorganizedforinputtootherprograms.Theinputdata,theprogram\nandtheparametersusedareclearlyidentifiedineveryoutputfile.\nProcedureLibrary\nUWGCGprogramsarewrittenlargelyascallstoalibraryof250\nproceduresdesignedtomanipulatebiologicalsequences.Theseproceduresuse\ndataandfilestructureswhichhavebeendesignedtosimplifyprogram\nmodification.Forinstance,standardoperationssuchasreadingsequences\nfromfilesarealwayshandledbyasinglelibraryprocedure.Thusachangein\nsequencefileformatrequiresonlyonesubroutinetobemodifiedforthenew\nformattobeacceptabletoalloftheprogramsinthesystem.Command\nproceduresareavailabletohelpmodifythelibrary.Theprocedurelibrary\ncanbeusedbyprogramswritteninanylanguage.\nDISTRIBUTIONOFUWGCGSOFTWARE\nIntent\nTheintentofUWGCGistomakeitssoftwareavailableatthelowest\npossiblecosttoasmanyscientistsaspossible.\nFees\nAfeeof$2,000fornon-profitinstitutionsor$4,000forindustriesis\nbeingchargedforatapeanddocumentationforeachcomputeronwhichUWGCG\nsoftwareisinstalled.Whilenocontinuingfeeisrequired,UWGCGsoftware,\nlikethefielditsupports,ischangingveryrapidly.Aconsortiumof\nindustriesandacademiclaboratoriesisplannedtosupporttheprojectinthe\nfuture.Theconsortiumwillentitleitsmemberstoperiodicupdatesandto\ninfluencethedirectionofnewprogrammingundertakenbyUWGCGinreturnfora\npledgeofcontinuingfinancialsupport.\n393 NucleicAcidsResearch\nCopyrights\nUWGCGretainsthecopyrightstoallofitssoftwareandUWGCGmustbe\ncontactedbeforealloranypartoftheitssoftwarepackageiscopiedor\ntransferredtoanymachine.UWGCGis,however,mandatedtoprovideresearch\ntoolstohelpscientistsworkingintheareaofmoleculargeneticsandweare\ngladtoseeoursourcecodesbecomethebasisoffurtherprogrammingefforts\nbyotherscientists.Copyrightcanberemovedforanyprogrammodifiedby\nmorethan25%ofitsoriginaleffort.\nTapeFormat\nTheUWGCGpackageisusuallydistributedinVAX/VMS\"backup\"formatona\n9trackmagnetictaperecordedat1600bits/inch.Thesystemconsistsof\nabout1000filesusingabout20,000blocksat512bytes/block.Thecurrent\nversionsoftheGenBankandEMBLnucleotidesequencedatabasesarenormally\nincludedwhichaddanother3,000filesandrequireanother20,000blocks.\nUponrequestUWGCGwillmakeacardimagetapeofalloftheFortran77\nprogramsandproceduresforreadingoncomputersotherthantheVAX.Thecard\nimagetapeisusuallyprovidedat1600bits/inchwith80characters/record and\n10records/block.AdaptationofUWGCGsoftwaretosystemsotherthanVAX/VMS\nmaytakeconsiderableeffort.\nEquipmentRequired\nUWGCGprogramsandcommandprocedureswillrunonaDigitalEquipment\nCorporation(DEC)VAXcomputerthatisusingversion3.0orgreateroftheDEC\nVMSoperatingsystem.Atapedriveisnecessary;afloatingpointaccelerator\nandaDECFortrancompilerarehelpful,butnotrequired.Allprogramscanbe\nrunfromaDECVT52orVT100terminal.Sevenprograms,asnotedintable1,\nrequireaHewlettPackard7221terminalplotterwiredinserieswiththe\nterminal.Severalutilitiessupportadaisywheelcompatibleprinterattached\ntotheterminal'spass-throughport,however,allprogramswriteoutputfiles\nsuitableforprintingonanystandarddevice.\nInquiries\nInquiriesmaybesenttoJohnDevereuxattheLaboratoryofGenetics,\nUniversityofWisconsin,Madison,WI,USA53706,(608)263-8970.UWGCGisnot\nlicensedtodistributeFold(7),buttheUWGCGimplementationisavailablefrom\nMichaelZuker,DivisionofBiologicalSciences,NationalResearchCouncilof\nCanada,100SussexDrive,Ottawa,Canada,KlAOR6(613)992-4182.\nACKNOWLEDGEMENTS\nUWGCGwasstartedwithsoftwarewrittenforOliverSmithies'laboratory\n394 NucleicAcidsResearch\nwithNIHsupportfromgrantsGM20069andAM20120.UWGCGisdirectedbyJohn\nDevereuxandisoperatedasapartoftheLaboratoryofGeneticswiththe\nadviceofasteeringcommitteeconsistingofRichardBurgess,JamesDahlberg,\nWalterFitch,OliverSmithiesandMillardSusman.UWGCGiscurrently\nsupportedwithintramuralfundsandwithfeespaidbythefacultyand\nindustriesusingthefacilityinMadison.Thisarticleispapernumber2684\nfromtheLaboratoryofGenetics,UniversityofWisconsin.\n*Currentaddress:SiliconGraphicsInc.,630ClydeCourt,MountainView,CA94043,USA\nREFERENCES\n1.Kernighan,B.W.andPlauger,P.J.(1976)SoftwareTools,Addison-Wesley\nPublishingCompany,Reading,Massachusetts.\n2.Maizel,J.V.andLenk,R.P.(1981)ProceedingsoftheNationalAcademyof\nSciencesUSA78,7665-7669.\n3.Needleman,S.B.andWunsch,C.D.(1970)JournalofMolecularBiology48,\n443-453.\n4.Sellers,P.H.(1974)SIAMJournalonAppliedMathematics26,787-793.\n5.Smith,T.F.andWaterman,M.S.(1981)AdvancesinAppliedMathematics2,\n482-489.\n6.Schroeder,J.L.andBlattner,F.R.(1982)NucleicAcidsResearch10,\n69-84,Figure1.\n7.Zuker,M.andStiegler,P.(1981)NucleicAcidsResearch9,133-148.\n8.Gribskov,M.,Devereux,J.andBurgess,R.R.\"TheCodonPreferencePlot:\nGraphicAnalysisofProteinCodingSequencesandGeneExpression,\"\nsubmittedtoNucleicAcidsResearch.\n9.Grantham,R.Gautier,C.Guoy,M.Jacobzone,M.andMercierR.(1981)\nNucleicAcidsResearch9(1),r43-r74.\n10.Fickett,J.W.(1982)NucleicAcidsResearch10,5303-5318\n11.Smithies,O.,Engels,W.R.,Devereux,J.R.,Slightom,J.L.,andS.Shen,\n(1981)Cell26,345-353.\n12.Smith,T.F.,Waterman,M.S.andSadler,J.R.(1983)NucleicAcids\nResearch11,2205-2220.\n13.Staden,R.(1980)NucleicAcidsResearch8,3673-3694.\n14.Clayton,J.andKedes,L.(1982)NucleicAcidsResearch10,305-321.\n15.TheGenBank(TM)GeneticSequenceDataBankisavailablefromWayne\nRindone,BoltBeranekandNewmanInc.,10MoultonStreet,Cambridge,\nMassachusetts02238,USA.\n16.TheEMBLNucleotideSequenceDataLibraryisavailablefromGregHamm,\nEuropeanMolecularBiologyLaboratory,Postfach10.2209,\nMeyerhofstrasse1,6900Heidelberg,WestGermany.\n17.PersonalcommunicationfromDr.RichardLathe,TransgeneSA,11Rue\nHumann,67000Strasbourg,France.\n395" -} -{ - "pm_id": "https://pubmed.ncbi.nlm.nih.gov/28463186", - "pdf_text": "1\nDeepLab: Semantic Image Segmentation with\nDeep Convolutional Nets, Atrous Convolution,\nand Fully Connected CRFs\nLiang-Chieh Chen, George Papandreou, Senior Member, IEEE, Iasonas Kokkinos, Member, IEEE,\nKevin Murphy, and Alan L. Yuille, Fellow, IEEE\nAbstract —In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions\nthat are experimentally shown to have substantial practical merit. First, we highlight convolution with upsampled filters, or\n‘atrous convolution’, as a powerful tool in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at\nwhich feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of\nview of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second , we\npropose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional\nfeature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at\nmultiple scales. Third , we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical\nmodels. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on\nlocalization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional\nRandom Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance. Our proposed\n“DeepLab” system sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 79.7% mIOU in\nthe test set, and advances the results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes. All of our code\nis made publicly available online.\nIndex Terms —Convolutional Neural Networks, Semantic Segmentation, Atrous Convolution, Conditional Random Fields.\n!\n1 I NTRODUCTION\nDeep Convolutional Neural Networks (DCNNs) [1] have\npushed the performance of computer vision systems to\nsoaring heights on a broad array of high-level problems,\nincluding image classification [2], [3], [4], [5], [6] and object\ndetection [7], [8], [9], [10], [11], [12], where DCNNs trained\nin an end-to-end manner have delivered strikingly better\nresults than systems relying on hand-crafted features. Es-\nsential to this success is the built-in invariance of DCNNs\nto local image transformations, which allows them to learn\nincreasingly abstract data representations [13]. This invari-\nance is clearly desirable for classification tasks, but can ham-\nper dense prediction tasks such as semantic segmentation,\nwhere abstraction of spatial information is undesired.\nIn particular we consider three challenges in the applica-\ntion of DCNNs to semantic image segmentation: (1) reduced\nfeature resolution, (2) existence of objects at multiple scales,\nand (3) reduced localization accuracy due to DCNN invari-\nance. Next, we discuss these challenges and our approach\nto overcome them in our proposed DeepLab system.\nThe first challenge is caused by the repeated combination\nof max-pooling and downsampling (‘striding’) performed at\nconsecutive layers of DCNNs originally designed for image\nclassification [2], [4], [5]. This results in feature maps with\nsignificantly reduced spatial resolution when the DCNN is\n•L.-C. Chen, G. Papandreou, and K. Murphy are with Google Inc. I. Kokki-\nnos is with University College London. A. Yuille is with the Departments\nof Cognitive Science and Computer Science, Johns Hopkins University.\nThe first two authors contributed equally to this work.employed in a fully convolutional fashion [14]. In order to\novercome this hurdle and efficiently produce denser feature\nmaps, we remove the downsampling operator from the last\nfew max pooling layers of DCNNs and instead upsample\nthe filters in subsequent convolutional layers, resulting in\nfeature maps computed at a higher sampling rate. Filter\nupsampling amounts to inserting holes (‘trous’ in French)\nbetween nonzero filter taps. This technique has a long\nhistory in signal processing, originally developed for the\nefficient computation of the undecimated wavelet transform\nin a scheme also known as “algorithme `a trous” [15]. We use\nthe term atrous convolution as a shorthand for convolution\nwith upsampled filters. Various flavors of this idea have\nbeen used before in the context of DCNNs by [3], [6], [16].\nIn practice, we recover full resolution feature maps by a\ncombination of atrous convolution, which computes feature\nmaps more densely, followed by simple bilinear interpola-\ntion of the feature responses to the original image size. This\nscheme offers a simple yet powerful alternative to using\ndeconvolutional layers [13], [14] in dense prediction tasks.\nCompared to regular convolution with larger filters, atrous\nconvolution allows us to effectively enlarge the field of view\nof filters without increasing the number of parameters or the\namount of computation.\nThe second challenge is caused by the existence of ob-\njects at multiple scales. A standard way to deal with this is\nto present to the DCNN rescaled versions of the same image\nand then aggregate the feature or score maps [6], [17], [18].\nWe show that this approach indeed increases the perfor-arXiv:1606.00915v2 [cs.CV] 12 May 2017 2\nmance of our system, but comes at the cost of computing\nfeature responses at all DCNN layers for multiple scaled\nversions of the input image. Instead, motivated by spatial\npyramid pooling [19], [20], we propose a computationally\nefficient scheme of resampling a given feature layer at\nmultiple rates prior to convolution. This amounts to probing\nthe original image with multiple filters that have com-\nplementary effective fields of view, thus capturing objects\nas well as useful image context at multiple scales. Rather\nthan actually resampling features, we efficiently implement\nthis mapping using multiple parallel atrous convolutional\nlayers with different sampling rates; we call the proposed\ntechnique “atrous spatial pyramid pooling” (ASPP).\nThe third challenge relates to the fact that an object-\ncentric classifier requires invariance to spatial transforma-\ntions, inherently limiting the spatial accuracy of a DCNN.\nOne way to mitigate this problem is to use skip-layers\nto extract “hyper-column” features from multiple network\nlayers when computing the final segmentation result [14],\n[21]. Our work explores an alternative approach which we\nshow to be highly effective. In particular, we boost our\nmodel’s ability to capture fine details by employing a fully-\nconnected Conditional Random Field (CRF) [22]. CRFs have\nbeen broadly used in semantic segmentation to combine\nclass scores computed by multi-way classifiers with the low-\nlevel information captured by the local interactions of pixels\nand edges [23], [24] or superpixels [25]. Even though works\nof increased sophistication have been proposed to model\nthe hierarchical dependency [26], [27], [28] and/or high-\norder dependencies of segments [29], [30], [31], [32], [33],\nwe use the fully connected pairwise CRF proposed by [22]\nfor its efficient computation, and ability to capture fine edge\ndetails while also catering for long range dependencies.\nThat model was shown in [22] to improve the performance\nof a boosting-based pixel-level classifier. In this work, we\ndemonstrate that it leads to state-of-the-art results when\ncoupled with a DCNN-based pixel-level classifier.\nA high-level illustration of the proposed DeepLab model\nis shown in Fig. 1. A deep convolutional neural network\n(VGG-16 [4] or ResNet-101 [11] in this work) trained in\nthe task of image classification is re-purposed to the task\nof semantic segmentation by (1) transforming all the fully\nconnected layers to convolutional layers ( i.e., fully convo-\nlutional network [14]) and (2) increasing feature resolution\nthrough atrous convolutional layers, allowing us to compute\nfeature responses every 8 pixels instead of every 32 pixels in\nthe original network. We then employ bi-linear interpolation\nto upsample by a factor of 8 the score map to reach the\noriginal image resolution, yielding the input to a fully-\nconnected CRF [22] that refines the segmentation results.\nFrom a practical standpoint, the three main advantages\nof our DeepLab system are: (1) Speed: by virtue of atrous\nconvolution, our dense DCNN operates at 8 FPS on an\nNVidia Titan X GPU, while Mean Field Inference for the\nfully-connected CRF requires 0.5 secs on a CPU. (2) Accu-\nracy: we obtain state-of-art results on several challenging\ndatasets, including the PASCAL VOC 2012 semantic seg-\nmentation benchmark [34], PASCAL-Context [35], PASCAL-\nPerson-Part [36], and Cityscapes [37]. (3) Simplicity: our sys-\ntem is composed of a cascade of two very well-established\nmodules, DCNNs and CRFs.The updated DeepLab system we present in this paper\nfeatures several improvements compared to its first version\nreported in our original conference publication [38]. Our\nnew version can better segment objects at multiple scales,\nvia either multi-scale input processing [17], [39], [40] or\nthe proposed ASPP . We have built a residual net variant\nof DeepLab by adapting the state-of-art ResNet [11] image\nclassification DCNN, achieving better semantic segmenta-\ntion performance compared to our original model based\non VGG-16 [4]. Finally, we present a more comprehensive\nexperimental evaluation of multiple model variants and\nreport state-of-art results not only on the PASCAL VOC\n2012 benchmark but also on other challenging tasks. We\nhave implemented the proposed methods by extending the\nCaffe framework [41]. We share our code and models at\na companion web site http://liangchiehchen.com/projects/\nDeepLab.html.\n2 R ELATED WORK\nMost of the successful semantic segmentation systems de-\nveloped in the previous decade relied on hand-crafted fea-\ntures combined with flat classifiers, such as Boosting [24],\n[42], Random Forests [43], or Support Vector Machines [44].\nSubstantial improvements have been achieved by incorpo-\nrating richer information from context [45] and structured\nprediction techniques [22], [26], [27], [46], but the perfor-\nmance of these systems has always been compromised by\nthe limited expressive power of the features. Over the past\nfew years the breakthroughs of Deep Learning in image\nclassification were quickly transferred to the semantic seg-\nmentation task. Since this task involves both segmentation\nand classification, a central question is how to combine the\ntwo tasks.\nThe first family of DCNN-based systems for seman-\ntic segmentation typically employs a cascade of bottom-\nup image segmentation, followed by DCNN-based region\nclassification. For instance the bounding box proposals and\nmasked regions delivered by [47], [48] are used in [7] and\n[49] as inputs to a DCNN to incorporate shape information\ninto the classification process. Similarly, the authors of [50]\nrely on a superpixel representation. Even though these\napproaches can benefit from the sharp boundaries delivered\nby a good segmentation, they also cannot recover from any\nof its errors.\nThe second family of works relies on using convolution-\nally computed DCNN features for dense image labeling,\nand couples them with segmentations that are obtained\nindependently. Among the first have been [39] who apply\nDCNNs at multiple image resolutions and then employ a\nsegmentation tree to smooth the prediction results. More\nrecently, [21] propose to use skip layers and concatenate the\ncomputed intermediate feature maps within the DCNNs for\npixel classification. Further, [51] propose to pool the inter-\nmediate feature maps by region proposals. These works still\nemploy segmentation algorithms that are decoupled from\nthe DCNN classifier’s results, thus risking commitment to\npremature decisions.\nThe third family of works uses DCNNs to directly\nprovide dense category-level pixel labels, which makes\nit possible to even discard segmentation altogether. The 3\nAtrous Convolution\nInputAeroplane Coarse\nScore map\nBi-linear Interpolation Fully Connected CRF Final Output\nDCNN\nFig. 1: Model Illustration. A Deep Convolutional Neural Network such as VGG-16 or ResNet-101 is employed in a fully\nconvolutional fashion, using atrous convolution to reduce the degree of signal downsampling (from 32x down 8x). A\nbilinear interpolation stage enlarges the feature maps to the original image resolution. A fully connected CRF is then\napplied to refine the segmentation result and better capture the object boundaries.\nsegmentation-free approaches of [14], [52] directly apply\nDCNNs to the whole image in a fully convolutional fashion,\ntransforming the last fully connected layers of the DCNN\ninto convolutional layers. In order to deal with the spatial lo-\ncalization issues outlined in the introduction, [14] upsample\nand concatenate the scores from intermediate feature maps,\nwhile [52] refine the prediction result from coarse to fine by\npropagating the coarse results to another DCNN. Our work\nbuilds on these works, and as described in the introduction\nextends them by exerting control on the feature resolution,\nintroducing multi-scale pooling techniques and integrating\nthe densely connected CRF of [22] on top of the DCNN.\nWe show that this leads to significantly better segmentation\nresults, especially along object boundaries. The combination\nof DCNN and CRF is of course not new but previous works\nonly tried locally connected CRF models. Specifically, [53]\nuse CRFs as a proposal mechanism for a DCNN-based\nreranking system, while [39] treat superpixels as nodes for a\nlocal pairwise CRF and use graph-cuts for discrete inference.\nAs such their models were limited by errors in superpixel\ncomputations or ignored long-range dependencies. Our ap-\nproach instead treats every pixel as a CRF node receiving\nunary potentials by the DCNN. Crucially, the Gaussian CRF\npotentials in the fully connected CRF model of [22] that we\nadopt can capture long-range dependencies and at the same\ntime the model is amenable to fast mean field inference.\nWe note that mean field inference had been extensively\nstudied for traditional image segmentation tasks [54], [55],\n[56], but these older models were typically limited to short-\nrange connections. In independent work, [57] use a very\nsimilar densely connected CRF model to refine the results of\nDCNN for the problem of material classification. However,\nthe DCNN module of [57] was only trained by sparse point\nsupervision instead of dense supervision at every pixel.\nSince the first version of this work was made publicly\navailable [38], the area of semantic segmentation has pro-\ngressed drastically. Multiple groups have made important\nadvances, significantly raising the bar on the PASCAL VOC\n2012 semantic segmentation benchmark, as reflected to thehigh level of activity in the benchmark’s leaderboard1[17],\n[40], [58], [59], [60], [61], [62], [63]. Interestingly, most top-\nperforming methods have adopted one or both of the key\ningredients of our DeepLab system: Atrous convolution for\nefficient dense feature extraction and refinement of the raw\nDCNN scores by means of a fully connected CRF. We outline\nbelow some of the most important and interesting advances.\nEnd-to-end training for structured prediction has more re-\ncently been explored in several related works. While we\nemploy the CRF as a post-processing method, [40], [59],\n[62], [64], [65] have successfully pursued joint learning of\nthe DCNN and CRF. In particular, [59], [65] unroll the CRF\nmean-field inference steps to convert the whole system into\nan end-to-end trainable feed-forward network, while [62]\napproximates one iteration of the dense CRF mean field\ninference [22] by convolutional layers with learnable filters.\nAnother fruitful direction pursued by [40], [66] is to learn\nthe pairwise terms of a CRF via a DCNN, significantly\nimproving performance at the cost of heavier computation.\nIn a different direction, [63] replace the bilateral filtering\nmodule used in mean field inference with a faster domain\ntransform module [67], improving the speed and lowering\nthe memory requirements of the overall system, while [18],\n[68] combine semantic segmentation with edge detection.\nWeaker supervision has been pursued in a number of\npapers, relaxing the assumption that pixel-level semantic\nannotations are available for the whole training set [58], [69],\n[70], [71], achieving significantly better results than weakly-\nsupervised pre-DCNN systems such as [72]. In another line\nof research, [49], [73] pursue instance segmentation, jointly\ntackling object detection and semantic segmentation.\nWhat we call here atrous convolution was originally de-\nveloped for the efficient computation of the undecimated\nwavelet transform in the “algorithme `a trous” scheme of\n[15]. We refer the interested reader to [74] for early refer-\nences from the wavelet literature. Atrous convolution is also\nintimately related to the “noble identities” in multi-rate sig-\nnal processing, which builds on the same interplay of input\n1. http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?\nchallengeid=11&compid=6 4\nsignal and filter sampling rates [75]. Atrous convolution is a\nterm we first used in [6]. The same operation was later called\ndilated convolution by [76], a term they coined motivated by\nthe fact that the operation corresponds to regular convolu-\ntion with upsampled (or dilated in the terminology of [15])\nfilters. Various authors have used the same operation before\nfor denser feature extraction in DCNNs [3], [6], [16]. Beyond\nmere resolution enhancement, atrous convolution allows us\nto enlarge the field of view of filters to incorporate larger\ncontext, which we have shown in [38] to be beneficial. This\napproach has been pursued further by [76], who employ a\nseries of atrous convolutional layers with increasing rates\nto aggregate multiscale context. The atrous spatial pyramid\npooling scheme proposed here to capture multiscale objects\nand context also employs multiple atrous convolutional\nlayers with different sampling rates, which we however lay\nout in parallel instead of in serial. Interestingly, the atrous\nconvolution technique has also been adopted for a broader\nset of tasks, such as object detection [12], [77], instance-\nlevel segmentation [78], visual question answering [79], and\noptical flow [80].\nWe also show that, as expected, integrating into DeepLab\nmore advanced image classification DCNNs such as the\nresidual net of [11] leads to better results. This has also been\nobserved independently by [81].\n3 M ETHODS\n3.1 Atrous Convolution for Dense Feature Extraction\nand Field-of-View Enlargement\nThe use of DCNNs for semantic segmentation, or other\ndense prediction tasks, has been shown to be simply and\nsuccessfully addressed by deploying DCNNs in a fully\nconvolutional fashion [3], [14]. However, the repeated com-\nbination of max-pooling and striding at consecutive layers\nof these networks reduces significantly the spatial resolution\nof the resulting feature maps, typically by a factor of 32\nacross each direction in recent DCNNs. A partial remedy\nis to use ‘deconvolutional’ layers as in [14], which however\nrequires additional memory and time.\nWe advocate instead the use of atrous convolution,\noriginally developed for the efficient computation of the\nundecimated wavelet transform in the “algorithme `a trous”\nscheme of [15] and used before in the DCNN context by [3],\n[6], [16]. This algorithm allows us to compute the responses\nof any layer at any desirable resolution. It can be applied\npost-hoc, once a network has been trained, but can also be\nseamlessly integrated with training.\nConsidering one-dimensional signals first, the output\ny[i]of atrous convolution2of a 1-D input signal x[i]with a\nfilterw[k]of lengthKis defined as:\ny[i] =K∑\nk=1x[i+r·k]w[k]. (1)\nThe rate parameterrcorresponds to the stride with which\nwe sample the input signal. Standard convolution is a\nspecial case for rate r= 1. See Fig. 2 for illustration.\n2. We follow the standard practice in the DCNN literature and use\nnon-mirrored filters in this definition.\nInput featureConvolution\nkernel = 3\nstride = 1\npad = 1Output feature(a) Sparse feature extraction\nrate = 2Convolution\nkernel = 3\nstride = 1\npad = 2\nrate = 2\n(insert 1 zero)\n(b) Dense feature extraction\nFig. 2: Illustration of atrous convolution in 1-D. (a) Sparse\nfeature extraction with standard convolution on a low reso-\nlution input feature map. (b) Dense feature extraction with\natrous convolution with rate r= 2 , applied on a high\nresolution input feature map.\ndownsampling stride= 2 atrous convolution kernel=7 rate= 2 stride=1 \nconvolution kernel=7 upsampling stride=2 \nFig. 3: Illustration of atrous convolution in 2-D. Top row:\nsparse feature extraction with standard convolution on a\nlow resolution input feature map. Bottom row: Dense fea-\nture extraction with atrous convolution with rate r= 2 ,\napplied on a high resolution input feature map.\nWe illustrate the algorithm’s operation in 2-D through a\nsimple example in Fig. 3: Given an image, we assume that\nwe first have a downsampling operation that reduces the\nresolution by a factor of 2, and then perform a convolution\nwith a kernel - here, the vertical Gaussian derivative. If one\nimplants the resulting feature map in the original image\ncoordinates, we realize that we have obtained responses at\nonly 1/4 of the image positions. Instead, we can compute\nresponses at all image positions if we convolve the full\nresolution image with a filter ‘with holes’, in which we up-\nsample the original filter by a factor of 2, and introduce zeros\nin between filter values. Although the effective filter size\nincreases, we only need to take into account the non-zero\nfilter values, hence both the number of filter parameters and\nthe number of operations per position stay constant. The\nresulting scheme allows us to easily and explicitly control\nthe spatial resolution of neural network feature responses.\nIn the context of DCNNs one can use atrous convolution\nin a chain of layers, effectively allowing us to compute the 5\nfinal DCNN network responses at an arbitrarily high resolu-\ntion. For example, in order to double the spatial density of\ncomputed feature responses in the VGG-16 or ResNet-101\nnetworks, we find the last pooling or convolutional layer\nthat decreases resolution (’pool5’ or ’conv5 1’ respectively),\nset its stride to 1 to avoid signal decimation, and replace all\nsubsequent convolutional layers with atrous convolutional\nlayers having rate r= 2. Pushing this approach all the way\nthrough the network could allow us to compute feature\nresponses at the original image resolution, but this ends\nup being too costly. We have adopted instead a hybrid\napproach that strikes a good efficiency/accuracy trade-off,\nusing atrous convolution to increase by a factor of 4 the\ndensity of computed feature maps, followed by fast bilinear\ninterpolation by an additional factor of 8 to recover feature\nmaps at the original image resolution. Bilinear interpolation\nis sufficient in this setting because the class score maps\n(corresponding to log-probabilities) are quite smooth, as\nillustrated in Fig. 5. Unlike the deconvolutional approach\nadopted by [14], the proposed approach converts image\nclassification networks into dense feature extractors without\nrequiring learning any extra parameters, leading to faster\nDCNN training in practice.\nAtrous convolution also allows us to arbitrarily enlarge\nthefield-of-view of filters at any DCNN layer. State-of-the-\nart DCNNs typically employ spatially small convolution\nkernels (typically 3×3) in order to keep both computation\nand number of parameters contained. Atrous convolution\nwith raterintroducesr−1zeros between consecutive filter\nvalues, effectively enlarging the kernel size of a k×kfilter\ntoke=k+ (k−1)(r−1)without increasing the number\nof parameters or the amount of computation. It thus offers\nan efficient mechanism to control the field-of-view and\nfinds the best trade-off between accurate localization (small\nfield-of-view) and context assimilation (large field-of-view).\nWe have successfully experimented with this technique:\nOur DeepLab-LargeFOV model variant [38] employs atrous\nconvolution with rate r= 12 in VGG-16 ‘fc6’ layer with\nsignificant performance gains, as detailed in Section 4.\nTurning to implementation aspects, there are two effi-\ncient ways to perform atrous convolution. The first is to\nimplicitly upsample the filters by inserting holes (zeros), or\nequivalently sparsely sample the input feature maps [15].\nWe implemented this in our earlier work [6], [38], followed\nby [76], within the Caffe framework [41] by adding to the\nim2col function (it extracts vectorized patches from multi-\nchannel feature maps) the option to sparsely sample the\nunderlying feature maps. The second method, originally\nproposed by [82] and used in [3], [16] is to subsample the\ninput feature map by a factor equal to the atrous convolu-\ntion rater, deinterlacing it to produce r2reduced resolution\nmaps, one for each of the r×rpossible shifts. This is followed\nby applying standard convolution to these intermediate\nfeature maps and reinterlacing them to the original image\nresolution. By reducing atrous convolution into regular con-\nvolution, it allows us to use off-the-shelf highly optimized\nconvolution routines. We have implemented the second\napproach into the TensorFlow framework [83].\nrate = 6rate = 12rate = 18rate = 24\nAtrous Spatial Pyramid Pooling\nInput Feature MapConv\nkernel: 3x3\nrate: 6Conv\nkernel: 3x3\nrate: 12Conv\nkernel: 3x3\nrate: 18Conv\nkernel: 3x3\nrate: 24\nFig. 4: Atrous Spatial Pyramid Pooling (ASPP). To classify\nthe center pixel (orange), ASPP exploits multi-scale features\nby employing multiple parallel filters with different rates.\nThe effective Field-Of-Views are shown in different colors.\n3.2 Multiscale Image Representations using Atrous\nSpatial Pyramid Pooling\nDCNNs have shown a remarkable ability to implicitly repre-\nsent scale, simply by being trained on datasets that contain\nobjects of varying size. Still, explicitly accounting for object\nscale can improve the DCNN’s ability to successfully handle\nboth large and small objects [6].\nWe have experimented with two approaches to han-\ndling scale variability in semantic segmentation. The first\napproach amounts to standard multiscale processing [17],\n[18]. We extract DCNN score maps from multiple (three\nin our experiments) rescaled versions of the original image\nusing parallel DCNN branches that share the same param-\neters. To produce the final result, we bilinearly interpolate\nthe feature maps from the parallel DCNN branches to the\noriginal image resolution and fuse them, by taking at each\nposition the maximum response across the different scales.\nWe do this both during training and testing. Multiscale\nprocessing significantly improves performance, but at the\ncost of computing feature responses at all DCNN layers for\nmultiple scales of input.\nThe second approach is inspired by the success of the\nR-CNN spatial pyramid pooling method of [20], which\nshowed that regions of an arbitrary scale can be accurately\nand efficiently classified by resampling convolutional fea-\ntures extracted at a single scale. We have implemented a\nvariant of their scheme which uses multiple parallel atrous\nconvolutional layers with different sampling rates. The fea-\ntures extracted for each sampling rate are further processed\nin separate branches and fused to generate the final result.\nThe proposed “atrous spatial pyramid pooling” (DeepLab-\nASPP) approach generalizes our DeepLab-LargeFOV vari-\nant and is illustrated in Fig. 4.\n3.3 Structured Prediction with Fully-Connected Condi-\ntional Random Fields for Accurate Boundary Recovery\nA trade-off between localization accuracy and classifica-\ntion performance seems to be inherent in DCNNs: deeper\nmodels with multiple max-pooling layers have proven most\nsuccessful in classification tasks, however the increased in-\nvariance and the large receptive fields of top-level nodes can\nonly yield smooth responses. As illustrated in Fig. 5, DCNN 6\nImage/G.T. DCNN output CRF Iteration 1 CRF Iteration 2 CRF Iteration 10\nFig. 5: Score map (input before softmax function) and belief\nmap (output of softmax function) for Aeroplane. We show\nthe score (1st row) and belief (2nd row) maps after each\nmean field iteration. The output of last DCNN layer is used\nas input to the mean field inference.\nscore maps can predict the presence and rough position of\nobjects but cannot really delineate their borders.\nPrevious work has pursued two directions to address\nthis localization challenge. The first approach is to harness\ninformation from multiple layers in the convolutional net-\nwork in order to better estimate the object boundaries [14],\n[21], [52]. The second is to employ a super-pixel represen-\ntation, essentially delegating the localization task to a low-\nlevel segmentation method [50].\nWe pursue an alternative direction based on coupling\nthe recognition capacity of DCNNs and the fine-grained\nlocalization accuracy of fully connected CRFs and show\nthat it is remarkably successful in addressing the localiza-\ntion challenge, producing accurate semantic segmentation\nresults and recovering object boundaries at a level of detail\nthat is well beyond the reach of existing methods. This\ndirection has been extended by several follow-up papers\n[17], [40], [58], [59], [60], [61], [62], [63], [65], since the first\nversion of our work was published [38].\nTraditionally, conditional random fields (CRFs) have\nbeen employed to smooth noisy segmentation maps [23],\n[31]. Typically these models couple neighboring nodes, fa-\nvoring same-label assignments to spatially proximal pixels.\nQualitatively, the primary function of these short-range\nCRFs is to clean up the spurious predictions of weak classi-\nfiers built on top of local hand-engineered features.\nCompared to these weaker classifiers, modern DCNN\narchitectures such as the one we use in this work pro-\nduce score maps and semantic label predictions which are\nqualitatively different. As illustrated in Fig. 5, the score\nmaps are typically quite smooth and produce homogeneous\nclassification results. In this regime, using short-range CRFs\ncan be detrimental, as our goal should be to recover detailed\nlocal structure rather than further smooth it. Using contrast-\nsensitive potentials [23] in conjunction to local-range CRFs\ncan potentially improve localization but still miss thin-\nstructures and typically requires solving an expensive dis-\ncrete optimization problem.\nTo overcome these limitations of short-range CRFs, we\nintegrate into our system the fully connected CRF model of\n[22]. The model employs the energy function\nE(x) =∑\niθi(xi) +∑\nijθij(xi,xj) (2)\nwhere xis the label assignment for pixels. We use as unary\npotentialθi(xi) =−logP(xi), whereP(xi)is the label\nassignment probability at pixel ias computed by a DCNN.The pairwise potential has a form that allows for efficient\ninference while using a fully-connected graph, i.e. when\nconnecting all pairs of image pixels, i,j. In particular, as\nin [22], we use the following expression:\nθij(xi,xj)=µ(xi,xj)[\nw1exp(\n−||pi−pj||2\n2σ2α−||Ii−Ij||2\n2σ2\nβ)\n+w2exp(\n−||pi−pj||2\n2σ2γ)]\n(3)\nwhereµ(xi,xj) = 1 ifxi̸=xj, and zero otherwise, which,\nas in the Potts model, means that only nodes with dis-\ntinct labels are penalized. The remaining expression uses\ntwo Gaussian kernels in different feature spaces; the first,\n‘bilateral’ kernel depends on both pixel positions (denoted\nasp) and RGB color (denoted as I), and the second kernel\nonly depends on pixel positions. The hyper parameters σα,\nσβandσγcontrol the scale of Gaussian kernels. The first\nkernel forces pixels with similar color and position to have\nsimilar labels, while the second kernel only considers spatial\nproximity when enforcing smoothness.\nCrucially, this model is amenable to efficient approxi-\nmate probabilistic inference [22]. The message passing up-\ndates under a fully decomposable mean field approximation\nb(x) =∏\nibi(xi)can be expressed as Gaussian convolutions\nin bilateral space. High-dimensional filtering algorithms\n[84] significantly speed-up this computation resulting in an\nalgorithm that is very fast in practice, requiring less that 0.5\nsec on average for Pascal VOC images using the publicly\navailable implementation of [22].\n4 E XPERIMENTAL RESULTS\nWe finetune the model weights of the Imagenet-pretrained\nVGG-16 or ResNet-101 networks to adapt them to the\nsemantic segmentation task in a straightforward fashion,\nfollowing the procedure of [14]. We replace the 1000-way\nImagenet classifier in the last layer with a classifier having as\nmany targets as the number of semantic classes of our task\n(including the background, if applicable). Our loss function\nis the sum of cross-entropy terms for each spatial position\nin the CNN output map (subsampled by 8 compared to\nthe original image). All positions and labels are equally\nweighted in the overall loss function (except for unlabeled\npixels which are ignored). Our targets are the ground truth\nlabels (subsampled by 8). We optimize the objective function\nwith respect to the weights at all network layers by the\nstandard SGD procedure of [2]. We decouple the DCNN\nand CRF training stages, assuming the DCNN unary terms\nare fixed when setting the CRF parameters.\nWe evaluate the proposed models on four challenging\ndatasets: PASCAL VOC 2012, PASCAL-Context, PASCAL-\nPerson-Part, and Cityscapes. We first report the main results\nof our conference version [38] on PASCAL VOC 2012, and\nmove forward to latest results on all datasets.\n4.1 PASCAL VOC 2012\nDataset: The PASCAL VOC 2012 segmentation benchmark\n[34] involves 20 foreground object classes and one back-\nground class. The original dataset contains 1,464 (train ), 7\nKernel Rate FOV Params Speed bef/aft CRF\n7×7 4 224 134.3M 1.44 64.38 / 67.64\n4×4 4 128 65.1M 2.90 59.80 / 63.74\n4×4 8 224 65.1M 2.90 63.41 / 67.14\n3×3 12 224 20.5M 4.84 62.25 / 67.64\nTABLE 1: Effect of Field-Of-View by adjusting the kernel\nsize and atrous sampling rate rat ‘fc6’ layer. We show\nnumber of model parameters, training speed (img/sec), and\nvalset mean IOU before and after CRF. DeepLab-LargeFOV\n(kernel size 3×3,r= 12 ) strikes the best balance.\n1,449 (val), and 1,456 (test) pixel-level labeled images for\ntraining, validation, and testing, respectively. The dataset\nis augmented by the extra annotations provided by [85],\nresulting in 10,582 (trainaug ) training images. The perfor-\nmance is measured in terms of pixel intersection-over-union\n(IOU) averaged across the 21 classes.\n4.1.1 Results from our conference version\nWe employ the VGG-16 network pre-trained on Imagenet,\nadapted for semantic segmentation as described in Sec-\ntion 3.1. We use a mini-batch of 20 images and initial\nlearning rate of 0.001 (0.01for the final classifier layer),\nmultiplying the learning rate by 0.1 every 2000 iterations.\nWe use momentum of 0.9and weight decay of 0.0005 .\nAfter the DCNN has been fine-tuned on trainaug , we\ncross-validate the CRF parameters along the lines of [22]. We\nuse default values of w2= 3 andσγ= 3 and we search for\nthe best values of w1,σα, andσβby cross-validation on 100\nimages from val. We employ a coarse-to-fine search scheme.\nThe initial search range of the parameters are w1∈[3 : 6] ,\nσα∈[30 : 10 : 100] andσβ∈[3 : 6] (MATLAB notation),\nand then we refine the search step sizes around the first\nround’s best values. We employ 10 mean field iterations.\nField of View and CRF: In Tab. 1, we report experiments\nwith DeepLab model variants that use different field-of-\nview sizes, obtained by adjusting the kernel size and atrous\nsampling rate rin the ‘fc6’ layer, as described in Sec. 3.1.\nWe start with a direct adaptation of VGG-16 net, using\nthe original 7×7kernel size and r= 4 (since we use\nno stride for the last two max-pooling layers). This model\nyields performance of 67.64% after CRF, but is relatively\nslow ( 1.44images per second during training). We have\nimproved model speed to 2.9images per second by re-\nducing the kernel size to 4×4. We have experimented\nwith two such network variants with smaller ( r= 4) and\nlarger (r= 8 ) FOV sizes; the latter one performs better.\nFinally, we employ kernel size 3×3and even larger atrous\nsampling rate ( r= 12 ), also making the network thinner by\nretaining a random subset of 1,024 out of the 4,096 filters\nin layers ‘fc6’ and ‘fc7’. The resulting model, DeepLab-CRF-\nLargeFOV , matches the performance of the direct VGG-16\nadaptation ( 7×7kernel size, r= 4 ). At the same time,\nDeepLab-LargeFOV is 3.36times faster and has significantly\nfewer parameters (20.5M instead of 134.3M).\nThe CRF substantially boosts performance of all model\nvariants, offering a 3-5% absolute increase in mean IOU.\nTest set evaluation: We have evaluated our DeepLab-\nCRF-LargeFOV model on the PASCAL VOC 2012 official\ntestset. It achieves 70.3%mean IOU performance.Learning policy Batch size Iteration mean IOU\nstep 30 6K 62.25\npoly 30 6K 63.42\npoly 30 10K 64.90\npoly 10 10K 64.71\npoly 10 20K 65.88\nTABLE 2: PASCAL VOC 2012 valset results (%) (before CRF)\nas different learning hyper parameters vary. Employing\n“poly” learning policy is more effective than “step” when\ntraining DeepLab-LargeFOV .\n4.1.2 Improvements after conference version of this work\nAfter the conference version of this work [38], we have\npursued three main improvements of our model, which we\ndiscuss below: (1) different learning policy during training,\n(2) atrous spatial pyramid pooling, and (3) employment of\ndeeper networks and multi-scale processing.\nLearning rate policy: We have explored different learn-\ning rate policies when training DeepLab-LargeFOV . Similar\nto [86], we also found that employing a “poly” learning rate\npolicy (the learning rate is multiplied by (1−iter\nmaxiter)power)\nis more effective than “step” learning rate (reduce the\nlearning rate at a fixed step size). As shown in Tab. 2,\nemploying “poly” (with power = 0.9) and using the same\nbatch size and same training iterations yields 1.17% better\nperformance than employing “step” policy. Fixing the batch\nsize and increasing the training iteration to 10K improves\nthe performance to 64.90% (1.48% gain); however, the total\ntraining time increases due to more training iterations. We\nthen reduce the batch size to 10 and found that comparable\nperformance is still maintained (64.90% vs. 64.71%). In the\nend, we employ batch size = 10 and 20K iterations in order\nto maintain similar training time as previous “step” policy.\nSurprisingly, this gives us the performance of 65.88% (3.63%\nimprovement over “step”) on val, and 67.7% on test, com-\npared to 65.1% of the original “step” setting for DeepLab-\nLargeFOV before CRF. We employ the “poly” learning rate\npolicy for all experiments reported in the rest of the paper.\nAtrous Spatial Pyramid Pooling: We have experimented\nwith the proposed Atrous Spatial Pyramid Pooling (ASPP)\nscheme, described in Sec. 3.1. As shown in Fig. 7, ASPP\nfor VGG-16 employs several parallel fc6-fc7-fc8 branches.\nThey all use 3×3kernels but different atrous rates rin the\n‘fc6’ in order to capture objects of different size. In Tab. 3,\nwe report results with several settings: (1) Our baseline\nLargeFOV model, having a single branch with r= 12 ,\n(2) ASPP-S, with four branches and smaller atrous rates\n(r={2, 4, 8, 12}), and (3) ASPP-L, with four branches\nand larger rates ( r={6, 12, 18, 24}). For each variant\nwe report results before and after CRF. As shown in the\ntable, ASPP-S yields 1.22% improvement over the baseline\nLargeFOV before CRF. However, after CRF both LargeFOV\nand ASPP-S perform similarly. On the other hand, ASPP-L\nyields consistent improvements over the baseline LargeFOV\nboth before and after CRF. We evaluate on testthe proposed\nASPP-L + CRF model, attaining 72.6%. We visualize the\neffect of the different schemes in Fig. 8.\nDeeper Networks and Multiscale Processing: We have\nexperimented building DeepLab around the recently pro- 8\nFig. 6: PASCAL VOC 2012 valresults. Input image and our DeepLab results before/after CRF.\nFc6\n(3x3, rate = 12 )Fc7\n(1x1)Fc8\n(1x1)\nPool5\nFc6\n(3x3, rate = 6)Fc7 \n(1x1)Fc8 \n(1x1)Sum-Fusion\nFc6\n(3x3, rate = 12 )Fc7\n(1x1)Fc8\n(1x1)\nFc6\n(3x3, rate = 18 )Fc7\n(1x1)Fc8\n(1x1)\nFc6\n(3x3, rate = 24 )\nPool5Fc7\n(1x1)Fc8\n(1x1)\n(a) DeepLab-LargeFOV (b) DeepLab-ASPP\nFig. 7: DeepLab-ASPP employs multiple filters with differ-\nent rates to capture objects and context at multiple scales.\nMethod before CRF after CRF\nLargeFOV 65.76 69.84\nASPP-S 66.98 69.73\nASPP-L 68.96 71.57\nTABLE 3: Effect of ASPP on PASCAL VOC 2012 valset per-\nformance (mean IOU) for VGG-16 based DeepLab model.\nLargeFOV : single branch, r= 12 .ASPP-S : four branches, r\n={2, 4, 8, 12}.ASPP-L : four branches, r={6, 12, 18, 24}.\nMSC COCO Aug LargeFOV ASPP CRF mIOU\n68.72\n✓ 71.27\n✓ ✓ 73.28\n✓ ✓ ✓ 74.87\n✓ ✓ ✓ ✓ 75.54\n✓ ✓ ✓ ✓ 76.35\n✓ ✓ ✓ ✓ ✓ 77.69\nTABLE 4: Employing ResNet-101 for DeepLab on PASCAL\nVOC 2012 valset.MSC : Employing mutli-scale inputs with\nmax fusion. COCO : Models pretrained on MS-COCO. Aug :\nData augmentation by randomly rescaling inputs.\nposed residual net ResNet-101 [11] instead of VGG-16. Sim-\n(a) Image (b) LargeFOV (c) ASPP-S (d) ASPP-L\nFig. 8: Qualitative segmentation results with ASPP com-\npared to the baseline LargeFOV model. The ASPP-L model,\nemploying multiple large FOVs can successfully capture\nobjects as well as image context at multiple scales.\nilar to what we did for VGG-16 net, we re-purpose ResNet-\n101 by atrous convolution, as described in Sec. 3.1. On top of\nthat, we adopt several other features, following recent work\nof [17], [18], [39], [40], [58], [59], [62]: (1) Multi-scale inputs:\nWe separately feed to the DCNN images at scale = {0.5, 0.75,\n1}, fusing their score maps by taking the maximum response\nacross scales for each position separately [17]. (2) Models\npretrained on MS-COCO [87]. (3) Data augmentation by\nrandomly scaling the input images (from 0.5 to 1.5) during\ntraining. In Tab. 4, we evaluate how each of these factors,\nalong with LargeFOV and atrous spatial pyramid pooling\n(ASPP), affects valset performance. Adopting ResNet-101\ninstead of VGG-16 significantly improves DeepLab perfor-\nmance ( e.g., our simplest ResNet-101 based model attains\n68.72%, compared to 65.76% of our DeepLab-LargeFOV\nVGG-16 based variant, both before CRF). Multiscale fusion\n[17] brings extra 2.55% improvement, while pretraining\nthe model on MS-COCO gives another 2.01% gain. Data\naugmentation during training is effective (about 1.6% im-\nprovement). Employing LargeFOV (adding an atrous con-\nvolutional layer on top of ResNet, with 3×3kernel and rate\n= 12) is beneficial (about 0.6% improvement). Further 0.8%\nimprovement is achieved by atrous spatial pyramid pooling\n(ASPP). Post-processing our best model by dense CRF yields 9\nperformance of 77.69%.\nQualitative results: We provide qualitative visual com-\nparisons of DeepLab’s results (our best model variant)\nbefore and after CRF in Fig. 6. The visualization results\nobtained by DeepLab before CRF already yields excellent\nsegmentation results, while employing the CRF further im-\nproves the performance by removing false positives and\nrefining object boundaries.\nTest set results: We have submitted the result of our\nfinal best model to the official server, obtaining test set\nperformance of 79.7%, as shown in Tab. 5. The model\nsubstantially outperforms previous DeepLab variants ( e.g.,\nDeepLab-LargeFOV with VGG-16 net) and is currently the\ntop performing method on the PASCAL VOC 2012 segmen-\ntation leaderboard.\nMethod mIOU\nDeepLab-CRF-LargeFOV-COCO [58] 72.7\nMERL DEEP GCRF [88] 73.2\nCRF-RNN [59] 74.7\nPOSTECH DeconvNet CRF VOC [61] 74.8\nBoxSup [60] 75.2\nContext + CRF-RNN [76] 75.3\nQOmres\n4[66] 75.5\nDeepLab-CRF-Attention [17] 75.7\nCentraleSuperBoundaries++ [18] 76.0\nDeepLab-CRF-Attention-DT [63] 76.3\nH-ReNet + DenseCRF [89] 76.8\nLRR 4xCOCO [90] 76.8\nDPN [62] 77.5\nAdelaide Context [40] 77.8\nOxford TVG HO CRF [91] 77.9\nContext CRF + Guidance CRF [92] 78.1\nAdelaide VeryDeep FCN VOC [93] 79.1\nDeepLab-CRF (ResNet-101) 79.7\nTABLE 5: Performance on PASCAL VOC 2012 testset. We\nhave added some results from recent arXiv papers on top of\nthe official leadearboard results.\nVGG-16 vs. ResNet-101: We have observed that\nDeepLab based on ResNet-101 [11] delivers better segmen-\ntation results along object boundaries than employing VGG-\n16 [4], as visualized in Fig. 9. We think the identity mapping\n[94] of ResNet-101 has similar effect as hyper-column fea-\ntures [21], which exploits the features from the intermediate\nlayers to better localize boundaries. We further quantize this\neffect in Fig. 10 within the “trimap” [22], [31] (a narrow band\nalong object boundaries). As shown in the figure, employing\nResNet-101 before CRF has almost the same accuracy along\nobject boundaries as employing VGG-16 in conjunction with\na CRF. Post-processing the ResNet-101 result with a CRF\nfurther improves the segmentation result.\n4.2 PASCAL-Context\nDataset: The PASCAL-Context dataset [35] provides de-\ntailed semantic labels for the whole scene, including both\nobject ( e.g., person) and stuff ( e.g., sky). Following [35], the\nproposed models are evaluated on the most frequent 59\nclasses along with one background category. The training\nset and validation set contain 4998 and 5105 images.\nEvaluation: We report the evaluation results in Tab. 6.\nOur VGG-16 based LargeFOV variant yields 37.6% before\nand 39.6% after CRF. Repurposing the ResNet-101 [11] for\nImage VGG-16 Bef. VGG-16 Aft. ResNet Bef. ResNet Aft.\nFig. 9: DeepLab results based on VGG-16 net or ResNet-\n101 before and after CRF. The CRF is critical for accurate\nprediction along object boundaries with VGG-16, whereas\nResNet-101 has acceptable performance even before CRF.\n0 510 15 20 25 30 35 4045505560657075mean IOU (%)\nTrimap Width (pixels) \nResNet aft\nVGG−16 aft\nResNet bef\nVGG−16 bef\n(a) (b)\nFig. 10: (a) Trimap examples (top-left: image. top-right:\nground-truth. bottom-left: trimap of 2 pixels. bottom-right:\ntrimap of 10 pixels). (b) Pixel mean IOU as a function of the\nband width around the object boundaries when employing\nVGG-16 or ResNet-101 before and after CRF.\nMethod MSC COCO Aug LargeFOV ASPP CRF mIOU\nVGG-16\nDeepLab [38] ✓ 37.6\nDeepLab [38] ✓ ✓ 39.6\nResNet-101\nDeepLab 39.6\nDeepLab ✓ ✓ 41.4\nDeepLab ✓ ✓ ✓ 42.9\nDeepLab ✓ ✓ ✓ ✓ 43.5\nDeepLab ✓ ✓ ✓ ✓ 44.7\nDeepLab ✓ ✓ ✓ ✓ ✓ 45.7\nO2P[45] 18.1\nCFM [51] 34.4\nFCN-8s [14] 37.8\nCRF-RNN [59] 39.3\nParseNet [86] 40.4\nBoxSup [60] 40.5\nHO CRF [91] 41.3\nContext [40] 43.3\nVeryDeep [93] 44.5\nTABLE 6: Comparison with other state-of-art methods on\nPASCAL-Context dataset.\nDeepLab improves 2% over the VGG-16 LargeFOV . Simi-\nlar to [17], employing multi-scale inputs and max-pooling\nto merge the results improves the performance to 41.4%.\nPretraining the model on MS-COCO brings extra 1.5%\nimprovement. Employing atrous spatial pyramid pooling\nis more effective than LargeFOV . After further employing\ndense CRF as post processing, our final model yields 45.7%,\noutperforming the current state-of-art method [40] by 2.4%\nwithout using their non-linear pairwise term. Our final\nmodel is slightly better than the concurrent work [93] by\n1.2%, which also employs atrous convolution to repurpose 10\nFig. 11: PASCAL-Context results. Input image, ground-truth, and our DeepLab results before/after CRF.\nMethod MSC COCO Aug LFOV ASPP CRF mIOU\nResNet-101\nDeepLab 58.90\nDeepLab ✓ ✓ 63.10\nDeepLab ✓ ✓ ✓ 64.40\nDeepLab ✓ ✓ ✓ ✓ 64.94\nDeepLab ✓ ✓ ✓ ✓ 62.18\nDeepLab ✓ ✓ ✓ ✓ 62.76\nAttention [17] 56.39\nHAZN [95] 57.54\nLG-LSTM [96] 57.97\nGraph LSTM [97] 60.16\nTABLE 7: Comparison with other state-of-art methods on\nPASCAL-Person-Part dataset.\nthe residual net of [11] for semantic segmentation.\nQualitative results: We visualize the segmentation re-\nsults of our best model with and without CRF as post pro-\ncessing in Fig. 11. DeepLab before CRF can already predict\nmost of the object/stuff with high accuracy. Employing CRF,\nour model is able to further remove isolated false positives\nand improve the prediction along object/stuff boundaries.\n4.3 PASCAL-Person-Part\nDataset: We further perform experiments on semantic part\nsegmentation [98], [99], using the extra PASCAL VOC 2010\nannotations by [36]. We focus on the person part for the\ndataset, which contains more training data and large varia-\ntion in object scale and human pose. Specifically, the dataset\ncontains detailed part annotations for every person, e.g.\neyes, nose. We merge the annotations to be Head, Torso,\nUpper/Lower Arms and Upper/Lower Legs, resulting in\nsix person part classes and one background class. We only\nuse those images containing persons for training (1716 im-\nages) and validation (1817 images).\nEvaluation: The human part segmentation results on\nPASCAL-Person-Part is reported in Tab. 7. [17] has already\nconducted experiments on this dataset with re-purposed\nVGG-16 net for DeepLab, attaining 56.39% (with multi-scale\ninputs). Therefore, in this part, we mainly focus on the effect\nof repurposing ResNet-101 for DeepLab. With ResNet-101,Method mIOU\npre-release version of dataset\nAdelaide Context [40] 66.4\nFCN-8s [14] 65.3\nDeepLab-CRF-LargeFOV-StrongWeak [58] 64.8\nDeepLab-CRF-LargeFOV [38] 63.1\nCRF-RNN [59] 62.5\nDPN [62] 59.1\nSegnet basic [100] 57.0\nSegnet extended [100] 56.1\nofficial version\nAdelaide Context [40] 71.6\nDilation10 [76] 67.1\nDPN [62] 66.8\nPixel-level Encoding [101] 64.3\nDeepLab-CRF (ResNet-101) 70.4\nTABLE 8: Test set results on the Cityscapes dataset, compar-\ning our DeepLab system with other state-of-art methods.\nDeepLab alone yields 58.9%, significantly outperforming\nDeepLab-LargeFOV (VGG-16 net) and DeepLab-Attention\n(VGG-16 net) by about 7% and 2.5%, respectively. Incorpo-\nrating multi-scale inputs and fusion by max-pooling further\nimproves performance to 63.1%. Additionally pretraining\nthe model on MS-COCO yields another 1.3% improvement.\nHowever, we do not observe any improvement when adopt-\ning either LargeFOV or ASPP on this dataset. Employing\nthe dense CRF to post process our final output substantially\noutperforms the concurrent work [97] by 4.78%.\nQualitative results: We visualize the results in Fig. 12.\n4.4 Cityscapes\nDataset: Cityscapes [37] is a recently released large-scale\ndataset, which contains high quality pixel-level annotations\nof 5000 images collected in street scenes from 50 different\ncities. Following the evaluation protocol [37], 19 semantic\nlabels (belonging to 7 super categories: ground, construc-\ntion, object, nature, sky, human, and vehicle) are used for\nevaluation (the void label is not considered for evaluation).\nThe training, validation, and test sets contain 2975, 500, and\n1525 images respectively. 11\nFig. 12: PASCAL-Person-Part results. Input image, ground-truth, and our DeepLab results before/after CRF.\nFig. 13: Cityscapes results. Input image, ground-truth, and our DeepLab results before/after CRF.\nFull Aug LargeFOV ASPP CRF mIOU\nVGG-16\n✓ 62.97\n✓ ✓ 64.18\n✓ ✓ 64.89\n✓ ✓ ✓ 65.94\nResNet-101\n✓ 66.6\n✓ ✓ 69.2\n✓ ✓ 70.4\n✓ ✓ ✓ 71.0\n✓ ✓ ✓ ✓ 71.4\nTABLE 9: Val set results on Cityscapes dataset. Full: model\ntrained with full resolution images.\nTest set results of pre-release: We have participated in\nbenchmarking the Cityscapes dataset pre-release. As shown\nin the top of Tab. 8, our model attained third place, with per-\nformance of 63.1% and 64.8% (with training on additional\ncoarsely annotated images).\nVal set results: After the initial release, we further ex-plored the validation set in Tab. 9. The images of Cityscapes\nhave resolution 2048×1024 , making it a challenging prob-\nlem to train deeper networks with limited GPU memory.\nDuring benchmarking the pre-release of the dataset, we\ndownsampled the images by 2. However, we have found\nthat it is beneficial to process the images in their original\nresolution. With the same training protocol, using images\nof original resolution significantly brings 1.9% and 1.8%\nimprovements before and after CRF, respectively. In order\nto perform inference on this dataset with high resolution\nimages, we split each image into overlapped regions, similar\nto [37]. We have also replaced the VGG-16 net with ResNet-\n101. We do not exploit multi-scale inputs due to the lim-\nited GPU memories at hand. Instead, we only explore (1)\ndeeper networks ( i.e., ResNet-101), (2) data augmentation,\n(3) LargeFOV or ASPP , and (4) CRF as post processing\non this dataset. We first find that employing ResNet-101\nalone is better than using VGG-16 net. Employing LargeFOV\nbrings 2.6% improvement and using ASPP further improves\nresults by 1.2%. Adopting data augmentation and CRF as\npost processing brings another 0.6% and 0.4%, respectively. 12\n(a) Image (b) G.T. (c) Before CRF (d) After CRF\nFig. 14: Failure modes. Input image, ground-truth, and our\nDeepLab results before/after CRF.\nCurrent test result: We have uploaded our best model to\nthe evaluation server, obtaining performance of 70.4%. Note\nthat our model is only trained on the train set.\nQualitative results: We visualize the results in Fig. 13.\n4.5 Failure Modes\nWe further qualitatively analyze some failure modes of\nour best model variant on PASCAL VOC 2012 valset. As\nshown in Fig. 14, our proposed model fails to capture the\ndelicate boundaries of objects, such as bicycle and chair.\nThe details could not even be recovered by the CRF post\nprocessing since the unary term is not confident enough.\nWe hypothesize the encoder-decoder structure of [100], [102]\nmay alleviate the problem by exploiting the high resolution\nfeature maps in the decoder path. How to efficiently incor-\nporate the method is left as a future work.\n5 C ONCLUSION\nOur proposed “DeepLab” system re-purposes networks\ntrained on image classification to the task of semantic seg-\nmentation by applying the ‘atrous convolution’ with upsam-\npled filters for dense feature extraction. We further extend it\nto atrous spatial pyramid pooling, which encodes objects as\nwell as image context at multiple scales. To produce seman-\ntically accurate predictions and detailed segmentation maps\nalong object boundaries, we also combine ideas from deep\nconvolutional neural networks and fully-connected condi-\ntional random fields. Our experimental results show that\nthe proposed method significantly advances the state-of-\nart in several challenging datasets, including PASCAL VOC\n2012 semantic image segmentation benchmark, PASCAL-\nContext, PASCAL-Person-Part, and Cityscapes datasets.\nACKNOWLEDGMENTS\nThis work was partly supported by the ARO 62250-CS,\nFP7-RECONFIG, FP7-MOBOT, and H2020-ISUPPORT EU\nprojects. We gratefully acknowledge the support of NVIDIA\nCorporation with the donation of GPUs used for this re-\nsearch.\nREFERENCES\n[1] Y. LeCun, L. Bottou, Y. Bengio, and P . Haffner, “Gradient-based\nlearning applied to document recognition,” in Proc. IEEE , 1998.\n[2] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classifi-\ncation with deep convolutional neural networks,” in NIPS , 2013.\n[3] P . Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and\nY. LeCun, “Overfeat: Integrated recognition, localization and\ndetection using convolutional networks,” arXiv:1312.6229 , 2013.[4] K. Simonyan and A. Zisserman, “Very deep convolutional net-\nworks for large-scale image recognition,” in ICLR , 2015.\n[5] C. Szegedy, W. Liu, Y. Jia, P . Sermanet, S. Reed, D. Anguelov,\nD. Erhan, V . Vanhoucke, and A. Rabinovich, “Going deeper with\nconvolutions,” arXiv:1409.4842 , 2014.\n[6] G. Papandreou, I. Kokkinos, and P .-A. Savalle, “Modeling local\nand global deformations in deep learning: Epitomic convolution,\nmultiple instance learning, and sliding window detection,” in\nCVPR , 2015.\n[7] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature\nhierarchies for accurate object detection and semantic segmenta-\ntion,” in CVPR , 2014.\n[8] D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov, “Scalable\nobject detection using deep neural networks,” in CVPR , 2014.\n[9] R. Girshick, “Fast r-cnn,” in ICCV , 2015.\n[10] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-\ntime object detection with region proposal networks,” in NIPS ,\n2015.\n[11] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for\nimage recognition,” arXiv:1512.03385 , 2015.\n[12] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, and S. Reed, “SSD:\nSingle shot multibox detector,” arXiv:1512.02325 , 2015.\n[13] M. D. Zeiler and R. Fergus, “Visualizing and understanding\nconvolutional networks,” in ECCV , 2014.\n[14] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional net-\nworks for semantic segmentation,” in CVPR , 2015.\n[15] M. Holschneider, R. Kronland-Martinet, J. Morlet, and\nP . Tchamitchian, “A real-time algorithm for signal analysis with\nthe help of the wavelet transform,” in Wavelets: Time-Frequency\nMethods and Phase Space , 1989, pp. 289–297.\n[16] A. Giusti, D. Ciresan, J. Masci, L. Gambardella, and J. Schmidhu-\nber, “Fast image scanning with deep max-pooling convolutional\nneural networks,” in ICIP , 2013.\n[17] L.-C. Chen, Y. Yang, J. Wang, W. Xu, and A. L. Yuille, “Attention\nto scale: Scale-aware semantic image segmentation,” in CVPR ,\n2016.\n[18] I. Kokkinos, “Pushing the boundaries of boundary detection\nusing deep learning,” in ICLR , 2016.\n[19] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features:\nSpatial pyramid matching for recognizing natural scene cate-\ngories,” in CVPR , 2006.\n[20] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in\ndeep convolutional networks for visual recognition,” in ECCV ,\n2014.\n[21] B. Hariharan, P . Arbel ´aez, R. Girshick, and J. Malik, “Hyper-\ncolumns for object segmentation and fine-grained localization,”\ninCVPR , 2015.\n[22] P . Kr ¨ahenb ¨uhl and V . Koltun, “Efficient inference in fully con-\nnected crfs with gaussian edge potentials,” in NIPS , 2011.\n[23] C. Rother, V . Kolmogorov, and A. Blake, “GrabCut: Interactive\nforeground extraction using iterated graph cuts,” in SIGGRAPH ,\n2004.\n[24] J. Shotton, J. Winn, C. Rother, and A. Criminisi, “Textonboost for\nimage understanding: Multi-class object recognition and segmen-\ntation by jointly modeling texture, layout, and context,” IJCV ,\n2009.\n[25] A. Lucchi, Y. Li, X. Boix, K. Smith, and P . Fua, “Are spatial and\nglobal constraints really necessary for segmentation?” in ICCV ,\n2011.\n[26] X. He, R. S. Zemel, and M. Carreira-Perpindn, “Multiscale condi-\ntional random fields for image labeling,” in CVPR , 2004.\n[27] L. Ladicky, C. Russell, P . Kohli, and P . H. Torr, “Associative\nhierarchical crfs for object class image segmentation,” in ICCV ,\n2009.\n[28] V . Lempitsky, A. Vedaldi, and A. Zisserman, “Pylon model for\nsemantic segmentation,” in NIPS , 2011.\n[29] A. Delong, A. Osokin, H. N. Isack, and Y. Boykov, “Fast approxi-\nmate energy minimization with label costs,” IJCV , 2012.\n[30] J. M. Gonfaus, X. Boix, J. Van de Weijer, A. D. Bagdanov, J. Serrat,\nand J. Gonzalez, “Harmony potentials for joint classification and\nsegmentation,” in CVPR , 2010.\n[31] P . Kohli, P . H. Torr et al. , “Robust higher order potentials for\nenforcing label consistency,” IJCV , vol. 82, no. 3, pp. 302–324,\n2009.\n[32] L.-C. Chen, G. Papandreou, and A. Yuille, “Learning a dictionary\nof shape epitomes with applications to image labeling,” in ICCV ,\n2013. 13\n[33] P . Wang, X. Shen, Z. Lin, S. Cohen, B. Price, and A. Yuille,\n“Towards unified depth and semantic prediction from a single\nimage,” in CVPR , 2015.\n[34] M. Everingham, S. M. A. Eslami, L. V . Gool, C. K. I. Williams,\nJ. Winn, and A. Zisserma, “The pascal visual object classes\nchallenge a retrospective,” IJCV , 2014.\n[35] R. Mottaghi, X. Chen, X. Liu, N.-G. Cho, S.-W. Lee, S. Fidler,\nR. Urtasun, and A. Yuille, “The role of context for object detection\nand semantic segmentation in the wild,” in CVPR , 2014.\n[36] X. Chen, R. Mottaghi, X. Liu, S. Fidler, R. Urtasun, and A. Yuille,\n“Detect what you can: Detecting and representing objects using\nholistic models and body parts,” in CVPR , 2014.\n[37] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler,\nR. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes\ndataset for semantic urban scene understanding,” in CVPR , 2016.\n[38] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L.\nYuille, “Semantic image segmentation with deep convolutional\nnets and fully connected crfs,” in ICLR , 2015.\n[39] C. Farabet, C. Couprie, L. Najman, and Y. LeCun, “Learning\nhierarchical features for scene labeling,” P AMI , 2013.\n[40] G. Lin, C. Shen, I. Reid et al. , “Efficient piecewise training of deep\nstructured models for semantic segmentation,” arXiv:1504.01013 ,\n2015.\n[41] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick,\nS. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture\nfor fast feature embedding,” arXiv:1408.5093 , 2014.\n[42] Z. Tu and X. Bai, “Auto-context and its application to high-\nlevel vision tasks and 3d brain image segmentation,” IEEE Trans.\nPattern Anal. Mach. Intell. , vol. 32, no. 10, pp. 1744–1757, 2010.\n[43] J. Shotton, M. Johnson, and R. Cipolla, “Semantic texton forests\nfor image categorization and segmentation,” in CVPR , 2008.\n[44] B. Fulkerson, A. Vedaldi, and S. Soatto, “Class segmentation\nand object localization with superpixel neighborhoods,” in ICCV ,\n2009.\n[45] J. Carreira, R. Caseiro, J. Batista, and C. Sminchisescu, “Semantic\nsegmentation with second-order pooling,” in ECCV , 2012.\n[46] J. Carreira and C. Sminchisescu, “CPMC: Automatic object seg-\nmentation using constrained parametric min-cuts,” P AMI , vol. 34,\nno. 7, pp. 1312–1328, 2012.\n[47] P . Arbel ´aez, J. Pont-Tuset, J. T. Barron, F. Marques, and J. Malik,\n“Multiscale combinatorial grouping,” in CVPR , 2014.\n[48] J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders,\n“Selective search for object recognition,” IJCV , 2013.\n[49] B. Hariharan, P . Arbel ´aez, R. Girshick, and J. Malik, “Simultane-\nous detection and segmentation,” in ECCV , 2014.\n[50] M. Mostajabi, P . Yadollahpour, and G. Shakhnarovich, “Feedfor-\nward semantic segmentation with zoom-out features,” in CVPR ,\n2015.\n[51] J. Dai, K. He, and J. Sun, “Convolutional feature masking for joint\nobject and stuff segmentation,” arXiv:1412.1283 , 2014.\n[52] D. Eigen and R. Fergus, “Predicting depth, surface normals\nand semantic labels with a common multi-scale convolutional\narchitecture,” arXiv:1411.4734 , 2014.\n[53] M. Cogswell, X. Lin, S. Purushwalkam, and D. Batra, “Combining\nthe best of graphical models and convnets for semantic segmen-\ntation,” arXiv:1412.4313 , 2014.\n[54] D. Geiger and F. Girosi, “Parallel and deterministic algorithms\nfrom mrfs: Surface reconstruction,” P AMI , vol. 13, no. 5, pp. 401–\n412, 1991.\n[55] D. Geiger and A. Yuille, “A common framework for image\nsegmentation,” IJCV , vol. 6, no. 3, pp. 227–243, 1991.\n[56] I. Kokkinos, R. Deriche, O. Faugeras, and P . Maragos, “Computa-\ntional analysis and learning for a biologically motivated model of\nboundary detection,” Neurocomputing , vol. 71, no. 10, pp. 1798–\n1812, 2008.\n[57] S. Bell, P . Upchurch, N. Snavely, and K. Bala, “Material recog-\nnition in the wild with the materials in context database,”\narXiv:1412.0623 , 2014.\n[58] G. Papandreou, L.-C. Chen, K. Murphy, and A. L. Yuille,\n“Weakly- and semi-supervised learning of a dcnn for semantic\nimage segmentation,” in ICCV , 2015.\n[59] S. Zheng, S. Jayasumana, B. Romera-Paredes, V . Vineet, Z. Su,\nD. Du, C. Huang, and P . Torr, “Conditional random fields as\nrecurrent neural networks,” in ICCV , 2015.\n[60] J. Dai, K. He, and J. Sun, “Boxsup: Exploiting bounding boxes to\nsupervise convolutional networks for semantic segmentation,” in\nICCV , 2015.[61] H. Noh, S. Hong, and B. Han, “Learning deconvolution network\nfor semantic segmentation,” in ICCV , 2015.\n[62] Z. Liu, X. Li, P . Luo, C. C. Loy, and X. Tang, “Semantic image\nsegmentation via deep parsing network,” in ICCV , 2015.\n[63] L.-C. Chen, J. T. Barron, G. Papandreou, K. Murphy, and A. L.\nYuille, “Semantic image segmentation with task-specific edge\ndetection using cnns and a discriminatively trained domain\ntransform,” in CVPR , 2016.\n[64] L.-C. Chen, A. Schwing, A. Yuille, and R. Urtasun, “Learning\ndeep structured models,” in ICML , 2015.\n[65] A. G. Schwing and R. Urtasun, “Fully connected deep structured\nnetworks,” arXiv:1503.02351 , 2015.\n[66] S. Chandra and I. Kokkinos, “Fast, exact and multi-scale inference\nfor semantic image segmentation with deep Gaussian CRFs,”\narXiv:1603.08358 , 2016.\n[67] E. S. L. Gastal and M. M. Oliveira, “Domain transform for edge-\naware image and video processing,” in SIGGRAPH , 2011.\n[68] G. Bertasius, J. Shi, and L. Torresani, “High-for-low and low-for-\nhigh: Efficient boundary detection from deep object features and\nits applications to high-level vision,” in ICCV , 2015.\n[69] P . O. Pinheiro and R. Collobert, “Weakly supervised seman-\ntic segmentation with convolutional networks,” arXiv:1411.6228 ,\n2014.\n[70] D. Pathak, P . Kr ¨ahenb ¨uhl, and T. Darrell, “Constrained convo-\nlutional neural networks for weakly supervised segmentation,”\n2015.\n[71] S. Hong, H. Noh, and B. Han, “Decoupled deep neural network\nfor semi-supervised semantic segmentation,” in NIPS , 2015.\n[72] A. Vezhnevets, V . Ferrari, and J. M. Buhmann, “Weakly su-\npervised semantic segmentation with a multi-image model,” in\nICCV , 2011.\n[73] X. Liang, Y. Wei, X. Shen, J. Yang, L. Lin, and S. Yan, “Proposal-\nfree network for instance-level object segmentation,” arXiv\npreprint arXiv:1509.02636 , 2015.\n[74] J. E. Fowler, “The redundant discrete wavelet transform and\nadditive noise,” IEEE Signal Processing Letters , vol. 12, no. 9, pp.\n629–632, 2005.\n[75] P . P . Vaidyanathan, “Multirate digital filters, filter banks,\npolyphase networks, and applications: a tutorial,” Proceedings of\nthe IEEE , vol. 78, no. 1, pp. 56–93, 1990.\n[76] F. Yu and V . Koltun, “Multi-scale context aggregation by dilated\nconvolutions,” in ICLR , 2016.\n[77] J. Dai, Y. Li, K. He, and J. Sun, “R-fcn: Object detection via region-\nbased fully convolutional networks,” arXiv:1605.06409 , 2016.\n[78] J. Dai, K. He, Y. Li, S. Ren, and J. Sun, “Instance-sensitive fully\nconvolutional networks,” arXiv:1603.08678 , 2016.\n[79] K. Chen, J. Wang, L.-C. Chen, H. Gao, W. Xu, and R. Nevatia,\n“Abc-cnn: An attention based convolutional neural network for\nvisual question answering,” arXiv:1511.05960 , 2015.\n[80] L. Sevilla-Lara, D. Sun, V . Jampani, and M. J. Black, “Op-\ntical flow with semantic segmentation and localized layers,”\narXiv:1603.03911 , 2016.\n[81] Z. Wu, C. Shen, and A. van den Hengel, “High-performance\nsemantic segmentation using very deep fully convolutional net-\nworks,” arXiv:1604.04339 , 2016.\n[82] M. J. Shensa, “The discrete wavelet transform: wedding the a\ntrous and mallat algorithms,” Signal Processing, IEEE Transactions\non, vol. 40, no. 10, pp. 2464–2482, 1992.\n[83] M. Abadi, A. Agarwal et al. , “Tensorflow: Large-scale\nmachine learning on heterogeneous distributed systems,”\narXiv:1603.04467 , 2016.\n[84] A. Adams, J. Baek, and M. A. Davis, “Fast high-dimensional\nfiltering using the permutohedral lattice,” in Eurographics , 2010.\n[85] B. Hariharan, P . Arbel ´aez, L. Bourdev, S. Maji, and J. Malik,\n“Semantic contours from inverse detectors,” in ICCV , 2011.\n[86] W. Liu, A. Rabinovich, and A. C. Berg, “Parsenet: Looking wider\nto see better,” arXiv:1506.04579 , 2015.\n[87] T.-Y. Lin et al. , “Microsoft COCO: Common objects in context,” in\nECCV , 2014.\n[88] R. Vemulapalli, O. Tuzel, M.-Y. Liu, and R. Chellappa, “Gaussian\nconditional random field network for semantic segmentation,” in\nCVPR , 2016.\n[89] Z. Yan, H. Zhang, Y. Jia, T. Breuel, and Y. Yu, “Combining the best\nof convolutional layers and recurrent layers: A hybrid network\nfor semantic segmentation,” arXiv:1603.04871 , 2016.\n[90] G. Ghiasi and C. C. Fowlkes, “Laplacian reconstruction and\nrefinement for semantic segmentation,” arXiv:1605.02264 , 2016. 14\n[91] A. Arnab, S. Jayasumana, S. Zheng, and P . Torr, “Higher order\npotentials in end-to-end trainable conditional random fields,”\narXiv:1511.08119 , 2015.\n[92] F. Shen and G. Zeng, “Fast semantic image segmentation with\nhigh order context and guided filtering,” arXiv:1605.04068 , 2016.\n[93] Z. Wu, C. Shen, and A. van den Hengel, “Bridging\ncategory-level and instance-level semantic image segmentation,”\narXiv:1605.06885 , 2016.\n[94] K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep\nresidual networks,” arXiv:1603.05027 , 2016.\n[95] F. Xia, P . Wang, L.-C. Chen, and A. L. Yuille, “Zoom better to\nsee clearer: Huamn part segmentation with auto zoom net,”\narXiv:1511.06881 , 2015.\n[96] X. Liang, X. Shen, D. Xiang, J. Feng, L. Lin, and S. Yan, “Se-\nmantic object parsing with local-global long short-term memory,”\narXiv:1511.04510 , 2015.\n[97] X. Liang, X. Shen, J. Feng, L. Lin, and S. Yan, “Semantic object\nparsing with graph lstm,” arXiv:1603.07063 , 2016.\n[98] J. Wang and A. Yuille, “Semantic part segmentation using com-\npositional model combining shape and appearance,” in CVPR ,\n2015.\n[99] P . Wang, X. Shen, Z. Lin, S. Cohen, B. Price, and A. Yuille, “Joint\nobject and part segmentation using deep learned potentials,” in\nICCV , 2015.\n[100] V . Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep\nconvolutional encoder-decoder architecture for image segmenta-\ntion,” arXiv:1511.00561 , 2015.\n[101] J. Uhrig, M. Cordts, U. Franke, and T. Brox, “Pixel-level en-\ncoding and depth layering for instance-level semantic labeling,”\narXiv:1604.05096 , 2016.\n[102] O. Ronneberger, P . Fischer, and T. Brox, “U-net: Convolutional\nnetworks for biomedical image segmentation,” in MICCAI , 2015.\nLiang-Chieh Chen received his B.Sc. from Na-\ntional Chiao Tung University, Taiwan, his M.S.\nfrom the University of Michigan- Ann Arbor, and\nhis Ph.D. from the University of California- Los\nAngeles. He is currently working at Google. His\nresearch interests include semantic image seg-\nmentation, probabilistic graphical models, and\nmachine learning.\nGeorge Papandreou (S’03–M’09–SM’14) holds\na Diploma (2003) and a Ph.D. (2009) in Elec-\ntrical Engineering and Computer Science, both\nfrom the National Technical University of Athens\n(NTUA), Greece. He is currently a Research Sci-\nentist at Google, following appointments as Re-\nsearch Assistant Professor at the Toyota Tech-\nnological Institute at Chicago (2013-2014) and\nPostdoctoral Research Scholar at the University\nof California, Los Angeles (2009-2013).\nHis research interests are in computer vision\nand machine learning, with a current emphasis on deep learning. He\nregularly serves as a reviewer and program committee member to the\nmain journals and conferences in computer vision, image processing,\nand machine learning. He has been a co-organizer of the NIPS 2012,\n2013, and 2014 Workshops on Perturbations, Optimization, and Statis-\ntics and co-editor of a book on the same topic (MIT Press, 2016).\nIasonas Kokkinos (S’02–M’06) obtained the\nDiploma of Engineering in 2001 and the Ph.D.\nDegree in 2006 from the School of Electrical and\nComputer Engineering of the National Technical\nUniversity of Athens in Greece, and the Habili-\ntation Degree in 2013 from Universit Paris-Est.\nIn 2006 he joined the University of California at\nLos Angeles as a postdoctoral scholar, and in\n2008 joined as faculty the Department of Applied\nMathematics of Ecole Centrale Paris (Centrale-\nSupelec), working an associate professor in the\nCenter for Visual Computing of CentraleSupelec and affiliate researcher\nat INRIA-Saclay. In 2016 he joined University College London and Face-\nbook Artificial Intelligence Research. His currently research activity is on\ndeep learning for computer vision, focusing in particular on structured\nprediction for deep learning, shape modeling, and multi-task learning\narchitectures. He has been awarded a young researcher grant by the\nFrench National Research Agency, has served as associate editor for\nthe Image and Vision Computing and Computer Vision and Image\nUnderstanding Journals, serves regularly as a reviewer and area chair\nfor all major computer vision conferences and journals.\nKevin Murphy was born in Ireland, grew up in\nEngland, went to graduate school in the USA\n(MEng from U. Penn, PhD from UC Berkeley,\nPostdoc at MIT), and then became a professor\nat the Computer Science and Statistics Depart-\nments at the University of British Columbia in\nVancouver, Canada in 2004. After getting tenure,\nKevin went to Google in Mountain View, Cali-\nfornia for his sabbatical. In 2011, he converted\nto a full-time research scientist at Google. Kevin\nhas published over 50 papers in refereed con-\nferences and journals related to machine learning and graphical mod-\nels. He has recently published an 1100-page textbook called “Machine\nLearning: a Probabilistic Perspective” (MIT Press, 2012).\nAlan L. Yuille (F’09) received the BA degree in\nmath- ematics from the University of Cambridge\nin 1976. His PhD on theoretical physics, super-\nvised by Prof. S.W. Hawking, was approved in\n1981. He was a research scientist in the Artificial\nIntelligence Laboratory at MIT and the Division\nof Applied Sciences at Harvard University from\n1982 to 1988. He served as an assistant and\nassociate professor at Harvard until 1996. He\nwas a senior research scientist at the Smith-\nKettlewell Eye Research Institute from 1996 to\n2002. He joined the University of California, Los Angeles, as a full\nprofessor with a joint appointment in statistics and psychology in 2002,\nand computer science in 2007. He was appointed a Bloomberg Dis-\ntinguished Professor at Johns Hopkins University in January 2016. He\nholds a joint appointment between the Departments of Cognitive science\nand Computer Science. His research interests include computational\nmodels of vision, mathematical models of cognition, and artificial intelli-\ngence and neural network" -} -{ - "pm_id": "https://pubmed.ncbi.nlm.nih.gov/16928733", - "pdf_text": "HAL Id: hal-01577887\nhttps://hal.science/hal-01577887\nSubmitted on 28 Aug 2017\nHAL is a multi-disciplinary open access\narchive for the deposit and dissemination of sci-\nentific research documents, whether they are pub-\nlished or not. The documents may come from\nteaching and research institutions in F rance or\nabroad, or from public or private research centers.L’archive ouverte pluridisciplinaire HAL , est\ndestinée au dépôt et à la diffusion de documents\nscientifiques de niveau recherche, publiés ou non,\némanant des établissements d’enseignement et de\nrecherche français ou étrangers, des laboratoires\npublics ou privés.\nUltra-deep sequencing of foraminiferal microbarcodes\nunveils hidden richness of early monothalamous lineages\nin deep-sea sediments\nBéatrice Lecroq, F ranck Lejzerowicz, Dipankar Bachar, Richard Christen,\nMagne P Østerås, Philippe Esling, Loïc P Baerlocher, Laurent P F arinelli, Jan\nP Pawlowski\nT o cite this version:\nBéatrice Lecroq, F ranck Lejzerowicz, Dipankar Bachar, Richard Christen, Magne P Østerås, et al..\nUltra-deep sequencing of foraminiferal microbarcodes unveils hidden richness of early monothalamous\nlineages in deep-sea sediments. Proceedings of the National Academy of Sciences of the United States\nof America, 2011, 108 (32), pp.13177-13182. ￿10.1093/bioinformatics/btl446￿. ￿hal-01577887￿ Ultra-deep sequencing of foraminiferal microbarcodes\nunveils hidden richness of early monothalamous\nlineages in deep-sea sediments\nBéatrice Lecroqa,b,1, Franck Lejzerowicza,1, Dipankar Bacharc,d, Richard Christenc,d, Philippe Eslinge, Loïc Baerlocherf,\nMagne Østeråsf, Laurent Farinellif, and Jan Pawlowskia,2\naDepartment of Genetics and Evolution, University of Geneva, CH-1211 Geneva 4, Switzerland;bInstitute of Biogeosciences, Japan Agency for Marine-Earth\nScience and Technology, Yokosuka 237-0061, Japan;cCentre National de la Recherche Scienti fique, UMR 6543 anddUniversité de Nice-Sophia-Antipolis, Unité\nMixte de Recherche 6543, Centre de Biochimie, Faculté des Sciences, F06108 Nice, France;eInstitut de Recherche et Coordination Acoustique/Musique, 75004\nParis, France; andfFASTERIS SA, 1228 Plan-les-Ouates, Switzerland\nEdited* by James P. Kennett, University of California, Santa Barbara, CA, and approved June 20, 2011 (received for review December 8, 2010)\nDeep-sea floors represent one of the largest and most complex\necosystems on Earth but remain essentially unexplored. The\nvastness and remoteness of this ecosystem make deep-sea sam-\npling dif ficult, hampering traditional taxonomic observations and\ndiversity assessment. This problem is particularly true in the case of\nthe deep-sea meiofauna, which largely comprises small-sized, frag-\nile, and dif ficult-to-identify metazoans and protists. Here, we in-\ntroduce an ultra-deep sequencing-based metagenetic approach to\nexamine the richness of benthic foraminifera, a principal compo-\nnent of deep-sea meiofauna. We used Illumina sequencing technol-\nogy to assess foraminiferal richness in 31 unsieved deep-sea\nsediment samples from five distinct oceanic regions. We sequenced\nan extremely short fragment (36 bases) of the small subunit ribo-\nsomal DNA hypervariable region 37f, which has been shown to\naccurately distinguish foraminiferal species. In total, we obtained\n495,978 unique sequences that were grouped into 1,643 opera-\ntional taxonomic units, of which about half (841) could be reliably\nassigned to foraminifera. The vast majority of the operational tax-\nonomic units (nearly 90%) were either assigned to early (ancient)\nlineages of soft-walled, single-chambered (monothalamous) fora-\nminifera or remained undetermined and yet possibly belong to un-\nknown early lineages. Contrasting with the classical view of\nmultichambered taxa dominating foraminiferal assemblages, our\nwork re flects an unexpected diversity of monothalamous lineages\nthat are as yet unknown using conventional micropaleontological\nobservations. Although we can only speculate about their mor-\nphology, the immense richness of deep-sea phylotypes revealed\nby this study suggests that ultra-deep sequencing can improve un-\nderstanding of deep-sea benthic diversity considered until now as\nunknowable based on a traditional taxonomic approach.\nDNA barcoding |next-generation sequencing |small subunit ribosomal\nRNA |microbial eukaryote |cosmopolitanism\nDeep-sea sediments are home for a wide range of small-sized\nmetazoan and protistan taxa. The diversity of this meiofaunal\ncommunity is dif ficult to estimate because its study suffers from\nundersampling, dif ficult access, and the problems involved in cul-\nturing deep-sea organisms. Additionally, most of the deep-sea\nspecies are tiny, fragile, and dif ficult to identify. Benthic forami-\nnifera form one of the most abundant and diverse groups of deep-\nsea meiofauna, found even in the deepest ocean trenches (1).\nParticularly in abyssal areas, a large proportion of deep-sea fora-\nminifera belongs to early lineages characterized by simple, single-\nchambered (monothalamous), organic-walled or agglutinated tests,\nwhich are poorly preserved in the fossil record (2). These early\nmonothalamous lineages traditionally classi fied in orders Allogro-\nmiida and Astrorhiza have been proposed to form a large radiation\nin the Neoproterozoic, well before the first multichambered fora-\nminifera appeared (3). However, the assessment of their diversity\nis hampered by the fragmentation of their delicate tests, a lackof distinctive morphological characters, and their unfamiliarity to\nmeiofaunal workers, which means that they are often overlooked.\nDuring the past decade, molecular studies revealed an aston-\nishing diversity of early foraminifera (4), along with numerous\ndescriptions of new deep-sea monothalamous species and genera\n(5). The sequences of early lineages were particularly abundant\nin environmental DNA surveys of marine (6), freshwater (7), and\nsoil (8) ecosystems. Eight new family-rank clades branching at\nthe base of the foraminiferal tree were distinguished in DNA\nextracts of deep Southern Ocean sediments (9). However, all these\nstudies relied on clone libraries, limiting the number of sequences\navailable for analysis.\nBy reducing cloning limitations, next-generation sequencing\n(NGS) methods profoundly altered our perception of microbial\necosystems (10), but only few studies focused on eukaryotes (11,\n12). Until now, most environmental DNA sequence data were\ngenerated by using 454 technology (13). It is only recently that\nIllumina ultra-deep sequencing technology was used for envi-\nronmental microbial diversity assessment (14, 15).\nHere, we present a unique application of Illumina technology\nfor the assessment of eukaryotic diversity. Taking advantage\nof exceptionally high divergence of some short hypervariable\nregions of foraminiferal small subunit (SSU) ribosomal RNA\n(rRNA) genes (16), we used the Illumina platform to examine\nforaminiferal species richness in deep-sea sediments. We mas-\nsively sequenced the 36-bp-long fragment situated at foraminif-\neral speci fic helix 37f of the SSU ribosomal DNA (rDNA) for\na set of 31 samples from the Arctic, North Atlantic, Southern,\nand Paci fic Oceans and the Caribbean Sea, with the cultured\nspecies Reticulomyxa filosaused as a control. Using ultra-deep\nsequencing, the targeted diversity was thoroughly covered with\nthe majority of obtained sequences assigned to early foraminif-\nera, including monothalamids and undetermined basal lineages.\nOur study suggests that these inconspicuous simple foraminifera\nby far outnumbered the well-known multichambered species,\nchallenging our current view of foraminiferal diversity.\nAuthor contributions: B.L. and J.P. designed research; B.L., M.Ø., L.F., and J.P. performed\nresearch; ; P.E., M. Ø., and L.F. contributed new reagents/analytic tools; B.L., F.L., D.B., R.C.,\nP.E., L.B., and J.P. analyzed data; and B.L., F.L., R.C., and J.P. wrote the paper.\nThe authors declare no con flict of interest.\n*This Direct Submission article had a prearranged editor.\nData deposition: The sequence data reported in this paper have been deposited at the\nNCBI Sequence Read Archive.\n1B.L. and F.L. contributed equally to this work.\n2To whom correspondence should be addressed. E-mail: jan.pawlowski@unige.ch.\nThis article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.\n1073/pnas.1018426108/-/DCSupplemental .\nwww.pnas.org/cgi/doi/10.1073/pnas.1018426108 PNAS Early Edition |1o f6\nENVIRONMENTAL\nSCIENCES Results\nShort Tags Analysis. In total we analyzed 78,613,888 reads of 36\nnucleotides (nt) for 31 samples of sur ficial deep-sea sediment (0.5-\nor 5-mL volume; Table S1 ). After quality filtration, 41,534,552\nreads were retained, with the percentage of filtered sequences\nranging from 16.3% (SFA31) to 97.8% (SFA04). All identical\nreads were combined into unique sequence tags, of which the\nnumber ranged from 5,065 (SFA33) to 39,647 (SFA03). After\nremoving singletons, the number of tags per sample averaged\n6,549, with a number of reads per sample ranging from 404,160\n(SFA31) to 3,392,323 (SFA03). For each sample, we reduced\nIllumina sequencing errors with an original approach combining\nafilter based on a second strict dereplication of tags and a one-\npass clustering as explained in Materials and Methods . The number\nof OTUs per sample averaged 116 and ranged from 17 (SFA33) to\n211 (SFA09), with a maximum number of reads for an OTU\nranging from 35,207 (SFA22) to 1,244,352 (SFA17). Saturation\ncurves ( Fig. S1 ) clearly showed that most samples were thoroughly\nsequenced. In general, 20,000 filtered reads were suf ficient to\ncover >95% of the OTUs ’diversity present in a single sample.\nOne of the samples (SFA06) comprised only the cultured\nspecies R.filosa. The sequencing of this sample produced\n2,416,756 reads, corresponding to 1,689 dereplicated tags with at\nleast 2 reads per tag. After filtering and clustering, we recovered\none OTU, which was identical to the 37f hypervariable sequence\nofR.filosapreviously obtained by using classical Sanger tech-\nnology, con firming the ef ficiency of our filtering approach.\nForaminiferal Richness. Each of the 3,609 OTUs obtained for the\n31 samples was taxonomically assigned based on our re fined\ndatabase containing 1,048 reference sequences. The undeter-\nmined OTUs (UNK) that could not be placed con fidently in any\nhigh-level taxonomic groups represented 47% of all OTUs,ranging from 23.5% (SFA33) to 57% (SFA18; Table S2 ). The\nremaining 1,611 OTUs were successively assigned to three tax-\nonomic levels ( Materials and Methods ). At least one-third of the\nOTUs assigned at the highest taxonomic level (corresponding to\nthe order) were placed in the group of monothalamous forami-\nnifera (MON; Fig. 1 A). Together with UNK, they formed >80%\nof OTUs in most of the samples. The multichambered orders\nwere much less represented, with numbers of OTUs assigned to\nTextulariida (TEX) ranging from 0 (SFA33) to 15 (SFA29) and\nthose assigned to Rotaliida (ROT) ranging from 1 (SFA33) to 24\n(SFA29). The OTUs belonging to Miliolida (MIL), Spirillinida\n(SPI), Lagenida (LAG), and Robertinida (ROB) were repre-\nsented by a single OTU at most. In many samples (18 of 31), we\nalso found OTUs assigned to the planktonic order Globigerinida\n(GLO), but their number never exceeded 4 OTUs. The pro-\nportion of major groups was similar between the samples, except\nfor the Paci fic Ocean where GLO were particularly diverse.\nIn total, after combining all samples and average linkage clus-\ntering, the MON and UNK reached nearly 90% of all 1,643 OTUs,\nwhereas the proportion of ROT, TEX, and GLO remained rela-\ntively low, at 6.5%, 4.2%, and 0.6%, respectively (Fig. 1 B). In\nterms of reads, the proportion of UNK diminished (32%), whereas\nthat of ROT and TEX increased. The proportion of reads assigned\nto MON remained close to 40% of all reads (Fig. 1 B).\nThe proportion of OTUs assigned to the second taxonomic\nlevel was high, with up to 74.2% for MON, 57.5% for ROT, 62%\nfor TEX, 66% for ROB, 90% for GLO, and 100% for SPI and\nLAG, although the last four orders were only represented by few\nOTUs (Fig. 1 C). Conversely, the proportion of OTUs assigned to\nthe third taxonomic level, or species level, was low, except for\nGLO (80%). In other groups, this depth of identi fication applied\nto 25% of the OTUs for both MON and ROT and 18% of the\nOTUs for TEX. Unresolved identi fication con flicts were re-\nreads A\nB\nC\nFig. 1. Taxonomic composition of deep-sea Foraminifera assemblages based on microbarcode sequences. ( A) Proportion of OTUs (solid bars) and reads\n(hatched bars) for deep-sea samples grouped according to their geographic origins; replicate samples are grouped above dashed lines. ( B) Proportion of OTUs\n(solid bars) and reads (hatched bars) for the whole data set. ( C) Proportion of OTUs identi fied at species (solid colored areas), family-clade (hatched colored\nareas), and order (solid black areas) level for each foraminiferal order. The numbers above or aside bars indicate the number of OTUs ( Aand B) or million\nreads ( B). Colors correspond to foraminiferal orders: red, MON; green, ROT; dark blue, TEX; white, ROB; dark green, MIL; blue, SPI; pink, GLO.\n2o f6 |www.pnas.org/cgi/doi/10.1073/pnas.1018426108 Lecroq et al. sponsible for up to 2.2% (SFA 25) of OTUs assigned to the UNK\ncategory (0.6% on average). The OTUs that were downgraded to\nthefirst level because of con flicts at the second level and to the\nsecond level because of con flicts at the third level averaged 1.6%\nand 1.7%, respectively. Only 4 OTUs across all samples were\nidenti fied at the order level because of con flicts at the third level.\nAs shown in the maximum likelihood (ML) tree based on\nsequences identi fied by microbarcodes (Fig. 2), the phylogenetic\ndiversity of assigned OTUs is impressive. In addition to rotaliids\nand textulariids, almost all previously de fined clades of mono-\nthalamous foraminifera, including well-supported environmentalforaminifera (ENFOR) clades are represented in our samples.\nThe number of OTUs assigned to some of these clades (clade G,\nENFOR 5) reached the number of OTUs found in the order-\nlevel clades of TEX or ROT. The OTUs that could not be\nproperly assigned to any monothalamous clade were combined\nin a group of “unresolved lineages. ”The large size of this group\n(200 OTUs) indicates that the diversity of monothalamiids is\nmuch higher than represented in the tree (Fig. 2).\nGeographic Ranges. To test the possible extent of cosmopolitan-\nism in the deep sea, we analyzed reads and OTUs found in the\nFig. 2. Phylogenetic diversity of assigned OTUs based on ML analysis of corresponding partial SSU rDNA sequences. The unresolved lineages leaf includes\nmonothalamids sequences that could not be con fidently placed in any of illustrated clades. Horizontal bars represent the number of OTUs corresponding to\neach clade. Sequences identi fied up to the order-level only (MON) were not included. Bootstraps of 95% or more are indicated. Colors correspond to fo-\nraminiferal orders as in Fig. 1.\nLecroq et al. PNAS Early Edition |3o f6\nENVIRONMENTAL\nSCIENCES five distinct geographic zones sampled. In total, 1,643 OTUs and\n30,193,893 reads were examined ( Table S3 ). The number of OTUs\npresent in different zones decreased with the number of zones,\nfrom 181 occurring in two zones to 16 in five zones (Fig. 3 B). The\nproportion of reads in cosmopolitan OTUs (CosmoOTUs) was\nrelatively high (10% of all reads). Half of the CosmoOTUs were\nassigned to MON, whereas the remaining 8 OTUs could not be\nassigned ( Table S4 ). Five cosmopolitan monothalamids, in-\ncluding the two most abundant ones, were assigned con fidently\nto the third level, but none of them was characterized morpho-\nlogically, except CosmoOTU 01, which was assigned to the\nVanhoeffenella clade, a widely distributed deep-sea genus.\nThe relative abundance of the 10 most sequenced CosmoO-\nTUs among geographical zones is shown in Fig. 3 A. Six OTUs\nwere abundantly ( >50%) sequenced in one geographic zone\n(Table S4 ). Among them, 4 were most abundant in the Arctic\nOcean, ranging from 60% to 84.3%. Two OTUs (CosmoOTU 08\nand 11) were evenly distributed with no less than 10% and no\nmore than one-third of the reads in a zone. The most abundantly\nsequenced cosmopolitan OTU (CosmoOTU 01), identi fied as\nVanhoeffenella , was found in at least four samples per zone.\nOverall, Baf fin Bay appeared as the most representative area for\ncosmopolitanism because reads belonging to the 16 cosmopoli-\ntan OTUs occurred in four samples of this zone on average.\nThere was no obvious trend between geographic dispersal and\ntaxonomic composition at the order level (Fig. 3 C), except for\nthe absence of multichambered orders (ROT and TEX) among\nfive zones ’taxa. Within each geographic zone, station replicates\ndisplayed heterogeneous taxonomic compositions. Indeed, the\nmost sequenced cosmopolitan OTU of a given geographic area\nwas not necessarily the most sequenced OTU in the samples,\nexcept for Vanhoeffenella in Baf fin Bay. Moreover, the number\nof CosmoOTUs detected in each sample varied greatly, as evi-\ndenced by SFA33, where no CosmoOTU was found.\nDiscussion\nDNA Microbarcodes Offer High Taxonomic Resolution. By using a\nshort hypervariable region of the SSU rRNA gene as foramini-\nferal barcode, we could reduce the size of a fragment necessary\nfor species identi fication down to 36 nt, corresponding to the\nIllumina read length at the beginning of this study. Use of such\nmicrobarcodes was possible because foraminiferal rRNA genes\nevolve rapidly (17) and possess speci fic hypervariable regions\nthat make it possible to distinguish species in the majority oftaxonomic groups (16). Although the SSU rRNA genes evolve\nmore slowly in other eukaryotic groups, for each of them it\nshould be possible to find similar hypervariable regions in the\ninternal transcribed spacer and large subunit rDNA, and there-\nfore this approach can be applied more generally to assess\neukaryotic diversity in group-speci fic studies.\nDespite their short length, the taxonomic resolution of fora-\nminiferal microbarcodes was relatively good. Approximately\n50% of sequences could be assigned to the order-level taxa,\nand almost all of them could be reliably placed in the family or\nclade (second level in our analyses). Conversely, the number of\nsequences assigned to the species level (third level in our anal-\nyses) was relatively low. This result can be explained by the\nlimited size of our reference database, which currently contains\n1,048 unique microbarcode sequences. Indeed, the high pro-\nportion (80%) of sequences assigned to the species level in GLO\nis related to the fact that many modern globigerinid species were\nsequenced already (18). The remaining globigerinid sequences\nassigned only to the first or second level corresponded either to\ncryptic genetic types or tiny species that were not sequenced yet\nor to intragenomic polymorphisms that were not detected using\nthe clonal approach. Because the taxonomic resolution of the\nanalyzed hypervariable region is limited for some genera (16), our\nanalyses most likely underestimate the true foraminiferal richness.\nBy using the Illumina system to sequence foraminiferal micro-\nbarcodes, we have reached a sequencing depth rarely approached\nbefore. More than 1 million reads were obtained for each sample,\ndramatically increasing the number of distinct foraminiferal tag\nsequences detected in deep-sea sediments. However, the origin\nof these sequences may be multiple. Some may correspond to the\nrare taxa, escaping detection by morphological or molecular\ncloning approaches. Some others may be produced by the am-\nplification of extracellular DNA, abundant in the deep-sea\nsediments (19). Finally, some tags may result from the intra-\ngenomic diversity of rDNA copies reported in some foraminifera\n(20). To take into account such natural polymorphisms, we de-\nveloped a very stringent filtering system removing the less abun-\ndant tags that could represent rare copies of rRNA genes as\nwell as Illumina sequencing errors. This filter and its associated\nthresholds have been successfully ground truthed on our control\nsequencing (SFA6).\nCosmopolitan Taxa Are Widespread in the Deep Sea. Several deep-\nsea foraminiferal morphospecies are known to have wide geo-\nB\nCA\nFig. 3. Cosmopolitanism patterns in deep-sea foraminiferal OTUs. ( A) Relative abundance of the 10 most sequenced CosmoOTUs across the five geographic\nzones. Proportion of reads is indicated by circle size. Each circle in a cluster corresponds to a CosmoOTU, in decreasing order of reads abundance (fro m1t o\n10). ( B) Proportions of reads and OTUs exclusively recovered in two, three, four, or five different geographic zones. ( C) Proportions of order-level taxa found\nin two, three, four, or five different geographic zones. Total numbers of OTUs are indicated. Colors correspond to foraminiferal orders as in Fig. 1.\n4o f6 |www.pnas.org/cgi/doi/10.1073/pnas.1018426108 Lecroq et al. graphic ranges (21), and some of them have been shown to be\ngenetically identical across the global ocean (22, 23). Our study\nconfirms this observation, showing that some OTUs were present\nin all sampling areas ( Table S3 ). Many ubiquitous species could\nhave been overlooked because of the small volume of analyzed\nsamples. Nevertheless, the proportion of CosmoOTUs is prob-\nably not much higher, although this assumption should be tested\nby increased coverage of sampling area.\nThe CosmoOTUs did not always occur in all samples of a given\narea, which can be explained by patchiness of deep-sea species\nor the small sizes of our samples. High abundance of reads be-\nlonging to CosmoOTUs could be explained by the widespread\noccurrence of generalist species having large populations being\nmore likely to be sampled. Such species are flourishing wherever\nthe conditions are favorable but can be found virtually everywhere\nbecause of the global dispersal of their numerous specimens,\npropagules, or resting stages (24). However, the correlation be-\ntween the number of reads and the amount of DNA in the sam-\nple must be carefully addressed because of the variable number\nof ribosomal copies in eukaryotic cells (25). As far as we are\naware, this possibility still awaits experimental testing under\nNGS conditions.\nEarly Lineages Dominate Deep-Sea Foraminiferal Assemblage. Se-\nquences assigned to early foraminiferal lineages grouped here\nin a paraphyletic assemblage of monothalamids by far out-\nnumbered those assigned to multichambered rotaliids and tex-\ntulariids, con firming previous environmental DNA surveys of\nforaminiferal diversity (6, 26). Moreover, we predict that the\nproportion of monothalamids is even greater because many\nUNK probably also belong to this group. This prediction is based\non the fact that the rotaliids and textulariids have an easily\nrecognizable, speci fic signature at the beginning of the 37f region\n(16), and all sequences having this signature were properly\nassigned to these groups. Some UNK could belong to the LAG,\nan order of calcareous foraminiferans known to be present in the\ndeep sea, but the genetic diversity of which has not been probed\nyet. However, the variability of UNK was so large that it is highly\nimprobable that all of them represent a single group.\nIf our prediction is correct, >80% of OTUs found in deep-sea\nsediments represent the early foraminiferal lineages. However,\nthe identi fication of these lineages is not straightforward. Some of\nthem could be assigned to previously established monothalamous\nclades (27), which comprise soft-walled allogromiids and agglu-\ntinated astrorhizids. Some others were assigned to the environ-\nmental clades (ENFOR 1 –8; Fig. 2). These clades are composed\nalmost exclusively of the sequences obtained in the previous en-\nvironmental DNA surveys of foraminiferal diversity (4, 9). They\nalso include the so-called hermit or squatter sequences that be-\nlong to species living inside or outside the empty tests of other\nforaminifera (28). These undetermi ned sequences are particularly\nabundant in DNA extractions of k omokiaceans, xenophyophores,\nand some large astrorhizids, the tests of which create an ideal\nhabitat for a rich microbial community (29).\nWe can only speculate about the possible morphology and\necology of members of these environmental clades. Probably,\nthey are tiny amoeboid cells that remained undetected because\nof their small size, passing through the 63- μm sieve routinely\nused to study deep-sea foraminifera. They may be naked and\ndwelling in the interstitial water or living as parasites inside or\noutside other foraminiferans. Some of them could be similar to\nthe new cercozoan species isolated from environmental samples\n(30). Characterization of environmental clades is needed, but we\ncan already assume that our perception of foraminiferal diversity\nand understanding of their role in functioning of deep-sea eco-\nsystems will profoundly change as result of this study.Ultra-Deep Sequencing Offers a Powerful Tool for Exploring Deep-Sea\nRichness. There is an increasing body of information about the\nrichness of macro- and megafaunal species living in and on deep-\nsea sediments (31, 32), but much less is known about the richness\nof benthic meiofaunal-sized metazoans (particularly nematodes\nand harpacticoid copepods) and protists. Our study shows that\nNGS provides an extremely powerful tool for investigating deep-\nsea meiofauna. Among the NGS technologies, Illumina ultra-\ndeep sequencing seems particularly well adapted to group-\nspeci fic studies. Like foraminifera, other meiofaunal taxa most\nlikely also possess a short DNA region that can be used to dis-\ntinguish between closely related species. The barcodes do not\nneed to be as short as 36 bases, given that the length of Illumina\nreads is >100 bp in the latest version. It should be easy to adapt\nthe existing barcodes, such as the V9 region of the SSU rRNA\ngene used for assessment of eukaryotic diversity (12).\nThe NGS technologies have potential to overcome the in-\ntrinsic dif ficulties of deep-sea research. They offer the capacity\nto process a higher number of samples, balancing the critical\nproblem of chronic undersampling of deep-sea habitats. More-\nover, they generate the minimum amount of sequences necessary\nfor group-speci fic surveys, recovering a part of diversity consid-\nered to be unknowable (33). We believe that further optimiza-\ntion of experimental design and reference database enlargement\nwill lead to broader applications of the NGS for biomonitoring\nand exploring evolution of the deep-sea environment.\nMaterials and Methods\nSampling. Thirty-one surface sediment samples (SFA2 –5 and SFA7 –33), for\nwhich depth, coordinates, and volume are presented in Table S5 , were col-\nlected either with box corer or multicorer during RV Polarstern cruises ARK\nXXII-2 (Arctic Ocean, 2007) and ANTXXIV-2 (Southern Ocean, 2007 –2008); RV\nMerian cruise MS MERIAN 09/02 (Baf fin Bay, 2008); RV Tansei Maru KT07-14\n(Pacific Ocean, 2007); and RV Galathea 3 (Caribbean Sea, 2007). For each\ngeographic region, at least three samples were replicates coming from the\nsame station and from the same gear. One to 5 mL of sediment was taken\nfrom the upper layer (0- to 2-cm depth) of the multicore/boxcore samples\nand frozen at −20 °C immediately after collection.\nDNA Extraction, Ampli fication, and Massive Sequencing. Metagenomic DNA\nextracts were obtained by using either the MO Bio PowerSoil kit for small\ndeep-sea sediment samples (0.5 mL) or the MO Bio PowerMax Soil kit for\nbigger volumes (5 mL), both according to the protocol except for cell lysis,\nwhich was extended to 40 min. Extraction products were then stored at −20 °C.\nAdditionally, cultured R.filosa DNA was extracted and processed as other\nenvironmental samples (SFA6). In the first step, a fragment of SSU rDNA\n(∼400 bp) was ampli fied by PCR (15 cycles, 50 °C for annealing temperature)\nwith a set of foraminiferal-speci fic primers (30). In the second step, PCR\nproducts were reampli fied for additional 10 cycles (in a remote laboratory)\nand attached to the surface of the Solexa flow cell channels by adaptors.\nAfter solid-phase bridge ampli fication, the DNA colonies were sequenced\non a Genome Analyzer GAII instrument (Illumina) for 36 cycles by using\na Chrysalis 36 Cycles Version 2 kit.\nReads Filtering. Base calling was performed by using GAPipeline (Version 1.0;\nIllumina). Low-quality reads based both on single base score evaluations\n(reads with 1 base <10, fastq-solexa scoring scheme) and averaged scores\nthroughout the entire lengths (quality value of <20) were removed. Reads\nwith>30 identical bases as well as reads containing no undetermined base\n(N) were discarded. After strict dereplication, singletons were excluded. We\ndeveloped a two-step filtering method to remove sequencing errors and to\ntemper intragenomic polymorphisms. The first step was based on the\nassumptions that ( i) Illumina quality scores decrease by the end of the se-\nquencing-by-synthesis reaction and ( ii) intraspeci fic variation occurs near the\n3′end rather than the 5 ′end of the sequenced fragment (16). After the\ntrimming of the six 3 ′terminal bases and a strict dereplication on the 30\nremaining bases, sequence tags with an unchanged number of occurrences\nas well as tags with a number of occurrences of <0.01% of the total number\nof reads in the sample were removed. The second step consisted of a one-\npass clustering designed to reduce the noise due to ( i) random errors and ( ii)\ninteroperons variations. Less abundant sequences were clustered with\na more abundant sequence from which they derive, and according to the\nLecroq et al. PNAS Early Edition |5o f6\nENVIRONMENTAL\nSCIENCES analysis of both reference sequences and the deep sequencing of a single\nspecimen (SFA6), we allowed up to four differences. Edit distances were\ncalculated by using a global alignments procedure (Needleman –Wunsch\nalgorithm; 3 ′terminal gaps not counted as differences). Finally, clusters with\nnumber of occurrences of <0.01% of all reads kept after the first step were\ndiscarded. The most abundant sequence of a cluster was retained for\ndownstream analyses and assigned the total number of occurrences found in\nthe cluster.\nReads Identi fication. A curated database comprising 1,048 different 37f\nhypervariable regions of the foraminiferal SSU rDNA was built. Each sequence\nwas extracted from the 3 ′position off the sequencing primer and to a length\nbetween the natural minimum length of the region (19 nt) and up to 30 nt,\nwhich corresponded to the length of filtered reads. Each entry was anno-\ntated to three taxonomic levels. The first level corresponded to the order\ncategory of the morphology-based classi fication of Foraminifera, modi fied\nby combining Allogromiida and Astrorhizida into a group of monothalamids\naccording to ref. 34. The second level corresponded to a family or clade\ndefined by previous phylogenetic studies (9, 27). Finally, the third level\ncorresponded either to the genus level or to the species level for well-\ndescribed voucher specimens. The taxonomic resolution of the barcoding\nregion was assessed, and identical sequences corresponding to different\nisolates of the same taxa were kept to analyze taxonomy con flicts ( Fig. S2 )\nFor each filtered OTU, the best global alignments with a reference se-\nquence were searched by using a penalty of 1 for mismatch, gapopen, and\ngapextend when calculating the edit distance. We assigned sequences to the\nconsensus of the third taxonomic level using successively alignments found at\n0, 1, 2, and 3 differences over the 30 nt. Unassigned sequences were then\nassigned similarly to the second level but only with up to 2 differences over\nthefirst 20 nt. The remaining unassigned sequences were assigned to the\nfirst taxonomic level by progressively searching exact matches against the\nfirst 19 –12 nt of reference sequences. Sequences involved in con flicts at a\ngiven level were assigned to the above level or to the UNK category for\norder-level con flicts.OTU Delineation. Although OTUs were already delineated after the clustering\nembedded in the filtering procedure, these OTUs did not allow for a global\nestimation of the diversity across samples. Thus, a nonsupervised clustering of\nallfiltered OTUs as well as reference sequences was conducted. A distance\nmatrix was build by using pairwise distances calculated as described above,\nfollowed by average linkage at percentages of 80, 85, 90, 95, 96, 97, 98,\nand 99%. Analyses of clusters containing reference sequences involved in\nconflicts led us to choose the 96% average linkage. Within each cluster, the\nprevious assignation of each sequence was used to assign the cluster to\nthe deepest consensus taxonomic level when no reference sequence was\nincluded in a cluster. When a reference sequence was present, no con flict with\nprevious assignments was found.\nRarefaction analyses were computed with a Python script by using the\nrandom module to randomize the reads. Curves were drawn for all samples\nseparately or after combining by average linkage when pooling all samples.\nPhylogenetic Reconstruction. A set of partial (853 –1,317 nt) SSU rDNA\nsequences commonly used in foraminiferal phylogenies (9) was selected\nbased on microbarcode identi fication. In total, 82 sequences were analyzed,\nincluding 8 rotaliids, 7 textulariids, and up to 5 sequences for each of the 20\nwell-de fined clades of monothalamids. ClustalW alignment (35) was re fined\nmanually by using Seaview 4.0 (36). The ML tree was built by using RaxML\n(37), with the GTR+I+ Γsubstitution model and 100 bootstraps.\nACKNOWLEDGMENTS. We thank the Captain, of ficers, crew, and chief\nscientist of R/V Polarstern (ARK XXII/2 and ANTXXIV-2), R/V Merian (09/02),\nR/VTansei Maru (KT07-14), and HDMS Vædderen (Galathea 3-Winmargin);\nAngelika Brandt, Michal Kucera, and Marit-Solveig Seidenkrantz for provid-\ning samples; Alexandra Weber, José Fahrni and Jackie Guiard for technical\nassistance; and Davor Trumbi /C19c for help with the analyses. This work was\nsupported by Swiss National Science Foundation Grant 31003A-125372 and\nby a G. and L. Claraz donation. R. Christen acknowledges support of the\nAquaparadox project financed by the Agence National de Recherche pro-\ngramme “Biodiversité ”and the Pôle Mer PACA.\n1. Todo Y, Kitazato H, Hashimoto J, Gooday AJ (2005) Simple foraminifera flourish at\nthe ocean ’s deepest point. Science 307:689.\n2. Gooday AJ (2002) Organic-walled allogromiids: aspects of their occurrence, diversity\nand ecology in marine habitats. J Foraminiferal Res 32:384 –399.\n3. Pawlowski J, et al. (2003) The evolution of early Foraminifera. Proc Natl Acad Sci USA\n100:11494 –11498.\n4. Pawlowski J, Fahrni JF, Brykczynska U, Habura A, Bowser SS (2002) Molecular data\nreveal high taxonomic diversity of allogromiid Foraminifera in Explorers Cove (McMurdo\nSound, Antarctica). Polar Biol 25:96 –105.\n5. Gooday AJ, Holzmann M, Guiard J, Cornelius N, Pawlowski J (2004) A new mono-\nthalamous foraminiferan from 1000 to 6300 m water depth in the Weddel Sea:\nMorphological and molecular characterisation. Deep Sea Res Part II Top Stud Oce-\nanogr 51:1603 –1616.\n6. Habura A, Pawlowski J, Hanes SD, Bowser SS (2004) Unexpected foraminiferal di-\nversity revealed by small-subunit rDNA analysis of Antarctic sediment. J Eukaryot\nMicrobiol 51:173 –179.\n7. Holzmann M, Habura A, Giles H, Bowser SS, Pawlowski J (2003) Freshwater fora-\nminiferans revealed by analysis of environmental DNA samples. J Eukaryot Microbiol\n50:135 –139.\n8. Lejzerowicz F, Pawlowski J, Fraissinet-Tachet L, Marmeisse R (2010) Molecular evi-\ndence for widespread occurrence of Foraminifera in soils. Environ Microbiol 12:\n2518 –2526.\n9. Pawlowski J, Fontaine D, da Silva AA, Guiard J (2010) Novel lineages of Southern\nOcean deep-sea foraminifera revealed by environmental DNA sequencing. Deep-Sea\nResII: in press.\n10. Sogin ML, et al. (2006) Microbial diversity in the deep sea and the underexplored\n“rare biosphere ”.Proc Natl Acad Sci USA 103:12115 –12120.\n11. Stoeck T, et al. (2009) Massively parallel tag sequencing reveals the complexity of\nanaerobic marine protistan communities. BMC Biol 7:72 –91.\n12. Amaral-Zettler LA, McCliment EA, Ducklow HW, Huse SM (2009) A method for\nstudying protistan diversity using massively parallel sequencing of V9 hypervariable\nregions of small-subunit ribosomal RNA genes. PLoS ONE 4:e6372.\n13. Edgcomb V, et al. (March 10, 2011) Protistan microbial observatory in the Cariaco\nBasin, Caribbean. I. Pyrosequencing vs Sanger insights into species richness. ISME J ,\n10.1038/ismej.2011.6.\n14. Lazarevic V, et al. (2009) Metagenomic study of the oral microbiota by Illumina high-\nthroughput sequencing. J Microbiol Methods 79:266 –271.\n15. Caporaso JG, et al. (June 3, 2010) Global patterns of 16S rRNA diversity at a depth of\nmillions of sequences per sample. Proc Natl Acad Sci USA , 10.1073/pnas.1000080107.\n16. Pawlowski J, Lecroq B (2010) Short rDNA barcodes for species identi fication in fora-\nminifera. J Eukaryot Microbiol 57:197 –205.\n17. Pawlowski J, et al. (1997) Extreme differences in rates of molecular evolution of fo-\nraminifera revealed by comparison of ribosomal DNA sequences and the fossil record.\nMol Biol Evol 14:498 –505.18. Darling KF, Kucera M, Wade CM (2007) Global molecular phylogeography reveals\npersistent Arctic circumpolar isolation in a marine planktonic protist. Proc Natl Acad\nSci USA 104:5002 –5007.\n19. Dell ’Anno A, Danovaro R (2005) Extracellular DNA plays a key role in deep-sea eco-\nsystem functioning. Science 309:2179.\n20. Pawlowski J (2000) Introduction to the molecular systematics of foraminifera.\nMicropaleontology 46:1 –12.\n21. Douglas RG, Woodruff F (1981) The Sea, The Oceanic Lithosphere , ed Emiliani C\n(Wiley, NY), pp 1233 –1327.\n22. Pawlowski J, et al. (2007) Bipolar gene flow in deep-sea benthic foraminifera. Mol\nEcol 16:4089 –4096.\n23. Lecroq B, Gooday AJ, Pawlowski J (2009) Global genetic homogeneity in deep-sea fo-\nraminiferan Epistominella exigua (Rotaliida: Pseudoparrellidae). Zootaxa 2096:23 –32.\n24. Alve E, Goldstein ST (2010) Dispersal, survival and delayed growth of benthic fora-\nminiferal propagules. J Sea Res 63:36 –51.\n25. Medinger R, et al. (2010) Diversity in a hidden world: Potential and limitation of next-\ngeneration sequencing for surveys of molecular diversity of eukaryotic micro-\norganisms. Mol Ecol 19(Suppl 1):32 –40.\n26. Habura A, Goldstein ST, Broderick S, Bowser SS (2008) A bush, not a tree: The ex-\ntraordinary diversity of cold-water basal foraminiferans extends to warm-water en-\nvironments. Limnol Oceanogr 53:1339 –1351.\n27. Pawlowski J, et al. (2002) Phylogeny of allogromiid Foraminifera inferred from SSU\nrRNA gene sequences. J Foraminiferal Res 32:334 –343.\n28. Grimm GW, et al. (2007) Diversity of rDNA in Chilostomella: Molecular differentiation\npatterns and putative hermit types. Mar Micropaleontol 62:75 –90.\n29. Lecroq B, Gooday AJ, Cedhagen T, Sabbatini A, Pawlowski J (2010) Molecular analysis\nreveal high levels of eukaryotic richness associated with enigmatic deep-sea protists\n(Komokiacea). Marine Biodiversity 39:45 –55.\n30. Bass D, et al. (2009) Phylogeny of novel naked Filose and Reticulose Cercozoa:\nGrano filosea cl. n. and Proteomyxidea revised. Protist 160:75 –109.\n31. Brandt A, et al. (2007) First insights into the biodiversity and biogeography of the\nSouthern Ocean deep sea. Nature 447:307 –311.\n32. Ebbe B, et al. (2010) Life in the World ’s Oceans: Diversity, Distribution and Abun-\ndance , ed McIntyre AD (Blackwell Publishing, Oxford), pp 139 –160.\n33. Danovaro R, et al. (2010) Deep-sea biodiversity in the Mediterranean Sea: The known,\nthe unknown, and the unknowable. PLoS ONE 5:e11832.\n34. Pawlowski J (2009) Encyclopedia of Microbiology , ed Schaechter M (Elsevier, Oxford),\npp 646 –662.\n35. Larkin MA, et al. (2007) Clustal W and Clustal X version 2.0. Bioinformatics 23:2947 –2948.\n36. Gouy M, Guindon S, Gascuel O (2010) SeaView version 4: A multiplatform graphical\nuser interface for sequence alignment and phylogenetic tree building. Mol Biol Evol\n27:221 –224.\n37. Stamatakis A (2006) RAxML-VI-HPC: maximum likelihood-based phylogenetic analy-\nses with thousands of taxa and mixed models. Bioinformatics 22:2688 –2690.\n6o f6 |www.pnas.org/cgi/doi/10.1073/pnas.1018426108 Lecroq et al." -} -{ - "pm_id": "https://pubmed.ncbi.nlm.nih.gov/10797032", - "pdf_text": "Papers\nEstablishing a standard definition for child overweight\nand obesity worldwide: international survey\nTim J Cole, Mary C Bellizzi, Katherine M Flegal, William H Dietz\nAbstract\nObjective To develop an internationally acceptable\ndefinition of child overweight and obesity, specifying\nthe measurement, the reference population, and theage and sex specific cut off points.Design International survey of six large nationally\nrepresentative cross sectional growth studies.Setting Brazil, Great Britain, Hong Kong, the\nNetherlands, Singapore, and the United States.Subjects 97 876 males and 94 851 females from birth\nto 25 years of age.Main outcome measure Body mass index\n(weight/height\n2).\nResults For each of the surveys, centile curves were\ndrawn that at age 18 years passed through the widelyused cut off points of 25 and 30 kg/m\n2for adult\noverweight and obesity. The resulting curves wereaveraged to provide age and sex specific cut off pointsfrom 2-18 years.Conclusions The proposed cut off points, which are\nless arbitrary and more internationally based thancurrent alternatives, should help to provideinternationally comparable prevalence rates ofoverweight and obesity in children.\nIntroduction\nThe prevalence of child obesity is increasing rapidlyworldwide.\n1It is associated with several risk factors for\nlater heart disease and other chronic diseasesincluding hyperlipidaemia, hyperinsulinaemia, hyper-tension, and early atherosclerosis.\n2–4These risk factors\nmay operate through the association between childand adult obesity, but they may also act independently.\n5\nBecause of their public health importance, the\ntrends in child obesity should be closely monitored.Trends are, however, difficult to quantify or to compareinternationally, as a wide variety of definitions of childobesity are in use, and no commonly acceptedstandard has yet emerged. The ideal definition, basedon percentage body fat, is impracticable for epidemio-logical use. Although less sensitive than skinfold thick-nesses,\n6the body mass index (weight/height2) is widely\nused in adult populations, and a cut off point of30 kg/m\n2is recognised internationally as a definition\nof adult obesity.7\nBody mass index in childhood changes substan-\ntially with age.89At birth the median is as low as13 kg/m2, increases to 17 kg/m2at age 1, decreases to\n15.5 kg/m2at age 6, then increases to 21 kg/m2at\nage 20. Clearly a cut off point related to age is neededto define child obesity, based on the same principle atdifferent ages, for example, using reference centiles.\n10\nIn the United States, the 85th and 95th centiles of bodymass index for age and sex based on nationallyrepresentative survey data have been recommended ascut off points to identify overweight and obesity.\n11\nFor wider international use this definition raises\ntwo questions: why base it on data from the UnitedStates, and why use the 85th or 95th centile? Othercountries are unlikely to base a cut off point solely onAmerican data, and the 85th or 95th centile is intrinsi-cally no more valid than the 90th, 91st, 97th, or 98thcentile. Regardless of centile or reference population,the cut off point can still be criticised as arbitrary.\nA reference population could be obtained by pool-\ning data from several sources, if sufficiently homogene-ous. A centile cut off point could in theory be identifiedas the point on the distribution of body mass indexwhere the health risk of obesity starts to rise steeply.Unfortunately such a point cannot be identified withany precision: children have less disease related toobesity than adults, and the association between childobesity and adult health risk may be mediated throughadult obesity, which is associated both with child obes-ity and adult disease.\nThe adult cut off points in widest use\n—a body mass\nindex of 25 kg/m2for overweight and 30 kg/m2for\nobesity —are related to health risk1but are also conven-\nient round numbers. A workshop organised by theInternational Obesity Task Force proposed that theseadult cut off points be linked to body mass index cen-tiles for children to provide child cut off points.\n12 13We\ndescribe the development of age and sex specific cutoff points for body mass index for overweight andobesity in children, using dataset specific centiles linkedto adult cut off points.\nSubjects and methods\nSubjectsWe obtained data on body mass index for childrenfrom six large nationally representative cross sectionalsurveys on growth from Brazil, Great Britain, HongKong, the Netherlands, Singapore, and the UnitedStates (table 1). Each survey had over 10 000 subjects,with ages ranging from 6-18 years, and quality controlDepartment of\nEpidemiology andPublic Health,Institute of ChildHealth, LondonWC1N 1EH\nTim J Cole\nprofessor of medicalstatistics\nInternational\nObesity Task ForceSecretariat, LondonNW1 2NS\nMary C Bellizzi\nhealth policy officer\nNational Center for\nHealth Statistics,Centers for DiseaseControl andPrevention,Hyattsville MD20782, USA\nKatherine M Flegal\nsenior researchscientist\nDivision of\nNutrition andPhysical Activity,Centers for DiseaseControl andPrevention, AtlantaGA 30341-3724,USA\nWilliam H Dietz\ndirector\nCorrespondence to:\nT J Coletim.cole@ich.ucl.ac.uk\n1 BMJ VOLUME 320 6 MAY 2000 bmj.comBMJ 2000;320:1240 on 26 May 2024 by guest. Protected by copyright. http://www.bmj.com/ BMJ: first published as 10.1136/bmj.320.7244.1240 on 6 May 2000. Downloaded from measures to minimise measurement error. Four of the\ndatasets were based on single samples whereas theBritish and American data consisted of pooled samplescollected over a period of time. We omitted the mostrecent survey data from the United States (1988-94)because we preferred to use data predating the recentincrease in prevalence of obesity.\n19In practice this deci-\nsion made virtually no difference to the final cut offpoints.Centile curves\nCentile curves for body mass index were constructedfor each dataset by sex using the LMS method,\n15which\nsummarises the data in terms of three smooth age spe-cific curves called L (lambda), M (mu), and S (sigma).The M and S curves correspond to the median andcoefficient of variation of body mass index at each agewhereas the L curve allows for the substantial agedependent skewness in the distribution of body massindex. The values for L, M, and S can be tabulated for aseries of ages. The Brazilian and US surveys (table 1)used a weighted sampling design, and their data wereanalysed accordingly.\nThe assumption underlying the LMS method is\nthat after Box-Cox power transformation the data ateach age are normally distributed. The points on eachcentile curve are defined in terms of the formula\n15:\nwhere L, M, and S are values of the fitted curves at eachage, and z indicates the z score for the required centile,for example, z = 1.33 for the 91st centile. Figure 1shows centiles for body mass index by sex based on theBritish reference,\n9with seven centiles spaced two thirds\nof a z score apart —that is, z = −2,−1.33,−0.67, 0,\n+ 0.67, + 1.33, and + 2.\nFigure 1 also shows body mass index values of 25\nand 30 kg/m2at age 18; 25 kg/m2is just below the 91st\ncentile in both sexes, whereas 30 kg/m2is above the\n98th centile. The body mass index (BMI) values can beconverted to exact z scores from the L, M, and S valuesat age 18, with the formula\n2:\nThe body mass index of 25 kg/m2at age 18 is z\nscore + 1.19 in females, corresponding to the 88th\ncentile, and + 1.30 in males, on the 90th centile. There-fore the prevalence of overweight at age 18 is 10-12%.A body mass index of 30 kg/m\n2at age 18 is on the 99th\ncentile in both sexes, an obesity prevalence of about1%.\nEach z score substituted into equation 1 provides\nthe formula for an extra centile curve passing throughthe specified point (dotted line in fig 1). Each centilecurve defines cut off points through childhood thatcorrespond in prevalence of overweight or obesity tothat of the adult cut off point\n—the curve joins up points\nwhere the prevalence matches that seen at age 18.\nThis process is repeated for all six datasets, by sex.\nSuperimposing their curves leads to a cluster of centilecurves that all pass through the adult cut off point yetTable 1 Six nationally representative datasets of body mass indices in childhood\nCountry Year DescriptionMales Females\nReference Age range Sample sizeAge\nrangeSample\nsize\nBrazil 1989 Second national anthropometric survey 2-25 15 947 2-25 15 859 14\nGreat Britain 1978-93 Data pooled from five national growth surveys 0-23 16 491 0-23 15 731 15Hong Kong 1993 National growth survey 0-18 11 797 0-18 12 168 16Netherlands 1980 Third nationwide growth survey 0-20 21 521 0-20 20 245 17Singapore 1993 School health service survey 6-19 17 356 6-20 16 616 18United States 1963-80 Data pooled from four national surveys 2-20 14 764 2-20 14 232 19\nAge (years)Body mass index (kg/m2)\n01 0 86422 0 18 16 14 1210141618202224262830\n98 30\n2591\n75\n50\n25\n9\n298\n91\n75\n50\n25\n9\n232\n12\nAge (years)01 0 86422 0 18 16 14 12Males Females\n30\n25\nFig 1 Centiles for body mass index for British males and females. Centile curves are spaced\ntwo thirds of z score apart. Also shown are body mass index values of 25 and 30 kg/m2at\nage 18, with extra centile curves drawn through them\nAge (years)Body mass index (kg/m2)\n01 0 86422 0 18 16 14 12141617181920212223\n15Males\nAge (years)01 0 86422 0 18 16 14 12Females\nBrazil\nGreat BritainHong KongNetherlandsSingaporeUnited States\nFig 2 Median body mass index by age and sex in six nationally representative datasetsPapers\n2 BMJ VOLUME 320 6 MAY 2000 bmj.com on 26 May 2024 by guest. Protected by copyright. http://www.bmj.com/ BMJ: first published as 10.1136/bmj.320.7244.1240 on 6 May 2000. Downloaded from represent a wide range of overweight and obesity. The\nhypothesis is that the relation between cut off pointand prevalence at different ages gives the same curveshape irrespective of country or obesity. If sufficientlysimilar the curves can be averaged to provide a singlesmooth curve passing through the adult cut off point.The curve is representative of all the datasets involvedbut is unrelated to their obesity\n—the cut off point is\neffectively independent of the spectrum of obesity inthe reference data.\nResults\nFigure 2 shows the median curves for body mass indexin the six datasets by sex from birth to 20 years. A widerange of values spans several units of body mass indexin both sexes. These show the different extents of over-weight across datasets, reflecting national differences infatness. The median curves are all about the sameshape, although the curve for Singaporean males ismore curved, being lowest at ages 6 and 19 and highestat age 11.\nAveraging the median curves would be a simple\nway to summarise the age trend in body mass indexthrough childhood. But the resulting position of thecurve at each age would depend on the overweightprevalence of the countries in the reference set, and sowould be comparatively arbitrary. In any case themedian is not an extreme centile and is ineffective as acut off point. So averaging the median curves is not theanswer.\nInstead the centile curves are linked to adult cut off\npoints of 25 and 30 kg/m\n2, positioned at age 18 to\nmaximise the available data. These values areexpressed as centiles for each dataset, and thecorresponding centile curves are drawn. Figure 1shows the centile curves for overweight and obesity forthe British reference.\nFigure 3 presents the centile curves for overweight\nfor the six datasets by sex, passing through the adult cutoff point of 25 kg/m\n2at age 18. They are much closer\ntogether than the median curves (fig 2), particularlyabove age 10, because the national differences in over-weight prevalence have been largely adjusted out. Thedivergence of the Singaporean curve is morepronounced than in figure 2.\nFigure 4 gives the corresponding centile curves for\nobesity in each dataset, all passing through a body massindex of 30 kg/m\n2at age 18. There is less agreement\nthan for the centiles for overweight, and againSingapore stands out.\nTable 2 gives the centiles for overweight corre-\nsponding to a body mass index of 25 kg/m\n2at age 18\nfor each dataset by sex. For example, they approximatethe 95th centile for Dutch males and the 90th centilefor British males\n—that is, prevalences of overweight of\n5-10%. The centiles for obesity corresponding to abody mass index of 30 kg/m\n2in table 3 are mainly\nabove the 97th centile, less than 3% prevalence, andthey show more variability.\nThe curves in figures 3 and 4 are reasonably\nconsistent across countries between ages 8 and 18,although those for Singapore are higher between ages10 and 15. This is due partly to the increased median(fig 2) and partly to greater variability. The LMSmethod estimates the coefficient of variation (or SAge (years)Body mass index (kg/m2)\n01 0 86422 0 18 16 14 1214182022242628\nBrazil\nGreat BritainHong KongNetherlandsSingaporeUnited States\n16Males\nAge (years)01 0 86422 0 18 16 14 12Females\nFig 3 Centiles for overweight by sex for each dataset, passing through body mass index of\n25 kg/m2at age 18\nAge (years)Body mass index (kg/m2)\n01 0 86422 0 18 16 14 12162022242628303234\nBrazil\nGreat BritainHong KongNetherlandsSingaporeUnited States\n18Males\nAge (years)01 0 86422 0 18 16 14 12Females\nFig 4 Centiles for obesity by sex for each dataset, passing through body mass index of\n30 kg/m2at age 18\nTable 2 Centiles and z scores for overweight corresponding to body mass index of\n25 kg/m2at age 18 years in six datasets, derived from fitted LMS curves\nCountryMales Females\nCentile z score% above cut\noff point Centile z score% above cut\noff point\nBrazil 95.3 1.68 4.7 84.8 1.03 15.2\nGreat Britain 90.4 1.30 9.6 88.3 1.19 11.7Hong Kong 88.3 1.19 11.7 90.2 1.29 9.8Netherlands 94.5 1.60 5.5 93.5 1.52 6.5Singapore 89.5 1.25 10.5 93.0 1.48 7.0United States 81.9 0.91 18.1 83.5 0.97 16.5\nTable 3 Centiles and z scores for obesity corresponding to body mass index of\n30 kg/m2at age 18 years in six datasets, derived from fitted LMS curves\nCountryMales Females\nCentile z score% above cut\noff point Centile z score% above cut\noff point\nBrazil 99.9 3.1 0.1 98.0 2.1 2.0Great Britain 99.1 2.37 0.9 98.8 2.25 1.2Hong Kong 96.9 1.86 3.1 98.2 2.10 1.8Netherlands 99.7 2.71 0.3 99.7 2.73 0.3Singapore 98.3 2.12 1.7 99.0 2.33 1.0United States 96.7 1.84 3.3 96.0 1.76 4.0Papers\n3 BMJ VOLUME 320 6 MAY 2000 bmj.com on 26 May 2024 by guest. Protected by copyright. http://www.bmj.com/ BMJ: first published as 10.1136/bmj.320.7244.1240 on 6 May 2000. Downloaded from curve) of body mass index during the centile fitting\nprocess, and figure 5 compares the S curves for the sixdatasets. Between ages 6 and 15 the coefficient of vari-ation in Singapore is greater than for the othercountries. The range of values for the coefficient ofvariation in puberty is greater for males than females,and for Brazil, Singapore, and the United States the\ncurves for both sexes show a peak in puberty.\nThe amount of skewness, as measured by the sam-\nple L curves, is similar across countries. The Box-Coxpowers are consistently between –1 and –2 indicatingextreme skewness (not shown).\nTable 4 shows international cut off points for body\nmass index for overweight and obesity from 2-18 years,obtained by averaging the centile curves in figures 3and 4. From 2-6 years the cut off points do not includeSingapore because its data start at age 6 years. Figure 6shows the cut off points, with the values at 5.5 and 6years adjusted slightly to ensure a smooth join betweenthe two sets of curves.\nDiscussion\nOur method addresses the two main problems ofdefining internationally acceptable cut off points forbody mass index for overweight and obesity inchildren. The reference population was obtained byaveraging across a heterogeneous mix of surveys fromdifferent countries, with widely differing prevalencerates for obesity, whereas the appropriate cut off pointwas defined in body mass index units in youngadulthood and extrapolated to childhood, conservingthe corresponding centile in each dataset. This princi-ple, proposed at a meeting in 1997,\n13was discussed in a\nrecent editorial.12\nAlthough less arbitrary and potentially more inter-\nnationally acceptable than other cut off points, thisapproach still provides a statistical definition, with allthe advantages and disadvantages that that implies.\n20\nOur terminology corresponds to adult cut off points,but the health consequences for children above the cutoff points may differ from those for adults. Childrenwho are overweight but not obese should be evaluatedfor other factors as well.\n11Nonetheless, the cut off\npoints based on a heterogeneous worldwide popula-tion can be applied widely to determine whether thechildren and adolescents they identify are at increasedrisk of morbidity related to obesity.\nAgreement of the centile curves\nThe major uncertainty with our approach, and the testof its validity, is the extent to which the centile curvesfor the datasets are of the same shape. Figures 3 and 4show that although the agreement is reasonable it isnot perfect. If it were perfect\n—that is, all the curves were\nsuperimposed —the reference cut off points applied to\na given dataset would give the same prevalence forobesity at all ages, which could be predicted from theprevalence at age 18. So the different shapes in figures3 and 4 show to what extent the age specific prevalencedeviates from the age 18 prevalence within datasets.\nWe did consider six other datasets for our analysis\n(Canada, France, Japan, Russia, Sweden, and Ven-ezuela) but we excluded them because they were eithertoo small or nationally unrepresentative. Their centilecurves for overweight in figure 7 are similar to those infigures 3 and 4. (Data for Japan and girls in Sweden andVenezuela are omitted as they do not extend to age 18).Singapore and Canada are clear outliers duringpuberty, whereas Russia stands out earlier inchildhood. The median curves for Japan and HongAge (years)Coefficient of variation\n01 0 86422 0 18 16 14 120.060.100.120.140.160.18\nBrazil\nGreat BritainHong KongNetherlandsSingaporeUnited States0.08Males\nAge (years)01 0 86422 0 18 16 14 12Females\nFig 5 Plots of coefficient of variation of body mass index by age and sex for each dataset\nTable 4 International cut off points for body mass index for overweight and obesity by\nsex between 2 and 18 years, defined to pass through body mass index of 25 and\n30 kg/m2at age 18, obtained by averaging data from Brazil, Great Britain, Hong Kong,\nNetherlands, Singapore, and United States\nAge (years)Body mass index 25 kg/m2Body mass index 30 kg/m2\nMales Females Males Females\n2 18.41 18.02 20.09 19.81\n2.5 18.13 17.76 19.80 19.55\n3 17.89 17.56 19.57 19.36\n3.5 17.69 17.40 19.39 19.23\n4 17.55 17.28 19.29 19.15\n4.5 17.47 17.19 19.26 19.12\n5 17.42 17.15 19.30 19.17\n5.5 17.45 17.20 19.47 19.34\n6 17.55 17.34 19.78 19.65\n6.5 17.71 17.53 20.23 20.08\n7 17.92 17.75 20.63 20.51\n7.5 18.16 18.03 21.09 21.01\n8 18.44 18.35 21.60 21.57\n8.5 18.76 18.69 22.17 22.18\n9 19.10 19.07 22.77 22.81\n9.5 19.46 19.45 23.39 23.46\n10 19.84 19.86 24.00 24.11\n10.5 20.20 20.29 24.57 24.77\n11 20.55 20.74 25.10 25.42\n11.5 20.89 21.20 25.58 26.05\n12 21.22 21.68 26.02 26.67\n12.5 21.56 22.14 26.43 27.24\n13 21.91 22.58 26.84 27.76\n13.5 22.27 22.98 27.25 28.20\n14 22.62 23.34 27.63 28.57\n14.5 22.96 23.66 27.98 28.87\n15 23.29 23.94 28.30 29.11\n15.5 23.60 24.17 28.60 29.29\n16 23.90 24.37 28.88 29.43\n16.5 24.19 24.54 29.14 29.56\n17 24.46 24.70 29.41 29.69\n17.5 24.73 24.85 29.70 29.84\n18 25 25 30 30Papers\n4 BMJ VOLUME 320 6 MAY 2000 bmj.com on 26 May 2024 by guest. Protected by copyright. http://www.bmj.com/ BMJ: first published as 10.1136/bmj.320.7244.1240 on 6 May 2000. Downloaded from Kong are similar in shape (not shown), suggesting that\nSingapore is atypical of Asia.\nNothing obvious explains Singapore’s unusual pat-\ntern of overweight in puberty. Omitting it from theaveraged country curves would lower the cut off pointsfor both sexes by up to 0.4 body mass index units or0.14 z score units at age 11-12. This compares to arange of three units between the lowest and highestcurves at this age. Therefore, even though Singaporelooks different from the other countries, its impact onthe cut off points is only modest. Because there is no apriori reason to exclude Singapore, and because solittle is known about growth patterns across countries,we have chosen to retain it in the reference population.\nExtending the dataset\nWe recognise that the reference population made upof these countries is less than ideal. It probably reflectsWestern populations adequately but lacks represen-tation from other parts of the world. The Hong Kongsample may, however, be fairly representative of theChinese, and the Brazilian and US datasets includemany subjects of African descent. Although additionaldatasets from Africa and Asia would be helpful, ourstringent inclusion criteria of a large sample, nationalrepresentativeness, minimum age range 6-18 years,and data quality control, mean that further datasets areunlikely to emerge from these continents in theforeseeable future. To our knowledge no otheravailable surveys satisfy the criteria. It is not realistic towait for them because there is an urgent need for inter-national cut off points now. Also, our methodologyaims to adjust for differences in overweight betweencountries, so it could be argued that adding othercountries to the reference set would make littledifference to the cut off points. None the less, furtherresearch is needed to explore patterns of body massindex in children in Africa and Asia.\nPuberty\nThe body mass index curves in figure 6 show a fairlylinear pattern for males but a higher and more concaveshape for females. This sex difference can also be seenin the individual curves of figures 2 to 4 reflecting ear-lier puberty in females. The sensitivity of the curve’sshape to the timing of puberty may affect the perform-ance of the cut off points in countries where puberty isappreciably delayed,\n21although delays of less than two\nyears are unlikely to make much difference.\nUse of cut off points\nThe cut off points in table 4 are tabulated at exact halfyear ages and for clinical use need to be linearly inter-polated to the subject’s age. For epidemiological use,with age groups of one year width, the cut off point atthe mid year value (for example, at age 7.5 for the 7.0-8.0 age group) will give an essentially unbiasedestimate of the prevalence.\nThe centiles for obesity involve more extrapolation\nthan the centiles for overweight, which may explain thegreater variability across datasets in figure 4 comparedwith figure 3. For this reason the obesity cut off pointsin figure 6 are fairly imprecise and are likely to be lessuseful than the cut off points for overweight.\nThe approximate prevalence values for overweight\nand obesity in tables 2 and 3 are calculated as the tailareas of the body mass index distribution in each sam-ple at age 18, as estimated by the LMS method. Thisassumes that the distribution is normal after adjustingfor skewness, which is inevitably only an approxima-tion. In the British data there was slight kurtosis (heavytails) in the distribution of body mass index,\n15with 2.8%\nof the sample rather than the 2.3% expected exceedinga z score of 2. Therefore the true prevalences for theother datasets here may differ slightly from the valuesquoted.\nThe principle used to obtain cut off points for over-\nweight and obesity in children could also provide a cutoff point for underweight in children, based on theWorld Health Organisation’s cut off point of a bodymass index of 18.5 kg/m\n2for adult underweight. A\nbody mass index of 18.5 kg/m2in a young adult is,\nhowever, equivalent to the British 12th centile,9an\nunacceptably high prevalence of child underweight. Apossible alternative would be a cut off point of a bodymass index of 17 kg/m\n2, on the British second centile\nat age 18.9Although substantial data link cut off points\nof 25 and 30 kg/m2to morbidity in adults22and the\ncorresponding centile cut off points are associated withAge (years)01 0 86422 0 18 16 14 121620222426283032\n30\n2530\n25\n18Males Females\nAge (years)01 0 86422 0 18 16 14 12Body mass index (kg/m2)\nFig 6 International cut off points for body mass index by sex for overweight and obesity,\npassing through body mass index 25 and 30 kg/m2at age 18 (data from Brazil, Britain,\nHong Kong, Netherlands, Singapore, and United States)\nAge (years)Body mass index (kg/m2)\n01 0 86422 0 18 16 14 12141820222426283034\n32\n16Males\nAge (years)01 0 86422 0 18 16 14 12Females\nBrazil\nGreat BritainNetherlandsHong KongSingaporeUnited StatesCanadaFranceRussiaSwedenVenezuela\nFig 7 Centiles for overweight by sex for 11 datasetsPapers\n5 BMJ VOLUME 320 6 MAY 2000 bmj.com on 26 May 2024 by guest. Protected by copyright. http://www.bmj.com/ BMJ: first published as 10.1136/bmj.320.7244.1240 on 6 May 2000. Downloaded from morbidity in children,23the health effects of cut off\npoints corresponding to a body mass index below 17\nor 18.5 kg/m2have not been studied. These cut off\npoints for underweight need validating as markers ofdisease risk.\nBased on cross sectional data the curves give no\ninformation about centile crossing over time\n—a\nweakness of most “growth” charts. Longitudinal dataare needed to derive correlations of body mass indexfrom one age to another, which then define the likelyvariability of centile crossing.\n24 25\nConclusionsOur analysis provides cut off points for body massindex in childhood that are based on international dataand linked to the widely accepted adult cut off points ofa body mass index of 25 and 30 kg/m\n2. Our approach\navoids some of the usual arbitrariness of choosing thereference data and cut off point. Applying the cut offpoints to the national datasets on which they are basedgives a wide range of prevalence estimates at age 18 of5-18% for overweight and 0.1-4% for obesity. A similarrange of estimates is likely to be seen from age 2-18.The cut off points are recommended for use ininternational comparisons of prevalence of overweightand obesity.\nWe thank Carlos Monteiro (Brazil), Sophie Leung (Hong Kong),\nMachteld Roede (the Netherlands), Uma Rajan (Singapore),Claude Bouchard (Canada), Marie Françoise Rolland Cachera(France), Yuji Matsuzawa (Japan), Barry Popkin (USA, for theRussian data), Gunilla Tanner-Lindgren (Sweden), and Mer-cedes Lopez de Blanco (Venezuela) for allowing us access totheir data.\nContributors: TJC had the original idea, did most of the sta-\ntistical analyses, and wrote the first draft of the paper. TJC, MCB,KMF, and WHD provided the data. KMF did further analyses ofthe US data. All authors attended the original childhood obesity\nworkshop, participated in the design and planning of the study,discussed the interpretation of the results, and contributed tothe final paper. TJC will act as guarantor for the paper.\nFunding: This work was supported by the Childhood Obes-\nity Working Group of the International Obesity Task Force. TJCis supported by a Medical Research Council programme grant.\nCompeting interests: None declared.\n1 World Health Organisation. Obesity: preventing and managing the global epi-\ndemic . Report of a WHO consultation, Geneva, 3-5 Jun 1997. Geneva:\nWHO, 1998. (WHO/NUT/98.1.)\n2 Berenson GS, Srinivasan SR, Wattigney WA, Harsha DW . Obesity and\ncardiovascular risk in children. Ann NY Acad Sci 1993;699:93-103.\n3 Berenson GS, Srinivasan SR, Bao W, Newman WP, Tracy RE, Wattigney\nWA. Association between multiple cardiovascular risk factors and athero-\nsclerosis in children and young adults. The Bogalusa heart study. New\nEngl J Med 1998;338:1650-6.\n4 Mahoney LT, Burns TL, Stanford W . Coronary risk factors measured in\nchildhood and young adult life are associated with coronary artery calci-fication in young adults: the Muscatine study. J Am Coll Cardiol\n1996;27:277-84.\n5 Must A, Jacques PF, Dallal GE, Bajema CJ, Dietz WH. Long-term morbid-\nity and mortality of overweight adolescents. A follow-up of the Harvardgrowth study of 1922 to 1935. New Engl J Med 1992;327:1350-5.\n6 Malina RM, Katzmarzyk PT. Validity of the body mass index as an indica-\ntor of the risk and presence of overweight in adolescents. Am J Clin Nutr\n1999;70:131-6S.\n7 World Health Organisation . Physical status: the use and interpretation of\nanthropometry . Geneva: WHO, 1995.\n8 Rolland-Cachera MF, Sempé M, Guilloud-Bataille M, Patois E,\nPequignot-Guggenbuhl F, Fautrad V . Adiposity indices in children. Am J\nClin Nutr 1982;36:178-84.\n9 Cole TJ, Freeman JV, Preece MA. Body mass index reference curves for\nthe UK, 1990. Arch Dis Child 1995;73:25-9.\n10 Power C, Lake JK, Cole TJ. Measurement and long-term health risks of\nchild and adolescent fatness. Int J Obesity 1997;21:507-26.\n11 Barlow SE, Dietz WH. Obesity evaluation and treatment: expert\ncommittee recommendations. The Maternal and Child Health Bureau,\nHealth Resources and Services Administration, and the Department ofHealth and Human Services. Pediatrics 1998;102:E29.\n12 Dietz WH, Robinson TN. Use of the body mass index (BMI) as a measure\nof overweight in children and adolescents. J Pediatr 1998;132:191-3.\n13 Bellizzi MC, Dietz WH. Workshop on childhood obesity: summary of the\ndiscussion. Am J Clin Nutr 1999;70:173-5S.\n14 Monteiro CA, Benicio MHDA, Iunes RF, Gouveia NC, Taddei JAAC, Car-\ndoso MAP. Nutritional status of Brazilian children: trends from 1975 to1989. Bull WHO 1992;70:657-66.\n15 Cole TJ, Freeman JV, Preece MA. British 1990 growth reference centiles\nfor weight, height, body mass index and head circumference fitted bymaximum penalized likelihood. Stat Med 1998;17:407-29.\n16 Leung SSF, Cole TJ, Tse LY, Lau JTF. Body mass index reference curves\nfor Chinese children. Ann Hum Biol 1998;25:169-74.\n17 Cole TJ, Roede MJ. Centiles of body mass index for Dutch children aged\n0-20 years in 1980\n—a baseline to assess recent trends in obesity. Ann Hum\nBiol1999;26:303-8.\n18 Rajan U. Obesity among Singapore students. Int J Obesity 1994;18(suppl\n2):27.\n19 Troiano RP, Flegal KM. Overweight children and adolescents:\ndescription, epidemiology, and demographics. Pediatrics 1998;101:497-\n504.\n20 Flegal KM. Defining obesity in children and adolescents: epidemiologic\napproaches. Crit Rev Food Sci Nutr 1993;33:307-12.\n21 D’Amato M, Ferro-Luzzi A, Gundry S, Wright J, Worrall J, Mucavele P. A\nnew approach to assessing adolescent malnutrition in low income countries. Acase study in Zimbabwe. Abstract presented at the International Biometric\nSociety, Italian region. Rome, 7-9 Jul 1999.\n22 National Institutes of Health, National Heart, Lung, and Blood Institute.\nClinical guidelines on the identification, evaluation, and treatment ofoverweight and obesity in adults; the evidence report. Obes Res\n1998;6(suppl 2):51-209S.\n23 Freedman DS, Dietz WH, Srinivasan SR, Berenson GS. The relation of\noverweight to cardiovascular risk factors among children andadolescents: the Bogalusa heart study. Pediatrics 1999;103:1175-82.\n24 Cole TJ. Growth charts for both cross-sectional and longitudinal data.\nStat Med 1994;13:2477-92.\n25 Cole TJ. Presenting information on growth distance and conditional\nvelocity in one chart: practical issues of chart design. Stat Med\n1998;17:2697-707.\n(Accepted 21 January 2000)What is already known on this topic\nChild obesity is a serious public health problem\nthat is surprisingly difficult to define\nThe 95th centile of the US body mass indexreference has recently been proposed as a cut offpoint for child obesity, but like previous definitionsit is far from universally accepted\nWhat this study adds\nA new definition of overweight and obesity inchildhood, based on pooled international data forbody mass index and linked to the widely usedadult obesity cut off point of 30 kg/m\n2, has been\nproposed\nThe definition is less arbitrary and moreinternational than others, and should encouragedirect comparison of trends in child obesityworldwidePapers\n6 BMJ VOLUME 320 6 MAY 2000 bmj.com on 26 May 2024 by guest. Protected by copyright. http://www.bmj.com/ BMJ: first published as 10.1136/bmj.320.7244.1240 on 6 May 2000. Downloaded from " -} -{ - "pm_id": "https://pubmed.ncbi.nlm.nih.gov/18956000", - "pdf_text": "HAL Id: hal-02417326\nhttps://hal.science/hal-02417326\nSubmitted on 18 Dec 2019\nHAL is a multi-disciplinary open access\narchive for the deposit and dissemination of sci-\nentific research documents, whether they are pub-\nlished or not. The documents may come from\nteaching and research institutions in F rance or\nabroad, or from public or private research centers.L’archive ouverte pluridisciplinaire HAL , est\ndestinée au dépôt et à la diffusion de documents\nscientifiques de niveau recherche, publiés ou non,\némanant des établissements d’enseignement et de\nrecherche français ou étrangers, des laboratoires\npublics ou privés.\nMaterials for electrochemical capacitors\nPatrice Simon, Y ury Gogotsi\nT o cite this version:\nPatrice Simon, Y ury Gogotsi. Materials for electrochemical capacitors. Nature Materials, 2008, 7\n(11), pp.845-854. ￿10.1038/nmat2297￿. ￿hal-02417326￿ \n \n \nOATAO is an open access repository that collects the work of Toulouse \nresearchers and makes it freely availab le over the web where possible \nAny correspondence concerning this service should be sent \nto the repository administrator: tech-oatao@listes -diff.inp -toulouse.fr This is an author ’s version published in: http://oatao.univ -toulouse.fr/ 25221 \n \n \nTo cite this version : \nSimon, Patrice\n and Gogotsi, Yury Materials for electrochemical capacitors. \n(2008) Nature Materials, 7 (11). 845 -854. ISSN 1476 -1122 Official URL : https://doi.org/10.1038/nmat2297 \n Patrice Simon1,2 and Yur Y GoGotSi3\n1Université Paul Sabatier, CIRIMAT, UMR-CNRS 5085, 31062 Toulouse \nCedex 4, France\n2Institut Universitaire de France, 103 Boulevard Saint Michel, \n75005 Paris, France\n3Department of Materials Science & Engineering, Drexel University, 3141 \nChestnut Street, Philadelphia 19104, USA\ne-mail: simon@chimie.ups-tlse.fr; gogotsi@drexel.edu\nClimate change and the decreasing availability of fossil fuels require \nsociety to move towards sustainable and renewable resources. As a \nresult, we are observing an increase in renewable energy production \nfrom sun and wind, as well as the development of electric vehicles or \nhybrid electric vehicles with low CO2 emissions. Because the sun does \nnot shine during the night, wind does not blow on demand and we all \nexpect to drive our car with at least a few hours of autonom y, energy \nstorage systems are starting to play a larger part in our lives. At the \nforefront of these are electrical energy storage systems, such as batteries \nand electrochemical capacitors (ECs)1. However, we need to improve \ntheir performance substantially to meet the higher requirements of \nfuture systems, ranging from portable electronics to hybrid electric \nvehicles and large industrial equipment, by developing new materials \nand advancing our understanding of the electrochemical interfaces \nat the nanoscale. Figure 1 shows the plot of power against energy \ndensit y, also called a Ragone plot2, for the most important energy \nstorage systems.\nLithium-ion batteries were introduced in 1990 by Son y, following \npioneering work by Whittingham, Scrosati and Armand (see ref. 3 \nfor a review). These batteries, although costl y, are the best in terms \nof performance, with energy densities that can reach 180 watt \nhours per kilogram. Although great efforts have gone into developing \nhigh-performance Li-ion and other advanced secondary batteries that use nanomaterials or organic redox couples\n4–6, ECs have \nattracted less attention until very recently. Because Li-ion batteries suffer from a somewhat slow power delivery or uptake, faster and higher-power energy storage systems are needed in a number of applications, and this role has been given to the ECs\n7. Also known \nas supercapacitors or ultracapacitors, ECs are power devices that can be fully charged or discharged in seconds; as a consequence, their energy density (about 5\n W\nh kg–1) is lower than in batteries, but a \nmuch higher power delivery or uptake (10 kW kg–1) can be achieved \nfor shorter times (a few seconds)1. They have had an important role \nin complementing or replacing batteries in the energy storage field, such as for uninterruptible power supplies (back-up supplies used to protect against power disruption) and load-levelling. A more recent example is the use of electrochemical double layer capacitors (EDLCs) in emergency doors (16 per plane) on an Airbus A380, thus proving that in terms of performance, safety and reliability ECs are definitely ready for large-scale implementation. A recent report by the US Department of Energy\n8 assigns equal importance to supercapacitors \nand batteries for future energy storage systems, and articles on supercapacitors appearing in business and popular magazines show increasing interest by the general public in this topic.\nSeveral types of ECs can be distinguished, depending on the charge \nstorage mechanism as well as the active materials used. EDLCs, the most common devices at present, use carbon-based active materials wi\nth high surface area (Fig. 2). A second group of ECs, known as \npseudo-capacitors or redox supercapacitors, uses fast and reversible surface or near-surface reactions for charge storage. Transition metal oxides as well as electrically conducting polymers are examples of Materials for electrochemical capacitors\nElectrochemical capacitors, also called supercapacitors, store energy using either ion adsorption \n(electrochemical double layer capacitors) or fast surface redox reactions (pseudo-capacitors). They \ncan complement or replace batteries in electrical energy storage and harvesting applications, when \nhigh power delivery or uptake is needed. A notable improvement in performance has been achieved \nthrough recent advances in understanding charge storage mechanisms and the development of \nadvanced nanostructured materials. The discovery that ion desolvation occurs in pores smaller than \nthe solvated ions has led to higher capacitance for electrochemical double layer capacitors using \ncarbon electrodes with subnanometre pores, and opened the door to designing high-energy density \ndevices using a variety of electrolytes. Combination of pseudo-capacitive nanomaterials, including \noxides, nitrides and polymers, with the latest generation of nanostructured lithium electrodes has \nbrought the energy density of electrochemical capacitors closer to that of batteries. The use of carbon \nnanotubes has further advanced micro-electrochemical capacitors, enabling flexible and adaptable \ndevices to be made. Mathematical modelling and simulation will be the key to success in designing \ntomorrow’s high-energy and high-power devices.\n pseudo-capacitive active materials. Hybrid capacitors, combining a \ncapacitive or pseudo-capacitive electrode with a battery electrode, are the latest kind of EC, which benefit from both the capacitor and the \nbattery properties.\nElectrochemical capacitors currently fill the gap between batteries \na\nnd conventional solid state and electrolytic capacitors (Fig. 1). They \nstore hundreds or thousands of times more charge (tens to hundreds of farads per gram) than the latter, because of a much larger surface \na\nrea (1,000–2,000 m2 g–1) available for charge storage in EDLC. \nHowever, they have a lower energy density than batteries, and this \nlimits the optimal discharge time to less than a minute, whereas \nmany applications clearly need more9. Since the early days of EC \ndevelopment in the late 1950s, there has not been a good strategy \nfor increasing the energy density; only incremental performance \nimprovements were achieved from the 1960s to 1990s. The impressive \nincrease in performance that has been demonstrated in the past couple of years is due to the discovery of new electrode materials and improved understanding of ion behaviour in small pores, as well as \nthe design of new hybrid systems combining faradic and capacitive \nelectrodes. Here we give an overview of past and recent findings as well as an analysis of what the future holds for ECs.\nelectrochemical double-la Yer ca Pacitor S\nThe first patent describing the concept of an electrochemical capacitor was filed in 1957 by Becker\n9, who used carbon with a high specific \nsurface area (SSA) coated on a metallic current collector in a sulphuric acid solution. In 1971, NEC (Japan) developed aqueous-electrolyte \ncapacitors under the energy company SOHIO’s licence for power-\nsaving units in electronics, and this application can be considered as the starting point for electrochemical capacitor use in commercial devices\n9. New applications in mobile electronics, transportation (cars, trucks, trams, trains and buses), renewable energy production \nand aerospace systems10 bolstered further research.\nmechaniSm of double-la Yer caP acitance\nEDLCs are electrochemical capacitors that store the charge \nelectrostatically using reversible adsorption of ions of the electrolyte onto active materials that are electrochemically stable and have high \naccessible SSA. Charge separation occurs on polarization at the \nelectrode–electrolyte interface, producing what Helmholtz described in 1853 as the double layer capacitance C:\nC ord(1)εr ε0 AC/Adεr ε0= = ( 1)\nwhere εr is the electrolyte dielectric constant, ε0 is the dielectric \nconstant of the vacuum, d is the effective thickness of the double layer \n(charge separation distance) and A is the electrode surface area.\nThis capacitance model was later refined by Gouy and Chapman, \nand Stern and Geary, who suggested the presence of a diffuse layer in the electrolyte due to the accumulation of ions close to the electrode \ns\nurface. The double layer capacitance is between 5 and 20 µF cm–2 \ndepending on the electrolyte used11. Specific capacitance achieved \nwith aqueous alkaline or acid solutions is generally higher than in \norganic electrolytes11, but organic electrolytes are more widely used as \nthey can sustain a higher operation voltage (up to 2.7 V in symmetric \nsystems). Because the energy stored is proportional to voltage squared \naccording to\nE = ½ CV2 (2)\na three-fold increase in voltage, V, results in about an order of \nmagnitude increase in energy, E, stored at the same capacitance.\nAs a result of the electrostatic charge storage, there is no faradic \n(redox) reaction at EDLC electrodes. A supercapacitor electrode must \nbe considered as a blocking electrode from an electrochemical point \nof view. This major difference from batteries means that there is no \nlimitation by the electrochemical kinetics through a polarization resistance. In addition, this surface storage mechanism allows very fast energy uptake and delivery, and better power performance. The absence of faradic reactions also eliminates the swelling in the \nactive material that batteries show during charge/discharge cycles. \nEDLCs can sustain millions of cycles whereas batteries survive a few thousand at best. Finally, the solvent of the electrolyte is not involved in the charge storage mechanism, unlike in Li-ion batteries where it \ncontributes to the solid–electrolyte interphase when graphite anodes \nor high-potential cathodes are used. This does not limit the choice of solvents, and electrolytes with high power performances at low \nt\nemperatures (down to –40 °C) can be designed for EDLCs. However, \nas a consequence of the electrostatic surface charging mechanism, \nthese devices suffer from a limited energy density. This explains why \ntoday’s EDLC research is largely focused on increasing their energy performance and widening the temperature limits into the range where batteries cannot operate\n9.\nhiGh Surface area active materialS\nThe key to reaching high capacitance by charging the double layer is in using high SSA blocking and electronically conducting electrodes. Graphitic carbon satisfies all the requirements for this application, \nincluding high conductivity, electrochemical stability and open \nporosity\n12. Activated, templated and carbide-derived carbons13, \ncarbon fabrics, fibres, nanotubes14, onions15 and nanohorns16 have \nbeen tested for EDLC applications11, and some of these carbons are \nshown in Fig. 2a–d. Activated carbons are the most widely used \nmaterials today, because of their high SSA and moderate cost.\nActivated carbons are derived from carbon-rich organic \nprecursors by carbonization (heat treatment) in inert atmosphere Li-primarySpecific power (W kg–1)\nSpecific energy (Wh kg–1)Capacitors\nPb36 s0.36 s3.6 ms\n1 h\n10 h3.6 s\n10–2 102102\n103103104105\n10–1 11\n1010Ni/MHLi-ionElectr\nochemical Capacitor\ns\nPbO2/\nFigure 1 Specific power against specific energy, also called a ragone plot, for \nvarious electrical energy storage devices. if a supercapacitor is used in an electric \nvehicle, the specific power shows how fast one can go, and the specific energy \ns\nhows how far one can go on a single charge. times shown are the time constants of \nthe devices, obtained by dividing the energy density by the power.\n with subsequent selective oxidation in CO2, water vapour or \nKOH to increase the SSA and pore volume. Natural materials, \nsuch as coconut shells, wood, pitch or coal, or synthetic materials, such as polymers, can be used as precursors. A porous network in the bulk of the carbon particles is produced after activation; micr o\npores (<2 nm in size), mesopores (2–50 nm) and macropores \n(>50 nm) can be created in carbon grains. Accordingly, the porous structure of carbon is characterized by a broad distribution of pore size. Longer activation time or higher temperature leads to larger mean pore size. The double layer capacitance of activated ca\nrbon reaches 100–120 F g–1 in organic electrolytes; this value can \nexceed 150–300 F g–1 in aqueous electrolytes, but at a lower cell \nvoltage because the electrolyte voltage window is limited by water decomposition. A typical cyclic voltammogram of a two-electrode EDL\nC laboratory cell is presented in Fig. 2e. Its rectangular shape \nis characteristic of a pure double layer capacitance mechanism for charge storage according to:\n(3) I = C × dV\ndt(3 )\nwhere Ι is the current, (dV/dt) is the potential scan rate and C is \nthe double layer capacitance. Assuming a constant value for C, for \na given scan rate the current I is constant, as can be seen from Fig. \n2e, where the cyclic voltammogram has a rectangular shape.\nAs previously mentioned, many carbons have been tested for \nEDLC applications and a recent paper11 provides an overview \nof what has been achieved. Untreated carbon nanotubes17 or nanofibres \nhave a lower capacitance (around 50–80 F g–1) than activated carbon \nin organic electrolytes. It can be increased up to 100 F g–1 or greater \nby grafting oxygen-rich groups, but these are often detrimental to \ncyclability. Activated carbon fabrics can reach the same capacitance as activated carbon powders, as they have similar SSA, but the high price limits their use to speciality applications. The carbons used in EDL capacitors are generally pre-treated to remove moisture and most of the surface functional groups present on the carbon surface to improve stability during cycling, both of which can be responsible for capacitance fading during capacitor ageing as demonstrated by Azais et\ta\n l.18 using NMR and X-ray photoelectron spectroscopy \ntechniques. Pandolfo et\ta l.11, in their review article, concluded that \nthe presence of oxygenated groups also contributes to capacitor instability, resulting in an increased series resistance and deterioration of\n capacitance. Figure 3 presents a schematic of a commercial EDLC, \nshowing the positive and the negative electrodes as well as the separator in\n rolled design (Fig. 3a,b) and flat design (button cell in Fig. 3c).\ncaPacitance and Pore Size\nInitial research on activated carbon was directed towards increasing the pore volume by developing high SSA and refining the activation process. However, the capacitance increase was limited even for the most porous samples. From a series of activated carbons with different pore sizes in various electrolytes, it was shown that there was no linear relationship between the SSA and the capacitance\n19–21. Some \n5 nm5 nm~0.336 nm\n3)\nSiCCNT\n–75075150Capacitance (F g–1)\n100 nm\nCell voltage (V)0 0.5 1 1.5 2 2.5 3–150\nFigure 2 carbon structures used as active materials for double layer capacitors. a, ty pical transmission electronic microscopy ( tem) image of a disordered microporous \ncarbon (Si c-derived carbon, 3 hours chlorination at 1,000 °c). b, t em image of onion-like carbon. reproduced with permission from ref. 80. © 2007 elsevier. c, Scanning \nel\nectron microscopy image of an array of carbon nanotubes (labelled cnt) on Si c produced by annealing for 6 h at 1,700 °c; inset, d, shows a t em image of the same \nnanotubes72. e, cy clic voltammetry of a two-electrode laboratory edlc cell in 1.5 m tetraethylammonium tetrafluoroborate net4+,bf4– in acetonitrile-based electrolyte, \ncontaining activated carbon powders coated on aluminium current collectors. cyclic voltammetry was recorded at room temperature and potential scan rate of 20 mv s–1.\n studies suggested that pores smaller than 0.5 nm were not accessible \nto hydrated ions20,22 and that even pores under 1 nm might be too \nsmall, especially in the case of organic electrolytes, where the size \no\nf the solvated ions is larger than 1 nm (ref. 23). These results were \nconsistent with previous work showing that ions carry a dynamic \nsheath of solvent molecules, the solvation shell24, and that some \nhundreds of kilojoules per mole are required to remove it25 in the \ncase of water molecules. A pore size distribution in the range 2–5 nm, \nwhich is larger than the size of two solvated ions, was then identified as a way to improve the energy density and the power capability. Despite \nall efforts, only a moderate improvement has been made. Gravimetric \nc\napacitance in the range of 100–120 F g–1 in organic and 150–200 F g–1 \nin aqueous electrolytes has been achieved26,27 and ascribed to improved \nionic mass transport inside mesopores. It was assumed that a well \nbalanced micro- or mesoporosity (according to IUPAC classification, \nm\nicropores are smaller than 2 nm, whereas mesopores are 2–50 nm \nin diameter) was needed to maximize capacitance28.\nAlthough fine-tuned mesoporous carbons failed to achieve high \ncapacitance performance, several studies reported an important \ncapacitive contribution from micropores. From experiments \nusing activated carbon cloth, Salitra e t\tal.29 suggested that a partial \ndesolvation of ions could occur, allowing access to small pores \n(\n<2 nm). High capacitance was observed for a mesoporous carbon \ncontaining large numbers of small micropores30–32, suggesting that \npartial ion desolvation could lead to an improved capacitance. H\nigh capacitances (120 F g–1 and 80 F cm–3) were found in organic \nelectrolytes for microporous carbons (<1.5 nm)33,34, contradicting \nthe solvated ion adsorption theory. Using microporous activated \ncoal-based carbon materials, Raymundo-Pinero e t\tal.35 observed the \nsame effect and found a maximum capacitance for pore size at 0.7 and 0\n.8 nm for aqueous and organic electrolytes, respectively. However, \nthe most convincing evidence of capacitance increase in pores smaller than the solvated ion size was provided by experiments using carbide-derived carbons (CDCs)\n36–38 as the active material. These are porous \ncarbons obtained by extraction of metals from carbides (TiC, SiC and other) by etching in halogens at elevated temperatures\n39:\nT\niC + 2Cl2 → TiCl4 + C (4)\nIn this reaction, Ti is leached out from TiC, and carbon atoms self-\norganize into an amorphous or disordered, mainly sp2-bonded40, \nstructure with a pore size that can be fine-tuned by controlling the \nchlorination temperature and other process parameters. Accordingly, \na narrow uni-modal pore size distribution can be achieved in the \nr\nange 0.6–1.1 nm, and the mean pore size can be controlled with \nsub-ångström accuracy41. These materials were used to understand \nthe charge storage in micropores using 1 M solution of NEt4BF4 \nin acetonitrile-based electrolyte42. The normalized capacitance \n(µF cm–²) decreased with decreasing pore size until a critical value \nclose to 1 nm was reached (Fig. 4), and then sharply increased when \nthe pore size approached the ion size. As the CDC samples were \nexclusively microporous, the capacitance increase for subnanometre \npores clearly shows the role of micropores. Moreover, the gravimetric and volumetric capacitances achieved by CDC were, respectively, 50% and 80% higher than for conventional activated carbon\n19–21. \nThe capacitance change with the current density was also found to be stable, demonstrating the high power capabilities these materials \ncan achieve\n42. As the solvated ion sizes in this electrolyte were 1.3 and \n1.16 nm for the cation and anion16, respectively, it was proposed that \npartial or complete removal of their solvation shell was allowing the \nions to access the micropores. As a result, the change of capacitance \nwas a linear function of 1/ b (where b is the pore radius), confirming \nthat the distance between the ion and the carbon surface, d, was shorter \nfor the smaller pores. This dependence published by Chmiola e t\tal.42 \nhas since been confirmed by other studies, and analysis of literature \nd\nata is provided in refs 43 and 44.\ncharGe-StoraGe mechaniSm in Subnanometre PoreS\nFrom a fundamental point of view, there is a clear lack of understanding \nof the double layer charging in the confined space of micropores, where there is no room for the formation of the Helmholtz layer and diffuse \nlayer expected at a solid–electrolyte interface. To address this issue, a \nthree-electrode cell configuration, which discriminates between anion and cation adsorption, was used\n45. The double layer capacitance in \n1\n.5 M NEt4BF4-acetonitrile electrolyte caused by the anion and cation \nat the positive and negative electrodes, respectively, had maxima at different pore sizes\n45. The peak in capacitance shifted to smaller pores \nfor the smaller ion (anion). This behaviour cannot be explained by purely electrostatic reasons, because all pores in this study were the \nsame size as or smaller than a single ion with a single associated \nsolvent molecule. It thus confirmed that ions must be at least partially stripped of solvent molecules in order to occupy the carbon pores. These results point to a charge storage mechanism whereby partial or \ncomplete removal of the solvation shell and increased confinement of \nions lead to increased capacitance.\nA theoretical analysis published by Huang e\n t\tal.43 proposed splitting \nthe capacitive behaviour in two different parts depending on the pore \ns\nize. For mesoporous carbons (pores larger than 2 nm), the traditional \nmodel describing the charge of the double layer was used43:\nb lnb\nb – dεr ε0(5) C/A = (5)\nwhere \tb is the pore radius and d is the distance of approach of the \ni\non to the carbon surface. Data from Fig. 4 in the mesoporous range \nCarbon electrodes\ncoated onto Al foil \nSeparator\nFigure 3 electrochemical capacitors. a, Schematic of a commercial spirally wound \ndouble layer capacitor. b, assembled device weighing 500 g and rated for 2,600 f. \n(Photo courtesy of batscap, Groupe bolloré, france.) c, a small button cell, which is \njust 1.6 mm in height and stores 5 f. (Photo courtesy of Y- carbon, uS.) both devices \noperate at 2.7 v.\n (zone III) were fitted with equation (5). For micropores (<1 nm), it was \nassumed that ions enter a cylindrical pore and line up, thus forming \nthe ‘electric wire in cylinder’ model of a capacitor. Capacitance was \ncalculated from43\nb lnbεr ε0(6) C/A = (6)\na0\nwhere a0 is the effective size of the ion (desolvated). This \nmodel perfectly matches with the normalized capacitance \nchange versus pore size (zone I in Fig. 4). Calculations using \ndensity functional theory gave consistent values for the size, a0, for \nunsolvated NEt4+ and BF4– ions43.\nThis work suggests that removal of the solvation shell is \nrequired for ions to enter the micropores. Moreover, the ionic \nradius a0 found by using equation (6) was close to the bare ion \nsize, suggesting that ions could be fully desolvated. A study carried \nout with CDCs in a solvent-free electrolyte ([EMI+,TFSI–] ionic liquid at 60 °C), in which \nboth ions have a maximum size of about 0.7 nm, showed the maximum \ncapacitance for samples with the 0.7-nm pore size46, demonstrating \nthat a single ion per pore produces the maximum capacitance (Fig. 5). \nThis suggests that ions cannot be adsorbed on both pore surfaces, in \ncontrast with traditional supercapacitor models.\nmaterialS bY deSiGn\nThe recent findings of the micropore contribution to the capacitive \nstorage highlight the lack of fundamental understanding of the electrochemical interfaces at the nanoscale and the behaviour of \nions confined in nanopores. In particular, the results presented \nabove rule out the generally accepted description of the double layer with solvated ions adsorbed on both sides of the pore walls, consistent with the absence of a diffuse layer in subnanometre \npores. Although recent studies\n45,46 provide some guidance for \ndeveloping materials with improved capacitance, such as elimination \nof macro- and mesopores and matching the pore size with the ion \nsize, further material optimization by Edisonian or combinatorial \nelectrochemistry methods may take a very long time. The effects of many parameters, such as carbon bonding ( sp versus \ts\np2 or sp3), pore \nshape, defects or adatoms, are difficult to determine experimentally. Clearly, computational tools and atomistic simulation will be \nneeded to help us to understand the charge storage mechanism in \nsubnanometre pores and to propose strategies to design the next generation of high-capacitance materials and material–electrolyte systems\n47. Recasting the theory of double layers in electrochemistry \nto take into account solvation and desolvation effects could lead to a better understanding of charge storage as well as ion transport in ECs \nand even open up new opportunities in areas such as biological ion \nc\nhannels and water desalination.\nredox-ba Sed electrochemical ca Pacitor S\nmechaniSm of PSeudo-caP acitive charGe StoraGe\nSome ECs use fast, reversible redox reactions at the surface of \nactive materials, thus defining what is called the pseudo-capacitive behaviour. Metal oxides such as RuO\n2, Fe3O4 or MnO2 (refs 48, 49), \nas well as electronically conducting polymers50, have been extensively \nstudied in the past decades. The specific pseudo-capacitance exceeds 051015\n1.5 M\n1.5 M\n1.5 M\n1.0 & 1.4 M\n1.7 MI IV\nD\nE\nFCB\nA\nAverage pore size (nm)0 1 2 3 4 5 15 25 35 45II IIINormalized capacitance (µF cm–2)\n+δ–δ–δ–\nδ–\nδ–\nδ–\nδ–δ––––\n–\n–\n–\n–\n–\n––––\nb d\na\n–\n–\n–\n–\n–\n–\nFigure 4 Specific capacitance normalized by SS a as a function of pore size for \ndifferent carbon samples. all samples were tested in the same electrolyte ( net4+,bf4– \nin acetonitrile; concentrations are shown in the key). Symbols show experimental \nd\nata for cdcs, templated mesoporous carbons and activated carbons, and lines show \nmodel fits43. a huge normalized capacitance increase is observed for microporous \ncarbons with the smallest pore size in zone i, which would not be expected in the \ntraditional view. the partial or complete loss of the solvation shell explains this \nanomalous behaviour42. as schematics show, zones i and ii can be modelled as an \nelectric wire-in-cylinder capacitor, an electric double-cylinder capacitor should be c\nonsidered for zone iii, and the commonly used planar electric double layer capacitor \ncan be considered for larger pores, when the curvature/size effect becomes negligible (\nzone iv). a mathematical fit in the mesoporous range (zone iii) is obtained using \nequation (5). equation (6) was used to model the capacitive behaviour in zone i, \nwhere confined micropores force ions to desolvate partially or completely44. a, b: \ntemplated mesoporous carbons; c: activated mesoporous carbon; d, f: microporous \ncdc; e: microporous activated carbon. reproduced with permission from ref. 44. \n© 2008 Wiley.\nNormalized capacitance (μF cm2)\n0.6678910111213\n4.3 Å\n7.6 Å\n7.9 Å2.9 ÅEMI cation\nTFSI anion14\n0.7 0.8\nPore size (nm)0.9 1–+\n1.1Fluorine\nCarbonNitrogenHydrogen OxygenSulphur\nH\nH\nHH\nH\nHHH\nFF\nF\nFFFO\nN\nCOOSSC\nOHC C\nCCCC NN\nFigure 5 normalized capacitance change as a function of the pore size of carbon-\nderived-carbide samples. Samples were prepared at different temperatures in e\nthyl-methylimidazolium/trifluoro-methane-sulphonylimide ( emi,tfSi) ionic liquid at \n60 °c. inset shows the structure and size of the emi and tfSi ions. the maximum \ncapacitance is obtained when the pore size is in the same range as the maximum i\non dimension. reproduced with permission from ref. 46. © 2008 acS.\n that of carbon materials using double layer charge storage, justifying \ninterest in these systems. But because redox reactions are used, pseudo-capacitors, like batteries, often suffer from a lack of stability \nduring cycling.\nRuthenium oxide, RuO\n2, is widely studied because it is conductive \nand has three distinct oxidation states accessible within 1.2 V . The \npseudo-capacitive behaviour of RuO2 in acidic solutions has been the \nfocus of research in the past 30 years1. It can be described as a fast, \nreversible electron transfer together with an electro-adsorption of \nprotons on the surface of RuO2 particles, according to equation (7), \nwhere Ru oxidation states can change from (ii) up to (iv):\nR\nuO2 + xH+ + xe– ↔ RuO2–x(OH)x (7)\nwhere 0 ≤ x\t≤ 2. The continuous change of x during proton insertion \no\nr de-insertion occurs over a window of about 1.2 V and leads to a \ncapacitive behaviour with ion adsorption following a Frumkin-type \nisotherm1. Specific capacitance of more than 600 F g–1 has been \nreported51, but Ru-based aqueous electrochemical capacitors are \nexpensive, and the 1-V voltage window limits their applications to \nsmall electronic devices. Organic electrolytes with proton surrogates \n(for example Li+) must be used to go past 1 V . Less expensive oxides \nof iron, vanadium, nickel and cobalt have been tested in aqueous \nelectrolytes, but none has been investigated as much as manganese \noxide52. The charge storage mechanism is based on surface adsorption \nof electrolyte cations C+ (K+, Na+…) as well as proton incorporation \naccording to the reaction:\nM\nnO2 + xC+ + yH+ + (x+y)e– ↔ M nOOCxHy (8)\nFigure 6 shows a cyclic voltammogram of a single MnO2 electrode in \nmild aqueous electrolyte; the fast, reversible successive surface redox \nreactions define the behaviour of the voltammogram, whose shape is \nclose to that of the EDLC. MnO2 micro-powders or micrometre-thick \nfilms show a specific capacitance of about 150 F g–1 in neutral aqueous \nelectrolytes within a voltage window of <1 V . Accordingly, there is \nlimited interest in MnO2 electrodes for symmetric devices, because \nthere are no oxidation states available at less than 0 V . However, it is \nsuitable for a pseudo-capacitive positive electrode in hybrid systems, which we will describe below. Other transition metal oxides with \nvarious oxidation degrees, such as molybdenum oxides, should also be explored as active materials for pseudo-capacitors.\nMany kinds of conducting polymers (polyaniline, polypyrrole, \npolythiophene and their derivatives) have been tested in EC applications as pseudo-capacitive materials\n50,53,54 and have shown \nhigh gravimetric and volumetric pseudo-capacitance in various non-\na\nqueous electrolytes at operating voltages of about 3 V . When used as \nbulk materials, conducting polymers suffer from a limited stability \nduring cycling that reduces the initial performance9. Research \nefforts with conducting polymers for supercapacitor applications are \nnowadays directed towards hybrid systems.\nnanoStructurinG redox-active materialS to increaSe caP acitance\nGiven that nanomaterials have helped to improve Li-ion batteries55, it \nis not surprising that nanostructuring has also affected ECs. Because pseudo-capacitors store charge in the first few nanometres from the \nsurface, decreasing the particle size increases active material usage. \nThanks to a thin electrically conducting surface layer of oxide and oxynitride, the charging mechanism of nanocrystalline vanadium nitride (VN) includes a combination of an electric double layer and \na faradic reaction (ii/iv) at the surface of the nanoparticles, leading \nt\no specific capacitance up to 1,200 F g–1 at a scan rate of 2 mV s–1 \n(ref. 56). A similar approach can be applied to other nano-sized \ntransition metal nitrides or oxides. In another example, the cycling \nstability and the specific capacitance of RuO2 nanoparticles were \nincreased by depositing a thin conducting polymer coating that \nenhanced proton exchange at the surface57. The design of specific \nsurface functionalization to improve interfacial exchange could be suggested as a generic approach to other pseudo-redox materials.\nMnO\n2 and RuO2 films have been synthesized at the nanometre \nscale. Thin MnO2 deposits of tens to hundreds of nanometres have \nbeen produced on various substrates such as metal collectors, carbon nanotubes or activated carbons. Specific capacitances as \nh\nigh as 1,300 F g–1 have been reported58, as reaction kinetics were no \nlonger limited by the electrical conductivity of MnO2. In the same \nway, Sugimoto’s group have prepared hydrated RuO2 nano-sheets \nwith capacitance exceeding 1,300 F g–1 (ref. 59). The RuO2 specific \ncapacitance also increased sharply when the film thickness was \ndecreased. The deposition of RuO2 thin film onto carbon supports60,61 \nboth increased the capacitance and decreased the RuO2 consumption. \nThin film synthesis or high SSA capacitive material decoration with nano-sized pseudo-capacitive active material, like the examples \np\nresented in Fig. 7a and b , offers an opportunity to increase energy \ndensity and compete with carbon-based EDLCs. Particular attention \nmust be paid to further processing of nano-sized powders into active \nfilms because they tend to re-agglomerate into large-size grains. An alternative way to produce porous films from powders is by growing nanotubes, as has been shown for V\n2O5 (ref. 62), or nanorods. These \nallow easy access to the active material, but can only be produced in thin films so far, and the manufacturing cost will probably limit the \nuse of these sophisticated nanostructures to small electronic devices.\nhYbrid SYStemS to achieve hiGh enerGY denSitY\nHybrid systems offer an attractive alternative to conventional pseudo-\ncapacitors or EDLCs by combining a battery-like electrode (energy source) with a capacitor-like electrode (power source) in the same \ncell. An appropriate electrode combination can even increase the cell \nvoltage, further contributing to improvement in energy and power densities. Currently, two different approaches to hybrid systems have emerged: (i) pseudo-capacitive metal oxides with a capacitive carbon \nelectrode, and (ii) lithium-insertion electrodes with a capacitive \ncarbon electrode.\nNumerous combinations of positive and negative electrodes have \nbeen tested in the past in aqueous or inorganic electrolytes. In most NormaIized current (per g)\n0–101\n0.2 0.4 0.6 0.8 1.0\nE vs Ag/AgCl (V)Mn( III)(x+y),Mn( IV)1–(x+y )OOCxHy Mn( IV)O2 + xC+ + yH+ + (x+y )e–\nMn( IV)O2 + xC+ + yH+ + (x+y )e–Mn( III)(x+y), Mn( IV)1–(x+y )OOCxHy\nFigure 6 cyclic voltammetry. this schematic of cyclic voltammetry for a mno2-\nelectrode cell in mild aqueous electrolyte (0.1 m K2So4) shows the successive \nmultiple surface redox reactions leading to the pseudo-capacitive charge storage \nm\nechanism. the red (upper) part is related to the oxidation from mn(iii) to mn(iv) and \nthe blue (lower) part refers to the reduction from mn(iv) to mn(iii).\n cases, the faradic electrode led to an increase in the energy density at \nthe cost of cyclability (for balanced positive and negative electrode \ncapacities). This is certainly the main drawback of hybrid devices, as \ncompared with EDLCs, and it is important to avoid transforming a \ngood supercapacitor into a mediocre battery63.\nMnO2 is one of the most studied materials as a low-cost alternative \nto RuO2. Its pseudo-capacitance arises from the iii/iv oxidation state \nchange at the surface of MnO2 particles58. The association of a negative \nEDLC-type electrode with a positive MnO2 electrode leads to a 2-V cell \nin aqueous electrolytes thanks to the apparent water decomposition \novervoltage on MnO2 and high-surface-area carbon. The low-cost \ncarbon–MnO2 hybrid system combines high capacitance in neutral \naqueous electrolytes with high cell voltages, making it a green \nalternative to EDLCs using acetonitrile-based solvents and fluorinated \nsalts. Moreover, the use of MnO2 nano-powders and nanostructures \noffers the potential for further improvement in capacitance64. Another \nchallenge for this system is to use organic electrolytes to reach higher \ncell voltage, thus improving the energy densit y.\nA combination of a carbon electrode with a PbO2 battery-like \nelectrode using H2SO4 solution can work at 2.1 V (ref. 65), offering a \nlow-cost EC device for cost-sensitive applications, in which weight of \nthe device is of minor concern.\nThe hybrid concept originated from the Li-ion batteries field. In \n1999, Amatucci ’s group combined a nanostructured lithium titanate \nanode Li4Ti5O12 with an activated carbon positive electrode, designing \na 2.8-V system that for the first time exceeded 10 Wh kg–1 (ref. 66). \nThe titanate electrode ensured high power capacity and no solid-\nelectrolyte interphase formation, as well as long-life cyclability thanks \nto low volume change during cycling. Following this \npioneering work, many studies have been conducted on various combinations \nof a lithium-insertion electrode with a capacitive carbon electrode. The Li-ion capacitor developed by Fuji Heavy Industry is an example \nof this concept, using a pre-lithiated high SSA carbon anode together \nwith an activated carbon cathode\n63,67. It achieved an energy density \nof more than 15 Wh kg–1 at 3.8 V . Capacity retention was increased \nby unbalancing the electrode capacities, allowing a low depth of charge/discharge at the anode. Systems with an activated carbon \nanode and anion intercalation cathode are also under development. \nThe advent of nanomaterials\n55 as well as fast advances in the area of \nLi-ion batteries should lead to the design of high-performance ECs. Combining newly developed high-rate conversion reaction anodes or \nLi-alloying anodes with a positive supercapacitor electrode could fill \nthe gap between Li-ion batteries and EDLCs. These systems could be of particular interest in applications where high power and medium cycle life are needed.\ncurrent collector S\nBecause ECs are power devices, their internal resistance must be kept low. Particular attention must be paid to the contact impedance between the active film and the current collector. ECs designed \nfor organic electrolytes use treated aluminium foil or grid current \ncollectors. Surface treatments have already been shown to decrease ohmic drops at this interface\n68, and coatings on aluminium that \nimprove electrochemical stability at high potentials and interface conductivity are of great interest.\nThe design of nanostructured current collectors with an increased \ncontact area is another way to control the interface between current \nActivated carbon grainActivated carbon grain\nCarbon nanotubes or rodsPseudo-capacitive material\nPseudo-capacitive material\nCurrent collector Current collectorCurrent collector Current collector\nFigure 7 Possible strategies to improve both energy and power densities for electrochemical capacitors. a, b, decorating activated carbon grains ( a) with pseudo-capacitive \nmaterials ( b). c, d, achieving conformal deposit of pseudo-capacitive materials ( d) onto highly ordered high-surface-area carbon nanotubes ( c).\n collector and active material. For example, carbon can be produced \nin a variety of morphologies12, including porous films and nanotube \nbrushes that can be grown on various current collectors69 and that \ncan serve as substrates for further conformal deposition (Fig. 7c \nand d) of active material. These nano-architectured electrodes could \noutperform the existing systems by confining a highly pseudo-\ncapacitive material to a thin film with a high SSA, as has been done for Li-ion batteries\n70 where, by growing Cu nano-pillars on a planar \nCu foil, a six-fold improvement in the energy density over planar electrodes has been achieved\n70. Long’s group64 successfully applied \na similar approach to supercapacitors by coating a porous carbon nano-foam with a 20-nm pseudo-capacitive layer of MnO\n2. As a \nresult, the area-normalized capacitance doubled to reach 1.5 F cm–2, \ntogether with an outstanding volumetric capacitance of 90 F cm–3. \nElectrophoretic deposition from stable colloidal suspensions of RuO2 \n(ref. 71) or other active material can be used for filling the inter-\ntube space to design high-energy-density devices which are just a few micrometres thick. The nano-architectured electrodes also find \napplications in micro-systems where micro-ECs can complement \nmicro-batteries for energy harvesting or energy generation. In this specific field, it is often advantageous to grow self-supported, binder-less nano-electrodes directly on semiconductor wafers, such as Si or \nS\niC (ref. 72; Fig. 2c).\nAn attractive material for current collectors is carbon in the form \nof a highly conductive nanotube or graphene paper. It does not corrode \nin aqueous electrolytes and is very flexible. The use of nanotube paper for manufacturing flexible supercapacitors is expected to grow as the cost of small-diameter nanotubes required for making paper \ndecreases. The same thin sheet of nanotubes\n14 could potentially act as \nan active material and current collector at the same time. Thin-film, \nprintable and wearable ECs could find numerous applications.\nfrom or Ganic to ionic liquid electrol YteS\nEC cell voltage is limited by the electrolyte decomposition at high potentials. Accordingly, the larger the electrolyte stability voltage window, the higher the supercapacitor cell voltage. Moving from \na\nqueous to organic electrolytes increased the cell voltage from 0.9 V \nto 2.5–2.7 V for EDLCs. Because the energy density is proportional \nto the voltage squared (equation (2)), numerous research efforts have \nbeen directed at the design of highly conducting, stable electrolytes with a wider voltage window. Today, the state of the art is the use of organic electrolyte solutions in acetonitrile or propylene carbonate, \nthe latter becoming more popular because of the low flash point and \nlower toxicity compared with acetonitrile.\nIonic liquids are room-temperature liquid solvent-free \nelectrolytes; their voltage window stability is thus only driven by the electrochemical stability of the ions. A careful choice of both the anion \nand the cation allows the design of high-voltage supercapacitors, and \n3-V , 1,000-F commercial devices are already available\n73. However ,\nthe ionic conductivity of these liquids at room temperature is just a\nfew milliSiemens per centimetre, so they are mainly used at highe r\ntemperatures. For example, CDC with an EMI/TFSI ionic liqui d\nelectrolyte has been shown46 to have capacitance of 160 F g–1 and\n~\n90 F cm–3 at 60 °C. In this area, hybrid activated carbon/conductin g\npolymer devices also show an improved performance with cel l\nv\noltages higher than 3 V (refs 74–76).\nFor applications in the temperature range –30 °C to +60 °C, where \nbatteries and supercapacitors are mainly used, ionic liquids still fail to satisfy the requirements because of their low ionic conductivity. However, the choice of a huge variety of combinations of anions and cations offers the potential for designing an ionic liquid electrolyte \nw\nith an ionic conductivity of 40 mS cm–1 and a voltage window of \n>4 V at room temperature77. A challenge is, for instance, to find a n\nalternative to the imidazolium cation that, despite high conductivity ,undergoes a reduction reaction at potential <1.5 V versus Li+/Li. \nReplacing the heavy bis(trifluoromethanesulphonyl)imide (TFSI) \nanion by a lighter (fluoromethanesulphonyl)imide (FSI) and \npreparing ionic liquid eutectic mixtures would improve both the cell \nvoltage (because a protecting layer of AlF3 can be formed on the Al \nsurface, shifting the de-passivation potential of Al above 4 V) and the \nionic conductivity77. However, FSI shows poor cyclability at elevated \ntemperatures. Supported by the efforts of the Li-ion community to \ndesign safer systems using ionic liquids, the research on ionic liquids \nfor ECs is expected to have an important role in the improvement of \ncapacitor performance in the coming years.\naPPlication S of electrochemical ca Pacitor S\nECs are electrochemical energy sources with high power delivery and uptake, with an exceptional cycle life. They are used when high power demands are needed, such as for power buffer and power \nsaving units, but are also of great interest for energy recovery. Recent \narticles from Miller e\n t\tal.7,10 present an overview of the opportunities \nfor ECs in a variety of applications, complementing an earlier review by Kötz e\nt \tal.9. Small devices (a few farads) are widely used for \npower buffer applications or for memory back-up in toys, cameras, video recorders, mobile phones and so forth. Cordless tools such as \nscrewdrivers and electric cutters using EDLCs are already available \non the market. Such systems, using devices of a few tens of farads, \nc\nan be fully charged or discharged in less than 2 minutes, which is \nparticularly suited to these applications, with the cycle life of EDLC \nexceeding that of the tool. As mentioned before, the Airbus A380 \njumbo jets use banks of EDLCs for emergency door opening. The modules consist of an in series/parallel assembly of 100-F , 2.7-V cells that are directly integrated into the doors to limit the use of heavy \ncopper cables. This application is obviously a niche market, but it \nis a demonstration that the EDLC technology is mature in terms of performance, reliability and safety. \nThe main market targeted by EDLC manufacturers for the next \nyears is the transportation market, including hybrid electric vehicles, \nas well as metro trains and tramways. There continues to be debate \nabout the advantage of using high power Li-ion batteries instead of ECs (or vice versa) for these applications. Most of these discussions have been initiated by Li-ion battery manufacturers who would like \ntheir products to cover the whole range of applications. However, ECs \nand Li-ion batteries should not necessarily be seen as competitors, because their charge storage mechanisms and thus their characteristics are different. The availability of the stored charge will always be faster \nfor a supercapacitor (surface storage) than for a Li-ion battery (bulk \nstorage), with a larger stored energy for the latter. Both devices must \nb\ne used in their respective time-constant domains (see Fig. 1). Using a \nLi-ion battery for repeated high power delivery/uptake applications for a\n short duration (10 s or less) will quickly degrade the cycle life of the \nsystem10. The only way to avoid this is to oversize the battery, increasing \nthe cost and volume. In the same way, using ECs for power delivery l\nonger than 10 s requires oversizing. However, some applications use \nECs as the main power and energy source, benefiting from the fast \ncharge/discharge capability of these systems as well as their outstanding \ncycle life. Several train manufacturers have clearly identified the tramway/metro market segment as extremely relevant for EC use, to power trains over short distances in big cities, where electric cables are \nclearly undesirable for aesthetic and other reasons, but also to recover \nthe braking energy of another train on the same line, thanks to the ECs’ symmetric high power delivery/uptake characteristics.\nFor automotive applications, manufacturers are already proposing \nsolutions for electrical power steering, where ECs are used for load-\nlevelling in stop-and-go traffic\n78. The general trend is to increase \nthe hybridization degree of the engines in hybrid electric vehicles, \nto allow fast acceleration (boost) and braking energy recovery. The \n on-board energy storage systems will be in higher demand, and a \ncombination of batteries and EDLCs will increase the battery cycle life, explaining why EDLCs are viewed as a partner to Li-ion batteries \nfor this market\n78. Currently, high price limits the use of both Li-ion \nbatteries and EDLC in large-scale applications (for example for load \nlevelling). But the surprisingly high cost of materials used for EDLC \nis due to a limited number of suppliers rather than intrinsically high \ncost of porous carbon. Decreasing the price of carbon materials for ECs, including CDC and AC, would remove the main obstacle to their wider use\n79.\nSummar Y and outloo K\nThe most recent advances in supercapacitor materials include nanoporous carbons with the pore size tuned to fit the size of ions of the electrolyte with ångström accuracy, carbon nanotubes for flexible \nand printable devices with a short response time, and transition \nmetal oxide and nitride nanoparticles for pseudo-capacitors with a high energy density. An improved understanding of charge storage and ion desolvation in subnanometre pores has helped to overcome \na barrier that has been hampering progress in the field for decades. It \nhas also shown how important it is to match the active materials with specific electrolytes and to use a cathode and anode with different pore sizes that match the anion or cation size. Nano-architecture of electrodes has led to further improvements in power delivery. \nThe very large number of possible active materials and electrolytes \nmeans that better theoretical guidance is needed for the design of \nf\nuture ECs.\nFuture generations of ECs are expected to come close to current \nLi-ion batteries in energy density, maintaining their high power \ndensity. This may be achieved by using ionic liquids with a voltage \nw\nindow of more than 4 V , by discovering new materials that combine \ndouble-layer capacitance and pseudo-capacitance, and by developing \nhybrid devices. ECs will have a key role in energy storage and \nharvesting, decreasing the total energy consumption and minimizing the use of hydrocarbon fuels. Capacitive energy storage leads to a lower energy loss (higher cycle efficiency), than for batteries, \ncompressed air, flywheel or other devices, helping to improve storage \neconomy further. Flexible, printable and wearable ECs are likely to be integrated into smart clothing, sensors, wearable electronics and drug delivery systems. In some instances they will replace batteries, but in many cases they will either complement batteries, increasing \ntheir efficiency and lifetime, or serve as energy solutions where \nan extremely large number of cycles, long lifetime and fast power delivery are required.\ndoi:10.1038/nmat2297\nreferences\n1.C onway, B. E. Electrochemical\tSupercapacitors:\tScientific\tFundamentals\tand\tTechnological\nApplications (Kluwer, 1999).\n2\n. Service, R. F . New ‘supercapacitor’ promises to pack more electrical punch. Science 313, 902–905 (2006).\n3.T\narascon, J.-M. & Armand, M. Issues and challenges facing rechargeable lithium batteries. Nature \n414, 359–367 (2001).\n4.B\nrodd, R. J. et\tal. Batteries, 1977 to 2002. J .\tElectrochem.\tSoc. 151, K1–K11 (2004).\n5.A\nrmand, M. & Tarascon, J.-M. Building better batteries. Nature 451, 652–657 (2008).\n6.A\nrmand, M. & Johansson, P . Novel weakly coordinating heterocyclic anions for use in lithium \nbatteries. J .\tPower\tSources 178, 821–825 (2008).\n7.M\niller, J. R. & Simon, P . Electrochemical capacitors for energy management. Science \n321, 651–652 (2008).\n8.US D\nepartment of Energy.\tBasic\tResearch\tNeeds\tfor\tElectrical\tEnergy\tStorage (2007).\n9.K\nötz, R. & Carlen, M. Principles and applications of electrochemical capacitors. Electrochim.\tActa\n45, 2483–2498 (2000).\n10.M\niller, J. R. & Burke, A. F. Electrochemical capacitors: Challenges and opportunities for real-world \napplications. E lectrochem.\tSoc.\tInterf. 17, 53–57 (2008).\n11.P\nandolfo, A. G. & Hollenkamp, A. F. Carbon properties and their role in supercapacitors. \nJ.\tPower\tSources 157, 11–27 (2006).\n12.G\nogotsi, Y . (ed.) Carbon\tNanomaterials (CRC, 2006).\n13.K\nyotani, T., Chmiola, J. & Gogotsi, Y . in Carbon\tMaterials\tfor\tElectrochemical\tEnergy\tStorage\tSystems \n(eds Beguin, F. & Frackowiak, E.) Ch. 13 (CRC/Taylor and Francis, in the press). 14.F\nutaba, D. N. et\tal. Shape-engineerable and highly densely packed single-walled carbon nanotubes\nand their application as super-capacitor electrodes. Nature\tMater. 5, 987–994 (2006).\n15.P\nortet, C., Chmiola, J., Gogotsi, Y ., Park, S. & Lian, K. Electrochemical characterizations of carbon \nnanomaterials by the cavity microelectrode technique. E lectrochim.\tActa , 53, 7675–7680 (2008).\n16.Y\nang, C.-M. et\tal. Nanowindow-regulated specific capacitance of supercapacitor electrodes of \nsingle-wall carbon nanohorns. J .\tAm.\tChem.\tSoc.\t129, 20–21 (2007).\n17.N\niu, C., Sichel, E. K., Hoch, R., Moy, D. & Tennent, H. High power electrochemical capacitors based \non carbon nanotube electrodes.\tA ppl.\tPhys.\tLett. 70, 1480 (1997).\n18.Aza\nïs, P . et\tal. Causes of supercapacitors ageing in organic electrolyte. J .\tPower\tSources\t\n171, 1046–1053 (2007).\n19.Ga\nmby, J., Taberna, P . L., Simon, P ., Fauvarque, J. F. & Chesneau, M. Studies and characterization \nof various activated carbons used for carbon/carbon supercapacitors. J .\tPower\tSources\n101, 109–116 (2001).\n20.S\nhi, H. Activated carbons and double layer capacitance. Electrochim.\tActa 41, 1633–1639 (1995).\n21.Q\nu, D. & Shi, H. Studies of activated carbons used in double-layer capacitors. J.\tPower\tSources\n74, 99–107 (1998).\n22.Q\nu, D. Studies of the activated carbons used in double-layer supercapacitors. J.\tPower\tSources\n109, 403–411 (2002).\n23.K\nim, Y . J. et\tal. Correlation between the pore and solvated ion size on capacitance uptake of \nPVDC-based carbons. Carbon 42, 1491 (2004).\n24.I\nzutsu, K. Electrochemistry\tin\tNonaqueous\tSolution (Wiley, 2002).\n25.M\narcus, Y . Ion\tSolvation (Wiley, 1985).\n26.J\nurewicz, K. et\tal. Capacitance properties of ordered porous carbon materials prepared by a \ntemplating procedure. J .\tPhys.\tChem.\tSolids 65, 287 (2004).\n27.F\nernández, J. A. et\tal. Performance of mesoporous carbons derived from poly(vinyl alcohol) in \nelectrochemical capacitors. J .\tPower\tSources 175, 675 (2008).\n28.F\nuertes, A. B., Lota, G., Centeno, T. A. & Frackowiak, E. Templated mesoporous carbons for \nsupercapacitor application. E lectrochim.\tActa 50, 2799 (2005).\n2\n9. Salitra, G., Soffer, A., Eliad, L., Cohen, Y . & Aurbach, D. Carbon electrodes for double-layer \ncapacitors. I. Relations between ion and pore dimensions. J .\tElectrochem. \tSoc. 147, 2486–2493 (2000).\n30.V\nix-Guterl, C. et\tal. Electrochemical energy storage in ordered porous carbon materials. Carbon \n43, 1293–1302 (2005).\n31.E\nliad, L., Salitra, G., Soffer, A. & Aurbach, D. On the mechanism of selective electroadsorption of \nprotons in the pores of carbon molecular sieves. Langmuir 21, 3198–3202 (2005).\n32.E\nliad, L. et\tal. Assessing optimal pore-to-ion size relations in the design of porous poly(vinylidene \nchloride) carbons for EDL capacitors. A ppl.\tPhys.\tA 82, 607–613 (2006).\n33.A\nrulepp, M. et\tal. The advanced carbide-derived carbon based supercapacitor. J .\tPower\tSources \n162, 1460–1466 (2006).\n34.A\nrulepp, M. et\tal. Influence of the solvent properties on the characteristics of a double layer \ncapacitor. J .\tPower\tSources 133, 320–328 (2004).\n35.R\naymundo-Pinero, E., Kierzek, K., Machnikowski, J. & Beguin, F. Relationship between the \nnanoporous texture of activated carbons and their capacitance properties in different electrolytes. \nCarbon 44, 2498–2507 (2006).\n36.J\nanes, A. & Lust, E. Electrochemical characteristics of nanoporous carbide-derived carbon materials \nin various nonaqueous electrolyte solutions. J .\tElectrochem.\tSoc. 153, A113–A116 (2006).\n37.S\nhanina, B. D. et\tal. A study of nanoporous carbon obtained from ZC powders (Z = Si, Ti, and B). \nCarbon 41, 3027–3036 (2003).\n38.C\nhmiola, J., Dash, R., Yushin, G. & Gogotsi, Y . Effect of pore size and surface area of carbide derived \ncarbon on specific capacitance. J .\tPower\tSources 158, 765–772 (2006).\n39.D\nash, R. et\tal. Titanium carbide derived nanoporous carbon for energy-related applications. Carbon\n44, 2489–2497 (2006).\n40.U\nrbonaite, S. et\tal. EELS studies of carbide derived carbons. Carbon 45, 2047–2053 (2007).\n41.G\nogotsi, Y . et\tal . Nanoporous carbide-derived carbon with tunable pore size. N ature\tMater. \n2, 591–594 (2003).\n42.C\nhmiola, J. et\tal. Anomalous increase in carbon capacitance at pore size below 1 nm. Science \n313, 1760–1763 (2006).\n43.H\nuang, J. S., Sumpter, B. G. & Meunier, V . Theoretical model for nanoporous carbon supercapacitors. \nAngew.\tChem.\tInt.\tEd.\t 47, 520–524 (2008).\n44.H\nuang, J., Sumpter, B. G. & Meunier, V . A universal model for nanoporous carbon supercapacitors \napplicable to diverse pore regimes, carbons, and electrolytes. Ch em.\tEur.\tJ. 14, 6614–6626 (2008).\n45.C\nhmiola, J., Largeot, C., Taberna, P .-L., Simon, P . & Gogotsi, Y . Desolvation of ions in subnanometer \npores, its effect on capacitance and double-layer theory. A ngew.\t Chem.\t Int.\tEd. 47, 3392–3395 (2008).\n46.L\nargeot, C. et\tal. Relation between the ion size and pore size for an electric double-layer capacitor. \nJ\n.\tAm.\tChem.\tSoc.\t 130, 2730–2731 (2008).\n47.W\neigand, G., Davenport, J. W ., Gogotsi, Y . & Roberto, J. in Scientific\tImpacts\tand\tOpportunities\tfor\nComputing Ch. 5, 29–35 (DOE Office of Science, 2008). \n48.W\nu, N.-L. Nanocrystalline oxide supercapacitors. Mater.\tChem.\tPhys. 75, 6–11 (2002).\n49.B\nrousse, T. et\tal. Crystalline MnO2 as possible alternatives to amorphous compounds in \nelectrochemical supercapacitors. J .\tElectrochem.\tSoc. 153, A2171–A2180 (2006).\n50.R\nudge, A., Raistrick, I., Gottesfeld, S. & Ferraris, J. P . Conducting polymers as active materials in \nelectrochemical capacitors. J .\tPower\tSources 47, 89–107 (1994).\n51.Zh\neng, J. P . & Jow, T. R. High energy and high power density electrochemical capacitors. \nJ.P\nower\tSources 62, 155–159 (1996).\n52.L\nee, H. Y . & Goodenough, J. B. Supercapacitor behavior with KCl electrolyte. J.\tSolid\tState\tChem.\n144, 220–223 (1999).\n53.L\naforgue, A., Simon, P . & Fauvarque, J.-F. Chemical synthesis and characterization of fluorinated \npolyphenylthiophenes: application to energy storage. S ynth.\tMet. 123, 311–319 (2001).\n54.N\naoi, K., Suematsu, S. & Manago, A. Electrochemistry of poly(1,5-diaminoanthraquinone) and its \napplication in electrochemical capacitor materials. J .\tElectrochem.\tSoc. 147, 420–426 (2000).\n55.A\nrico, A. S., Bruce, P ., Scrosati, B., Tarascon, J.-M. & Schalkwijk, W . V . Nanostructured materials for \nadvanced energy conversion and storage devices. N ature\tMater. 4, 366–377 (2005).\n56.C\nhoi, D., Blomgren, G. E. & Kumta, P . N. Fast and reversible surface redox reaction in \nnanocrystalline vanadium nitride supercapacitors. A dv.\tMater. 18, 1178–1182 (2006).\n 57.M achida, K., Furuuchi, K., Min, M. & Naoi, K. Mixed proton–electron conducting nanocomposite \nbased on hydrous RuO2 and polyaniline derivatives for supercapacitors. Electrochemistry \n72, 402–404 (2004).\n58.T\noupin, M., Brousse, T. & Belanger, D. Charge storage mechanism of MnO2 electrode used in \naqueous electrochemical capacitor. Ch em.\tMater. 16, 3184–3190 (2004).\n59.S\nugimoto, W ., Iwata, H., Y asunaga, Y ., Murakami, Y . & Takasu, Y . Preparation of ruthenic \nacid nanosheets and utilization of its interlayer surface for electrochemical energy storage. \nA\nngew.\tChem.\tInt.\tEd. 42, 4092–4096 (2003).\n60.M\niller, J. M., Dunn, B., Tran, T. D. & Pekala, R. W . Deposition of ruthenium nanoparticles \non carbon aerogels for high energy density supercapacitor electrodes. J .\tElectrochem.\tSoc.\n144, L309–L311 (1997).\n61.M\nin, M., Machida, K., Jang, J. H. & Naoi, K. Hydrous RuO2/carbon black nanocomposites with \n3D porous structure by novel incipient wetness method for supercapacitors. J .\tElectrochem.\tSoc.\n153, A334–A338 (2006).\n62.W\nang, Y ., Takahashi, K., Lee, K. H. & Cao, G. Z. Nanostructured vanadium oxide electrodes for \nenhanced lithium-ion intercalation. A dv.\tFunct.\tMater. 16, 1133–1144 (2006).\n63.N\naoi, K. & Simon, P . New materials and new configurations for advanced electrochemical capacitors. \nElectrochem.\tSoc.\tInterf. 17, 34–37 (2008).\n64.F\nischer, A. E., Pettigrew, K. A., Rolison, D. R., Stroud, R. M. & Long, J. W . Incorporation of \nhomogeneous, nanoscale MnO2 within ultraporous carbon structures via self-limiting electroless \ndeposition: Implications for electrochemical capacitors. N ano\tLett. 7, 281–286 (2007).\n65.K\nazaryan, S. A., Razumov, S. N., Litvinenko, S. V ., Kharisov, G. G. & Kogan, V . I. Mathematical \nmodel of heterogeneous electrochemical capacitors and calculation of their parameters. J\n.\tElectrochem.\tSoc. 153, A1655–A1671 (2006).\n66.A\nmatucci, G. G., Badway, F. & DuPasquier, A. in Intercalation\tCompounds\tfor\tBattery\tMaterials \n(ECS\tP roc. Vol. 99) 344–359 (Electrochemical Society, 2000).\n67.B\nurke, A. R&D considerations for the performance and application of electrochemical capacitors. \nElectrochim.\tActa 53, 1083–1091 (2007).\n68.P\nortet, C., Taberna, P . L., Simon, P . & Laberty-Robert, C. Modification of Al current collector \nsurface by sol-gel deposit for carbon-carbon supercapacitor applications. E lectrochim.\tActa\n49, 905–912 (2004).\n69.T\nalapatra, S. et\tal. Direct growth of aligned carbon nanotubes on bulk metals. N ature\tNanotech. \n1, 112–116 (2006).7\n0. Taberna, L., Mitra, S., Poizot, P ., Simon, P . & Tarascon, J. M. High rate capabilities Fe3O4-\nbased Cu nano-architectured electrodes for lithium-ion battery applications. N ature \tMater.\n5, 567–573 (2006).\n71.J\nang, J. H., Machida, K., Kim, Y . & Naoi, K. Electrophoretic deposition (EPD) of hydrous ruthenium \noxides with PTFE and their supercapacitor performances. E lectrochim.\tActa. 52, 1733 (2006).\n72.C\nambaz, Z. G., Yushin, G., Osswald, S., Mochalin, V . & Gogotsi, Y . Noncatalytic synthesis of carbon \nnanotubes, graphene and graphite on SiC. Carbon 46, 841–849 (2008).\n73.T\nsuda, T. & Hussey, C. L. Electrochemical applications of room-temperature ionic liquids. \nElectrochem.\tSoc.\tInterf. 16, 42–49 (2007).\n74.B\nalducci, A. et\tal. High temperature carbon–carbon supercapacitor using ionic liquid as electrolyte. \nJ\n.P\nower\tSources 165, 922–927 (2007).\n75.B\nalducci, A. et\tal. Cycling stability of a hybrid activated carbon//poly(3-methylthiophene) \nsupercapacitor with N-butyl-N-methylpyrrolidinium bis(trifluoromethanesulfonyl)imide ionic liquid as electrolyte. E\n lectrochim.\tActa 50, 2233–2237 (2005).\n76.B\nalducci, A., Soavi, F. & Mastragostino, M. The use of ionic liquids as solvent-free green electrolytes \nfor hybrid supercapacitors. A ppl.\tPhys.\tA 82, 627–632 (2006).\n7\n7. Endres, F ., MacFarlane, D. & Abbott, A. (eds) Electrodeposition \tfrom \tIonic \tLiquids \n(Wiley-VCH, 2008).\n78.F\naggioli, E. et\tal. Supercapacitors for the energy management of electric vehicles. J .\tPower\tSources\n84, 261–269 (1999).\n79.C\nhmiola, J. & Gogotsi, Y . Supercapacitors as advanced energy storage devices. Nanotechnol.\tLaw\tBus.\n4, 577–584 (2007).\n80.P\nortet, C., Yushin, G. & Gogotsi, Y . Electrochemical performance of carbon onions, nanodiamonds, \ncarbon black and multiwalled nanotubes in electrical double layer capacitors. Carbon \n45, 2511–2518 (2007).\nacknowledgements\nWe thank our students and collaborators, including J. Chmiola, C. Portet, R. Dash and G. Yushin (Drexel University), P . L. Taberna and C. Largeot (Université Paul Sabatier), and J. E. Fischer (University \nof Pennsylvania) for experimental help and discussions, H. Burnside (Drexel University) for editing the manuscript and S. Cassou (T oulouse) for help with illustrations. This work was partially funded through the Department of Energy, Office of Basic Energy Science, grant DE-FG01-05ER05-01, and through the Délégation Générale pour l’Armement.\n" -} -{ - "pm_id": "https://pubmed.ncbi.nlm.nih.gov/15641418", - "pdf_text": "Behavior Research Methods, Instruments, & Computers\n2004, 36 (4), 717-731\nPsychologists often conduct research to establish\nwhether and to what extent one variable affects another.\nHowever, the discovery that two variables are related toeach other is only one small part of the aim of psychol-ogy. Deeper understanding is gained when we compre-hend the process that produces the effect. For example, it\nmight be useful to know whether a management trainingprogram leads to an increase in employee satisfaction byaffecting employee attitudes toward management or bychanging behavioral habits. In this example, attitudesand habits are potential mediators of the relationship be-tween the management training program and employeesatisfaction.\nA variable may be called a mediator “to the extent that\nit accounts for the relation between the predictor and thecriterion” (Baron & Kenny, 1986, p. 1176).\n1 Panel A of\nFigure 1 represents the effect of some proposed cause(X) on some outcome (Y ). Panel B of Figure 1 represents\nthe simplest form of mediation—the type that occurswhen one variable (M ) mediates the effect of X on Y. We\nterm this model simple mediation. More complex medi-ation models are possible, but we limit our discussionhere to simple mediation because it is by far the most com-monly employed type of mediation model.The simple relationship between X and Yis often re-\nferred to as the total effect of Xon Y(see Figure 1,\npanel A); we denote the total effect c to distinguish it\nfrom c¢, the direct effect of Xon Yafter controlling for M\n(see Figure 1, panel B). The formal heuristic analysisoften used to detect simple mediation effects is straight-forward and follows directly from the definition of a me-diator provided by Baron and Kenny (1986). Variable Mis considered a mediator if (1) Xsignificantly predicts Y\n(i.e., c/HS110050 in Figure 1), (2) X significantly predicts M\n(i.e., a/HS110050 in Figure 1), and (3) Msignificantly predicts\nYcontrolling for X (i.e., b/HS110050 in Figure 1). Baron and\nKenny discuss several analyses that should be performedand the results assessed with respect to the criteria justdescribed. These criteria are assessed by estimating thefollowing equations:\nwhere iis an intercept coefficient. When the effect of X on\nYdecreases to zero with the inclusion of M, perfect medi-\nation is said to have occurred (James & Brett, 1984, call\nthis situation complete mediation). When the effect of X\non Ydecreases by a nontrivial amount, but not to zero, par-\ntial mediation is said to have occurred.\n2In addition to sat-\nisfying these requirements, two further assumptions mustbe met in order to claim that mediation has occurred, ac-cording to Baron and Kenny; namely, there should be nomeasurement error in M, and Y should not cause M. Theˆ ()\nˆ ()\nˆ ()Yi c X\nMi a X\nYi c Xb M=+\n=+\n=+ ¢+1\n2\n31\n2\n3\n717 Copyright 2004 Psychonomic Society, Inc.We thank Nancy Briggs, Donna Coffman, Jinyan Fan, and Robert\nMacCallum for helpful comments. Correspondence regarding this arti-\ncle should be addressed to K. J. Preacher, Department of Psychology,CB #3270 Davie Hall, University of North Carolina, Chapel Hill, NC27599-3270 (e-mail: preacher@unc.edu).SPSS and SAS procedures for estimating indirect\neffects in simple mediation models\nKRISTOPHER J. PREACHER\nUniversity of North Carolina, Chapel Hill, North Carolina\nand\nANDREW F. HAYES\nOhio State University, Columbus, Ohio\nResearchers often conduct mediation analysis in order to indirectly assess the effect of a proposed\ncause on some outcome through a proposed mediator. The utility of mediation analysis stems from its\nability to go beyond the merely descriptive to a more functional understanding of the relationshipsamong variables. A necessary component of mediation is a statistically and practically significant in-direct effect. Although mediation hypotheses are frequently explored in psychological research, formalsignificance tests of indirect effects are rarely conducted. After a brief overview of mediation, we arguethe importance of directly testing the significance of indirect effects and provide SPSS and SAS macrosthat facilitate estimation of the indirect effect with a normal theory approach and a bootstrap approachto obtaining confidence intervals, as well as the traditional approach advocated by Baron and Kenny(1986). We hope that this discussion and the macros will enhance the frequency of formal mediationtests in the psychology literature. Electronic copies of these macros may be downloaded from the Psy-chonomic Society’s Web archive at www.psychonomic.org/archive/. \n(1) 718 PREACHER AND HA YES\nfirst of these assumptions is routinely violated, but that is\nnot the focus of our discussion here. At the end of this ar-ticle, we shall emphasize that, ultimately, the validity ofone’ s conclusions about mediation is determined by thedesign of the study as much as by statistical criteria.\nMediation hypotheses are frequently tested in both\nbasic and applied psychological research, and mediationanalyses are most often guided by the procedures out-lined by Baron and Kenny (1986). For example, in an in-formal content analysis of the 2000, 2001, and 2002 is-sues of the Journal of Applied Psychology, we found that22% of the articles reported an analysis focused on me-diation, and the overwhelming majority of them werebased on the Baron and Kenny procedure. We believe thisto be fairly representative of the major journals in psy-chology, not only with respect to the frequency of medi-ation hypotheses but also to the use of the Baron andKenny criteria for assessing mediation. Indeed, theirpaper is one of the most frequently cited in the modernpsychological literature, with nearly 5,300 citations as ofSeptember 2004, according to the Science Citation Index .\nThere are more statistically rigorous methods by\nwhich mediation hypotheses may be assessed. Baron andKenny (1986) describe a procedure developed by Sobel(1982; hereafter referred to as the Sobel test) that pro-\nvides a more direct test of an indirect effect. In the caseof simple mediation, the Sobel test is conducted by com-paring the strength of the indirect effect of Xon Yto the\npoint null hypothesis that it equals zero. The indirect ef-fectof Xon Yin this situation is defined as the product\nof the X ÆM path (a ) and the M ÆY path (b ), or ab . In\nmost situations,\n3ab/H11005(c/H11002c¢), where c is the simple\n(i.e., total) effect of Xon Y, not controlling for M , and c¢\nis the XÆ Ypath coefficient after the addition of M to themodel (see Figure 1). Standard errors of a and bare rep-\nresented, respectively, by saand sb. The standard error of\nthe indirect effect (sab) is given by Aroian (1944), Mood,\nGraybill, and Boes (1974), and Sobel (1982) as\n(4)\nIn order to conduct the test, abis divided by sabto yield\na critical ratio that is traditionally compared with thecritical value from the standard normal distribution ap-propriate for a given alpha level. One of the assumptionsnecessary for the Sobel test is that the sample size islarge, so the rough critical value for the two-tailed ver-sion of the test, assuming that the sampling distributionof abis normal and that \na/H11005.05, is /H110061.96. As sample\nsize becomes smaller, the Sobel test becomes less con-servative. One variation of the Sobel test subtracts thelast term of the standard error ( s\n2as2bin Equation 4) rather\nthan adds it (Goodman, 1960). Another version omitss\n2as2baltogether because it is likely to be trivial (Baron &\nKenny, 1986; Goodman, 1960; MacKinnon & Dwyer,1993; MacKinnon, Warsi, & Dwyer, 1995; Sobel, 1982).Sobel (1982) describes a general procedure wherebymore complicated indirect effects may be tested. The util-ity and performance of the Sobel test has been discussedand demonstrated frequently (Hoyle & Kenny, 1999;MacKinnon, 1994; MacKinnon & Dwyer, 1993; Mac-Kinnon et al., 2001; MacKinnon, et al., 1995; Stone &Sobel, 1990). MacKinnon, Lockwood, Hoffman, West,and Sheets (2002), in their comparison of 14 methods of\nassessing mediation effects, settle on the Sobel test (andits variants) as superior in terms of power and intuitiveappeal. But, as we discuss below, there is reason to besuspicious of the use of the normal distribution for com-sb s a s s sab a b a b = ++22 22 22.Figure 1. Panel A: Illustration of a direct effect. Xaffects Y. Panel B:\nIllustration of a mediation design. X affects Yindirectly through M .\n TESTING INDIRECT EFFECTS 719\nputing the p value for the Sobel test because the sam-\npling distribution of abmay not be normal.\nCuriously, the Sobel test is discussed, with requisite\nformulas, by Baron and Kenny (1986), but it is rarely\nused in practice (cf. MacKinnon et al., 2002). We cannotsay for sure why the significance of the indirect effect israrely tested formally, but at least two possibilities sug-gest themselves. First, the statistical significance of thedifference between the total effect ( c) and the direct ef-\nfect (c ¢) of X on Yis not formally stated by Baron and\nKenny as a requirement for mediation. Instead, Baronand Kenny simply state that perfect mediation has oc-curred if c ¢becomes nonsignificant after controlling for\nM, so researchers have focused on that requirement. Sec-\nond, whereas most popular programs used for regression(such as SPSS and SAS) will conduct all the tests re-quired to establish mediation according to the Baron andKenny criteria, few of the commonly used programs con-duct a test of the null hypothesis that the indirect effect(c/H11002c¢) is 0 (or, equivalently, that ab/H110050). Although\nthese programs provide all the information needed forthe researcher to conduct the Sobel test manually, someextra hand computation is required, and researchers sim-ply may not see the point in bothering with those com-putations given that the significance of the indirect effectis not listed by Baron and Kenny as one of the criteria forestablishing mediation.\nBefore we proceed, it is important to clarify a poten-\ntially confusing point. Although the terms mediated ef-\nfects and indirect effects are sometimes used interchange-\nably, an important distinction should be drawn betweenthem in general (Holmbeck, 1997). A mediated effect isusually thought of as the special case of indirect effectswhen there is only one intervening variable. However, aconclusion that a mediation effect is present implies that\nthe total effect X ÆYwas present initially. There is no such\nassumption in the assessment of indirect effects. It is quite\npossible to find that an indirect effect is significant evenwhen there is no evidence for a significant total effect.Whether or not the effect also represents mediation shouldbe judged through examination of the total effect. For con-trasting views on the requirement that XÆYbe signifi-\ncant, see Collins, Graham, and Flaherty (1998), MacKin-non (2000), and Shrout and Bolger (2002).\nIn the remainder of this article, we provide arguments\nfavoring estimation of the indirect effect of Xon Ythrough\nMand end with a description of an SPSS macro that will\nformally test the significance of the indirect effect bothparametrically and nonparametrically, while simultane-ously providing the output relevant to assessing mediationwith the Baron and Kenny criteria, all in a few lines of out-put. An equivalent SAS version of the macro is also pro-vided. We hope that access to these macros will make itmore likely that researchers will include a formal test ofthe indirect effect as part of simple mediation analyses.\nThe Need for a Formal Test\nGiven that Baron and Kenny (1986) provide a concep-\ntually appealing recipe to follow in order to determine thepresence or absence of a mediation effect, one may wellask why it is necessary to perform a formal significancetest of the indirect effect if the Baron and Kenny criteriahave been met. Two broad benefits of formal testing maybe suggested. First, there are shortcomings inherent in theBaron and Kenny method. For example, Holmbeck (2002)points out that it is possible to observe a change from asignificant XÆYpath to a nonsignificant XÆYpath upon\nthe addition of a mediator to the model with a very smallchange in the absolute size of the coefficient. This patternof results may lead a researcher to erroneously concludethat a mediation effect is present (Type I error). Con-versely, it is possible to observe a large change in theXÆYpath upon the addition of a mediator to the model\nwithout observing an appreciable drop in statistical sig-nificance (Type II error). The latter situation is especiallylikely to occur when large samples are employed becausethose are the conditions under which even small regres-sion weights may remain statistically significant. Finally,it is possible for a Type I error about mediation to occurif either a or bappears to be statistically different from\nzero when one of them is in fact zero in the population. AType I error in the test of either a or b(or both) could lead\nto an incorrect conclusion about mediation.\nSecond, testing the hypothesis of no difference between\nthe total effect (c ) and the direct effect (c ¢) more directly\naddresses the mediation hypothesis than does the series ofregression analyses recommended by Baron and Kenny(1986). In the case of simple mediation, the indirect effectof Xon Ythrough Mis measured as the product of the\nXÆMand MÆYpaths (ab ), which is equivalent to ( c \n/H11002c¢)\nin most situations. Therefore, a significance test associatedwith abshould address mediation more directly than a se-\nries of separate significance tests not directly involving ab.\nIn addition, it has been found that the method de-\nscribed by Baron and Kenny (1986) suffers from low sta-tistical power in most situations (MacKinnon et al.,2002). Intuition suggests that this may be the result of therequirement that both the a and bcoefficients be statisti-\ncally significant, according to the Baron and Kenny cri-teria. Especially in small samples, it is possible that eitherthe aor the b coefficient (or both) may be nonsignificant\nonly because of low statistical power. If either of these pa-rameters fails to meet the Baron and Kenny criteria eventhough they are in fact nonzero in the population, the in-vestigator cannot claim mediation by the Baron andKenny criteria, and thus a Type II error results. In con-trast, testing the null hypothesis that ( c/H11002c¢)/H110050 requires\none fewer hypothesis test, and thus a Type II error in thetesting of mediation would be less likely. Indeed, jointsignificance tests involving the product of coefficientssuch as the Sobel test have been found to have greaterstatistical power than that of other formal methods of as-sessing mediation, including the Baron and Kenny ap-proach (MacKinnon et al., 2002). Thus, a more powerfulstrategy for testing mediation may be to require only(1) that there exists an effect to be mediated (i.e., c/HS110050)\nand (2) that the indirect effect be statistically significantin the direction predicted by the mediation hypothesis. 720 PREACHER AND HA YES\nEstimating the Size and Significance of the\nIndirect Effect\nAppendixes A and B contain macros for SPSS and\nSAS that provide a test of the indirect effect using theSobel test (the version that uses the standard error inEquation 4) as well as a version that relies on a nonpara-metric bootstrapping procedure. The macros also pro-vide all the output that one needs in order to assess me-diation using the Baron and Kenny (1986) criteria. SPSSand SAS are very widely used throughout the social sci-ences, and in psychology in particular, and we hope thatthese macros will increase the likelihood that researcherswill conduct a formal test of the significance of the in-direct effect in simple mediation models. A macro is aprogram that will run when a shortcut command is givento execute it. Rather than running the entire program foreach analysis, the program simply needs to be “acti-vated” by running it once or requesting that it be exe-cuted first in a batch using the\nINCLUDE command (see\nthe SPSS or SAS manuals for guidance on the use of the\nINCLUDE command). The user needs to run the macro only\nonce when SPSS or SAS is first executed; the macro willremain active until the user quits the program. The use ofthe macros is documented in the appendixes, and elec-tronic copies of the macros themselves can be obtained athttp://www.comm.ohio-state.edu/ahayes/sobel.htm.\nSuppose an investigator is interested in the effects of\na new cognitive therapy on life satisfaction after retire-ment. Residents of a retirement home diagnosed as clin-ically depressed are randomly assigned to receive 10 ses-sions of a new cognitive therapy (X /H110051) or 10 sessions\nof an alternative therapeutic method ( X/H110050). After Ses-\nsion 8, the positivity of the attributions the residentsmake for a recent failure experience is assessed ( M). Fi-\nnally, at the end of Session 10, the residents are given ameasure of life satisfaction ( Y). The question is whether\nthe cognitive therapy’ s effect on life satisfaction is me-diated by the positivity of their causal attributions ofnegative experiences.\nOutput of the SPSS version of the macro is displayed\nin the top half of Figure 2 using hypothetical data.\n4In\naccordance with the instructions in Appendix A, thisoutput was generated with the following command:\nsobel y/H11005satis / x/H11005therapy / m/H11005attrib / boot/H110055000.\nThe macros provide unstandardized coefficients for re-gression Equations 1–3 given above and discussed byBaron and Kenny (1986) as required to test mediation.The rows of output are interpreted as follows: b (YX) is\nthe total effect of the independent variable Xon the de-\npendent variable Y (cin Figure 1). This effect is statisti-\ncally different from zero in this example; residents whoreceived the cognitive therapy felt more satisfied withlife after 10 sessions than did those who did not receivethe therapy. The next row of the output, b (MX), is the ef-\nfect of the independent variable on the proposed media-tor M(ain Figure 1), also statistically different from\nzero; residents who received the cognitive therapy mademore positive attributions for a recent failure experience.The third row of the output, b(YM.X ), is the effect of the\nmediator on the dependent variable, controlling for the in-dependent variable (b in Figure 1). Residents who made\nmore positive attributions for a prior failure tended to bemore satisfied with life, even after controlling for whetheror not they received the therapy. Finally, b(YX.M) is the di-\nrect effect of the independent variable on the dependentvariable, controlling for the mediator ( c¢in Figure 1). This\neffect is not statistically different from zero, indicating norelationship between method of therapy and life satisfac-tion after controlling for the positivity of attributions forfailure. In this example, all of Baron and Kenny’ s criteriafor mediation are established, and the evidence is that pos-itivity of attributions completely mediates the effect ofcognitive therapy on life satisfaction.\nThe output also contains the estimate of the indirect\neffect of X on Ythrough M. In this example, the indi-\nrect effect is 0.3306, which is both abin Figure 1, or\nb(MX)/H11080b(YM.X) from the output, as well as c\n/H11002c¢in Fig-\nure 1, or b (YX) /H11002b(YX.M) from the output. A formal two-\ntailed test of the significance of this indirect effect follows,based on the assumption that the ratio of the indirect effectto its standard error is normal. From Equation 4, the stan-dard error is estimated as shown in the equation at the bot-tom of this page. The macro also produces a 95% confi-dence interval\n5for the size of the indirect effect, again on\nthe assumption that the sampling distribution of the ef-fect is normal. Whereas the procedure outlined by Baronand Kenny involves combining the results of several hy-pothesis tests, the Sobel test directly addresses the pri-mary question of interest—whether or not the total ef-fect of X on Yis significantly reduced upon the addition\nof a mediator to the model.\nThe Sobel test contradicts the Baron and Kenny (1986)\nstrategy and suggests no mediation (z /H110051.67, p/H11022.05).\nHowever, there is reason to be suspicious of the results ofthe Sobel test in this case. As alluded to earlier, the two-tailed pvalue [under “Sig(two)” in the output] is based on\nthe assumption that the distribution of ab(or c\n/H11002c¢) fol-\nlows a normal distribution under the null hypothesis. Butthis assumption has been seriousl y questioned. Not only\nis the distribution not necessarily normal, often it is noteven symmetrical, especially in small samples (Bollen &Stine, 1990). Because the distribution of products is usu-ally positively skewed, the symmetric confidence intervalbased on the assumption of normality will typically yieldunderpowered tests of mediation. As a consequence ofthese problems, MacKinnon et al. (2002) argue against theuse of the normal distribution for assessing significance\nsab= ++ = (. )(. ) (. )(. ) (. )(. ) . .0 4039 0 2990 0 8186 0 1808 0 2990 0 1808 0 198522 22 22 TESTING INDIRECT EFFECTS 721\nand instead suggest either comparing the obtained product\nwith a table of critical values established through simula-tion research, or using an alter native approach that also\nrequires a table of critical values in order to assess sig-nificance. Unfortunately, those tables and the researchon which they are based are not published in a sourceconveniently available to many (Meeker, Cornwell, &Aroian, 1981).\nAn alternative approach is to bootstrap the sampling\ndistribution of aband derive a confidence interval with the\nempirically derived bootstrapped sampling distribution.Bootstrapping is a nonparametric approach to effect-sizeFigure 2. SPSS macro output and a graphical depiction of the bootstrapped sampling distribution of the\nindirect effect.\n 722 PREACHER AND HA YES\nestimation and hypothesis testing that makes no assump-\ntions about the shape of the distributions of the variables orthe sampling distribution of the statistic (see, e.g., Efron &Tibshirani, 1993; Mooney & Duval, 1993). This approachhas been suggested by others as a way of circumventingthe power problem introduced by as ymmetries and other\nforms of nonnormality in the sampling distribution of ab\n(Bollen & Stine, 1990; Lockwood & MacKinnon, 1998;Shrout & Bolger, 2002). It also produces a test that is notbased on large-sample theory, meaning it can be appliedto small samples with more confidence. The macrosprovide a bootstrap estimate of the indirect effect ab , an\nestimated standard error, and both 95% and 99% confi-dence intervals for population value of ab . The bootstrap-\nping is accomplished by taking a large number of samplesof size n (where nis the original sample size) from the\ndata, sampling with replacement, and computing the indi-\nrect effect, ab, in each sample. Assume for the sake of il-\nlustration that 1,000 bootstrap samples have been re-\nquested. The point estimate of ab is simply the mean ab\ncomputed over the 1,000 samples, and the estimatedstandard error is the standard deviation of the 1,000 abestimates. To derive the 95% confidence interval, the el-ements of the vector of 1,000 estimates of ab are sorted\nfrom low to high. The lower limit of the confidence in-terval is defined as the 25th score in this sorted distrib-ution, and the upper limit is defined as the 976th scorein the distribution. Using the same logic, the upper andlower bounds of a 99% confidence interval correspondto the 5th and 996th scores in the sorted distribution of1,000 estimates, respectively.\nAs can be seen in the output, the bootstrapped esti-\nmate of the indirect effect is similar to the point estimatecomputed from the conventional regression analysis ofthe raw data, and the true indirect effect is estimated tolie between 0.0334 and 0.7008 with 95% confidence.Because zero is not in the 95% confidence interval, wecan conclude that the indirect effect is indeed signifi-cantly different from zero at p /H11021.05 (two tailed). Ob-\nserve the slight asymmetry in the confidence interval,evidenced by the fact that the upper and lower bounds ofthe confidence interval are not equidistant from the pointestimate. Through a modification of the macro as de-scribed in Appendix A, the bootstrap estimates of ab can\nbe output as a new data file, and the distribution of theestimates then depicted graphically, as in the lower halfof Figure 2. The asymmetry of the sampling distributionof the indirect effect is evident visually, and a formal testthat skew /H110050 can be rejected in this case. Bollen and\nStine (1990) and Shrout and Bolger (2002) provide otherexamples with real and simulated data sets illustratingthat the sampling distribution of the indirect effect is notalways symmetrical or normal.\nA second example illustrates the distinction between\nmediation and indirect effects, and how different analyticalstrategies can produce different results. Suppose patientswith Alzheimer’ s disease are randomly assigned to receivea drug (X /H110051) or placebo (X /H110050) that purportedly can in-crease a patient’ s long-term memory ( Y) through its effect\non the rate of neural regeneration (M ). Figure 3 displays\nthe output from the SPSS macro. As can be seen, neural re-generation could not possibly be a mediator of the drug’ seffect by the Baron and Kenny (1986) criteria, because thedrug has no initial direct effect on memory. However, thereis evidence that the drug does have an indirect effect on\nmemory, with the effect occurring through neural regener-ation. The positive, albeit nonsignificant, relationship be-tween receipt of the drug and memory ( c/H110050.27) is smaller\nafter controlling for rate of neural regeneration (c ¢/H11005\n/H110020.02). The bootstrap output shows that the indirect effect\nis different from zero with 95% confidence, but the Sobeltest (which incorrectly assumes normality of ab) does not.\nThe lower half of Figure 3 clearly shows that the assump-tion of normality of the sampling distribution is unwar-ranted. Indeed, a formal statistical test of the normality ofthe sampling distribution estimated with the bootstrapleads to a rejection of that assumption.\nDISCUSSION\nOur discussion and the macros presented here apply\nonly to the case of the simple mediation model, depicted inFigure 1, panel B. Many extensions to the simple media-tion model are, of course, possible, as noted earlier. Mac-Kinnon, Krull, and Lockwood (2000) have demonstratedthat mediation, suppression, and confounding effects aremathematically equivalent, although they are assessed bylooking for different patterns of relationships among vari-ables. Given this equivalence, the method and macros de-scribed here for the determination of mediation effects mayalso be useful in the context of determining the presenceand strength of suppression or confounding effects.\nIn addition, it has been recommended that structural\nequation modeling (SEM) be considered for assessingmediation because it offers a reasonable way to controlfor measurement error as well as some interesting alter-native ways to explore the mediation effect (Baron &Kenny, 1986; Holmbeck, 1997; Hoyle & Kenny, 1999;Judd & Kenny, 1981; Kline, 1998). Models involving la-tent variables with multiple measured indicators inher-ently correct for measurement error by estimating commonand unique variance separately. This, in turn, increases thelikelihood that indirect effects, if present, will be discov-ered. More complicated mediation models, such as thosewith several mediators linked serially or operating in par-allel (or both), can be explored in the context of SEM withany combination of latent or measured variables. Thenormal theory approach developed by Sobel (1982) hasbeen incorporated in popular SEM software applicationssuch as LISREL (Jöreskog & Sörbom, 1996) and EQS(Bentler, 1997), and it is discussed in the context of SEMby Bollen (1987) and Brown (1997). A bootstrapping ap-proach to assessing indirect effects is implemented in thecurrent version of AMOS (Arbuckle & Wothke, 1999).In addition, Shrout and Bolger (2002) provide syntaxthat enables EQS to conduct tests of indirect effects with TESTING INDIRECT EFFECTS 723\nbootstrapping. The macros described in this article bring\nthis method of analysis to users of SPSS and SAS with asimple command, but researchers should be aware thatoptions exist for exploring mediation in more complexmodels.\nFinally, it is important to emphasize that finding a sta-\ntistically significant indirect effect supportive of media-tion does not prove the pattern of causation shown inpanel B of Figure 1. For example, a model similar to thatin panel B, but with the apath reversed in direction, may\nbe theoretically equally reasonable to specify. The twomodels could be distinguished only on the basis of thecausal priority of X and M. This causal priority may be\nestablished in a number of ways, such as (1) manipulat-Figure 3. SPSS macro output showing an indirect effect without satisfying the Baron and Kenny crite-\nria for mediation.\n 724 PREACHER AND HA YES\ning Xand measuring M , (2) measuring X before M, al-\nlowing enough time for Xto exert an effect on M , or\n(3) arguing on the basis of theory or prior research that X\nis always causally prior to M . As in almost any scientific\nundertaking, the results of a statistical analysis can onlydisprove or lend support to a hypothesis, never prove it.\nREFERENCES\nAmerican Psychological Association (2001). Publication manual\nof the American Psychological Association (5th Ed.). Washington,\nDC: Author.\nArbuckle, J. L., & Wothke, W. (1999). AMOS 4.0 user’ s guide . Chicago:\nSPSS.\nAroian, L. A. (1944). The probability function of the product of two\nnormally distributed variables. Annals of Mathematical Statistics , 18,\n265-271.\nBaron, R. M., & Kenny, D. A. (1986). The moderator–mediator vari-\nable distinction in social psychological research: Conceptual, strate-\ngic, and statistical considerations. Journal of Personality & Social\nPsychology, 51, 1173-1182.\nBentler, P. M. (1997). EQS for Windows (Version 5.6) [Computer soft-\nware]. Encino, CA: Multivariate Software.\nBollen, K. A. (1987). Total, direct, and indirect effects in structural\nequation models. In C. C. Clogg (Ed.), Sociological methodology 1987\n(pp. 37-69). Washington, DC: American Sociological Association.\nBollen, K. A., & Stine, R. (1990). Direct and indirect effects: Classi-\ncal and bootstrap estimates of variability. Sociological Methodology,\n20, 115-140.\nBrown, R. L. (1997). Assessing specific mediational effects in com-\nplex theoretical models. Structural Equation Modeling, 4, 142-156.\nCollins, L. M., Graham, J. W., & Flaherty, B. P. (1998). An alter-\nnative framework for defining mediation. Multivariate Behavioral\nResearch, 33, 295-312.\nEfron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap .\nBoca Raton, FL: Chapman & Hall.\nGoodman, L. A. (1960). On the exact variance of products. Journal of\nthe American Statistical Association, 55, 708-713.\nHolmbeck, G. N. (1997). Toward terminological, conceptual, and sta-\ntistical clarity in the study of mediators and moderators: Examplesfrom the child-clinical and pediatric psychology literatures. Journalof Consulting & Clinical Psychology , 65, 599-610.\nHolmbeck, G. N. (2002). Post-hoc probing of significant moderational\nand mediational effects in studies of pediatric populations. Journal ofPediatric Psychology, 27, 87-96.\nHoyle, R. H., & Kenny, D. A. (1999). Sample size, reliability, and tests\nof statistical mediation. In R. Hoyle (Ed.), Statistical strategies for\nsmall sample research (pp. 195-222). Thousand Oaks, CA: Sage.\nJames, L. R., & Brett, J. M. (1984). Mediators, moderators, and tests\nfor mediation. Journal of Applied Psychology, 69, 307-321.\nJöreskog, K. G., & Sörbom, D. (1996). LISREL 8 user’ s reference\nguide. Uppsala, Sweden: Scientific Software International.\nJudd, C. M., & Kenny, D. A. (1981). Process analysis: Estimating me-\ndiation in treatment evaluations. Evaluation Review, 5, 602-619.\nKline, R. B. (1998). Principles and practice of structural equation\nmodeling. New Y ork: Guilford.\nLockwood, C. M., & MacKinnon, D. P. (1998). Bootstrapping the\nstandard error of the mediated effect. Proceedings of the 23rd annual\nmeeting of SAS Users Group International (pp. 997-1002). Cary, NC:\nSAS Institute.\nMacKinnon, D. P. (1994). Analysis of mediating variables in preven-\ntion and intervention research. In A. Cazares and L. A. Beatty, Sci-\nentific methods for prevention intervention research (NIDA Research\nMonograph 139. DHHS Pub. No. 94-3631, pp. 127-153). Washing-ton, DC: U.S. Government Printing Office.\nMacKinnon, D. P. (2000). Contrasts in multiple mediator models. In\nJ. S. Rose, L. Chassin, C. C. Presson, & S. J. Sherman (Eds.), Multi-variate applications in substance use research (pp. 141-160). Mah-\nwah, NJ: Erlbaum.\nMacKinnon, D. P., & Dwyer, J. H. (1993). Estimating mediated ef-\nfects in prevention studies. Evaluation Review, 17, 144-158.\nMacKinnon, D. P., Goldberg, L., Clarke, G. N., Elliot, D. L.,\nCheong, J., Lapin, A., Moe, E., & Krull, J. L. (2001). Mediating\nmechanisms in a program to reduce intentions to use anabolic steroidsand improve exercise self-efficacy and dietary behavior. Prevention Sci-\nence, 2, 15-28.\nMacKinnon, D. P., Krull, J. L., & Lockwood, C. M. (2000). Equiv-\nalence of the mediation, confounding and suppression effect. Pre-\nvention Science, 1, 173-181.\nMacKinnon, D. P., Lockwood, C. M., Hoffman, J. M., West, S. G.,\n& Sheets, V . (2002). A comparison of methods to test mediation andother intervening variable effects. Psychological Methods, 7, 83-104.\nMacKinnon, D. P., Warsi, G., & Dwyer, J. H. (1995). A simulation\nstudy of mediated effect measures. Multivariate Behavioral Research,30, 41-62.\nMeeker, W. Q., Cornwell, L. W., & Aroian, L. A. (1981). Selected\ntables in mathematical statistics: V ol. VII. The product of two nor-mally distributed random variables . Providence, RI: American Math-\nematical Society.\nMood, A., Graybill, F. A., & Boes, D. C. (1974). Introduction to the\ntheory of statistics. New Y ork: McGraw-Hill.\nMooney, C. Z., & Duval, R. D. (1993). Bootstrapping: A nonpara-\nmetric approach to statistical inference. Newbury Park, CA: Sage.\nRozeboom, W. W. (1956). Mediation variables in scientific theory. Psy-\nchological Review, 63, 249-264.\nShrout, P. E., & Bolger, N. (2002). Mediation in experimental and\nnonexperimental studies: New procedures and recommendations.Psychological Methods, 7, 422-445.\nSobel, M. E. (1982). Asymptotic confidence intervals for indirect ef-\nfects in structural equation models. In S. Leinhart (Ed.), Sociologi-cal methodology 1982 (pp. 290-312). San Francisco: Jossey-Bass.\nStone, C. A., & Sobel, M. E. (1990). The robustness of estimates of\ntotal indirect effects in covariance structure models estimated bymaximum likelihood. Psychometrika, 55, 337-352.\nWilkinson, L., & APA Task Force on Statistical Inference (1999).\nStatistical methods in psychology journals: Guidelines and explana-tions. American Psychologist, 54, 594-604.\nNOTES\n1. Rozeboom (1956) coined the term mediation to describe a partic-\nular pattern of linear prediction among measured variables, but Juddand Kenny (1981) and Baron and Kenny (1986) are mainly responsiblefor popularizing mediation models in psychology.\n2. We regard the determination of complete versus partial mediation\nas relying on the pattern of observed coefficients in Figure 1, panel B.Another way to test a complete mediation model is to estimate themodel XÆM ÆYas a structural equation model, constraining the X ÆY\npath to zero. If the \nc2statistic is significant, then constraining the X ÆY\npath to zero is regarded as unreasonable given the data, ruling out thepossibility of complete mediation by Baron and Kenny’ s criteria.\n3. It is straightforward to show how ab/H11005(c/H11002c¢). MacKinnon,\nWarsi, and Dwyer (1995) provide the following simple proof:\nTherefore, ab/H11005(c/H11002c ¢), and a test of the significance of the former is\nequivalent to a test of the latter. The only assumptions necessary for thiscYX\nX\ncX b X M\nX\ncbXM\nX\ncc a b=\n=¢ +\n= ¢+È\nÎ͢\n˚˙\n= ¢+Cov ( , \nVar\nVar Cov\nVar\nCov\nVar)\n()\n() (, )\n()\n(, )\n()\n. TESTING INDIRECT EFFECTS 725\nequality to hold are that there are no missing data and the model is a sat-\nurated simple mediation model, such as can be specified in linear re-gression, path analysis, and structural equation modeling. If there aremissing data, or if parameters are estimated in a multilevel modelingcontext (for example), this equality no longer holds, although ab is\nlikely to be quite close to (c /H11002c ¢).\n4. The data set used in this example is available at http://www.\ncomm.ohio-state.edu/ahayes/sobel.htm.\n5. In line with recommendations of the APA Task Force on Statistical\nInference (Wilkinson & the APA Task Force, 1999) and recommenda-tions in the latest APA Publication Manual (American Psychological As-sociation, 2001), we strongly encourage the use of confidence intervalsin the investigation of mediation hypotheses and reporting of results.\nARCHIVED MATERIALS\nThe following materials and links may be accessed through the Psy-\nchonomic Society’ s Norms, Stimuli, and Data archive, http://www.psy-chonomic.org/archive/.To access these files and links, search the archive using the journal\n(Behavior Research Methods, Instruments, & Computers ), the first au-\nthor’ s name (Preacher), and the publication year (2004).\nFile: Preacher-BRMIC-2004.zipDescription: The compressed archive file contains three files:sobel_spss.txt, containing the SPSS macro developed by Preacher\nand Hayes (2004). Instructions for using this macro can be found in thefile sobel_instr.txt.\nsobel_sas.txt, containing the SAS macro developed by Preacher and\nHayes (2004). Instructions for using this macro can be found in the filesobel_instr.txt.\nsobel_instr.txt, containing instructions for using both the SPSS and\nSAS versions of the macro developed by Preacher and Hayes (2004).\nLink: http://www.comm.ohio-state.edu/ahayes/sobel.htm.Description: The authors’ Web site, with macros and instructions.\nAuthors’ e-mail addresses: preacher@unc.edu; hayes.338@osu.edu.Authors’ Web sites: http://www.unc.edu/~preacher/; http://www.\ncomm.ohio-state.edu/ahayes/.\nAPPENDIX A\nInstructions for Use of SPSS Macro\nTo activate the macro, execute the command set at the end of this appendix by typing it verbatim into an\nSPSS syntax file or downloading an electronic copy from http://www.comm.ohio-state.edu/ahayes/sobel.htm.\nOnce the command set is executed, a new SPSS syntax command, sobel, will be available for use. This com-\nmand is available until SPSS is closed. To run the mediation analysis on a data set, execute the following com-mand in SPSS:\nSOBEL y= yvar/x= xvar/m=mvar/boot= z.\n—where yvar is the name of the dependent variable in your data file, xvar is the name of the independent vari-\nable, mvar is the name of the proposed mediating variable, and z specifies the number of bootstrap resamples\ndesired, in increments of 1,000 up to a maximum of 1,000,000. For example, if zis set to 3,000, the bootstrap\nestimates will be based on 3,000 resamples. If zis set to 0 (or any number less than 1,000), the bootstrapping\nmodule is deactivated.\nAll four of these arguments must be provided. Any cases that are system missing on any of the three vari-\nables will be deleted from the mediation analysis (i.e., listwise deletion), but they will remain in the activeSPSS data file. If the user desires any kind of imputation of missing values, imputation must be completedprior to running the sobel command. The SPSS matrix language does not recognize user-defined missing val-\nues, so any cases with user-defined missing values will be treated as valid data.\nThere are no error-checking procedures in the macro, so the output should be examined carefully to ensure\nthere are no errors printed. The most likely causes of errors include entering the command (or the originalmacro) incorrectly, using a variable that is actually a constant in the data file, or requesting a bootstrapped es-timate when the original sample is very small. The latter error stems from the fact that bootstrap resamplingis done with replacement, and it is possible for a variable resulting from a bootstrap sample to end up beinga constant even though none of the variables are actually constants. The minimum sample size will depend ona number of factors, but in testing, the macro usually worked as long as nwas at least 25 or so. Depending\nupon processor speed and the size of the sample, it may appear that SPSS has locked up or crashed once the\nSOBEL command is executed. Be patient.\nBecause bootstrapping is based on random sampling from the data set, each run of the program will gen-\nerate slightly different estimates of the indirect effect and its standard error, and the upper and lower boundsof confidence intervals will vary from run to run. The larger the number of bootstrap samples taken, the lessvariable these estimates will be over consecutive runs of the program. However, it is possible to replicate a setof bootstrap resamples by setting the random number seed prior to executing the \nSOBEL command. This is ac-\ncomplished by preceding the SOBEL command with the command SET SEED seedval where seedval is a num-\nber between 1 and 2,000,000. If the same seed and number of bootstrap samples are requested over multipleruns on the same data, the output from those runs will be identical. 726 PREACHER AND HA YES\nAPPENDIX A (Continued)\nIt is possible to save the bootstrapped estimates of the indirect effect as an SPSS data file for later examina-\ntion. To do this, the following command should be added just before the END MATRIX command at the end of the\nmacro SA VE RES/OUTFILE /H11005filename, where filename is any valid SPSS file name, including its storage\npath or file handle. Each command in the macro should be typed on a single line, even it if appears on multiple\nlines in the code below. Hit the return key when typing in the macro only after a command-terminating period (“.”).\nDEFINE SOBEL (y = !charend('/')/x = !charend('/')/m = !charend('/')/boot =\n!charend('/')).SET MXLOOPS = 10000001.MATRIX.\n/* READ ACTIVE SPSS DATA FILE */.\nget dd/variables = !y !x !m/MISSING = OMIT.compute n = nrow(dd).\n/* DEFINE NUMBER OF BOOTSTRAP SAMPLES */.\ndo if (!boot > 999).\ncompute btn = trunc(!boot/1000)*1000.compute btnp = btn+1.else.compute btn = 1000.compute btnp = btn+1.\nend if.\ncompute res=make(btnp,1,0).\ncompute dat=dd./* START OF THE LOOP FOR BOOTSTRAPPING */.loop #j = 1 to btnp.\ndo if (#j = 2 and !boot < 1000).\nBREAK.\nend if./* DO THE RESAMPLING OF THE DATA */.do if (#j > 1).\nloop #m = 1 to n.\ncompute v=trunc(uniform(1,1)*n)+1.compute dat(#m,1:3)=dd(v,1:3).\nend loop.\nend if.\n/* SET UP THE DATA COLUMNS FOR PROCESSING */.\ncompute y = dat(:,1).compute x = dat(:,2).compute z = dat(:,3).compute xz = dat(:,2:3).\n/* CALCULATE REGRESSION STATISTICS NEEDED TO COMPUTE c-c' */\n/* c-c' is held as variable 'ind' */.\ncompute con = make(n,1,1).\ncompute xo = {con,x}.compute bzx = inv(t(xo)*xo)*t(xo)*z.compute bzx = bzx(2,1).compute xzo = {con,xz}.compute byzx2 = inv(t(xzo)*xzo)*t(xzo)*y.compute byzx = byzx2(3,1).compute byxz = byzx2(2,1).compute ind = bzx*byzx.compute res(#j,1) = ind. TESTING INDIRECT EFFECTS 727\n/* GENERATE STATISTICS FOR BARON AND KENNY AND NORMAL SOBEL SECTION OF\nOUTPUT */.\ndo if (#j = 1).\ncompute sd = sqrt(((n*cssq(dat))-(csum(dat)&**2))/((n-1)*n)).\ncompute num = (n*sscp(dat)-(transpos(csum(dat))*(csum(dat)))).compute den = sqrt(transpos((n*cssq(dat))-\n(csum(dat)&**2))*((n*cssq(dat))-(csum(dat)&**2))).\ncompute r = num&/den.compute sdbzx = (sd(1,3)/sd(1,2))*sqrt((1-(r(3,2)*r(3,2)))/(n-2)).compute ryi = r(2:3,1).compute rii = r(2:3,2:3).compute bi=inv(rii)*ryi.compute rsq = t(ryi)*bi.compute sec=sqrt((1-rsq)/(n-3))*sqrt(1/(1-(r(3,2)*r(3,2)))).compute sdyzx = (sd(1,1)/sd(1,3))*sec.compute sdyxz = (sd(1,1)/sd(1,2))*sec.compute seind = sqrt(((byzx*byzx)*(sdbzx*sdbzx))+((bzx*bzx)*(sdyzx*sdyzx))+\n((sdbzx*sdbzx)*(sdyzx*sdyzx))).\ncompute byx = r(2,1)*sd(1,1)/sd(1,2).compute sebyx = (sd(1,1)/sd(1,2))*sqrt((1-(r(2,1)*r(2,1)))/(n-2)).compute se = {sebyx; sdbzx; sdyzx; sdyxz}.compute bb = {byx; bzx; byzx; byxz}.compute tt = bb&/se.compute p =2*(1-tcdf(abs(tt),n-2)).compute p(3,1)=2*(1-tcdf(abs(tt(3,1)),n-3)).compute p(4,1)=2*(1-tcdf(abs(tt(4,1)),n-3)).compute tst = ind/seind.compute bw = {bb,se,tt,p}.compute p2=2*(1-cdfnorm(abs(tst))).compute LL95 = ind-1.96*seind.compute UL95=ind+1.96*seind.compute op={ind, seind, LL95,UL95, tst, p2}.\nend if.\nend loop./* END OF BOOTSTRAPPING LOOP */.\n/* COMPUTE MEAN AND STANDARD DEV OF INDIRECT EFFECT ACROSS BOOTSTRAP \nSAMPLES */.compute res = res(2:btnp,1).compute mnbt = csum(res)/btn.compute se = (sqrt(((btn*cssq(res))-(csum(res)&**2))/((btn-1)*btn))).\n/* SORT THE BOOTSTRAP ESTIMATES */.\ndo if (!boot > 999).\ncompute res = {-999;res}.loop #i = 2 to btnp.\ncompute ix = res(#i,1).loop #k= #i to 2 by -1.\ncompute k = #k.do if (res(#k-1,1) > ix).\ncompute res(#k,1)=res(#k-1,1).else if (res(#k-1,1) <= ix).\nBREAK.\nend if.\nend loop.compute res(k,1)=ix.\nend loop.compute res = res(2:btnp,1).\nend if.APPENDIX A (Continued) 728 PREACHER AND HA YES\n/* GENERATE BOOTSTRAP CONFIDENCE INTERVAL FOR INDIRECT EFFECT */.\ncompute lower99 = res(.005*btn,1).compute lower95 = res(.025*btn,1).compute upper95 = res(1+.975*btn,1).compute upper99 = res(1+.995*btn,1).compute bt = {mnbt, se, lower95, upper95, lower99, upper99}.\n/* GENERATE OUTPUT */.\nprint bw/title = \"DIRECT AND TOTAL EFFECTS\"/clabels = \"Coeff\" \"s.e.\" \"t \" \n\"Sig(two)\"/rlabels = \"b(YX)\" \"b(MX)\" \"b(YM.X)\" \"b(YX.M)\"/format f9.4.\nprint op/title = \"INDIRECT EFFECT AND SIGNIFICANCE USING NORMAL DISTRIBUTION\"/rlabels \n= \" Sobel\"/clabels = \"Value\" \"s.e.\" \"LL 95 CI\" \"UL 95 CI\" \"Z\"\n\"Sig(two)\"/format f9.4.do if (!boot > 999).\nprint bt/title = \"BOOTSTRAP RESULTS FOR INDIRECT EFFECT\"/rlabels =\" \nEffect\"/clabels \n\"Mean\" \"s.e.\" \"LL 95 CI\" \"UL 95 CI\" \"LL 99 CI\" \"UL 99 CI\"/format f9.4.print n/title = \"SAMPLE SIZE\"/format F8.0.print btn/title = \"NUMBER OF BOOTSTRAP RESAMPLES\"/format F8.0.\nend if.END MATRIX.\n!END DEFINE.\nAPPENDIX B\nInstructions for Use of SAS Macro\nThe procedures for using the SAS version of the macro are largely the same as for the SPSS version. The user\nshould first execute the command set at the end of this appendix (available online at http://www.comm.ohio-state.edu/ahayes/sobel.htm). This will activate a command called %sobel, with syntax:\n%sobel(data= file, y/H11005dv, x= iv, m /H11005med, boot= z);\nwhere fileis the name of an SAS data file containing the data to be analyzed, dvis the name of the dependent\nvariable in the data file, iv is the name of the independent variable, med is the name of the proposed mediating\nvariable, and z specifies the number of bootstrap resamples desired. Except for command format, usage is the\nsame as for the SPSS version of the macro.The macro will exclude all cases from the analysis missing on any of the three variables, where missing is de-fined as the period character (“.”). There is no error checking in the macro, so examine the log file carefully forerrors. It will be obvious if an error occurs because a line marked \nERROR will appear in the SAS log file. The same\nconditions described in Appendix A will produce errors in the SAS version of the macro.To save the bootstrapped estimates of the indirect effect as a SAS data file for later examination, the followingcommands should be added just before the quiz command at the end of the macro:\ncreate filenamefrom res [colname='indirect'];\nappend from res;\nwhere filename is any valid SAS file name.\nAs currently listed, the random number generator will be seeded randomly. To set the seed, thus allowing you\nto replicate a set of bootstrap samples, change the “0” in the line that reads v /H11005int(uniform(0)*n) /H110011 to any\npositive integer less than 2\n32/H110021.\n%macro sobel(data=,y=,x=,m=,boot=);\n/* READ ACTIVE SAS DATA FILE */\nproc iml;use &data where (&y ^= . & &x ^= . & &m ^= .) ;read all var {&y &x &m};n=nrow(&y);dd=&y||&x||&m;dat=dd;APPENDIX A (Continued) TESTING INDIRECT EFFECTS 729\n/* DEFINE NUMBER OF BOOTSTRAP SAMPLES */\nif (&boot > 999) then;\ndo;btn = floor(&boot/1000)*1000;btnp=btn+1;end;\n/* START OF THE LOOP FOR BOOTSTRAPPING */if (&boot < 1000) then;\ndo;btn = 1000;btnp = btn+1;end;\nres=j(btnp,1,0);do mm = 1 to btnp;\n/* DO THE RESAMPLING OF THE DATA */if (mm > 1) then if (&boot > 999) then;\ndo;\ndo nn=1 to n;v = int(uniform(0)*n)+1;dat[nn,1:3]=dd[v,1:3];end;\nend;\ncon=j(n,1,1);\n/* SET UP THE DATA COLUMNS FOR PROCESSING */\nx=dat[,2];y=dat[,1];m=dat[,3];xt=dat-J(n,1)*dat[:,];cv=(xt`*xt)/(n-1);sd=sqrt(diag(cv));r=inv(sd)*cv*inv(sd);\n/* CALCULATE REGRESSION STATISTICS NEEDED TO COMPUTE c-c' */\n/* c-c' is held as variable 'ind' */xo=con||x;bzx=inv(xo`*xo)*xo`*m;bzx=bzx[2,1];xzo=con||x||m;byzx2=inv(xzo`*xzo)*xzo`*y;byzx=byzx2[3,1];byxz=byzx2[2,1];ind=bzx*byzx;res[mm,1]=ind;\n/* GENERATE STATISTICS FOR BARON AND KENNY AND NORMAL SOBEL SECTION OF\nOUTPUT */\nif (mm = 1) then;\ndo;\nsdbzx=(sd[3,3]/sd[2,2])*sqrt((1-(r[3,2]*r[3,2]))/(n-2));\nryi=r[2:3,1];rii=r[2:3,2:3];bi=inv(rii)*ryi;rsq=ryi`*bi;sec=sqrt((1-rsq)/(n-3))*sqrt(1/(1-(r[3,2]*r[3,2])));sdyzx=(sd[1,1]/sd[3,3])*sec;sdyxz=(sd[1,1]/sd[2,2])*sec;seind=sqrt(((byzx*byzx)*(sdbzx*sdbzx))+((bzx*bzx)*(sdyzx*sdyzx))+\n((sdbzx*sdbzx)*(sdyzx*sdyzx)));\nbyx = r[2,1]*sd[1,1]/sd[2,2];APPENDIX B (Continued) 730 PREACHER AND HA YES\nsebyx=(sd[1,1]/sd[2,2])*sqrt((1-(r[2,1]*r[2,1]))/(n-2));\nse = sebyx//sdbzx//sdyzx//sdyxz;bb= byx//bzx//byzx//byxz;tt = bb/se;df=j(4,1,n-2);df[3,1]=n-3;df[4,1]=n-3;p=2*(1-probt(abs(tt),df));bw=bb||se||tt||p;tst=ind/seind;pv=2*(1-probnorm(abs(tst)));LL95 = ind-1.96*seind;UL95=ind+1.96*seind;op=ind||seind||LL95||UL95||tst||pv;\nend;\nend;/* END OF BOOTSTRAPPING LOOP */\n/* COMPUTE MEAN AND STANDARD DEV OF INDIRECT EFFECT ACROSS BOOTSTRAP \nSAMPLES */res=res[2:btnp,1];mnbt = sum(res)/btn;res=-999//res;\n/* SORT THE BOOTSTRAP ESTIMATES */do i=2 to btnp;\nix=res[i,1];do k =i to 2 by -1;\nm=k;if res[k-1,1] > ix then;\ndo;\nres[k,1]=res[k-1,1];\nend;\nelse;if res[k-1,1] <= ix then;\ndo;\ngoto stpit;\nend;\nend;stpit:res[m,1]=ix;\nend;\nres=res[2:btnp,1];btpt=sum(abs(res) >= abs(op[1,1]))/btn;if op[1,1] <=0 then;\ndo;\nbtpo=sum(res <= op[1,1])/btn;\nend;\nelse;\nif op[1,1] >= 0 then;\ndo;\nbtpo=sum(res >= op[1,1])/btn;\nend;\n/* GENERATE BOOTSTRAP CONFIDENCE INTERVAL FOR INDIRECT EFFECT */lower99 = res[.005*btn,1];lower95 = res[.025*btn,1];upper95 = res[1+.975*btn,1];upper99 = res[1+.995*btn,1];xt=res-J(btn,1)*res[:,];APPENDIX B (Continued) TESTING INDIRECT EFFECTS 731\ncv=(xt`*xt)/(btn-1);\nse=sqrt(diag(cv));bt=mnbt||se||lower95||upper95||lower99||upper99;\n/* GENERATE OUTPUT */\nrn={\"b(YX)\" \"b(MX)\" \"b(YM.X)\" \"b(YX.M)\"};cn={\"Coeff\" \"s.e.\" \"t\" \"Sig(Two)\"};print \"DIRECT AND TOTAL EFFECTS\";print bw [rowname = rn colname = cn format = 9.4];rn={\"Sobel\"};cn={\"Value\" \"s.e.\" \"LL 95 CI\" \"UL 95 CI\" \"z\" \"Sig(Two)\"};print \"ESTIMATE AND TEST OF INDIRECT EFFECT\";print op [rowname = rn colname = cn format= 9.4];if (&boot > 999) then;\ndo;\nprint \"BOOTSTRAP RESULTS FOR INDIRECT EFFECT\";rn={\"Effect\"};cn={\"Mean\" \"s.e.\" \"LL 95 CI\" \"UL 95 CI\" \"LL 99 CI\" \"UL 99 CI\"};print bt [rowname = rn colname = cn format = 9.4];print \"NUMBER OF BOOTSTRAP RESAMPLES\" btn;\nend;\nprint \"SAMPLE SIZE\" n;quit;\n%mend sobel;\n(Manuscript received February 18, 2003;\nrevision accepted for publication May 9, 2004.)APPENDIX B (Continued)" -}