|
|
|
Fredric M. Wolf, Ph.D. |
|
University of Washington |
|
Dept of Medical Education
& Biomedical Informatics |
|
http://www.dme.washington.edu/ |
|
|
|
|
|
|
To provide a conceptual understanding of the
quantitative methods used to synthesize evidence across independent trials
or studies |
|
|
|
|
|
|
Methods for pooling evidence across independent
studies |
|
Pooling binary and continuous outcomes |
|
Differences between fixed and random effects
models |
|
Guidelines for appraising published systematic
reviews/meta-analyses |
|
|
|
|
|
|
What is a systematic review/meta-analysis? |
|
History of meta-analysis (very abbreviated) |
|
Methods for pooling evidence across independent
studies - examples |
|
Pooling binary and continuous outcomes |
|
|
|
|
Cumulative meta-analysis vs. standard
meta-analysis |
|
Differences between fixed and random effects
models; heterogenity |
|
Sources of bias |
|
Guidelines for appraising published systematic
reviews/meta-analyses |
|
|
|
|
|
|
|
|
Why do it? Why pool? |
|
What is a systematic review/meta-analysis? |
|
Brief history of meta-analysis (very
abbreviated) |
|
Examples |
|
|
|
|
|
|
“. . . the mass of new information makes
it difficult for practicing physicians to follow the literature in all
areas that might be relevant to their practices. New methods to synthesize and present information from widely
dispersed publications are needed
. . . .”
Jerome Kassirer. Clinical trials
and meta-analysis: what do they do for us? N Engl J Med 1992; 327:273-4. |
|
|
|
|
10-fold Increase in Number of Professional
Journals |
|
Psychology Journals:
91 (1951) --> 1,175 (1992) |
|
Math Science Journals:
91 (1953) --> 920 (1992) |
|
Biomedical Journals:
2,300 (1940)--> 23,000 (1993) |
|
|
|
|
Not only is there more information, but . . . |
|
Not all information is of equal quality |
|
Information does not necessarily = evidence |
|
There is often conflicting information &
reports |
|
Traditional narrative reviews can be very
“impressionistic” |
|
|
|
|
Selective inclusion of studies, often based on
the reviewer's own impressionistic view of the quality of the study |
|
Differential subjective weighting of studies in
the interpretation of a set of findings |
|
Misleading interpretations of study findings |
|
Failure to examine characteristics of the
studies as potential explanations for disparate or inconsistent results
across studies |
|
Failure to examine moderating variables in the
relationship under examination |
|
|
|
|
1900’s: Karl Pearson pooled correlation
coefficients -enteric fever & inoculation rates British Army |
|
1930’s: Ronald Fisher pooled p-values |
|
1972: Richard Peto: log rank test for combining
(binary) data from different trials; Mantel, Thomas Chalmers |
|
1976: Gene Glass “meta-analysis” effects of
psychotherapy |
|
1989 Murray Enkin, Marc Keirse, Iain Chalmers Effective
Care in Pregnancy & Childbirth |
|
1992/1993 UK Cochrane Centre/Cochrane
Collaboration created |
|
|
|
|
|
|
“Primary
analysis is the original analysis of data in a research study . . . . |
|
Secondary analysis is the re-analysis of data
for the purpose of answering the original research question with better
statistical techniques, or answering new questions with old data . . . .” |
|
|
|
GV Glass 1976, p. 3 |
|
|
|
|
“Meta-analysis
refers to the analysis of analyses . . . . the statistical analysis of a
large collection of analysis results from individual studies for the
purpose of integrating the findings.
It connotes a rigorous alternative to the casual, narrative
discussions of research studies which typify our attempts to make sense of
the rapidly expanding research literature.” |
|
GV Glass 1976, p. 3 |
|
|
|
|
“provide summaries of what we know, and do not
know, that are as free from bias as possible.” (Chalmers et al 1999) |
|
“research that uses explicit & transparent
methods to synthesise relevant studies, allowing others to comment on,
criticise or attempt to replicate the conclusions reached. Systematic
reviews follow same set of procedures as any individual study, & are
often reported in the same way. . . .” (Petrsino et al 1999) |
|
|
|
|
Are the results of the different studies
similar? |
|
To the extent that they are similar, what is the
best overall estimate of effect? |
|
How precise and robust is this estimate? |
|
Can dissimilarities be explained? |
|
|
|
Lau J, Ioannidis JPA, Schmid CH. Quantitative
Synthesis in Systematic Reviews. Annals of Internal Medicine 1997;
127:820-826. |
|
|
|
|
|
|
State objectives of the review, & outline
eligibility criteria |
|
Search for studies that seem to meet eligibility
criteria |
|
Tabulate characteristics of each study
identified & assess its methodological quality |
|
Apply eligibility criteria & justify any
exclusions |
|
|
|
|
Assemble the most complete dataset feasible,
with involvement of investigators |
|
Analyse results of eligible studies.
Use statistical synthesis of data
(meta-analysis) if appropriate & possible |
|
Perform sensitivity analyses, if appropriate
& possible (including subgroup analyses) |
|
Prepare a structured report of the review,
stating aims, describing materials & methods, & reporting results |
|
|
|
|
|
Comparison with human genome project in
potential impact for clinical medicine |
|
Naylor CD.
Grey zones of clinical practice: some limits to evidence-based
medicine. Lancet 1995; 345:840-2. |
|
What is the Cochrane Collaboration & why is
it important? |
|
|
|
|
Cochrane Database of
Systematic Reviews
(CDSR) |
|
Database of Abstracts of Reviews of
Effectiveness (DARE) |
|
Cochrane Central Register of Controlled Trials
(CENTRAL) |
|
Cochrane Review Methodology Database |
|
Health Technology Assessment DB (HTA) |
|
NHS Economic/Evaluation Database (NHS EED) |
|
|
|
|
|
|
|
CDSR (Cochrane
Database of Systematic Reviews) |
|
>50 CRGs,
>1500 complete reviews,
>1200 protocols, >200 new reviews/year |
|
DARE (DB of Abstracts of Reviews of
Effectiveness) |
|
>3800
abstracts |
|
Cochrane Central Register of Controlled Trials (CENTRAL)
>353,000 RCTs/CCTs |
|
HTA (Health Technology Assessment Database) >2800 |
|
NHS Economic /Evaluation DB >10,000 |
|
|
|
|
|
50% of RCTs not found in MEDLINE |
|
44% of all RCTs (17% - 76%) |
|
72% of RCTs in MEDLINE (32% - 91%) |
|
Publication Type (PT) indexing |
|
RCT (1991) |
|
CCT (1995) |
|
Meta-analysis (1993) |
|
|
|
|
>630 journals |
|
19,266 RCTs tagged (1991-93) |
|
14,964 CCTs tagged (1985-90) |
|
MEDLINE updated annually to include or re-tag
these studies as RCTs or CCTs |
|
|
|
|
“The fundamental distinction between Cochrane
Reviews and other reports of systematic reviews is that their authors are
expected to update and amend them in the light of relevant additional data,
and criticisms or other comments.”
Sir Iain Chalmers
1999 |
|
|
|
|
|
Consistency in format across all reviews, both
in the sections of the review & particularly for tables/figures |
|
Enables readers to better understand without
shifting cognitive frame of reference for each review |
|
However there are costs/trade-offs associated
with this standardization |
|
Wolf FM. Lessons to be learned from
evidence-based medicine: Practice and promise of evidence-based medicine
and evidence-based education. Medical Teacher 2000; 22(3): 251-259. |
|
|
|
|
|
|
|
|
“Antenatal corticosteroid therapy is associated
with a significant reduction in the incidence of neonatal death or infant
death. |
|
The magnitude of this effect was greater in the
earlier years of antenatal corticosteroid use, when the case-fatality rates
were high, however even with increasingly low fatality rates for
respiratory distress syndrome the effect is still statistically significant.” |
|
|
|
|
|
|
James P Guevara, MD,MPHa, Fredric M
Wolf, PhDb, Cyril M Grum, MDc, & Noreen M Clark,
PhDc |
|
aUniversity of Pennsylvania,
Philadelphia, USA |
|
bUniversity of Washington, Seattle,
USA |
|
cUniversity of Michigan, Ann Arbor,
USA |
|
|
|
Cochrane Library 2003 (1); BMJ 2003; in press. |
|
|
|
|
To determine the effectiveness of asthma
self-management education programs on health outcomes in children |
|
Outcomes: |
|
Lung Function |
|
Asthma Morbidity |
|
Heath Care Utilization: |
|
Self-reported Perceptions of Self-care Abilities |
|
|
|
|
|
|
|
To conduct subgroup analyses to examine the
impact of |
|
type of educational intervention (individual vs.
group) |
|
intensity of educational intervention (no. of
sessions) |
|
self-management strategy (symptom vs. peak flow) |
|
degree of asthma severity |
|
length of follow-up |
|
study quality (adequacy of allocation
concealment, withdrawal rates, and type of trial, i.e., RCT vs. CCT) |
|
|
|
|
Randomized controlled trials (RCTs) or
controlled clinical trials (CCT) |
|
Children & adolescents ages 2 to 18 years
old |
|
Educational intervention designed to teach one
or more self-management strategies related to prevention, attack
management, or social skills |
|
Included outcomes on pulmonary function tests,
morbidity, or health care utilization |
|
|
|
|
|
Studies were identified from |
|
Cochrane Airways Group's Special Register of
Controlled Trials comprised of references from |
|
MEDLINE (1966-2000) |
|
EMBASE (1980-2000) |
|
CINAHL (1982-2000) |
|
hand searched airways-related journals |
|
PsychINFO |
|
Reference lists from relevant review articles
that were identified (ancestry approach) |
|
|
|
|
asthma OR wheez* |
|
AND |
|
education* OR self management OR self-management |
|
AND |
|
placebo* OR trial* OR random* OR double-blind OR
double blind OR single-blind OR single blind OR controlled study OR
comparative study. |
|
|
|
|
|
|
Potentially relevant studies from literature
search and hand searches (n=318) |
|
Excluded on basis of abstract, e.g., not
randomised or controlled clinical trials (n=273) |
|
Articles selected for full text review (n=45) |
|
Excluded after full text review (n=13) |
|
Eligible trials (n=32) |
|
|
|
|
|
Lung Function (pulmonary function tests): |
|
FEV1 |
|
PEFR (Peak Flow ) |
|
Asthma Morbidity: |
|
asthma exacerbations |
|
days of school absence |
|
days of restricted activity |
|
nights disturbed by asthma |
|
asthma severity |
|
|
|
|
|
|
|
Health Care Utilization: |
|
physician visits |
|
emergency department (ED) visits |
|
hospitalizations |
|
Self-reported Perceptions of Self-care
Abilities: |
|
self-efficacy |
|
|
|
|
Time of enrollment in the intervention
(1-6 mo, 7-12 mo, or >/=12 months) |
|
Self-management strategy
(peak flow-based vs. symptom-based) |
|
Intervention type
(individual vs. group) |
|
Intensity of intervention
(single vs. multiple sessions) |
|
Study quality
(RCT vs. CCT; random allocation procedures) |
|
|
|
|
32 eligible trials were abstracted & coded
(by 2 independent coders) |
|
Approximately 3706 patients |
|
All eligible studies were abstracted onto
preprinted data collection forms |
|
Authors of studies were contacted & asked to
provide missing data |
|
|
|
|
Standardized weighted mean differences (SMD) for
continuous outcomes (vs. WMD) |
|
Pooled odds ratios for binary outcome measures
+ estimates of NNT
(number of patients needed to treat to prevent 1 bad outcome or result in
1 favorable outcome) |
|
95% confidence intervals & tests of
homogeneity of effects are reported |
|
Both fixed-effects & random effects models |
|
|
|
|
|
Significant improvement associated with
self-management education: |
|
FEV1 (1 trial, n = 110 )
(SMD –0.46 , 95% CI –0.84 to –0.08) |
|
Peak flow (PEFR) (3 trials, n = 148) |
|
(SMD
–0.53, 95% CI –0.87 to –0.20) |
|
Pooled FEV1 & PEFR (4 trials, n = 258) |
|
(SMD
–0.50 , 95% CI –0.76 to –0.25) |
|
|
|
|
|
|
Significant reductions in |
|
days of restricted activity (6 trials, n = 378)
(SMD –0.25, 95% CI
–0.46 to –0.05) |
|
days of school absence (16 trials, n = 1626)
(SMD –0.14, 95% CI –0.23 to –0.04) |
|
nights disturbed by asthma (3 trials, n = 202)
(SMD –0.34, 95% CI
–0.62 to –0.05) fixed effects (SMD –0.39, 95% CI –1.07 to +0.28) random effects |
|
No reduction in proportion of patients
experiencing an asthma exacerbation |
|
|
|
|
|
|
|
|
|
|
Significant reductions in |
|
number of emergency dept visits (11 trials)
(SMD –0.19, 95% CI –0.31 to
–0.07) |
|
No reduction in |
|
proportion of patients visiting emergency dept |
|
number of physician visits |
|
risk of hospitalization |
|
number of hospitalizations |
|
|
|
|
|
|
|
|
Significant improvement associated with
self-management education: |
|
Self-efficacy (4 trials involving 272
patients)
(SMD –0.47, 95% CI –0.71 to –0.23) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
There is not enough evidence to reliably discern
differences in effectiveness of self-management education as a function of |
|
asthma severity |
|
number of educational sessions |
|
individual versus group sessions |
|
peak flow vs. symptom management interventions |
|
|
|
|
|
|
|
Evidence from existing clinical trials supports
conclusion that self-management educational interventions for children with
asthma compared to usual care results in |
|
improved lung function |
|
declines in health care utilization |
|
decreased asthma morbidity (limited effects) |
|
This suggests desirability of incorporating
self-management education into routine asthma care for children |
|
|
|
|
|
Almost all educational programs included
prevention & attack management planning components. A small subset
included a social skills component. |
|
Many studies were either poorly reported or of
less than desired quality, or both |
|
When designing future studies, much more
attention needs to be paid to better reporting & higher quality study
design |
|
|
|
|
|
|
|
|
|
|
Evidence was unavailable for sufficient numbers
of patients to reliably estimate effects for many important subgroups that
could inform provider
& patient decision making |
|
Because evidence supports the conclusion that
education is effective when compared to no education, future studies should
directly test alternative interventions against one another rather than
against no education controls |
|
|
|
|
Cumulative meta-analysis vs. standard
meta-analysis |
|
Differences between fixed and random effects
models |
|
Sources of bias |
|
Guidelines for appraising published systematic
reviews/meta-analyses |
|
|
|
|
|
|
|
|
RCTs (primary studies) |
|
Selection bias |
|
Performance bias |
|
Exclusion bias |
|
Detection bias |
|
|
|
Meta-analyses |
|
Publication bias |
|
Language bias |
|
Coder bias |
|
“Apples & oranges” vs “fruit”
(heterogeneity) |
|
Quality bias (small studies) |
|
Multiple publication bias |
|
|
|
|
|
|
|
|
|
Publication bias |
|
Fail-safe N |
|
Funnel Plots |
|
Language bias |
|
Coder bias |
|
“Apples & oranges” vs. “fruit”
(heterogeneity) |
|
Quality bias (small studies) |
|
Multiple publication bias |
|
|
|
|
|
|
|
|
|
General principals |
|
Effect size |
|
Confidence Intervals |
|
Types of data |
|
Sources of potential bias |
|
Interpreting results |
|
Examples |
|
|
|
|
|
|
|
|
|
|
So what’s in a model? |
|
Why does it matter? |
|
How to deal with heterogeneity? |
|
|
|
|
|
|
|
Common, to be expected, not the exception |
|
Should do test for homogeneity, but . . .
interpret heterogeneity cautiously in spirit of exploratory data analysis |
|
Exploring sources of heterogeneity can lead to
insights about modification of apparent associations by various aspects of |
|
Study design |
|
Exposure measurements |
|
Study populations |
|
|
|
|
|
|
Relations discovered in process of exploring
heterogeneity may be useful in planning & carrying out new studies |
|
Excluding outliers solely on basis of
disagreement with other studies can lead to seriously biased summary
estimates (avoid) |
|
Easier to interpret sources of heterogeneity
when identified in advance of data analysis
(not when suggested only by data) |
|
|
|
|
|
|
Fixed effects models assume that an intervention
has a single true effect |
|
Random effects models assume that an effect may
vary across studies |
|
|
|
|
|
Assumes sample of studies randomly drawn from
population of studies |
|
This is NOT typically true because: |
|
All trials are included |
|
Trials are systematically (e.g., conveniently)
sampled and not randomly sampled |
|
|
|
|
|
|
Primary value of M-A is in search for predictors
of between-study heterogeneity |
|
Random-effects summary is last resort only when
predictors or causes of between-study heterogeneity cannot be identified |
|
Random-effects can conceal fact that summary
estimate or fitted model is poor summary of the data |
|
|
|
Sander Greenland. Am J Epidemiol 1994;140;290-6. |
|
|
|
|
Sometimes needed, but more sensitive to
publication bias than fixed-effects |
|
Random effects weights vary less across studies
than fixed-effects weights |
|
W = 1/v
versus w = 1/(v + t2) |
|
Leads to reduced variation in weights |
|
Thus smaller studies given larger relative
weights when random effects models used |
|
Thus influenced more strongly by any tendency
NOT to publish small statistically insignificant studies
® biased estimate, spuriously strong
associations |
|
|
|
|
|
|
Fixed effects weights vs. random effects weights |
|
W = 1/v
versus w = 1/(v + t2) |
|
Identical when there is little or no between
study variation |
|
When differ, confidence intervals are larger for
random-effects than fixed effects |
|
Smaller studies given larger relative weights in
random effects models & >
influence |
|
Conversely, influence of larger studies is less |
|
May result in type II (beta error), e.g.,
Finding no significant difference when one truly exists |
|
|
|
|
|
|
|
|
|
|
User’s guide criteria |
|
Sources of bias |
|
|
|
|
|
|
Three broad issues need to be considered when
appraising research: |
|
A: Are
the results of the study valid? |
|
B: What
are the results? |
|
C:
Will the results help locally? |
|
|
|
Based on worksheet developed by the Critical
Assessment Skills Programme (CASP).
http://www.phru.org.uk/~casp/ |
|
Questions are adapted from Oxman AD et al.
Users’ Guides to the Medical Literature: VI. How to use an overview. JAMA
1994; 272 (17): 1367-1371. |
|
|
|
|
|
|
|
A: Are the results of the study valid?
(Yes/ Can’t tell/ No) |
|
1. Did the review address a clearly focused
research question? |
|
HINT: A research question should be “focused” in
terms of: 1. population studied, 2. intervention given or exposure,
3. outcomes considered |
|
2. Did the review include the right type
of studies? |
|
HINT: These would: 1. address the review’s
research question, 2. have an appropriate study design |
|
Is it worth continuing based on answers to
above? |
|
|
|
|
|
|
|
3. Did the reviewers try to identify all
relevant studies? |
|
HINT: Look for: 1. which bibliographic
databases were used, 2. follow-up from reference lists, 3. personal contact
with experts, 4. search for unpublished studies, 5. search for non-English
language studies |
|
4. Did reviewers assess quality of the included
studies? |
|
HINT: A clear, pre-determined strategy
should be used to determine which studies are included. Look for: 1. a scoring system, 2. more
than one assessor |
|
|
|
|
|
|
|
5. If
results of studies have been combined, was it reasonable to do so? |
|
HINT: Consider whether: 1. results of each study
are clearly displayed, 2. results were similar from study to study (look
for tests of heterogeneity),
3. reasons for any variations in results are discussed |
|
|
|
|
|
|
|
6. What are the main results of the review? |
|
HINT:
Consider 1. how the results are expressed (e.g., odds ratio,
relative risks etc.),
2. what the results are |
|
7. Could these results be due to chance? |
|
HINT: Look for tests of statistical significance
(p-values) and confidence intervals (CIs) |
|
|
|
|
|
|
|
8. Can the results be applied to the local
population? Can the results be applied to your cousin? |
|
HINT: Consider whether: 1. the population sample
covered by the review could be sufficiently different from your population
to cause concern, 2. your local settings is likely to differ much from that
of the review |
|
9. Were all important outcomes considered? |
|
HINT: Consider outcomes from the point of view
of the:
1. individual, 2. policy makers and practitioners, 3. family/carers, 4.
wider community |
|
|
|
|
|
10. Should policy or practice change as a result
of evidence
contained in this review? |
|
HINT: Consider whether benefits are worth the
harms and costs |
|
|
|
|
Meta-regression |
|
Bayesian MA |
|
Pooling non-randomized data |
|
Pooling/MA of diagnostic test data |
|
Individual patient data MA |
|
Consort MA guidelines |
|
Missing data –imputation, bias? |
|
|
|
|
|
Fredric M. Wolf, Ph.D. |
|
Professor & Chair, Dept. of Medical
Education
& Biomedical Informatics |
|
Adjunct Professor of Health Services |
|
University of Washington |
|
E-312 Health Sciences/Box 357240 |
|
Seattle, WA
98195-7240 USA |
|
Tel: 206.543.2259 FAX: 206.543.3461 |
|
E-mail:
wolf@u.washington.edu |
|
|
|
|
|
|
|