do noe please

profilekettyg555
bul_131_5_785.pdf

Meta-Analysis of Cognitive–Behavioral Treatments for Generalized Anxiety Disorder: A Comparison With Pharmacotherapy

Kristin Mitte University of Jena

The efficacy of (cognitive) behavioral therapy ([C]BT) for generalized anxiety disorder was investigated and compared with the efficacy of pharmacological therapy using meta-analytic techniques. A total of 65 (C)BT studies and pharmacological studies were included. (C)BT was more effective than control conditions. The results of the comparison between (C)BT and pharmacotherapy varied according to the meta-analytic methods used. Conclusions about differences in efficacy between therapy approaches are limited when all available studies are included owing to a number of factors that influence effect sizes. When only those studies that directly compared both therapies were included in the analysis, there were no significant differences in efficacy. Attrition rates were lower for (C)BT, indicating that it is better tolerated by patients.

Keywords: generalized anxiety disorder, cognitive– behavior therapy, drug therapy, meta-analysis

Generalized anxiety disorder (GAD) is a common mental dis- order. Data collected in the United States indicate that approxi- mately 5% of the general population suffers from the disorder at least once in their lifetime (Kessler et al., 1994). Women are more often affected than men, with prevalence rates approximately twice as high (Wittchen, Zhao, Kessler, & Eaton, 1994). Furthermore, Kessler, Mickelson, Barber, and Wang (2001) found that in com- parison with those experiencing 25 other common physical con- ditions and mental disorders, people with GAD reported the high- est number of days off work, with an average of 6 days per month prior to taking part in the study.

In the third edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM–III; American Psychiatric Association, 1980), GAD was defined as uncontrollable and diffuse anxiety or worry with several related psychophysiological symptoms that persists for 1 month or longer. The third edition, revised (DSM– III–R; American Psychiatric Association, 1987), and the fourth edition (DSM–IV; American Psychiatric Association, 1994) in- cluded some changes in the diagnostic criteria. First, the minimum duration of the symptoms was increased from 1 month to 6 months. Second, the cognitive symptoms of chronic worrying became the defining features of GAD. In contrast, in DSM–III the most important criteria related to symptoms of arousal, for exam- ple, somatic symptoms and muscle tension. However, even now, some authors continue to emphasize the associated symptoms of

arousal and question the utility of focusing on worry (see also Rickels & Rynn, 2001).

(Cognitive) behavior therapy ([C]BT) is commonly used to treat patients with GAD. Such treatments include, for example, applied relaxation, cognitive restructuring of dysfunctional beliefs, and cognitive exposure to worry. However, compared with psycho- pharmacological approaches, research has paid little attention to psychological treatments. Although benzodiazepines have long been used, drugs of the class of azapirones (e.g., buspirone) and antidepressants (e.g., selective serotonin reuptake inhibitors; SSRIs) are now used. Thus, there is a need to address the question of whether (C)BT is effective in the treatment of GAD and also to compare its efficacy with that of pharmacotherapy.

Although a meta-analysis including the results of 35 studies has already been carried out on this issue (Gould, Otto, Pollack, & Yap, 1997), a new one is now required, as a number of additional relevant efficacy studies have since been conducted and new meta-analytic techniques have been developed. The present meta- analysis includes the following improvements. First, it provides a comprehensive quantitative summary of 65 controlled studies. Second, the random-effects model (REM) was used to compute average effect sizes and regression analyses. Two statistical mod- els are used in meta-analyses, which differ in their statistical and sampling assumptions and in the conclusions drawn: the fixed- effects model (FEM) and the REM. In the FEM, the results of a meta-analysis are restricted to the studies included. Results “apply to this collection of studies and say nothing about other studies that may be done later, could have been done earlier, or may have already been done but are not included among the observed stud- ies” (Hedges & Vevea, 1998, p. 487). In contrast, when the REM is used, results can be generalized beyond the studies selected and inferences apply to treatment efficacy in general, which is more appropriate in current research syntheses. Third, the present meta- analysis included several sensitivity analyses. In research synthe- ses, many meta-analytic decisions are made that can affect the results, for example, which studies are included or how effect sizes

The research reported in this article was supported by a personal Grant of the State of Thuringia (Germany). Results were partly presented at the 33rd Congress of the European Association of Behavioural and Cognitive Therapy, September 10 –13, 2003, Prague, Czech Republic. I thank Ker- sten Schäfer for developing the statistical program, Thomas Bär for pro- viding data, Rainer Riemann for providing feedback, and Deidre Winter and Matthew White for proofreading.

Correspondence concerning this article should be addressed to Kristin Mitte, Department of Psychology, University of Jena, Humboldtstr. 11, 07743, Jena, Germany. E-mail: [email protected]

Psychological Bulletin Copyright 2005 by the American Psychological Association 2005, Vol. 131, No. 5, 785–795 0033-2909/05/$12.00 DOI: 10.1037/0033-2909.131.5.785

785

T hi

s do

cu m

en t i

s co

py ri

gh te

d by

th e

A m

er ic

an P

sy ch

ol og

ic al

A ss

oc ia

tio n

or o

ne o

f i ts

a lli

ed p

ub lis

he rs

. T

hi s

ar tic

le is

in te

nd ed

s ol

el y

fo r t

he p

er so

na l u

se o

f t he

in di

vi du

al u

se r a

nd is

n ot

to b

e di

ss em

in at

ed b

ro ad

ly .

are computed. Analyses are therefore conducted to evaluate the effects of these decisions on mean effect sizes by changing the meta-analytic methods. That is, sensitivity analyses are performed to examine the robustness of results and conclusions. In the present meta-analysis, sensitivity analyses were conducted on, for exam- ple, modifications in methods of calculating effect sizes (including only those effect sizes calculated by means and standard devia- tions), modifications in the effect size distribution (including out- liers), or modifications in the modes of comparing different treat- ment approaches (comparing [C]BT with all available drug studies vs. comparing [C]BT only with currently used drug classes, i.e., excluding benzodiazepines and any drug not marketed after com- pletion of clinical trials). Fourth, publication bias, that is, a bias against studies with nonsignificant findings, was assessed and taken into account by using trim-and-fill analysis to compare (C)BT and pharmacotherapy. Trim-and-fill analysis (Duval & Tweedie, 2000) is used in meta-analysis to correct the funnel plot by estimating the number of missing studies and the effect sizes of these studies. Trim-and-fill analysis thus allows the calculation of mean effects corrected for publication bias. Fifth, several method- ological differences between studies, such as the dropout rate or sample size, were controlled for when different therapy approaches were compared. And finally, a method was used that allows the comparison of studies with different control groups.

Method

An extensive literature search was conducted in various databases (MEDLINE and PsycINFO from the 1st available year to May 2002) using the search terms general* anxiety, treatment, and *therapy. Additional titles were identified by a manual search in important journals and lists of references given in previous meta-analyses or primary studies. With a view to reducing file-drawer effects, I attempted to locate unpublished work using the Internet and by contacting researchers and pharmaceutical com- panies. However, I found only one unpublished study investigating the efficacy of pharmacotherapy (lesopitron). Suitable studies were selected for inclusion according to the following criteria.

1. Studies were published in either English or German. 2. All adult participants had been diagnosed with GAD according to a

standardized diagnostic classification system (e.g., DSM), or an exact description of the disorder including the duration of symptoms was pre- sented. It was considered insufficient if the patients were described as “anxious” or “neurotic.” Studies with children and adolescents or older persons were not included.

3. Participants had received some form of (C)BT or pharmacotherapy. Behavioral therapy was defined as “direct attempts to reduce dysfunctional emotions and behavior by altering behavior” (Brewin, 1996, p. 34) and cognitive therapy as “attempts to reduce dysfunctional emotions and be- havior by altering individual appraisals and thinking patterns” (Brewin, 1996, p. 34). Cognitive– behavioral therapy used both methods. Pharma- cotherapy was conducted with a minimum duration of 14 days and with a designation of the used substance with the nonproprietary name.

4. No case studies were included. 5. Studies used an adequate control group including waiting lists, pill

placebo, or therapy placebo. Therapy placebo was defined as the realiza- tion of factors common to all psychological therapies; studies using meth- ods of behavioral or cognitive therapy, including relaxation, as the control condition were excluded. Studies investigating both cognitive– behavior therapy and pharmacological therapy were included.

6. No reports on the results of a subsample or double publications were included if they could be clearly assigned to a larger study.

7. Treatment outcome was established by self-report; observer-rated measures; or behavioral tests of anxiety, depression, quality of life, or

clinical significance. The latter applied to anxiety only and included responder status, defined as a meaningful improvement (e.g., a 50% change on an assessment scale); end-state status, defined as a comparison with the normative population (e.g., no diagnosis); or both. Data were excluded if they could not be assigned to one of these categories.

8. Studies were required to give sufficient information to permit calcu- lation of effect sizes, including means and standard deviations, t or F values, change scores, frequencies, and probability levels. If these indices were not reported and the study had been published within the past few years, I attempted to contact the authors. Some meta-analysts have in- cluded only studies that give means and standard deviations. However, it should be noted that excluding a study because no means and standard deviations are reported may result in a greater bias than if the study is included and the best available estimator is calculated. The reasons why these values have been excluded in the study may plausibly be directly linked to the result of the study (e.g., studies having failed to find a significant effect may tend to exclude means and standard deviations). Excluding such studies would therefore reduce the representativeness of the sample of studies selected for the meta-analysis and thus also the generalizability of its results.

No additional exclusion criteria assessing the quality of the studies were applied. However, most studies had used a randomized design (for two studies this variable was not codable, and one was nonrandomized). Non- randomized studies were not excluded because the relationship between study quality and effect size is still a point at issue. In the pioneering meta-analysis conducted by Smith, Glass, and Miller (1980), there were no significant differences in effect sizes between randomized studies and nonequivalent control group designs. The same results were found by Lipsey and Wilson (1993) across 74 meta-analyses and Heinsman and Shadish (1996) across 98 studies after controlling for other methodological variables. In contrast, in a review of 100 studies of marital or family psychotherapy, Shadish and Ragsdale (1996) found a significantly greater effect size for randomized studies, even after controlling for covariates. Several authors therefore have recommended that the difference between randomized and nonrandomized studies be investigated in meta-analyses and taken into account when interpreting the findings (e.g., Hunter & Schmidt, 1990). However, the number of randomized studies included in the present meta-analysis was too small to evaluate this difference.

In the present research, a coding form consisting of items assessing methodological and clinical aspects of the studies was used; I completed it. To assess interrater reliability, I and three trained psychology students who were working on their diploma theses on therapy research independently coded 10 randomly selected studies. The coding form had an overall reliability of greater than .75 across all variables and individual item reliabilities of not less than .50 (according to Fleiss, 1981, this is excellent reliability). A total of 19 publications—13 studies comparing (C)BT with a control group and 6 studies comparing (C)BT with pharmacotherapy— met selection criteria for (C)BT trials, the data of 869 patients being included in the meta-analysis. The average age was 37.36 years, the average duration of GAD was 6.31 years, and 61.7% of the patients were women. Table 1 shows the effect sizes of the studies selected. It should be noted that some of the studies included in the meta-analysis by Gould et al. (1997) were not included in the current meta-analysis because of our strict exclusion criteria (e.g., Borkovec et al., 1987, in which the treatment of the control group consisted of relaxation, or Rice, Blanchard, & Purcell, 1993, in which some participants were diagnosed with subclinical GAD). The effect sizes of the (C)BT studies were compared with those of clinical drug trials of GAD (Mitte, Noack, Steil, & Hautzinger, 2005), so that a total of 65 studies with 7,739 patients were included in the analysis.

With regard to statistical analyses, computation of effect sizes was carried out using the standardized mean difference statistic Hedges’s g between groups (Hedges’s g is the difference between the means divided by the pooled standard deviations; Hedges & Olkin, 1985) or the algebra- ically equivalent effect sizes computed from the t and F values (Ray &

786 MITTE

T hi

s do

cu m

en t i

s co

py ri

gh te

d by

th e

A m

er ic

an P

sy ch

ol og

ic al

A ss

oc ia

tio n

or o

ne o

f i ts

a lli

ed p

ub lis

he rs

. T

hi s

ar tic

le is

in te

nd ed

s ol

el y

fo r t

he p

er so

na l u

se o

f t he

in di

vi du

al u

se r a

nd is

n ot

to b

e di

ss em

in at

ed b

ro ad

ly .

Shadish, 1996). When the relevant means and standard deviations or t and F values were not reported, effect sizes were calculated by various other methods. Odds ratios were computed and transformed into Hedges’s g on the basis of dichotomous data (e.g., responder status; Haddock, Rindskopf, & Shadish, 1998; Rosenthal, 1994). In cases in which change scores alone are reported, the correlation between pre- and posttest scores is required for computation of Hedges’s g. However, most of the studies failed to report these correlations, and it was thus necessary to use an estimator of r � .81 (this is the mean retest reliability of 23 frequently used instruments for anxiety disorders). This estimator is higher than that used by Smith et al. (1980), that is, r � .50 for a period of 2 to 6 months. However, Ray and Shadish showed that if an estimator of r � .50 is used, a higher result is obtained for effect sizes based on change scores than for effect sizes based on means and standard deviations. Because the effect size is higher when computed from a lower pre–post correlation, Ray and Shadish postulated that Smith et al.’s (1980) estimator may well be too small.

In addition, if the studies included in the present meta-analysis did not publish the exact statistical results of analyses, the effect sizes were

inferred from the t values of significance levels used to describe the results (“highly significant”: p � .01; “significant”: p � .05; “marginally signif- icant”: p � .10; “nonsignificant”: g � 0). It should, however, be noted that effect sizes computed by this method may possibly underestimate the real effect (Ray & Shadish, 1996). An additional sensitivity analysis was therefore carried out to examine the impact of these effect sizes on the average effect size across the studies. Only effect sizes based on means and standard deviations were included in this sensitivity analysis.

In the case of the same results being presented with various statistical values (e.g., both means with standard deviations and t values), Hedge’s g was computed using means and standard deviations. The correction for small sample bias was applied (Hedges & Olkin, 1985).

Where data were reported for both completer and intent-to-treat analy- ses, only the latter was included. Intent-to-treat analysis includes patients having dropped out during the course of the study, for example, because of lack of compliance, lack of treatment success, or the occurrence of serious side effects. Thus, results of an intent-to-treat analysis allow conclusions on the general applicability of a treatment and are regarded as more

Table 1 Effect Sizes of Included Studies

Study Groups Anxiety

effect size Depression effect size

Barlow et al. (1984) CBT, waiting list 2.24 — Barlow et al. (1992) BT, waiting list 0.93 0.54

CT, waiting list 1.02 0.85 CBT, waiting list 0.65 0.41

Biswas et al. (1995) BT, benzodiazepines �0.52 — CBT, benzodiazepines 0.04 —

Blowers et al. (1987) Anxiety management training, waiting list 0.65 — Anxiety management training, therapy placebo 0.09 —

Borkovec & Costello (1993) BT, therapy placebo 0.60 0.45 CBT, therapy placebo 0.68 0.81

Bowman et al. (1997)a Self-examination therapy, waiting list 0.95 — Butler et al. (1987) Anxiety management training, waiting list 1.00 1.00 Butler et al. (1991) BT, waiting list 0.44 0.23

CBT, waiting list 1.00 0.66 Cragan & Deffenbacher (1984) Anxiety management training, waiting list 1.14 0.90

BT, waiting list 1.24 0.73 Jannoun et al. (1982) Anxiety management training, waiting list 1.07 0.87 Kohli et al. (2000) Relaxation, pharmacotherapy 0.64 — Ladouceur et al. (2000) CBT, waiting list 1.39 1.48 Linden et al. (2002) CBT, therapy placebo 0.48 — Lindsay et al. (1987) Anxiety management therapy, waiting list 0.73 0.42

Anxiety management therapy, benzodiazepines 0.17 �0.09

CBT, waiting list 1.27 1.65 CBT, benzodiazepines 0.77 0.86

Power et al. (1989) CBT, pill placebo 1.12 — CBT, benzodiazepines 0.85 —

Power et al. (1990) CBT, pill placebo 1.22 0.51 CBT, benzodiazepines 0.53 —

Sarkar et al. (1999) BT, pharmacotherapy �0.06 — White et al. (1992) BT, waiting list 0.50 0.83

BT, therapy placebo 0.38 0.56 CT, waiting list 0.55 1.07 CT, therapy placebo 0.30 0.42 CBT, waiting list 0.44 0.64 CBT, therapy placebo 0.37 0.33

Woodward & Jones (1980) BT, waiting list 0.46 — CT, waiting list 0.29 — CBT, waiting list 0.77 —

Note. CBT � cognitive– behavioral therapy; BT � behavioral therapy; CT � cognitive therapy without behavioral techniques. a Study was excluded in the subsequent analyses because therapy was restricted to use of a booklet; in contrast, other studies in which a self-help treatment was investigated included longer lasting contact with the therapist.

787META-ANALYSIS OF GAD

T hi

s do

cu m

en t i

s co

py ri

gh te

d by

th e

A m

er ic

an P

sy ch

ol og

ic al

A ss

oc ia

tio n

or o

ne o

f i ts

a lli

ed p

ub lis

he rs

. T

hi s

ar tic

le is

in te

nd ed

s ol

el y

fo r t

he p

er so

na l u

se o

f t he

in di

vi du

al u

se r a

nd is

n ot

to b

e di

ss em

in at

ed b

ro ad

ly .

meaningful than results of an analysis based on the data of treatment completers only. With a few exceptions, most studies investigating the efficacy of a psychopharmacological therapy used intent-to-treat analysis (of the studies included in this meta-analysis, nearly all [C]BT studies used completer analysis only, as compared with only one fourth of clinical drug trials). However, an intent-to-treat analysis could show lower effect sizes. This seems plausible, given that patients with severe symptomatologies were included—leading to increases in the means and variances of the symptoms. However, Mitte et al. (2005) found no significant differences between the results of completer and intent-to-treat analyses of trials of pharmacotherapy for GAD.

The direction of the effect sizes was standardized so that positive effect sizes always represented a better result for the treatment group (comparison of control and treatment group), for psychotherapy (comparison of pharmaco- and psychotherapy), and for a combination of therapies (com- parison of psychotherapy and a combination of psycho- and pharmacotherapy).

Hedges’s g was calculated separately for each assessment scale used in a study. Then, a mean g for the clinical variables was calculated by averaging across all of the dependent measures. Thus, each instrument was equally weighted. There are other approaches to the combination of sto- chastically dependent effect sizes (Gleser & Olkin, 1994); however, addi- tional information (intercorrelations between the assessment scales) is required. Because these were not available, the more conservative method for computing the mean effect was used.

Some of the studies had used more than one treatment condition to investigate efficacy. In such cases, separate effect sizes were calculated wherever the various treatments represented different techniques. Other- wise, effect sizes were averaged across the treatment groups.

A random-effects analysis was then carried out to compute the mean effect sizes across all studies and the subsequent regression analyses (see, e.g., Erez, Bloom, & Wells, 1995; Hedges & Vevea, 1998; Overton, 1998). FEMs and REMs can be distinguished as follows. In the FEM, the variation between studies results only from the subjects included in the studies (within-study variance). All effect sizes are assumed to be estimates of a common population effect size; results cannot be generalized beyond the included studies. In contrast, the REM includes a variance component (between-studies or random-effects variance) that results from drawing studies from a universe of possible studies in addition to the variation due to the sampling of subjects in the original studies. The results of REM can therefore be generalized to treatment conditions not exactly resembling the conditions in the studies used for the data analyses. The REM takes into account several uncontrollable variables that could influence study effect sizes, such as therapist variables or setting. The REM was adopted both for the computation of the average effect sizes and for the regression analyses. Simulations show that interpretation of the results of the REM is only to be recommended when it is based on the data of at least 5 studies (Hedges & Vevea, 1998). When more than 20 studies are included, the performance of analyses (power of significance test, confidence intervals; CIs) is close to nominal (Hedges & Vevea, 1998). An FEM was therefore carried out in an additional sensitivity analysis to compute average effect sizes (Shadish & Haddock, 1994).

In order to test for variation in study effect sizes, I applied the Q test for homogeneity of effect sizes (Hedges, 1994). This test is used to determine whether study effect sizes are all equal or whether at least one effect size differs from the remainder. The Q statistic is also used to examine whether the average effect size computed with the FEM is representative of the population effect, which is a main assumption of the FEM. With the REM, the Q statistic tests whether the random-effects variance is significantly different from 0. Because the Q test has a low statistical power (Harwell, 1997), I tested for significance at a probability level of .10. When the Q test is nonsignificant, FEM and REM yield similar results for mean effect size.

In the computation of both the mean effect sizes and the regression

analyses, the individual effect sizes of each study were weighted with the reciprocal of the variance components (i.e., with the within-study variance in the FEM and with the sum of the within-study and between-studies variance in the REM; Raudenbush, 1994; Shadish & Haddock, 1994). In view of the association between the sample size and the variance of an effect size, this method was used to ensure that the inaccuracy due to small sample size was taken into consideration. Studies with large sample sizes and thus more precise estimates of effect sizes have greater weights than studies with small sample sizes.

Results

Preliminary Analyses

In the first step of the analysis, a correction of the effect size distribution was performed. This was based on the work of Hunter and Schmidt (1990), who described several artifacts in meta- analyses, including the influence of bad data resulting from a variety of possible data handling errors (e.g., publishing erroneous data). Hunter and Schmidt drew attention to the difficulty of dealing with these artifacts when performing meta-analyses. They proposed excluding outlying values to reduce the impact of the faulty data on the meta-analysis. For the purposes of the present study, outliers were defined as those values that deviate more than two standard deviations from the unweighted mean, and appropri- ate correction was carried out across all studies included. Although it is not known whether outliers do in fact result from false data, this is the only way to control for this problem. For this analysis of the outliers, the effect sizes were calculated for each individual assessment scale and not averaged across groups. A total of 3.57% of the individual effect sizes for posttest measures were excluded from the outlier analysis comparing (C)BT and a no-treatment control, 1.47% from that comparing (C)BT and a common-factors control, and 3.45% from that comparing psychotherapy with phar- macotherapy. The impact of excluding outliers on the average effect size was investigated in a subsequent sensitivity analysis.

How Effective Is (C)BT?

The majority of the studies comparing (C)BT with a control group investigated a cognitive– behavioral approach, usually in conjunction with some type of exposure. The effect sizes were positive in all studies, that is, the control groups never achieved a better result than the treatment groups. Table 2 shows the average weighted effect sizes, 95% CIs, random-effects variances, and homogeneity statistics for each symptom category. The results of both the REM and the FEM are presented. The results of the REM can be generalized beyond the studies included in the review; however, when only a small number of studies are included, the results of the REM may be biased. As shown in Table 2, (C)BT yielded significant medium-to-large effect sizes in comparison with both a waiting list control and a common-factors control (both psychological and pill placebo). The efficacy of (C)BT therefore exceeds the realization of common factors. This result holds true not only for the main symptom of anxiety but also for associated depressive symptoms and quality of life. The zero random-effects variance indicates that it was not necessary to conduct a regression analysis to investigate the impact of moderator variables. For the same reason, the two statistical models (FEM and REM) yielded

788 MITTE

T hi

s do

cu m

en t i

s co

py ri

gh te

d by

th e

A m

er ic

an P

sy ch

ol og

ic al

A ss

oc ia

tio n

or o

ne o

f i ts

a lli

ed p

ub lis

he rs

. T

hi s

ar tic

le is

in te

nd ed

s ol

el y

fo r t

he p

er so

na l u

se o

f t he

in di

vi du

al u

se r a

nd is

n ot

to b

e di

ss em

in at

ed b

ro ad

ly .

equal effect sizes. The CIs for the REM are wider, despite a zero random-effects variance, because the number of studies is used to compute the CIs, and not the number of subjects, as in the FEM.

In a first sensitivity analysis, all effect sizes from studies that did not provide means and standard deviations were excluded. This step was based on findings described by Ray and Shadish (1996), who reported notable differences between various computational variants of Hedges’s g. They found, for example, that mean effect sizes and variances based on means and standard deviations tend to be larger than those based on probability levels and concluded that researchers should pay more attention to this in their analyses. In the sensitivity analysis conducted in the present meta-analysis, in which only effect sizes computed with means and standard devi- ations were included, results for the main category …