A critical assessment of NICE guidelines for treatment of depression
The UK National Institute for Health and Care Excellence (NICE) recently updated its recommendations for the treatment of depression 1 . This effort has many strengths, including the meticulous documentation of the process; systematic reviews, meta‐analyses and cost‐effectiveness analyses; and inclusion of stakeholder comments that feed into the guidelines. Here we attempt a constructive critical appraisal of areas where future improvements for this but also for other similar initiatives are feasible, with a special focus on psychotherapies for depression.
We first notice that the methods and analyses of the NICE guidelines were not subjected to formal external peer review for any of the addressed questions. Asking stakeholders for comments is welcome, but it is unlikely to be equally rigorous, leaving it to the guideline committee how these comments are considered. External peer review is recommended as a default quality standard for treatment guidelines 2 .
Furthermore, study protocols were pre‐registered only for some conditions (e.g., for new episodes of depression and treatment‐resistant depression), but not for others (including chronic depression, depression with personality disorder, and psychotic depression). Pre‐registering should be established as a default standard in guidelines for all reviewed conditions.
For the primary analysis concerning new episodes of depression, network meta‐analysis (NMA) was chosen 1 . NMA has the advantage of incorporating both direct and indirect evidence, but complex assumptions need to be fulfilled, and the level of evidence provided is still debated 3 . For these reasons, NMA results and the derived inferences require extra caution.
For treatment ranking, the guideline committee primarily focused on effect sizes from NMA treatment comparisons with placebo or treatment‐as‐usual, and compared these effect sizes between treatments. From these comparisons, the committee concluded that some treatments appeared to be “more effective” than others 1 . For most treatments, however, the differences between treatment and control effect sizes were below the minimal clinically significant difference defined by the committee (standardized mean difference, SMD >0.5 or <–0.5) 1 . This applies to comparisons between individual cognitive or cognitive‐behavioral therapy (CT/CBT), individual interpersonal therapy (IPT), individual problem solving, individual short‐term psychodynamic psychotherapy (STPP), and group behavior activation. Thus, with only subtle effect size differences, treatment ranking carries large uncertainty. Furthermore, assuming differences between two treatments if one of them shows descriptively a larger effect size than the other compared to a control condition, without comparing them directly, should be avoided 4 .
The guideline committee reported head‐to‐head comparisons of active treatments only in a supplement. These comparisons show that, in more severe depression, the differences between individual behavioral therapy, individual CBT, individual IPT and individual STPP are neither statistically nor clinically significant (SMDs <0.50) 1 . In less severe depression, only a few clinically significant differences were found: for example, in a pairwise comparison, STPP was statistically and clinically significantly superior to counselling (SMD=–0.61, 95% CI: –1.05 to –0.17), but was ranked below counselling.
Thus, the committee's conclusions about differences in efficacy between active treatments are not consistent with its own head‐to‐head comparisons. They are also not compatible with independent peer‐reviewed evidence of no substantial differences in efficacy between psychotherapies 5 . The committee, however, erroneously interpreted this independent evidence 5 as confirming its treatment ranking1,B, p.165. In summary, procedures for treatment ranking need to be pre‐defined, and subtle differences below the threshold of clinically meaningful values should not be overstated.
In principle, possible allegiance and conflicts of interests need to be controlled for 2 , for example by including methodologists, patients, and different‐field experts, and by limiting the involvement of field specialists to a consultation role 6 . Avoidance of stacking is also essential, ensuring that guideline developers do not have an over‐representation of believers in one or another treatment modality 6 .
The guideline committee based the hierarchy of treatment recommendations on both efficacy and cost‐effectiveness, which is useful in trying to optimize the use of treatments for conditions with high prevalence 1 . For cost‐effectiveness, however, peer reviews and pre‐registration are missing. Moreover, the cost‐effectiveness literature is notoriously replete with biases. This further complicates matters in a field such as depression where the primary studies are often also biased (e.g., sponsor bias in pharmacotherapy trials and allegiance bias in psychotherapy trials). Furthermore, the studies used by the committee for cost‐effectiveness analysis did not cover all relevant treatment types. For those not covered, it is not clear whether cost‐effectiveness estimates are valid. Additional cost‐effectiveness analyses commissioned by the committee were based on the NMA treatment‐control effect sizes shown above to be questionable, which further limits the derived treatment ranking.
Another challenge is whether extrapolations from new episodes of depression to other conditions are valid, when there is no solid evidence for these other categories of depression. For example, in depression with personality disorder, the committee recommends combining antidepressants and psychotherapy. For the choice between psychotherapies, readers are referred to the treatments for new episodes of depression. Then, for patients not sufficiently responding to pharmacotherapy alone, switching to psychotherapies listed for new episodes of more severe depression is recommended as one option. In reviewing new episodes of depression, however, the committee excluded depression with personality disorder and treatment‐resistant depression. Thus, the committee's ranking of psychotherapies for new episodes of depression may not be valid for these other conditions. Finally, for the cost‐effectiveness of chronic depression and depression with personality disorder, the committee also used the economic data for new episodes of depression.
As another problem, the guideline committee found the quality of studies to be quite low. The committee tried to adjust results for bias, but a pre‐registered threshold analysis for assessing confidence in recommendations was not carried out. Quality of evidence was evaluated narratively using the GRADE system, but without assessing confidence. Assessing confidence in evidence is essential for guidelines 6 .
The committee also draws an arbitrary distinction between the more complex forms of depression, which not only reduces generalizability to clinical practice but appears to have led to the exclusion of relevant studies. Available randomized controlled trials have not clearly distinguished between chronic depression and treatment‐resistant depression. For chronic depression, the committee recommends CBT, antidepressants or their combination 1 . However, these recommendations do not take into account the evidence for STPP and long‐term psychodynamic therapy in treatment‐resistant depression and in depression with personality disorder7, 8, conditions highly associated with chronic depression. Guidelines need to avoid arbitrary distinctions of disorders.
Moreover, the committee did not sufficiently consider the limitations of the available evidence 2 , especially the limited remission rates (about 30%) of short‐term psychotherapies (4‐20 sessions), with SMDs of 0.30 9 . Aggravating this problem, most effect sizes of short‐term treatments are not stable at follow‐up1. Especially for chronic depression, success rates may be improved with longer‐term treatments 9 . The committee, however, considered long‐term treatments only as an option for depression with personality disorder.
Finally, an explicit link between evidence and recommendations is missing 2 . We acknowledge that the evidence in this field is uncertain, and this may be the reason why the committee found it “difficult… to link the recommendations directly to the NMA results”1,B, pp.48,66, and based its recommendations ultimately on “clinical experience”1,B, p.66. However, it is unclear whether clinical experience can offer any solid guidance when treatment differences are modest, uncertainty is high and bias is substantial. Guidelines should fully admit this uncertainty and avoid over‐simplified, over‐confident recommendations 6 .