Skip to main content

Confounding and missing data in cost-effectiveness analysis: comparing different methods



Common approaches in cost-effectiveness analyses do not adjust for confounders. In nonrandomized studies this can result in biased results. Parametric models such as regression models are commonly applied to adjust for confounding, but there are several issues which need to be accounted for. The distribution of costs is often skewed and there can be a considerable proportion of observations of zero costs, which cannot be well handled using simple linear models. Associations between costs and effectiveness cannot usually be explained using observed background information alone, which also requires special attention in parametric modeling. Furthermore, in longitudinal panel data, missing observations are a growing problem also with nonparametric methods when cumulative outcome measures are used.


We compare two methods, which can handle the aforementioned issues, in addition to the standard unadjusted bootstrap techniques for assessing cost-effectiveness in the Helsinki Psychotherapy Study based on five repeated measurements of the Global Severity Index (SCL-90-GSI) and direct costs during one year of follow-up in two groups defined by the Defence Style Questionnaire (DSQ) at baseline. The first method models cumulative costs and effectiveness using generalized linear models, multiple imputation and bootstrap techniques. The second method deals with repeated measurement data directly using a hierarchical two-part logistic and gamma regression model for costs, a hierarchical linear model for effectiveness, and Bayesian inference.


The adjustment for confounders mitigated the differences of the DSQ groups. Our method, based on Bayesian inference, revealed the unexplained association of costs and effectiveness. Furthermore, the method also demonstrated strong heteroscedasticity in positive costs.


Confounders should be accounted for in cost-effectiveness analyses, if the comparison groups are not randomized.

JEL classification

C1; C3; I1


Cost-effectiveness analyses are often based on comparing average costs with the effectiveness of the treatments, and on bootstrap methods [1]. Bootstrap methods have been shown to have good properties when compared with parametric methods[24]. A standard application of this approach does not adjust for possible confounding effects, which can have a considerable influence on the results not only in observational but also in randomised studies [5]. In randomized controlled trials spurious differences in the distributions of background factors can emerge due to random variation especially with small sample sizes. This can induce a need to adjust for confounding. Some statistical methods have been developed to estimate adjusted means using (generalized) linear models, but it seems that these methods have seldom been applied in cost-effectiveness analyses. In the predictive margins approach [6, 7] the mean of individual predictions based on a generalized linear regression model are calculated, which allows for comparison of different scenarios by modification of the covariate values. In Bayesian inference, predictive distributions (e.g. [8]) can be applied for calculating adjusted means.

Longitudinal data have often been compressed by using different cumulative measures of costs and effectiveness outcomes (e.g. area under curve, AUC). The application of cumulative costs can also reduce the number of observations with zero costs in repeated measurement data and skewness, therefore, reducing the problems related to the application of linear models.

A drawback of cumulative measures is that a missing value in the outcome variable at a given measurement point will result in missing values in the cumulative outcome values of the subsequent measurement points. Multiple imputation (MI) [9] and data augmentation [10] can be applied to deal with missing outcome values at single measurement points, so that all of the information for the observed outcome values can be utilised for the cumulative outcomes.

The distribution of costs is generally skewed, with a proportion of observations having zero costs and the remaining portion having positive costs. Direct application of linear regression models is not suitable for these kinds of data, because if the model is used to predict costs, the predictions could suggest negative costs or other unrealistic results in some scenarios. The usual method of reducing skewness using logarithmic transformation is not sensible when some of the study subjects have zero costs. Two-part models [11], for example, have been proposed for solving this problem.

Missing values may be present not only in the dependent variable but also in the independent variables in the regression model. These variables can be, for example, skewed or categorical. Multiple imputation of missing costs with possibly zero values is more complicated, and two-part models can be more useful. In these cases, methods which assume a multinormal distribution for the variables are not ideal. More suitable methods, which can handle the aforementioned complications, can be based on data augmentation and Bayesian inference and implemented, for example, by using the OpenBUGS software [12, 13].

In the case of an observational study and non-intervention-based comparison groups based on, for example, a patient characteristic the cost-effectiveness analysis demonstrates the change of the average costs (relative to the change in the average effectiveness). This information allows a decision-maker to plan more cost-effective interventions, as not only different patient-specific characteristics but also other baseline factors can be compared in terms of the standardized average cost differences.

The Helsinki Psychotherapy Study (HPS) is a randomised clinical trial comparing three therapy treatments [14]. In addition to comparisons of the randomized therapy groups, the data set can be utilized to conduct cost-effectiveness analyses by comparing groups defined by nonrandomized baseline factors, in which case the randomization no longer plays any part and confounding factors generally need to be adjusted for. In the present work, we compared two groups based on the Defence Style Questionnaire (DSQ) [15], resulting in the identification of several potential confounders.

The aims of the study were as follows: a) We handle confounding by applying predictive margins in the frequentist inference and predictive distributions in the Bayesian inference to produce adjusted means of cost and effectiveness, and their differences. b) We address missing data at single measurements of a repeated measurement study by using the multiple imputation and data augmentation techniques. c) We assess the unexplained associations between costs and effectiveness by applying Bayesian hierarchical models. d) We handle nonnegative costs by applying a two-part model using a logistic regression model as an indicator of zero or positive costs, and a hierarchical gamma regression model for positive costs to avoid unrealistic negative predictive costs, which can be a result of an application of simple linear models. Effectiveness is analysed by using a hierarchical linear model.



The Helsinki Psychotherapy Study recruited 326 outpatients, aged 20–46 years, from the Helsinki region over the period 1994–2000 [14]. A statement describing explicitly the ethical background to this study and an approval by the Helsinki University Central Hospital’s ethics council can also be found in [14] (p. 31). Of these patients, 101 were randomly assigned to short-term psychodynamic psychotherapy, 97 to solution-focused therapy and 128 to long-term psychodynamic psychotherapy. In the present study, we restrict our analyses to the former two groups containing 198 patients having received short-term therapies. Of these patients, 7 refused to participate after being assigned to the treatment group and 21 discontinued their treatment. The cost-effectiveness analysis of the intervention groups has been reported elsewhere [16].

The measurement points (MP) at which the patients were measured were the baseline, and 3, 7, 9 and 12 months after the start of therapy. The Defence Style Questionnaire (DSQ) was dichotomized using the median value of 4.0 as the threshold. A total of 93 patients had DSQ <4, 100 had DSQ ≥4, and 5 had a missing DSQ value.

The effectiveness measure was the Global Severity Index, SCL-90-GSI, psychiatric symptoms, hereafter abbreviated as GSI [17]. The cost variable of the incremental cost-effectiveness analysis was the direct costs resulting from psychiatric health problems (DCP) during the 12-month follow-up period. The DCP was the sum of seven cost items in euros (€), which were: the costs accruing from 1) the protocol-driven study of psychotherapy treatments, 2) auxiliary study treatment visits, 3) other psychotherapy sessions, 4) outpatient visits to physicians and to other health care personnel concerning mental health problems, 5) inpatient care in psychiatric hospitals or with psychiatric diagnosis, 6) psychotropic medication, and 7) travel costs due to therapy visits. All costs were included in the analysis regardless of the payer. Effectiveness was assessed at baseline and at 3, 7, 9 and 12 months after baseline. Cost data based on information obtained from patients by questionnaires covered periods 0–7 months and 8–12 months whereas cost data based on patient level registers covered periods 0–3, 4–7, 8–9 and 10–12 months. Table 1 presents descriptive statistics of the outcome variables.

Table 1 Descriptive statistics of outcomes

The DSQ was associated with several baseline variables, which were also predictors of the effectiveness measure or the costs. The potential confounders in modelling both effectiveness and costs were gender, ‘psychiatric diagnostic category on Axis I’ (DSM IV) [18], ‘IIP-C, total score’ [19], ‘SOC, Sense of Coherence scale’ [20] and ‘SAS-SR, work subscale’ [21] (Table 2). These potential confounders were chosen based both on a priori judgement on psychology and on Kendall’s τ with DSQ (p-value smaller than 0.1), and with the AUC or with DCP (p-value smaller than 0.2) (data not shown). Also, we have applied baseline adjustment of the effectiveness by adding the baseline GSI value in the effectiveness model [22]. Due to randomization, there was no association between therapy group (the intervention) and DSQ, thus the therapy group was not included in the model.

Table 2 Descriptive statistics of confounders


The basic bootstrap method, which did not adjust for confounders, was compared with two model-based approaches, which adjust for confounders by using regression models. The first model-based method was based on cumulative measures of cost and effectiveness as outcomes, multiple imputation, bootstrap, generalized linear models and frequentist inference. The other method was based on hierarchical regression modelling of GSI and DCP at each MP using the Bayesian inference. These latter methods are described below, and referred to as the frequentist model and the Bayesian model although the models were not restricted to any particular inferential paradigm.

Multiple imputation and bootstrap methods

Multiple imputation [9] of the missing data was performed using procedure MI of the Sas System 9.2 [23]. The numerical Markov chain Monte Carlo (MCMC) method of this procedure was chosen because the missing data pattern was not monotonic. The MCMC method, in which the variables of the imputation model were assumed to be multinormally distributed, was applied separately for the DSQ groups. Effectiveness GSI was imputed for the repeated measurement data, and after the imputation the AUC was constructed from the imputed values using equation (4) given in the Appendix. The DCP were log-transformed in the MI. Auxiliary variables were not included in the imputation model due to convergence problems of the EM algorithm, which was used to estimate the initial values of imputation model parameters, of procedure MI.

In order to calculate the confidence intervals, the bootstrap method [1] was applied with 500 bootstrap samples. Multiple imputation was performed as a single imputation separately for each bootstrap sample [24].

Models for cumulative outcomes with frequentist inference

The observed values of the cumulative costs variable were all positive in this case, and a gamma regression model was therefore applied as the analysis model for the cumulative costs. The AUC was modelled using a linear regression model.

Bayesian model for repeated measurement data

The Bayesian method [8] for costs was based on a two-part hierarchical model because there were a considerable number of zero costs in the MP-specific cost data. The costs of patient i at MP k were factored as a product of two terms: C ik = C ik Z C ik P . The first term in the product had a value of one if the costs were positive between MP’s k-1 and k, and zero if no costs had been incurred. A logistic regression model for the binary outcome C ik Z was defined as

P C ik Z = 1 β k Z , X ik = 1 1 + exp { - X ik β k Z } ,

where X i k denoted the row vector of the intercept, the confounders, MP k and the binary DSQ group indicator DSQ i . β k Z were the corresponding regression coefficients, and exp{ β k Z } were the corresponding odds ratios (OR).

The second term of the positive costs C ik P was defined in a similar fashion. Because the distribution of the positive costs was skewed, the Gamma distribution was applied.

C ik P Gamma ( exp { X ik β k P + U i P } τ DSQ i k , τ DSQ i k ) .

Random effect U i P was individual. The regression coefficients represented the proportional changes in the expected value of the positive outcomes, i.e. a one-unit increase in the value of a covariate corresponded to a exp{β}-fold increase in the expected value. The possible heteroscedasticity was accounted for by allowing dispersion (inverse of variance-to-mean ratio) parameter τ DSQ i k vary over the DSQ groups and the MPs. Note that if τ DSQ i k = 1 then the expected value of the positive costs was equal to the variance, and if τ DSQ i k>1 then the variance was smaller.

Effectiveness E i k was modelled by using a linear, hierarchical model:

E ik = X ik β k E + U i 1 E + U i 2 E t+ ik E ,

where t is the follow-up time at MP k. There was assumed to be an individual linear trend, which was modelled by the random effects part U i 1 E + U i 2 E t.

The prior distributions of the regression coefficients β k · and the residual error terms ik E were N(0, 100) and N(0, σ E,DSQ i k 2), respectively. The variance parameters σ E , · , · 2 and the dispersion parameters τ ·,· were distributed as InverseGamma(3,1) and Gamma(3,1)a priori, respectively.

The possible associations between measurement points, and the costs and the effects were accounted for using hierarchical models. The prior distribution for the three random effects were U i := ( U i 1 , U i 2 , U i 3 ) T := ( U i 1 E , U i 2 E , U i P ) T N(0, Σ U ) and for the corresponding covariance matrix Σ U=(σ i j ) i,j InvW10(I 3), where InvW10 was an abbreviation for the inverse Wishart distribution with 10 degrees of freedom and I 3 the 3×3 identity matrix. Thus the correlation between the random effects was assumed to be zero a priori, and the costs and effectiveness were therefore assumed to be independent given the observed background information and the random effects. The correlation coefficients of the distribution of the random effects were defined as ρ ij := σ ij / σ ii σ jj .

The details of the model specifications, estimation, data augmentation and the model assessment are described below.

Mathematical description of the model


Let i{1,2,,n}=:I index the n=198 subjects and k=1,2,…,K:=5 the K intervention points where the measurements were made. Let time t(1)≡0 denote the baseline and t(K)=12 the duration of the follow-up in months. As there was some individual variation in the times when the actual measurements were made, let t i (k) denote the actual kth measurement time when the outcomes C i k (DCP during ( t i (k-1), t i (k)]) and E i k (GSI) of subject i were recorded.

The cumulative cost variable was defined as the sum C i Cum := k = 2 K C ik . The effectiveness AUC was defined as:

E i AUC := 1 t ( K ) k = 2 K E i , k - 1 + E ik 2 ( t ( k ) - t ( k - 1 ) ) .

The incremental cost-effectiveness ratio was defined as

ICER:= C ̄ DSQ < 4 Cum - C ̄ DSQ 4 Cum Ē DSQ < 4 AUC - Ē DSQ 4 AUC ,

where C ̄ · Cum and Ē · AUC are the DSQ group specific cumulative cost and effectiveness AUC means, respectively.

Cumulative outcomes

The model adjustment for controlling confounding was based on the ideas of Lee [6]. The individual predictions, which were based on the parameter values, the covariate values and the expected value based on the gamma regression model, were:

E C i Cum β C , X i C :=exp{ X i C β C }i,

where X i C denotes the row vector of the covariates and βC the corresponding column vector of the regression coefficients.

The analysis model for the effectiveness contains the group DSQ i and the confounders. The linear regression model was applied, and the individual predictions were:

E E i AUC β E , σ 2 , X i E := X i E β E i.

The predicted margin [7] was the average of the predictions (6):

P M C ( x DSQ ):= 1 n i = 1 n E C i Cum β C , σ 2 , X i C , .

In (8) X i C , was a modified version of the original covariate values. In this modification group variable DSQ i was set to value xDSQ{DSQ <4, DSQ ≥4} for all patients i and the values of the other covariates remained at their original values. The adjusted difference of groups was difference PMC(DSQ <4)-PMC(DSQ ≥4). The predictive margins PME(xDSQ) and the difference between the groups PME(DSQ <4)-PME(DSQ ≥4) were defined as in (8), but by using the individual predictions defined in (7). The adjusted ICER was calculated by using the predictive margins PME(·) and PMC(·):

ICER PM := PM C ( DSQ < 4 ) - PM C ( DSQ 4 ) PM E ( DSQ < 4 ) - PM E ( DSQ 4 ) .

Repeated measurements

In the Bayesian model, the posterior distribution was proportional to the product of the likelihood terms based on equations (1), (2) and (3), the joint density of the random effects and the joint density of all model parameters (denoted here by θ):

p θ data i , k P C ik Z β k Z , X ik Z p C ik P X ik P , β k P , U i P , τ DSQ i k × i , k p E ik X ik E , β k E , U i E , σ E , DSQ i k 2 × i p U i θ × p { θ } .

The predictive distribution of the outcomes for a set I of hypothetical subjects i was defined as

( C i k Z , C i k P , E i k ) i I | data = p θ data i p U i θ × k P C i k Z β k Z , X i k Z p C i k P X i k P , β k P , U i P , τ DSQ i k × p E i k X i k E , β k E , U i E , σ E , DSQ i k 2 d U i d θ.

Predictive distributions (11) were applied to calculate posterior predictive expectations and quantile points of functionals of

( C i k Z , C i k P , E i k ) i I , k ,

such as PMC(xDSQ) in (8) or the ICER PM in (9).


The frequentist regression analyses were performed using the SAS System 9.2 [23] procedures Genmod and Mixed for cumulative costs and AUC, respectively.

The Bayesian analyses were conducted using the OpenBugs software [13], which applies MCMC methods. The data management and the predictive distributions were handled using the R software [25].

Data augmentation to handle missing data

Predictive distributions (11) were also applied in the data augmentation procedures to handle missing outcome values. The corresponding predictive distribution for the missing baseline covariate value X i j , which belongs to at least one of the covariate vectors X ik Z , X ik P or X ik E , was

X ij | data p θ data × p X ij θ p U i θ k P C ik Z β k Z , X ik Z × p C ik P X ik P , β k P , U i P , τ DSQ i k × p E ik X ik E , β k E , U i E , σ E , DSQ i k 2 d U i d θ.

The prior distribution for the discretized baseline DSQ was defined as Bernoulli(pDSQ), where the hyperprior for pDSQ was chosen to be Uniform(0, 1). The other baseline covariates X i j having missing values were continuous, and they were assigned N(μ j , 1/τ j ) priors with hyperpriors μ j N(0, 1000) and τ j Gamma(2, 1). As the number of missing values in these covariates was small, we chose not to elaborate the prior distributions further.

Convergence checks of MCMC and assessment of model assumptions

Two parallel chains were simulated with 60,000 iterations in addition to 10,000 iterations of burn-in in both chains. The chains were thinned by factor 15 resulting in 4,000 sampled values for both chains. Autocorrelations vanished quickly, which suggests good convergence (data not shown).

The Bayesian model was assessed using the Q-Q plots of standardised predictive errors (SPE). A total of 40 subjects denoted by set I were excluded from the data, and the predictions for these 40 subjects were calculated based on their baseline information and all observed information of the remaining subjects in I I C . The SPE was defined as

E E i k - E i k Obs σ E , | data I I C = E i k - E i k Obs σ E , p E i k X i k E , β k E , U i E , σ E , , DSQ i 2 × p U i | θ p θ data { I I C } d E i k d U i d θ i I

where E i k Obs denotes the observed value of effectiveness GSI. A similar approach for the positive costs C ik P was applied.


The adjusted absolute differences of the cumulative costs (-161 and -166 based on the frequentist and Bayesian methods, respectively) appeared to be smaller than the unadjusted difference of -309 based on the bootstrap method (Table 3). The unadjusted difference was weakly significantly different from zero (P-value 0.094) whereas the corresponding adjusted difference based on the frequentist model was not significant (P-value 0.432). The Bayesian method, which modelled the measurement points separately, appeared to produce slightly lower predictive cumulative cost means than the frequentist predictive margins method. The estimates of the difference were close to each others.

Table 3 The observed and model-based estimates

The AUC differences for effectiveness were highly significant for the unadjusted bootstrapped difference (-0.38), but the adjusted mean -0.04 based on predictive margins was not significant and close to zero (P-values 0.000 and 0.566, respectively), whereas the predictive mean difference based on the Bayesian inference produced a much smaller difference of -0.03 with credible interval (CrI) (-0.27, 0.20) (Table 3).

The unadjusted ICER estimate was slightly higher than the adjusted estimate (Table 3). The standard errors, however, were large. The unadjusted ICER was weakly significant (P-value 0.098), but the adjusted ICER was not significant (P-value 0.772). The adjusted ICER and the posterior expectation of the ICER were not applicable, because the denominator of the ICER (the effectiveness difference) was near zero, in which case the ICER was undefined [2].

The cost–effectiveness plane shows that in the group where DSQ ≥4 both the costs and the symptoms were on a higher level than in the group where DSQ <4 (Figure 1).

Figure 1
figure 1

The cost-effectiveness plane. The Bayesian method was applied to calculate the posterior predictive differences of effectiveness means Ē DSQ < 4 AUC - Ē DSQ 4 AUC and cost means C ̄ DSQ < 4 Cum - C ̄ DSQ 4 Cum .

Due to the large number of parameters in the regression models defined above, which were merely a technical utility as the objective was to adjust the average costs and effectiveness for possible confounding, we focus on a limited number of parameters of the Bayesian model in the following.

There were significant correlations between average levels of individual costs and effectiveness over time (posterior expectation E[ ρ 1 , 3 |data]=0.32 and CrI (0.00,0.57)), and the individual slope of effectiveness (E[ ρ 1 , 2 |data]=-0.17 CrI (-0.33, -0.00)), which cannot be explained by the background information included in the regression models. The correlations between the intercept of positive costs, and the slope of effectiveness, however, were practically zero (E[ ρ 1 , 3 |data]=-0.03).

The dispersions of the positive costs showed considerable variation over the measurement points, but the residual variances of effectiveness were close to each other (Table 4). During the first measurement interval the positive costs varied relatively little (E[ τ · , 2 |data] were large) because the majority of patients took the study treatments, each of which lasted approximately six months and was covered by the first two measurement intervals. Some patients took auxiliary therapies or other psychiatric treatments both during and after the study therapies, which resulted in higher costs, whereas other patients took no auxiliary treatments, which resulted in lower costs. This resulted in greater individual variation during the last measurement interval (E[ τ · , 5 |data] were small).

Table 4 Variance parameter estimates

The posterior predictive checks based on the Q-Q plots showed that the model for costs fitted well for intervals 0 to 3 months and 3 to 7 months, but after that the predictions were too low. The model for effectiveness did not fit well at the last two measurement points (Figures 2 and 3).

Figure 2
figure 2

Q-Q plots of effectiveness outcome GSI. The four measurement points at 3, 7, 9 and 12 months were used.

Figure 3
figure 3

Q-Q plots of positive costs. The measurement intervals were 0–3, 3–7, 7–9 and 9–12 months. The measurements with zero costs were excluded.


This paper compared the standard bootstrap-based method in a cost-effectiveness analysis with two model-based methods based on the frequentist and the Bayesian inferences. The skewed, non-negative distribution of cost variables was analysed using the gamma regression and the two-part models, and model adjustment for potential confounding was performed using the predictive margins and predictive distributions in the frequentist and the Bayesian inferences, respectively.

The benefit of using the predictive margins method or the predictive distributions of the Bayesian inference to calculate the adjusted averages of costs and effectiveness is that there is that also other link functions than the identity function can be used in regression modeling. For example, Nixon and Thompson [5] applied identity link functions both for gamma distribution of the costs and for the normal distribution of the effectiveness. The average of the covariates over all individuals multiplied by the corresponding regression coefficients were restricted to zero in order to interpret the intercept terms as the group averages. There is no need to make such restrictions or to stick with the identity link function when using predictive margins or Bayesian predictive distributions. Other link functions, which are commonly used in generalized linear models, can be applied with these methods as well thus avoiding the possibility of negative linear predictors (and predictive values) in case of the gamma distribution.

Both methods based on the regression models could have been applied in the case of repeated measurements by using either frequentist or Bayesian inference. However, the prevalence of zero costs during the first two follow-up periods was very low, thus an application of a logistic regression model and frequentist inference in the two-part model was not plausible. Furthermore, application of a joint hierarchical model would have been difficult because effectiveness and positive costs were modelled using different families of distributions, normal and gamma distributions, respectively. The Bayesian method avoided the numerical instabilities by using informative prior distributions for the regression coefficients and the missing data values were augmented during the MCMC simulation in a straightforward manner.

Bayesian model averaging [26, 27] has been proposed to handle the skewness of cost variables or for the selection of important predictors, respectively, but simultaneous handling of skewness, heteroscedasticity and adjustment for confounding factors can be challenging. We modelled the heteroscedastic residual variance of the cost variables in the case of the repeated measurement data, which allows the model to adapt to the distribution of the costs. The posterior predictive checks suggested that the model fit was good.

The predictive margins approach for the cumulative outcomes was flexible, and the procedure described in this paper can be extended to the case of zero costs and two-part models.

Olsen and Schafer [28] introduced a model with random effects for both parts of the two-part model for costs. They reported that the random effects were usually correlated. In our study, however, the sample size was considerably smaller and the proportion of zero costs was very low during the first seven months of follow-up. Therefore, application of separate random effects for the zero costs (and possible positive correlation with the random effect(s) of the positive costs) was suppressed.

Potential extensions of the model include, for example, using a higher order random effects model for positive costs and adding a random effect to the logistic model of zero costs. These extensions would, however, require more measurements or more individuals. Another extension of the model would be to handle the skewed distribution of the effectiveness outcome, especially at the later stages of the follow-up by using, for example, the skewed normal distribution [29].

The patient groups were not defined using (randomized) intervention groups, which is the case in most cost-effectiveness analyses, but with the DSQ, which complicates the interpretation of the ICER statistic. The DSQ is a rather stable characteristic of a patient, which cannot be altered by a researcher, whereas interventions in general can be altered. The interpretation of the ICER statistic is the change in average costs per one unit in the change of the average effectiveness measured in terms of the AUC. It is also important to bear in mind that in this work large values of the effectiveness outcome GSI correspond to more severe symptoms (less benefits) whereas in commonly used effectiveness outcomes such as the quality adjusted life years (QALY) large values correspond to greater benefits. Therefore the usual interpretation of the quadrants of the cost-effectiveness plane[30] is reversed with respect to the vertical axis. In this case the unadjusted ICER estimates were positive indicating that in the group DSQ¡4 the effectiveness was better and the costs were lower.

Our results allow a decision-maker to assess the importance of patient characteristics such as the DSQ in this study. If a patient in the group DSQ ≥4 become similar to a patient in the group DSQ¡4, the cost would decrease by 816 euros (the bootstrapped point estimate) per one unit decrease in the AUC based on the GSI on average. However, the adjusted estimates indicate that not only the effectiveness difference but also the cost difference vanished thus it is not reasonable to present such a standardized estimate of the change in costs as the ICER.

Limitations of the proposed methods are mainly related to the various modelling assumptions, but there are few alternatives to parametric models in order to adjust for confounders. For example, the effectiveness measure was nonnegative, but the model was based on normality assumptions, thus predictive effectiveness values can be negative. The poor performance of the effectiveness model was likely to be due to the skewed distribution, which was not accounted for using a model based on normal distribution. SCL-90-GSI can have only positive values, and the reduction in symptoms caused a considerable proportion of observations to lie close to zero. Linearity assumptions or possible interactions were not tested, but could be done in future work.


Our paper demonstrates how to combine several methods for performing cost-effectiveness analyses in observational studies, which are often subject to effects of confounding and missing data. Our results based on regression modelling confirmed that there was a need to adjust for the confounders in this study, thus the standard unadjusted methods based on the bootstrap method, were not adequate. Unadjusted methods showed significant differences between the groups, but the adjustment for confounders showed no the significant differences thus yielding different conclusions. Not all associations between the costs and effectiveness could not, however, be explained by the observed confounders only, thus the hierarchical model showed clear non-zero correlations between the random effects. The OpenBUGS code is available from the corresponding author upon request.


  1. Efron B: Bootstrap methods: Another look at the jackknife. Ann Stat 1979, 7: 1–26. 10.1214/aos/1176344552

    Article  Google Scholar 

  2. Briggs AH, Mooney CZ, Wonderling DE: Constructing confidence intervals for cost-effectiveness ratios: An evaluation of parametric and non-parametric techniques using Monte Carlo simulation. Stat Med 1999, 18: 3245–3262. 10.1002/(SICI)1097-0258(19991215)18:23<3245::AID-SIM314>3.0.CO;2-2

    Article  CAS  PubMed  Google Scholar 

  3. Briggs AH, Wonderling DE, Mooney CZ: Pulling cost-effectiveness analysis up Bby its bootstraps: A non-parametric approach to confidence interval estimation. Health Econ 1997, 6: 327–340. 10.1002/(SICI)1099-1050(199707)6:4<327::AID-HEC282>3.0.CO;2-W

    Article  CAS  PubMed  Google Scholar 

  4. Polsky D, Glick HA, Willke R, Schulman K: Confidence intervals for cost-effectiveness ratios: A comparison of four methods. Health Econ 1997, 6: 243–252. 10.1002/(SICI)1099-1050(199705)6:3<243::AID-HEC269>3.0.CO;2-Z

    Article  CAS  PubMed  Google Scholar 

  5. Nixon RM, Thompson SG: Methods for incorporating covariate adjustment, subgroup analysis and between centre differences into cost-effectiveness evaluations. Health Econ 2005, 14: 1217–1229. 10.1002/hec.1008

    Article  PubMed  Google Scholar 

  6. Lee J: Covariance adjustment of rates based on the multiple logistic regression model. J Chronic Dis 1981,34(8):415–26. 10.1016/0021-9681(81)90040-0

    Article  CAS  PubMed  Google Scholar 

  7. Graubard BI, Korn EL: Predictive Margins with Survey Data. Biometrics 1999, 55: 652–659. 10.1111/j.0006-341X.1999.00652.x

    Article  CAS  PubMed  Google Scholar 

  8. Gelman A, Carlin JB, Stern HS, Rubin DB: Bayesian Data Analysis. London: Chapman & Hall; 1995.

    Google Scholar 

  9. Rubin DB: Multiple imputation for nonresponse in surveys. New York: Wiley; 1987.

    Book  Google Scholar 

  10. Tanner MA, Wong WH: The calculation of posterior distributions by data augmentation. J Am Stat Assoc 1987, 82: 528–550. 10.1080/01621459.1987.10478458

    Article  Google Scholar 

  11. Mullahy J: Much ado about two: Reconsidering retransformation and the two-part model in health econometrics. J Health Econ 1998,17(3):247–281. 10.1016/S0167-6296(98)00030-7

    Article  CAS  PubMed  Google Scholar 

  12. Lambert PC, Billingham LJ, Cooper NJ, Sutton AJ, Abrams KR: Estimating the cost-effectiveness of an intervention in a clinical trial when partial cost information is available: a Bayesian approach. Health Econ 2008,17(1):67–81. 10.1002/hec.1243

    Article  PubMed  Google Scholar 

  13. Thomas A, O’Hara B, Ligges U, Sturtz S: Making BUGS open. R News 2006, 6: 12–17.

    Google Scholar 

  14. Knekt P, Lindfors O (Eds): A randomized trial of four forms of the psychotherapy on depressive and anxiety disorders. Finland: The Social Insurance Institution; 2004.

    Google Scholar 

  15. Andrews G, Pollock C, Stewart G: The determination of defense style by questionnaire. Arch Gen Psychiatry 1989, 46: 455–460. 10.1001/archpsyc.1989.01810050069011

    Article  CAS  PubMed  Google Scholar 

  16. Maljanen T, Tillman P, Hrknen T, Lindfors O, Laaksonen MA, Haaramo P, Knekt P: The cost-effectiveness of short-term psychodynamic psychotherapy and solution-focused therapy in the treatment of depressive and anxiety disorders during a one-year follow-up. J Ment Health Policy Econ 2012, 15: 13–23.

    PubMed  Google Scholar 

  17. Derogatis LR, Lipman RS, Covi L: The SCL-90: An outpatient psychiatric rating scale - Preliminary report. Psychopharmacol Bull 1973, 9: 13–28.

    CAS  PubMed  Google Scholar 

  18. American Psychiatric Association: Diagnostic and statistical manual of mental disorders. Washington, DC: American Psychiatric Association, 4th edition; 1994.

    Google Scholar 

  19. Horowitz LM, Rosenberg SE, Baer BA, Ureno G, Villasenor VS: The inventory of interpersonal problems: Psychometric properties and clinical applications. J Consult Clin Psychol 1988, 56: 885–895.

    Article  CAS  PubMed  Google Scholar 

  20. Antonovsky A: The structure and properties of the sense of coherence scale. Soc Sci Med 1993, 36: 725–733. 10.1016/0277-9536(93)90033-Z

    Article  CAS  PubMed  Google Scholar 

  21. Weissman MM, Bothwell S: Assessment of social adjustment by patient self-report. Arch Gen Psychiatry 1976, 33: 1111–1115. 10.1001/archpsyc.1976.01770090101010

    Article  CAS  PubMed  Google Scholar 

  22. Manca A, Hawkins N, Sculpher MJ: Estimating mean QALYs in trial-based cost-effectiveness analysis: the importance of controlling for baseline utility. Health Econ 2005, 14: 487–496. 10.1002/hec.944

    Article  PubMed  Google Scholar 

  23. SAS/STAT 9.2 User’s Guide 9.2, Cary, NC. 2009.

  24. Shao J, Sitter RR: Bootstrap for imputed survey data. J Am Stat Assoc 1996, 91: 1278–1288. 10.1080/01621459.1996.10476997

    Article  Google Scholar 

  25. R Development CoreTeam: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Austria: Vienna; 2011. . [ISBN 3–900051–07–0] []

    Google Scholar 

  26. Conigliani C, Tancredi A: Semi-parametric modelling for costs of health care technologies. Stat Med 2005,24(20):3171–3184. 10.1002/sim.2012

    Article  CAS  PubMed  Google Scholar 

  27. Negrin MA, Vazquez-Polo FJ: Incorporating model uncertainty in cost-effectiveness analysis: A Bayesian model averaging approach. J Health Econ 2008,27(5):1250–1259. 10.1016/j.jhealeco.2008.03.005

    Article  PubMed  Google Scholar 

  28. Olsen MK, Schafer JL: A two-part random-effects model for semicontinuous longitudinal data. J Am Stat Assoc 2001,96(454):730–745. 10.1198/016214501753168389

    Article  Google Scholar 

  29. O’Hagan T, Leonhard T: Bayes estimation subject to uncertainty about parameter constraints. Biometrika 1976, 63: 201–202. 10.1093/biomet/63.1.201

    Article  Google Scholar 

  30. Cohen DJ, Reynolds MR: Interpreting the results of cost-effectiveness studies. J Am Coll Cardiol 2008,52(25):2119–2126. 10.1016/j.jacc.2008.09.018

    Article  PubMed Central  PubMed  Google Scholar 

Download references


The work is supported by the Academy of Finland (#138876) and Social Insurance Institution.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Tommi Härkänen.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

PK, TH and TM conceived the study. TH developed the methods, conducted the analyses and wrote the manuscript. TM provided expertise in cost-effectiveness analyses. OL and PK provided expertice in psychotherapy. OL, PK ja EV provided the data. EV programmed the SAS macro. All authors read and approved the final manuscript.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Härkänen, T., Maljanen, T., Lindfors, O. et al. Confounding and missing data in cost-effectiveness analysis: comparing different methods. Health Econ Rev 3, 8 (2013).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: