Adjusting health spending for the presence of comorbidities: an application to United States national inpatient data

Background One of the major challenges in estimating health care spending spent on each cause of illness is allocating spending for a health care event to a single cause of illness in the presence of comorbidities. Comorbidities, the secondary diagnoses, are common across many causes of illness and often correlate with worse health outcomes and more expensive health care. In this study, we propose a method for measuring the average spending for each cause of illness with and without comorbidities. Methods Our strategy for measuring cause of illness-specific spending and adjusting for the presence of comorbidities uses a regression-based framework to estimate excess spending due to comorbidities. We consider multiple causes simultaneously, allowing causes of illness to appear as either a primary diagnosis or a comorbidity. Our adjustment method distributes excess spending away from primary diagnoses (outflows), exaggerated due to the presence of comorbidities, and allocates that spending towards causes of illness that appear as comorbidities (inflows). We apply this framework for spending adjustment to the National Inpatient Survey data in the United States for years 1996-2012 to generate comorbidity-adjusted health care spending estimates for 154 causes of illness by age and sex. Results The primary diagnoses with the greatest number of comorbidities in the NIS dataset were acute renal failure, septicemia, and endocarditis. Hypertension, diabetes, and ischemic heart disease were the most common comorbidities across all age groups. After adjusting for comorbidities, chronic kidney diseases, atrial fibrillation and flutter, and chronic obstructive pulmonary disease increased by 74.1%, 40.9%, and 21.0%, respectively, while pancreatitis, lower respiratory infections, and septicemia decreased by 21.3%, 17.2%, and 16.0%. For many diseases, comorbidity adjustments had varying effects on spending for different age groups. Conclusions Our methodology takes a unified approach to account for excess spending caused by the presence of comorbidities. Adjusting for comorbidities provides a substantially altered, more accurate estimate of the spending attributed to specific cause of illness. Making these adjustments supports improved resource tracking, accountability, and planning for future resource allocation. Electronic supplementary material The online version of this article (doi:10.1186/s13561-017-0166-2) contains supplementary material, which is available to authorized users.


Introduction
A regression-based framework was used to model the share of spending for a health system encounter that is attributable to comorbidities. In this model, spending was transferred away from an encounter's primary diagnosis and systematically redistributed across comorbidities to more accurately reflect the true cost of treating each cause.

Extracting, mapping, and cleaning data
The National Inpatient Sample survey (NIS) was used to demonstrate our method for comorbidity adjustment because it contains multiple secondary diagnoses in addition to a primary diagnosis. The unit of analysis is an encounter, which corresponds to a single inpatient hospital stay. The NIS dataset contains on average 5.3 secondary diagnoses per inpatient encounter (Table 1). Age, sex, primary diagnosis, secondary diagnoses (comorbidities), patient weights and health spending were extracted from the NIS data for each inpatient encounter. Diagnoses are recorded using International Classification of Disease version 9 (ICD9) classification (1), (2). All diagnoses were mapped from ICD9 code to Global Burden of Disease (GBD) 2013 cause classification. Our analysis was based on level III of the GBD cause hierarchy, which classifies causes of illness and health spending across 169 causes. These causes of illness are listed in Table 2, stratified by age group. Observations which failed to map were removed. This occurred in 17% of cases, and is due to reporting error. ICD9 uses two coding systems to classify injuries: N-codes and E-codes. E-codes, which state the external cause of injury or poisoning, are most similar to the way GBD classifies injuries. However, in the NIS, Ecodes are not listed as primary diagnoses. To accurately capture health system encounters resulting from injuries classified using E-codes, the E-code listed first was considered the primary diagnosis. The one exception to this rule was E-codes corresponding to adverse medical treatment. These E-codes were not allowed to be primary diagnoses because they are injuries generally occurring due to treatment complications, and are thus not typically the underlying reason for the health systems encounter. If multiple E-codes were listed for an observation, the first E-code that was not adverse medical treatment was selected as the primary diagnosis.
ICD9 codes corresponding to non-disease well person care were not allowed to be primary diagnoses for inpatient services. Similarly, encounters that violated GBD age or sex restrictions (shown in Table 2), such as females with prostate cancer or adults with neonatal illnesses, were also excluded. Finally, ICD9 codes with a primary diagnosis that was mapped to intermediate causes of illness rather than underlying causes were also removed. When a primary diagnosis was mapped to a viable cause, but secondary diagnoses were not, these secondary diagnoses were removed.
After mapping from ICD9 codes to GBD causes, the data still contained N-codes for injuries. A probabilistic replacement was used to replace these remaining N-codes with E-codes (and then mapped to GBD cause). Probability maps for this probabilistic assignment were created by pooling data across all years to make age-specific E-code probabilities, conditional on having each N-code. The conditional probabilities used in this assignment were calculated using full four-or five-digit codes from NIS.

EXAMPLE. N-code proportions and replacement
Among 55-year-olds, the GBD injury causes and the corresponding probability weights associated with the N-code N11 (Dislocation of hip) are: • Animal contact (0.008) • Exposure to mechanical forces (0.017) • Other transport injuries (0.011) • Other unintentional injuries (0.16) • Road injuries (0.370) • Falls (0.440) Thus, whenever N11 appeared in the diagnosis list for 55-year-olds, it was remapped as falls in 44% of observations, as road injuries in 37% of observations, etc.
An additional challenge of mapping ICD9 codes to GBD causes occurred when abbreviated ICD9 codes (not full, 5-digit ICD9 codes) led to a diagnosis mapping to a level I or II GBD cause, rather than level III. We referred to these cases as "not elsewhere classified" (NEC). For the few NEC that existed in the NIS, a probabilistic replacement was used to replace NEC causes with viable GBD level III causes. The data were combined across all years to make age-specific probability maps. These maps were stratified by age because disease burden and the distribution of causes are a function of age. These maps were used to probabilistically reassign NEC causes to non-NEC causes.

EXAMPLE. NEC proportions and replacement
Among 55-year-olds, the causes in the cardiovascular disease (CVD) family and their corresponding probabilities of occurrence are: • If a single observation had multiple diagnoses with the same GBD cause (for example, two or more secondary diagnoses of septicemia), the duplicate comorbidities were removed.
Encounters were divided into four age categories. The four age categories were: (1) 0-14 years, (2) 15-44 years, (3) 45-64 years, and (4) 65 years and above. These age groups are intended to roughly identify major life stages and are valuable because age groups beyond 5-year groupings have distinct patterns of comorbidity and health care utilization. While the groups are not precisely associative, other research has utilized the same age groupings. These groups have been relied on previously for reporting spending stratified by distinct life stages. 4 We split our data by primary diagnosis and age group, pooling across sex and time. Despite pooling across these dimensions, there were several rare causes of illness, such as malaria and leprosy, with few observations. Rather than trying to estimate the effect of comorbidities on spending for these causes of illness, causes with fewer than 1,000 reported encounters across all years and both sexes within an age category were excluded from analysis.
In order to generate uncertainty intervals, 1,000 bootstrap samples were generated. All subsequent steps in comorbidity analysis were carried out 1,000 times, once for each sample. All reported results are the mean estimates across the bootstrap samples, and uncertainty is reported as the 2.5 and 97.5 percentile across the bootstrap draws.

Comorbidity selection
To be comprehensive, nearly all conditions present in the data were included in analysis. In a few cases, the comorbidities that were allowed for a primary diagnosis were restricted due to research aims and data availability.
Infrequently occurring comorbidities seemed less likely to be systematically related to price and greatly increased the computation needed for this analysis. To address this, comorbidities were excluded for a given primary diagnosis if their probability of occurring was less than 0.1. This threshold ensured that only the most relevant and common comorbidities were included in the analysis for each primary diagnosis. Figure 1 shows the distribution of comorbidities included for each primary diagnosis. 75% of primary diagnoses have at least one comorbidity. 68% of all primary diagnoses have at least four comorbidities.   Figure 3 shows the mean number of comorbidities per person that were omitted from the analysis as a result of the frequency threshold that was applied. Using a threshold of 0.1, the average number of comorbidities excluded per patient was 0.6. The average number of comorbidities excluded per patient did not vary dramatically at different frequency thresholds.

EXAMPLE. Comorbidity pairs selection
Among 45-64 year olds, ischemic heart disease (IHD) occurs as both a primary and secondary diagnosis.
As a primary diagnosis, IHD had 145 associated secondary diagnoses. Of the 145 associated secondary diagnoses, nine had probabilities of co-occurrence greater than or equal to 0.1. Therefore, only the following secondary diagnoses were considered as comorbidities for IHD: We further refine the primary cause-comorbidity pairing to ensure that resources are reallocated from a primary cause only to causes that are true comorbidities, rather than manifestations of the same conditions. To do this, we apply three exclusion criteria: 1. Exclude intermediate causes. For example, remove skin and subcutaneous diseases as comorbidities when the primary diagnosis is for diabetes, and remove heart failure as a comorbidity when CVD is the primary diagnosis. 2. Exclude residual "other" categories, such as other indirect maternal causes and other infectious diseases. 3. Exclude risk factors, impairments, and well care causes, such as hyperlipidemia, renal failure, and well pregnancies. These restrictions were set in consultation with medical professionals who have experience using ICD9 coding in clinical settings. The full list of restrictions is outlined in Table 3. Funds were not permitted to flow from the primary diagnoses in the left column to the comorbidities in the right column.

Modeling risk of excess spending
A log-linear regression model was used to estimate the risk of excess spending due to comorbidities. Log-linear regression is one of the most commonly used methods for modeling health care spending data (3). A model was fit separately for each primary condition and age category. Spending for a health system encounter was the dependent variable and binary indicators identifying an encounter was coded with the relevant comorbidities were the independent variables. The simplest form of the model is illustrated by Equation (1): Using this equation, excess spending was estimated independently for each primary diagnosis i, using age category-specific encounter-level data. The set of J comorbidities was included. Binary indicators for year and sex were included to control for heterogeneous spending across the time and sex. The relative risk of excess spending for i induced by comorbidity j is the coefficient on the primary diagnosiscomorbidity pair ( ). Only statistically significant pairs (p < 0.05) were included in the final comorbidity adjustment.
The presence of a comorbidity generally increases health spending for a given primary diagnosis. In these cases, > 0, and the comorbid condition raised the cost of managing the primary condition, on average. Conversely, when < 0 , costs of managing the primary condition decreased because of a comorbid condition. While empirically rare, this occurs if a comorbid condition makes standard treatment for the primary condition ineffective, unsafe, or poorly tolerated, necessitating less complex, and therefore less expensive, treatment.

EXAMPLE. Understanding regression results
Among 45-64 year olds, IHD appeared as comorbidity when diabetes mellitus was the primary diagnosis. In addition, diabetes was a comorbidity for IHD as a primary diagnosis. After regression, the IHD-diabetes pair had a coefficient of 0.050. The presence of diabetes as a comorbidity made IHD more expensive to treat. The diabetes-IHD pair had a coefficient of 0.006. The presence of IHD as a comorbidity made diabetes more expensive to treat, but the effect of IHD on diabetes was less than the effect of diabetes on IHD.

Calculating attributable fractions
The relative risk of excess spending due to comorbidities was then used to calculate the attributable fraction for each primary diagnosis-comorbidity pair. An attributable fraction is the proportion of disease spending on a primary diagnosis that is attributable to a specific comorbidity. The share of total spending for primary condition i attributable to comorbidity j is the product of the pair-specific relative risk of excess spending and the conditional probability of i and j co-occurring. This is illustrated by Equation (2):

EXAMPLE. Calculating attributable fractions
As seen in previous examples, the IHD-diabetes pair for 45-64 year olds has a probability of occurrence of 0.318 and a regression coefficient of 0.050. The attributable fraction for this pair is as follows: This means that 1.6% of spending on the treatment and prevention of IHD is attributable to diabetes.

Generating flows and adjustment scalars
The attributable fractions for all primary diagnosis-comorbidity pairs were then used to reallocate spending from primary diagnoses to comorbidities. We applied our estimates of attributable fractions to more granular, sex-specific categories disaggregated using 5-year age groups instead of the four highly aggregated age groups.
The outflows, resources transferred away from primary diagnosis i to comorbidity j, were calculated as the product of the attributable fraction AFij and the total spending on diagnosis i. The total outflow of resources from primary condition i due to all comorbidities is the sum of the outflows from i to all comorbidities under consideration, illustrated in Equation (4): A primary diagnosis in one health system encounter is often a comorbidity for another primary diagnosis in a different health system encounter. Thus, it was important to calculate not only the share of primary diagnosis i attributable to comorbidity j, but also to calculate the share of primary diagnosis j attributable to comorbidity i. These funds are inflows, or the resources transferred to i when it is listed as a comorbidity for each of the j other causes. The total inflow of resources from all comorbidities to primary diagnosis i was calculated as sum of the product of the total spending for j and the attributable fractions. Equation (5) illustrates the calculation of inflows: Because the comorbidity adjustment was a redistribution of resources, the total outflows across all causes in an age category should have been equal to the total inflows in that age category. That is, money flowing out of the primary diagnoses should have been the same as the money flowing to the comorbidities. This assumption was used to verify the calculation of outflows and inflows by age category.
The netflow of resources for a primary condition captures the transfer of resources to and from that cause. That is, the netflow for cause i is the difference between the total inflows and total outflows for i, as illustrated in Equation (6). The netflow can be positive or negative. A positive netflow indicates that inflow for a cause is greater than outflow. Causes with positive netflows generally appeared often as comorbidities, and spending typically increased as a result of comorbidity adjustment. A negative netflow indicates outflow for a cause is greater than inflow. Causes with negative netflows generally appeared often as primary diagnoses and rarely as comorbidities. Spending on these causes typically decreased after comorbidity adjustment.
The final, comorbidity adjusted spending for cause i was the sum of the pre-comorbidity adjusted spending for i and its corresponding netflow, as shown in Equation (7): Relative increases and decreases in spending were described using comorbidity adjustment scalars. The scalar for cause i is defined as the netflow for i as a percent of the total spending on i. This is shown in Equation (8): The value of the scalar represented the percent change in spending for that cause. For a given cause, a scalar greater than one represented an increase in spending, while a scalar less than one represented a decrease in spending. The scalars provided a common metric for comparing comorbidity adjustments between causes and across age categories.

EXAMPLE. Calculating outflows, inflows, netflows and adjusted spending
The attributable fractions for 45-64 year olds with a primary diagnosis of IHD are: Thus, the total outflow from IHD to other causes was the sum of these three outflows, or approximately $18.1 billion.
The inflow for IHD was the sum of the outflows from the 72 diseases for which IHD was a comorbidity to IHD. The inflow to IHD was $31.6 billion.
Thus, the netflow for IHD was $31.6 billion -$18.1 billion, or $13.5 billion. The final spending for IHD among 45-64 year olds was $685.5 billion ($672 billion pre-comorbidity spending + $13.5 billion netflow), after adjusting for all comorbidities. There was a slight increase in spending on IHD in this age group after comorbidity adjustment of about 2%: = . + = .
Because IHD occurred frequently as a comorbidity, it had a net increase in spending due to comorbidity adjustment.