Skip to main content

Mendelian randomization: estimation of inpatient hospital costs attributable to obesity



Mendelian Randomization is a type of instrumental variable (IV) analysis that uses inherited genetic variants as instruments to estimate causal effects attributable to genetic factors. This study aims to estimate the impact of obesity on annual inpatient healthcare costs in the UK using linked data from the UK Biobank and Hospital Episode Statistics (HES).


UK Biobank data for 482,127 subjects was linked with HES inpatient admission records, and costs were assigned to episodes of care. A two-stage least squares (TSLS) IV model and a TSLS two-part cost model were compared to a naïve regression of inpatient healthcare costs on body mass index (BMI).


The naïve analysis of annual cost on continuous BMI predicted an annual cost of £21.61 [95% CI £20.33 – £22.89] greater cost per unit increase in BMI. The TSLS IV model predicted an annual cost of £14.36 [95% CI £0.31 – £28.42] greater cost per unit increase in BMI. Modelled with a binary obesity variable, the naïve analysis predicted that obese subjects incurred £205.53 [95% CI £191.45 – £219.60] greater costs than non-obese subjects. The TSLS model predicted a cost £201.58 [95% CI £4.32 – £398.84] greater for obese subjects compared to non-obese subjects.


The IV models provide evidence for a causal relationship between obesity and higher inpatient healthcare costs. Compared to the naïve models, the binary IV model found a slightly smaller marginal effect of obesity, and the continuous IV model found a slightly smaller marginal effect of a single unit increase in BMI.


The global prevalence of obesity has increased significantly since 1980 and represents a significant economic burden worldwide [1,2,3,4,5]. Analysis of Global Burden of Disease data (2015) estimated that 603.7 million adults were obese, representing 12% of adults globally [6]. According to the World Health Organization criteria used internationally and by the UK National Health Service (NHS), an adult with a body mass index (BMI) of 18.5 to 25.0 kg/m2 is considered to be of a normal weight. Individuals with a BMI of 25 to 30 kg/m2 are considered to be overweight, and individuals with a BMI at or above 30 kg/m2 are considered to be obese [7]. Obesity has a significant impact on health through secondary consequences such as type 2 diabetes, coronary heart disease, cancer, stroke, and depression [8,9,10].

Observational studies have established a positive correlation between elevated BMI and healthcare costs, but they cannot definitively establish causation because of the high likelihood of unobserved confounding factors and the possibility of reverse causation [1, 4, 11]. Randomized controlled trials may establish causation, but there are challenges to conducting randomized controlled trials in obesity [12,13,14]. Mendelian Randomization (MR) is a type of instrumental variable analysis that uses genetic variants as instruments to mitigate the effects of unobserved confounding factors in statistical models. In the case of this study, MR is used to evaluate the relationship between the risk factor of elevated BMI on the outcome of inpatient healthcare costs.

To our knowledge, at the time this study was designed, no published studies had investigated the link between obesity and healthcare costs using MR methods, though general instrumental variable analysis has been used extensively to investigate the relationship with obesity and other factors such as cancer and cardiovascular diseases [15,16,17,18]. A few studies have used general instrumental variable analysis to investigate the relationship between obesity and healthcare costs, but these studies did not use genetic variants as instruments. Cawley and Meyerhoefer [19] conducted an instrumental variable analysis of US Medical Expenditure Panel Survey data to determine that obesity increases annual medical costs by $2741. The study used the weight of a biological relative as an instrument for the weight of the primary subject. An Australian instrumental variables study on the impact of childhood obesity on healthcare costs found that obese individuals incurred healthcare costs $102.90 AUD greater than their normal weight counterparts. This study by Black et al. used a biological parent’s BMI as an instrument for the child’s BMI [20].

Our study aims to use MR to estimate the impact of obesity on annual inpatient healthcare costs in the UK using linked data from the UK Biobank and Hospital Episode Statistics (HES). The use of MR methods to investigate the relationship between obesity and healthcare costs is a novel approach, and this analysis was intended to fill a gap in the literature. A study using similar methods and conducted independently of ours was recently published, and reported marginal effects per additional unit of BMI ranging from £18.85 [95% CI: 9.05–28.65] to £21.22 [95% CI: 14.35–28.07] [21]. There are some key differences in the approach, including the use of a larger sample size, data over a longer follow-up period, and different modelling strategies, but the results of both studies suggest a significant causal relationship between obesity and inpatient healthcare costs. This study offers an applied example of the use of Mendelian Randomization in health economics research and is a useful independent replication of the recent results reported by Dixon et al.


Instrumental variable analysis

An instrumental variable, in this case a genetic variant, must fulfill three assumptions: 1) the instrument must be associated with the risk factor (relevance assumption); 2) the instrument must be associated with the outcome only through its association with the risk factor (exclusion-restriction assumption); and 3) the instrument must be independent of factors that may affect the outcome (independence assumption) [14, 22].

Genes are randomly allocated, conditional on parental genes, according to Mendel’s laws of inheritance, and are therefore generally independent of external factors [14, 22]. The presence of obesity will not alter the genotype, so reverse causation is also not a concern [14, 23]. These characteristics make genetic variants viable instrumental variables. Figure 1 presents a directed acyclic graph that depicts the relationship between the instrument, exposure, and outcome.

Fig. 1
figure 1

Directed acyclic graph (DAG) of relationship between instrumental variable Z, exposure X and outcome Y

Data sources

The UK Biobank is a medical research database supported by the NHS. Between 2006 and 2010, the UK Biobank collected detailed demographics, health data and biological samples from over 500,000 participants between the ages of 40 and 69. The UK Biobank reports patient-level genotype data [24]. The UK Biobank is also linkable to inpatient admissions data from the HES database, which contains information on the diagnoses and procedures associated with each hospital admission [25]. These admissions are constructed in terms of hospital spells and episodes. A hospital spell represents the time from hospital admission to discharge. Within each hospital spell, subjects may experience several episodes of care, which are continuous periods of care under a single consultant. The linkage between the UK Biobank and HES records connects the genotype data necessary for MR analysis with the outcome of interest, healthcare resource use.


This study included all UK Biobank subjects with data for age, gender, and baseline BMI measurements and selected genetic variants. The UK Biobank data was linked with HES records using a unique patient identifier. A small number of patients who received bariatric surgery were excluded from the analysis because of the potential for rapid weight loss with elevated inpatient healthcare costs, which may obscure the relationship between BMI and inpatient healthcare cost. Female subjects with a confirmed pregnancy within 9 months of the baseline BMI measurement were also excluded due to possible confounding from pregnancy-related weight gain. The final sample included 482,127 subjects. A flowchart depicting dataset construction and subject exclusions is presented in Fig. 2.

Fig. 2
figure 2

Flowchart of dataset construction and subject exclusion


Genome-wide association studies (GWASs) are an invaluable resource for investigating diseases mediated by multiple genes. The objective of GWASs is to identify specific genetic variants, known as small-nucleotide polymorphisms (SNPs), that are associated with a particular trait or disease. Dozens of obesity-associated genetic variants have been identified in GWAS studies with low but significant explanatory power [26,27,28]. Alone, these variants would be weak instruments, but when combined into a genetic risk score (GRS), the variants can explain a greater amount of variance in BMI and reduce weak-instrument bias [29, 30].

A 2013 study by Belsky et al. synthesized data from 16 GWAS studies with the objective of developing a GRS to efficiently and effectively predict subjects’ predisposition toward obesity. SNPs were systematically chosen for inclusion in Belsky’s GRS based on both strength of association with obesity and the number of times the specific SNP was identified in different GWAS studies (i.e., replicability). The instrument was found to be a statistically significant predictor of BMI and obesity [31]. Belsky’s GRS formed the basis of the instrument used in this study, though not all of the specific SNPs were available in the UK Biobank data. In cases where the exact SNPs were unavailable in the genotyping platform implemented by the UK Biobank, SNPs on the same gene were chosen from the largest, most recent GWAS study included in Belsky’s analysis (i.e., Speliotes et al.) [27, 31]. Following the method employed by Belsky et al., each SNP was weighted according to its effect size and summed to generate the GRS.

MR analysis is only useful if the instrument is proven to be both valid and strong. Such an instrument does not violate any of three instrumental variable assumptions. The relevance assumption requires that the instrument (i.e. the GRS) be strongly associated with the exposure (i.e. BMI). The F-statistic is a common metric of instrument strength in MR analysis; typically, an F-statistic above 10 generally suggests a strong instrument [32]. The independence assumption requires that the instrument is not associated with any factors that may influence the outcome (i.e. inpatient healthcare cost) such as age, sex, ethnicity, or socioeconomic status. These factors generally have no direct effect on gene assignment, though systematic differences in allele frequency can occur in certain subpopulations, which is referred to as population stratification. This can be tested by linear regression of the instrument on each covariate [22, 29, 33].

The exclusion-restriction assumption requires that the GRS be associated with healthcare cost only through its association with BMI. Linkage disequilibrium and pleiotropy are both violations of this assumption. Linkage disequilibrium is a correlation between two genetic variants that occurs when traits are inherited together (i.e. not randomly), often in close proximity on a chromosome. Linkage disequilibrium violates the exclusion-restriction assumption if a variant in the GRS is in linkage disequilibrium with another variant that is associated with traits that also influence cost [14, 22]. For example, the assumption is violated if a variant in the GRS is in linkage disequilibrium with a variant associated with breast cancer, which is also associated with greater inpatient healthcare cost. Pleiotropy occurs when a genetic variant has multiple functions that may be related to the outcome variable [14, 22]. For example, the assumption is violated if a variant in the GRS is associated with both elevated BMI and increased risk of cardiovascular disease because cardiovascular disease also impacts inpatient healthcare cost. There is no direct test for these violations, but searching GWAS studies provides information on the variants’ associations with other genes and traits that could indicate violation by pleiotropy or linkage disequilibrium. The National Human Genome Research Institute’s (NHGRI) GWAS Catalog is a database of all known GWAS studies, which enables a search for all currently known gene associations [22, 34].


According to the World Health Organization criteria used internationally and by the UK NHS, an adult with BMI of 18.5 to 25.0 kg/m2 is considered to be of a normal weight. Individuals with a BMI of 25 to 30 kg/m2 are considered to be overweight, and individuals with a BMI at or above 30 kg/m2 are considered to be obese [7]. Although BMI is a commonly used measure for obesity, it is an imperfect measure that does not distinguish between lean and fat mass [35, 36].

BMI may be treated as a continuous variable or as a categorical variable. The categorical variable tested in this analysis was binary; subjects below the threshold of 30 kg/m2 were categorized as non-obese, and subjects above the threshold were categorized as obese. All models in this analysis were executed twice, representing BMI as a continuous variable or as a binary variable.

Inpatient healthcare costs

The linked HES data provides episode-level information about diagnoses, procedures, and hospital length of stay. This information was used to classify each episode of care by Healthcare Resource Group (HRG). HRGs are groupings of clinically similar diagnoses, procedures, and treatments that use similar resources and may be linked to a national unit cost [37]. An NHS application referred to as the HRG Reference Costs Grouper was used to process hospital data and appropriately assign HRGs to episodes of care [38]. Each HRG code was assigned a cost based on the 2017/2018 NHS National Schedule of Reference Costs [39]. In some cases, records were assigned two HRG codes, one of which was an “unbundled” code, which separately captures high-cost specialized care such as chemotherapy, specialist palliative care, and renal dialysis [40]. The cost of the unbundled code was included in the total estimated cost for the episode. Episodes of care that could not be processed by the Cost Grouper due to missing data elements were assigned the average cost of episodes in the same diagnosis category. All episodes of care were assigned an appropriate cost based on the National Schedule of Reference Costs, and all episodes of care were summed to generate a total inpatient healthcare cost per patient. An average annual inpatient cost per patient was then derived by dividing by years of follow-up. Subjects with no hospital visits during the relevant time period were assigned an annual inpatient healthcare cost of zero.

IV models

Three models were developed in this analysis to estimate the impact of BMI on inpatient healthcare costs.

  1. 1.

    A “naïve” regression that is a standard ordinary least squares regression of inpatient healthcare costs on observed BMI.

  2. 2.

    An instrumental variable model that uses the GRS instrument as a proxy for BMI to evaluate the association between inpatient healthcare costs and BMI.

  3. 3.

    An instrumental variable model with a two-part model of inpatient healthcare cost that uses the GRS instrument as a proxy for BMI, but accounts for the high proportion of subjects with zero inpatient healthcare costs.

The naïve, non-instrumental variable model was developed for comparison to the instrumental variable model. The regression of inpatient healthcare cost on BMI was adjusted for age, age squared, sex, ethnicity, socioeconomic status, smoking status, and the interaction of age and sex. Previous research has shown a U-shaped relationship between BMI and mortality, with underweight and obese patients at higher risk of death [41, 42]. A corresponding non-linear relationship between BMI and healthcare cost was also anticipated, so a BMI-squared term was also included in the regression. Age is also expected to have a non-linear relationship with BMI because healthcare costs tend to increase with age [43, 44]. Insignificant covariates were excluded from the final naïve model.

Several model types were tested in the development of the instrumental variable model. The first was a two-stage least squares (TSLS) model, which is the most common instrumental variable model used in MR studies [14]. The first stage of the TSLS model is a linear regression of BMI on the GRS and other covariates. The linear second stage regression predicts total inpatient healthcare costs from predicted BMI and the same covariates. An instrumental variable probit model was also tested to determine whether it would provide a better fit for the binary dependent variable. Best-fit models were chosen based on the F-statistic of the instrument and the magnitude of R2.

In addition to these standard instrumental variable models, two-part instrumental variable cost models were tested because a large proportion of patients incurred no healthcare costs. The first stage of the model was identical to the standard TSLS model. The second stage was constructed as a two-part model. The first part of the cost model was a logistic regression, which predicted whether subjects had non-zero costs. The second part of the cost model was an ordinary least squares or generalized linear model regression of predicted BMI on cost, conditional on the subject having non-zero costs. The generalized linear model used a log link and gamma distribution to account for the skewness of the cost data. The main outcome of all three models is the marginal effect of obesity on inpatient healthcare costs. In this analysis, marginal effects refer to the added incremental costs associated with obesity. All analyses were executed in STATA® version 14.


The mean BMI of the 482,127 subjects included in the analysis was 27.4 kg/m2. The average age of subjects was 56.5 years, 24% of subjects had BMI > 30 kg/m2, 94% were white, and 54% were female. Additional baseline characteristics are presented in Table 1.

Table 1 Baseline Characteristics

The instrument (GRS) was then checked for violation of any of the three instrumental variable assumptions (relevance, independence, exclusion-restriction). The F-statistic of our TSLS models were 3999 and 2453 for continuous and binary BMI respectively. Model F-statistics were well over the commonly used IV threshold of 10, which satisfies the relevance assumption [22, 32]. Linear regression of age, sex, and socioeconomic status found no significant relationship between the instrument and the confounding factors. The instrument was significantly associated with ethnicity (p < 0.001), which suggests a possible violation of the independence assumption by population stratification, which occurs when alleles occur with different frequencies in a population subgroup [22]. This finding is consistent with previous genetic studies that have shown varying effect sizes for obesity-related genes among different ethnicities [45, 46]. However, there was no significant association of ethnicity with inpatient healthcare cost (p = 0.896). This non-significance and the ethnically homogenous population (94% white) provide evidence against violation by population stratification. Assortative mating can also violate the independence assumption, though this cannot be directly tested. Genes are randomly assigned conditional on parental genes, but research suggests that individuals tend to select mates who are phenotypically similar to themselves [47,48,49]. This may violate the independence assumption if the mother and father’s genetic traits are associated with each other and with the outcome of inpatient healthcare costs [49].

Violation of the exclusion-restriction assumptions by linkage disequilibrium or pleiotropy cannot be directly tested. A search of NHGRI’s GWAS Catalog showed that some variants in the GRS may have multiple functions (pleiotropy) or be in linkage disequilibrium with variants associated with confounding factors or comorbidities. These variants were removed from the GRS in a sensitivity analysis and the direction of the estimated effect is the same as the primary analysis. The final naïve model is presented in Table 2.

Table 2 Naïve ordinary least squares regression results

The association between healthcare cost and BMI was significant (p < 0.05). BMI-squared was also tested as the dependent variable in the naïve analysis because of the non-linear relationship between BMI and cost. However, using the squared term did not improve the model fit (R2), probably because so few (under 1%) of the study population were underweight (BMI < 18.5). The best-fit instrumental variable model was a TSLS model, presented in Table 3 for both continuous and binary BMI.

Table 3 Two-stage least squares regression results

The relationship between predicted BMI and inpatient healthcare costs was significant (p < 0.05) in the TSLS model, but was not significant in the two-part cost models.

The naïve analysis of annual cost on continuous BMI predicts an additional healthcare cost of £21.61 [95% CI £20.33 – £22.89] associated with each additional BMI unit. The TSLS instrumental variable model predicts a slightly smaller cost difference of £14.36 [95% CI £0.31 – £28.42] per additional BMI unit. When the same models were executed with obesity expressed as a binary variable, the naïve analysis estimated £205.53 [95% CI £191.45 – £219.60] greater inpatient healthcare costs for obese subjects compared to non-obese subjects. The TSLS instrumental variable model found that inpatient healthcare costs were £201.58 [95% CI £4.32 – £398.84] greater for obese subjects compared to non-obese subjects. The two-part cost models did not find a significant difference in healthcare costs between obese and non-obese subjects, though the marginal effects were similar to the standard TSLS model. The predicted probability of being obese effectively ranges from 16 to 34% in the TSLS instrumental variable model. In the two-part cost model, the association between predicted BMI and inpatient healthcare cost is more significant (p = 0.003), and the odds of incurring hospital costs for obese patients is 1.02 [95% CI: 1.01–1.04] times that of non-obese patients. Marginal effects of each of these models are compared in Table 4.

Table 4 Model Comparison


The results of both the TSLS and naïve models show significantly greater inpatient healthcare costs for obese patients in all versions of the model compared to non-obese patients. The impact of using instrumental variable methods compared to naïve models depends on whether BMI is expressed as a continuous or binary variable. When BMI was expressed as a binary categorical variable (above or below 30 kg/m2), the naïve and instrumental variable models estimated very similar marginal effects (£206 vs. £202), which suggests that the greater healthcare cost is almost entirely causal. When BMI was expressed as a continuous variable, the instrumental variable model estimated a smaller marginal effect than the naïve model (£14 vs £22), suggesting that only a portion of elevated costs are directly caused by greater BMI.

The IV model is more robust to residual confounding and reverse causality than the naïve model, but both models show very similar marginal effects. This similarity suggests that the naïve model is not strongly influenced by residual confounding and reverse causality. If the naïve model found a much larger cost difference between obese and non-obese individuals than the IV model, that would suggest that the cost difference estimated by the naïve model might be inflated by residual confounding. The naïve model has the advantage of a much tighter confidence interval than the IV model because the relationship between BMI and healthcare cost is estimated directly from the data instead of using GRS as a proxy.

In the TSLS model, predicted BMI is significantly (p < 0.05) associated with inpatient healthcare cost, but the p-value of 0.045 may be considered high given the large sample size. A large sample size is important to the strength of the instrument, and may reduce the likelihood of weak instrument bias, and offset the low explanatory power of the GRS [32, 50]. However, since 95% of the weighted GRS values are between 2.16 and 4.19, the predicted BMI effectively ranges between 26 and 29 kg/m2, which is close to the obesity threshold of BMI 30 kg/m2.

The MR approach used in this study makes the results less vulnerable to bias than standard regression analysis in scenarios where there is a high potential for unobserved confounding factors, and the large sample size reduces weak instrument bias [32, 50]. Results of both the instrumental variable and naïve models show a significantly greater inpatient cost for patients with BMI ≥30 kg/m2 than < 30 kg/m2 in all versions of the model, which is consistent with other studies of the impact of obesity on healthcare costs. A 2016 study of linked general practice and HES data found that annual healthcare costs were £456 [95% CI £344–£568] higher for subjects of BMI ≥ 40 kg/m2 [1]. An observational study by Tigbe et al. using data from the UK Counterweight Programme found an annual healthcare cost greater by £16 [95% CI: £11–£21] per unit BMI, when adjusted for age, sex, smoking status, alcohol intake, and physical activity [4]. Our analysis estimated a remarkably similar cost of £14.36 [95% CI £0.31 – £28.42] per additional unit of BMI, but in contrast our more recent analysis only focused on hospital expenditures, and did not assess total healthcare expenditures which have increased over the years.

A 2020 study by Dixon et al. (conducted at the same time as this study) undertook an MR analysis of linked UK Biobank and HES data, completely independently of the present analysis [21]. The 2020 study by Dixon and colleagues found marginal effects per additional unit of BMI ranging from £18.85 [95% CI: 9.05–28.65] to £21.22 [95% CI: 14.35–28.07] depending on the type of instrumental variable model used [21]. These results are not materially different from those presented in the current study, which found a marginal effect of £14.36 [95% CI £0.31 – £28.42] per additional unit of BMI. The minor differences in our results could be due to several factors. Dixon et al. included a greater number of alleles in the instrument, whereas the current study used an instrument with fewer alleles and had been generated and validated as a measure of obesity risk [31]. The F-statistic (a measure of the strength of the instrument) is higher in our study and suggests a stronger instrument, though the statistic is strongly influenced by sample size and it is likely a reflection of that factor [50]. The current study had the benefit of approximately two additional years of follow-up data and included approximately 100,000 more subjects. Both strategies produced instrumental variable analyses with F-statistics well above the threshold of 10 for a strong instrument. Both studies adopted the same costing methods, but used different versions of the NHS Cost Grouper software and national reference costs [21, 51]. Our study used the 2017/2018 Grouper, which has updated the valuation of procedure codes and included a larger net number of HRGs than the 2016/2017 Grouper [52].

The types of IV models chosen also differed between the studies. Our analysis included a standard TSLS model and a two-stage cost model similar to the one executed by Cawley and Meyerhoefer, while Dixon et al. implemented inverse variance weighted models and penalized weighted median models. Notably, our IV models found smaller effect sizes than the naïve models, except in the case of the two-part cost models using binary obesity. This is different than the results of Dixon [21] and Cawley and Meyerhoefer [19], where the opposite was true. Any combination of these differences covariates, sample size, and modelling strategy could be responsible for the differences in our results, but the fact that the results are similar in magnitude suggests that the results stand up to independent replication.

Only inpatient healthcare costs are considered in this analysis because outpatient, accident and emergency, and pharmacy data were not yet linkable to the UK Biobank data at the time the analyses were conducted. However, Tigbe et al. found that the positive association of healthcare cost with BMI extends beyond the inpatient setting. The study found that subjects with a BMI greater than 40 kg/m2 incurred significantly greater costs for prescription medication, primary care, and outpatient care than for subjects with a BMI less than 20 kg/m2 [4]. These additional costs are a subject for further research once additional ambulatory health record data is available for linkage with the UK Biobank.

This study also does not consider social care costs associated with obesity. In England, health and social care services cost about £17 billion per year, and about 70% of these annual costs are attributable to the care of individuals with long-term conditions [53, 54]. Obese individuals are at greater risk for diseases that require long-term care such as diabetes, cardiovascular disease, musculoskeletal disorders, and mental health disorders. These individuals often experience functional limitations and require long-term assistance with personal care, domestic tasks, transportation, housing and finances, which increases social care costs [53, 54]. A 2017 study by Copley et al. found that a 1 unit (kg/m2) increase in BMI was associated with a 5% increase in the odds of requiring social care. They also estimated that a BMI of 40 is associated with a nearly £500 increase in average annual social care cost compared to an individual with a normal BMI [52].

There are limitations to the generalizability of these results, including the ‘healthy volunteer’ bias of the UK Biobank data and the age limit of the patients recruited (40–69 years) [53]. Specifically, UK Biobank subjects were less likely to be obese, to report health conditions, or to smoke and drink [53, 55]. These individuals also had a lower mortality rate than the general population [55]. The non-representativeness of the UK Biobank sample introduces the potential for collider bias [56, 57]. In this case, the most severely obese individuals (who are also those at the highest genetic risk for obesity) are less likely to be represented in the data. If higher BMI and higher healthcare costs both reduce the likelihood of participation in the Biobank, an association is induced that violates the instrumental variable assumptions. Non-representative samples are common with large scale databases and this source of bias is typically small compared to other types of bias [56].

The costs reflect recent standard of care within the UK National Health Service, which may be different under other healthcare systems. Any MR analysis is limited by current knowledge of genes associations. The GRS used as an instrument in this analysis explained a significant but limited amount of variation in BMI. Future GWAS studies of obesity-related traits may reveal additional BMI-associated alleles that could explain more of the variation in BMI and create a stronger instrument. Unknown pleiotropy or linkage disequilibrium violations could also bias the analysis [13, 22]. Not all BMI-related alleles are necessarily related to body fat, given that they are derived from normal population among whom a half or more are likely to have BMI below 30, and thus influenced to a greater degree by variance in muscle mass, especially in younger people. The influence of parental behavior on their offspring may also violate the exclusion-restriction assumption. For example, a mother’s high-risk genotype may influence her behavior and preferences, which in turn may affect the child’s behavior either directly or through intrauterine effects of maternal adiposity [13, 58, 59]. If the mother’s genotype influences both the genotype of the child and the child’s inpatient healthcare cost, this represents a violation of the exclusion-restriction assumption [13, 19].

BMI is the most commonly used measure of obesity, but it is imperfect. BMI classification is based on a person’s height and weight, but does not distinguish between lean and fat mass. Therefore, BMI may be overestimated in people with higher muscle mass [35, 36]. Although there may be some cases where BMI is not a perfect measure, it is appropriate for use in this study because BMI was a common measure in the GWAS studies used to generate the GRS [26, 27]. There is however a potentially important source of error arising from the fact that the disease process of obesity (including its genetic elements) is present in large numbers of people before their BMI has risen to the conventional public health obesity threshold of 30 kg/m2. Thus, it is possible that some subjects in the sample who possess risk alleles may not yet have developed obesity or its comorbidities. Increase in size and number of fat cells triggers changes in metabolic and inflammatory processes, but comorbidities such as diabetes, atherosclerosis, osteoarthritis take time to develop [11, 60]. This may result in an underestimation of the costs attributable to obesity.


The continuous and binary IV models provide evidence for a causal relationship between obesity and higher inpatient healthcare costs, independent of unobserved confounding factors such as lifestyle. Compared to the naïve model, the binary IV model found a slightly smaller marginal effect of obesity (₤201.58 vs. ₤205.53). The continuous IV model found a slightly smaller marginal effect of a single unit increase in BMI than the naïve model (₤14.36 vs. ₤21.61 annual cost). While we believe that this analysis is a very important step in understanding the role of endogeneity in determining causation in healthcare cost studies, further analyses with instrumental variables used in addition to weighted GRS would strengthen the results. At the time this analysis was conducted, general practice data were not yet linked to UK Biobank. Once the linked data become available, the MR analysis on all healthcare costs should be conducted to provide a more complete picture of healthcare costs attributable to obesity.

Availability of data and materials

The data that support the findings of this study are available from the UK Biobank but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of the UK Biobank.



Mendelian Randomization


Instrumental variable


Hospital Episode Statistics


Two-stage least squares


National Health Services


Body mass index


Randomized controlled trial


Genome-wide association study


Small-nucleotide polymorphism


Genetic risk score


National Human Genome Research Institute


Healthcare Resource Group


Standard deviation


  1. Rudisill C, Charlton J, Booth HP, Gulliford MC. Are healthcare costs from obesity associated with body mass index, comorbidity or depression? Cohort study using electronic health records: drivers of health care costs in obesity. Clin Obes. 2016;6(3):225–31.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Wang YC, McPherson K, Marsh T, Gortmaker SL, Brown M. Health and economic burden of the projected obesity trends in the USA and the UK. Lancet. 2011;378(9793):815–25.

    Article  PubMed  Google Scholar 

  3. Tremmel M, Gerdtham U-G, Nilsson PM, Saha S. Economic burden of obesity: a systematic literature review. Int J Environ Res Public Health. 2017;14(4):435.

    Article  PubMed Central  Google Scholar 

  4. Tigbe WW, Briggs AH, Lean ME. A patient-centred approach to estimate total annual healthcare cost by body mass index in the UK Counterweight programme. Int J Obes. 2013;37(8):1135–9.

    Article  CAS  Google Scholar 

  5. Finucane MM, Stevens GA, Cowan MJ, Danaei G, Lin JK, Paciorek CJ, et al. National, regional, and global trends in body-mass index since 1980: systematic analysis of health examination surveys and epidemiological studies with 960 country-years and 9.1 million participants. Lancet. 2011;377(9765):557–67.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Afshin A, Forouzanfar MH, Reitsma MB, Sur P, Estep K, Lee A, et al. Health Effects of Overweight and Obesity in 195 Countries over 25 Years. N Engl J Med. 2017;377(1):13–27.

    Article  PubMed  Google Scholar 

  7. Body mass index - BMI: World Health Organization (WHO); Available from:

  8. Bhaskaran K, Douglas I, Forbes H, dos Santos-Silva I, Leon DA, Smeeth L. Body-mass index and risk of 22 specific cancers: a population-based cohort study of 5.24 million UK adults. Lancet. 2014;384(9945):755–65.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Lavie CJ, De Schutter A, Parto P, Jahangir E, Kokkinos P, Ortega FB, et al. Obesity and prevalence of cardiovascular diseases and prognosis-the obesity paradox updated. Prog Cardiovasc Dis. 2016;58(5):537–47.

    Article  PubMed  Google Scholar 

  10. Singh GM, Danaei G, Farzadfar F, Stevens GA, Woodward M, Wormser D, et al. The age-specific quantitative effects of metabolic risk factors on cardiovascular diseases and diabetes: a pooled analysis. PLoS One. 2013;8(7):e65174.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Kent S, Fusco F, Gray A, Jebb SA, Cairns BJ, Mihaylova B. Body mass index and healthcare costs: a systematic literature review of individual participant data studies. Obes Rev. 2017;18(8):869–79.

    Article  PubMed  Google Scholar 

  12. Didelez V, Sheehan N. Mendelian randomization as an instrumental variable approach to causal inference. Stat Methods Med Res. 2007;16(4):309–30.

    Article  PubMed  Google Scholar 

  13. von Hinke S, Davey Smith G, Lawlor DA, Propper C, Windmeijer F. Genetic markers as instrumental variables. J Health Econ. 2016;45:131–48.

    Article  Google Scholar 

  14. Boef AGC, Dekkers OM, le Cessie S. Mendelian randomization studies: a review of the approaches used and the quality of reporting. Int J Epidemiol. 2015;44(2):496–511.

    Article  PubMed  Google Scholar 

  15. Chatterjee NA, Giulianini F, Geelhoed B, Lunetta KL, Misialek JR, Niemeijer MN, et al. Genetic obesity and the risk of atrial fibrillation: causal estimates from Mendelian randomization. Circulation. 2017;135(8):741–54.

    Article  PubMed  Google Scholar 

  16. Cole CB, Nikpay M, Stewart AF, McPherson R. Increased genetic risk for obesity in premature coronary artery disease. Eur J Hum Genet. 2016;24(4):587–91.

    Article  PubMed  Google Scholar 

  17. Fall T, Hägg S, Ploner A, Mägi R, Fischer K, Draisma HH, et al. Age- and sex-specific causal effects of adiposity on cardiovascular risk factors. Diabetes. 2015;64(5):1841–52.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Thrift AP, Gong J, Peters U, Chang-Claude J, Rudolph A, Slattery ML, et al. Mendelian randomization study of body mass index and colorectal Cancer risk. Cancer Epidemiol Biomark Prev. 2015;24(7):1024–31.

    Article  Google Scholar 

  19. Cawley J, Meyerhoefer C. The medical care costs of obesity: an instrumental variables approach. J Health Econ. 2012;31(1):219–30.

    Article  PubMed  Google Scholar 

  20. Black N, Hughes R, Jones AM. The health care costs of childhood obesity in Australia: an instrumental variables approach. Econ Hum Biol. 2018;31:1–13.

    Article  PubMed  Google Scholar 

  21. Dixon P, Hollingworth W, Harrison S, Davies NM, Davey SG. Mendelian randomization analysis of the causal effect of adiposity on hospital costs. J Health Econ. 2020;70:102300.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Lawlor DA, Harbord RM, Sterne JA, Timpson N, Davey SG. Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Stat Med. 2008;27(8):1133–63.

    Article  PubMed  Google Scholar 

  23. Smith GD, Ebrahim S. Mendelian randomization: prospects, potentials, and limitations. Int J Epidemiol. 2004;33(1):30–42.

    Article  PubMed  Google Scholar 

  24. About UK Biobank: UK Biobank; 2019 [Available from:

  25. Hospital Episode Statistics Data in Showcase 2013 [updated December 2013. Version 1.0:[Available from:

  26. Locke AE, Kahali B, Berndt SI, Justice AE, Pers TH, Day FR, et al. Genetic studies of body mass index yield new insights for obesity biology. Nature. 2015;518(7538):197–206.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Speliotes EK, Willer CJ, Berndt SI, Monda KL, Thorleifsson G, Jackson AU, et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat Genet. 2010;42(11):937–48.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Thorleifsson G, Walters GB, Gudbjartsson DF, Steinthorsdottir V, Sulem P, Helgadottir A, et al. Genome-wide association yields new sequence variants at seven loci that associate with measures of obesity. Nat Genet. 2009;41(1):18–24.

    Article  CAS  PubMed  Google Scholar 

  29. Burgess S, Thompson SG. Use of allele scores as instrumental variables for Mendelian randomization. Int J Epidemiol. 2013;42(4):1134–44.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Burgess S, Dudbridge F, Thompson SG. Combining information on multiple instrumental variables in Mendelian randomization: comparison of allele score and summarized data methods. Stat Med. 2016;35(11):1880–906.

    Article  PubMed  Google Scholar 

  31. Belsky DW, Moffitt TE, Sugden K, Williams B, Houts R, McCarthy J, et al. Development and evaluation of a genetic risk score for obesity. Biodemography Soc Biol. 2013;59(1):85–100.

    Article  PubMed  Google Scholar 

  32. Boef AG, Dekkers OM, Vandenbroucke JP, le Cessie S. Sample size importantly limits the usefulness of instrumental variable methods, depending on instrument strength and level of confounding. J Clin Epidemiol. 2014;67(11):1258–64.

    Article  PubMed  Google Scholar 

  33. Palmer TM, Lawlor DA, Harbord RM, Sheehan NA, Tobias JH, Timpson NJ, et al. Using multiple genetic variants as instrumental variables for modifiable risk factors. Stat Methods Med Res. 2012;21(3):223–42.

    Article  PubMed  PubMed Central  Google Scholar 

  34. The NHGRI-EBI GWAS Catalog of published genome-wide association studies. National Human Genome Research Institute. . Available from: [cited 07 Aug 2019].

  35. Adab P, Pallan M, Whincup PH. Is BMI the best measure of obesity? BMJ. 2018;360:k1274.

    Article  PubMed  Google Scholar 

  36. Romero-Corral A, Somers VK, Sierra-Johnson J, Thomas RJ, Collazo-Clavell ML, Korinek J, et al. Accuracy of body mass index in diagnosing obesity in the adult general population. Int J Obes. 2008;32(6):959–66.

    Article  CAS  Google Scholar 

  37. Canavan C, West J, Card T. Calculating Total health service utilisation and costs from routinely collected electronic health records using the example of patients with irritable bowel syndrome before and after their first gastroenterology appointment. PharmacoEconomics. 2016;34(2):181–94.

    Article  PubMed  Google Scholar 

  38. HRG4+ 2017/18 Reference Costs Grouper [updated 15 January 2019. Available from:

  39. 2017/18 Reference Cost Data: NHS; 2017 [updated 17 December 2018. Available from:

  40. Grašič K, Mason AR, Street A. Paying for the quantity and quality of hospital care: the foundations and evolution of payment policy in England. Heal Econ Rev. 2015;5(1):50.

    Google Scholar 

  41. Flegal KM, Graubard BI, Williamson DF, Gail MH. Excess deaths associated with underweight, overweight, and obesity. JAMA. 2005;293(15):1861–7.

    Article  CAS  PubMed  Google Scholar 

  42. Flegal KM, Kit BK, Orpana H, Graubard BI. Association of all-cause mortality with overweight and obesity using standard body mass index categories: a systematic review and meta-analysis. JAMA. 2013;309(1):71–82.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Kelly E, Stoye G, Vera-Hernández M. Public hospital spending in England: Evidence from National Health Service Administrative Records. Fisc Stud. 2016;37(3–4):433–59.

    Article  Google Scholar 

  44. Storey A. Living longer: how our population is changing and why it matters. UK: Office for National Statistics; 2018.

    Google Scholar 

  45. Shinozaki K, Okuda M. The effects of fat mass and obesity-associated gene variants on the body mass index among ethnic groups and in children and adults. Indian J Endocrinol Metab. 2012;16(Suppl 3):S588–S95.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Peng S, Zhu Y, Xu F, Ren X, Li X, Lai M. FTO gene polymorphisms and obesity risk: a meta-analysis. BMC Med. 2011;9:71.

    Article  PubMed  PubMed Central  Google Scholar 

  47. Howe LJ, Lawson DJ, Davies NM, St Pourcain B, Lewis SJ, Davey Smith G, et al. Genetic evidence for assortative mating on alcohol consumption in the UK Biobank. Nat Commun. 2019;10(1):5039.

    Article  PubMed  PubMed Central  Google Scholar 

  48. Tenesa A, Rawlik K, Navarro P, Canela-Xandri O. Genetic determination of height-mediated mate choice. Genome Biol. 2016;16:269.

    Article  PubMed  PubMed Central  Google Scholar 

  49. Hartwig FP, Davies NM, Davey SG. Bias in Mendelian randomization due to assortative mating. Genet Epidemiol. 2018;42(7):608–20.

    Article  PubMed  PubMed Central  Google Scholar 

  50. Burgess S, Thompson SG, Collaboration CCG. Avoiding bias from weak instruments in Mendelian randomization studies. Int J Epidemiol. 2011;40(3):755–64.

    Article  PubMed  Google Scholar 

  51. Dixon P, Davey Smith G, Hollingworth W. The association between adiposity and inpatient hospital costs in the UK biobank cohort. Appl Health Econ Health Policy. 2019;17(3):359–70.

    Article  PubMed  Google Scholar 

  52. Copley VR, Cavill N, Wolstenholme J, Fordham R, Rutter H. Estimating the variation in need for community-based social care by body mass index in England and associated cost: population-based cross-sectional study. BMC Public Health. 2017;17(1):667.

    Article  PubMed  PubMed Central  Google Scholar 

  53. Fry A, Littlejohns TJ, Sudlow C, Doherty N, Adamska L, Sprosen T, et al. Comparison of Sociodemographic and health-related characteristics of UK biobank participants with those of the general population. Am J Epidemiol. 2017;186(9):1026–34.

    Article  PubMed  PubMed Central  Google Scholar 

  54. Gatineau M, Hancock C, Dent M. Obesity and disability: adults. London: Public Health England; 2013.

    Google Scholar 

  55. Batty GD, Gale CR, Kivimäki M, Deary IJ, Bell S. Comparison of risk factor associations in UK biobank against representative, general population based studies with conventional response rates: prospective cohort study and individual participant meta-analysis. BMJ. 2020;368:m131.

    Article  PubMed  PubMed Central  Google Scholar 

  56. Gkatzionis A, Burgess S. Contextualizing selection bias in Mendelian randomization: how bad is it likely to be? Int J Epidemiol. 2019;48(3):691–701.

    Article  PubMed  Google Scholar 

  57. Munafò M, Smith GD. Biased estimates in Mendelian randomization studies conducted in unrepresentative samples. JAMA Cardiol. 2018;3(2):181.

    Article  PubMed  Google Scholar 

  58. Castillo-Laura H, Santos IS, Quadros LCM, Matijasevich A. Maternal obesity and offspring body composition by indirect methods: a systematic review and meta-analysis. Cad Saude Publica. 2015;31(10):2073–92.

    Article  PubMed  Google Scholar 

  59. Wang Y, Min J, Khuri J, Li M. A systematic examination of the association between parental and child obesity across countries. Adv Nutr. 2017;8(3):436–48.

    Article  PubMed  PubMed Central  Google Scholar 

  60. Bray GA, Kim KK, Wilding JPH, World OF. Obesity: a chronic relapsing progressive disease process. A position statement of the World obesity federation. Obes Rev. 2017;18(7):715–23.

    Article  CAS  PubMed  Google Scholar 

Download references


This research has been conducted using the UK Biobank Resource under Application Number 44371.


This study was funded by Novartis AG and study results were not contingent on the sponsor’s approval or censorship of the manuscript.

Author information

Authors and Affiliations



KD conducted the statistical analysis and drafted the manuscript. All authors contributed to study design, results interpretation, and manuscript editing. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Katherine Dick.

Ethics declarations

Ethics approval and consent to participate

All UK Biobank participants gave informed consent for the collection and use of their data for research purposes. UK Biobank approved the use of data for this study under Application Number 44371.

Consent for publication

Not Applicable.

Competing interests

Katherine Dick is an employee of Avalon Health Economics which received compensation from Novartis AG for this work. John Schneider is CEO and a shareholder of Avalon Health Economics which received compensation from Novartis AG for this work. Andrew Briggs is a Director and shareholder of Avalon Health Economics which received compensation from Novartis AG for this work. Pascal Lecomte is employed by, owns stock in and has stock options in Novartis Pharma AG. Stephane A. Regnier is an employee and a shareholder of Novartis Pharma AG. Michael Lean declares that he has no competing interests for this work.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dick, K., Schneider, J.E., Briggs, A. et al. Mendelian randomization: estimation of inpatient hospital costs attributable to obesity. Health Econ Rev 11, 16 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: