Risk selection into supplemental private health insurance in China

Background Information on risk selection is important for the regulation and development of supplemental private health insurance (PHI). The research on risk selection into supplemental PHI has been documented in several developed countries where the regulation of the PHI markets was relatively mature. However, evidence on this important aspect of the supplemental PHI market in China is still absent in the literature. The private insurers in China were not prohibited from discrimination against pre-existing conditions and did not guarantee ongoing enrolment. Therefore, the direction and degree of risk selection could not be inferred using the evidence from the other countries. To provide evidence on risk selection into supplemental PHI in China, we conducted a cross-sectional analysis using data from the 2015 wave of China Health and Retirement Longitudinal Study (CHARLS). Results Using probit models, we found that individuals having better self-reported general health were more likely to enrol in PHI in China, suggesting advantageous selection. This result was confirmed by an alternative analysis using an instrumental variable. We also adjusted the realized occurrence of hospitalization by excluding potential moral hazard effect and showed that the adjusted hospitalization risk was negatively associated with PHI enrolment, which also indicated advantageous selection. Conclusions The findings suggested potential over-insurance of healthier individuals or under-insurance of less healthy individuals. The regulation of the PHI market in China should aim to address the inefficiency. The current study could also contribute to the information base for policymakers in countries where the PHI markets similarly lack strong regulation.


Introduction
China experienced fast economic growth since late 1970s. However, the increase of health insurance coverage was not observed alongside with the growth of economy in China [1,2]. In fact, the population health insurance coverage rate dropped from 73% to 50% during 1993 to 2003 [1]. To reverse this, the Chinese government deployed programs towards universal coverage since 2003 that collectively led to a basic health insurance coverage rate of 95% by 2011 [2][3][4]. Despite this achievement, the high out-of-pocket (OOP) rate associated with public programs that ranges from 40% to 70% remains a financial challenge to the covered population [3,[5][6][7]. The reimbursement rate of public insurance is not expected to increase substantially in the near future because the public medical insurance fund in China is already on the verge of a deficit [4,8]. To avoid the coverage gap, some people purchase private health insurance (PHI) in addition to their public health insurance [2,7]. Indeed, the Chinese government is engaging private health insurers to provide supplemental coverage of expensive healthcare services [3,4]. This is reflected by the fact that the current PHI plans dominantly target at critical illnesses that incur high inpatient costs [2,6,9].
A mere 3% of the Chinese population is covered by PHI [7]. Although the total PHI premium in China is surging at 30% annually [5,8], the PHI products still lack important features of supplemental health insurance. For example, PHIs in China usually do not reimburse based on actual expenditure. Instead, they pay a lump sum for a narrow set of conditions and have no commitment to ongoing enrolment [5,9]. Such policies account for over 70% of the market [5]. In addition, regulation of price discrimination against individuals with unfavourable health conditions in the PHI market is still absent. At the initial enrolment, insurers may require a physical examination report from enrolees. However, the more popular approach is to ask enrolees to disclose any preexisting conditions [10]. An observation period after enrolment, mostly 90 days, is used to verify if enrolees have critical illnesses at the time of enrolment. The list of critical illness varies across insurers and plans. Pre-existing conditions will become exceptions in the individuals' plans. Furthermore, renewal of insurance is extremely unlikely if the benefit of a high-cost hospitalization due to a critical illness had been claimed [5]. Hence, the PHI market in China is still premature and should be improved in numerous attributes [4,5]. To that end, it is important to understand whether the voluntary enrolment into supplemental PHI is driven by adverse or advantageous selection [5,9]. However, there is currently a dearth of evidence on risk selection into supplemental PHI in China. Such evidence can help regulators and policymakers to establish an efficient insurance market. Recent government efforts to promote PHI as supplements to basic public health insurance including offering tax breaks for purchasing PHI reiterates the evidence need on risk selection [11]. As such, the current study aims to examine whether PHI in China is associated with adverse selection or advantageous selection.
The classic Rothschild-Stiglitz (R-S) [12] model of insurance markets predicts adverse selection at the competitive equilibrium when a consumer's risk type is private information. Indeed, the literature of adverse selection in the basic health insurance markets is relatively saturated [13][14][15]. However, studies on risk selection in the supplemental PHI markets are relatively sparse. In addition, current evidence in this field arguably supports advantageous selection or lack of risk selection more than adverse selection.
Wolfe and Goddeeris [16] found that those with better self-reported health were more likely to purchase Medigap, a type of policy sold by private insurers to supplement basic Medicare plans in the US. Similarly, Fang et al. [17] showed that Medicare beneficiaries who were covered by Medigap tended to be healthier than those without Medigap. On top of that, they further identified income, education, financial planning horizon, and cognitive function including financial literacy as sources of advantageous selection. Extending the work by Fang et al., Keane and Stavrunova [18] found advantageous selection into Medigap when not controlling any variables but found adverse selection which was consistent with the R-S model once race and marital status were controlled in addition to the variables that were controlled in the analysis by Fang et al. Advantageous selection has also been documented outside of the US.
Buchmueller et al. [19] evidenced advantageous selection in the Australian supplemental PHI market when testing the correlation between self-reported health and PHI enrolment as well as the correlation between realized hospital expenditure and PHI enrolment.
Cutler et al. [20] attributed the advantageous selection into Medigap to the hypothesis that insurance demand in a market was driven more by risk tolerance heterogeneity than by health risk heterogeneity. To test this, he showed that proxies for risk aversion were positively correlated with owning Medigap policies but negatively correlated with ex post health risk. The theory by Culter et al. was echoed by evidence in the German supplemental PHI market [21], where it has been shown risk averse attitude was positively associated with purchasing PHI.
However, lack of risk selection into supplemental PHI has also been found. Such findings were recently documented for several European countries and Israel [22,23]. In both studies, self-reported health was not significantly associated with PHI enrolment. As such, risk selection into PHI varies across countries. A possible reason of the variation is that basic insurance provided by governments that aim to optimize social efficiency can address adverse selection to a certain extent. Therefore, PHI markets have fewer high-risk consumers than the basic insurance markets [15].
The heterogeneity in the direction of risk selection highlights the importance of examining the PHI market by country to provide specific evidence on potential social inefficiency and policy implication for the regulation. So far, the English literature of risk selection into supplemental insurance exclusively focused on developed countries. The current study provides initial evidence on risk selection in the PHI market in a developing country. In addition, all the markets investigated previously prohibited insurers from price discrimination based on conditions either partially or entirely. Thus, our study also contributes to the literature by documenting risk selection into PHI in a market lacking similar regulation.

Theoretical background
The R-S model of insurance purchase applies most appropriately to a fully competitive market with asymmetric information [12]. In this scenario, the insured keep their health status as private information such that the insurers cannot practice medical underwriting. Hence, the sicker individuals would buy more insurance. This framework does not directly apply to China in that PHI is supplemental to public health insurance and that the insurers are allowed to price-set using the health information available. Suppose there is a high-risk type individual H whose probability of incurring a healthcare expenditure E not covered by the public health insurance is p H and a low-risk type individual L whose corresponding probability is p L such that we know 0 ≤ p L < p H ≤ 1. Assuming the insurers can successfully screen bad risk, then we have Pr(L) ≥ Pr(H) in the R-S framework in which there is no risk preference heterogeneity where Pr is the probability of possessing PHI for the corresponding risk type.
More, the R-S model does not allow correlation between the risk averse parameter τ and the probability p ∈[0, 1] of incurring a healthcare expenditure E. To illustrate how such correlation impacts risk selection, let the expected utilities of purchasing and not purchasing PHI for an individual with wealth W and a premium m of purchasing PHI be respectively V PHI ðp; τÞ ¼ uðW −mÞ þ e V NoPHI ðp; τÞ ¼ puðW −EÞ þ ð1−pÞuðW Þ where e is the fixed non-monetary costs of obtaining PHI and follows a logistic distribution independent of p and τ, and u is the constant relative risk aversion utility function that takes the form uðyÞ ¼ Y 1−τ 1−τ [17]. The probability for this individual to buy PHI is then given by which is the form of logistic probability. Since τ measures risk aversion, Pr PHI (p, τ) is increasing in τ [17]. Fang et al. has shown that, with the additional assumption of negative correlation between p and τ, Pr PHI (p, τ) may be decreasing in p if corr(p, τ) is unaccounted for in analysis [17], leading to advantageous selection.
Following the theories of deviations from the classic R-S model described above, we hypothesize that there is advantageous selection in the supplemental PHI market in China when both the price-setting variables (e.g. demographic information and physical conditions) and the risk tolerance proxies are not adjusted in analysis. Without knowing the correlation between p and τ a priori, we further hypothesize that there is adverse selection if the price-setting factors are adjusted with the assumption of no correlation between p and τ. If there is still advantageous selection, then p and τ are indeed negatively correlated. If so, we hypothesize that adverse selection can be retrieved as predicted by the R-S model if risk tolerance proxies are adjusted.

Data
We conducted a cross-sectional analysis using data from the 2015 wave of China Health and Retirement Longitudinal Study (CHARLS). CHARLS is a longitudinal aging survey of 45 years and older people and their spouses in China with biennial follow-up [7]. It was designed to allow nationally representative estimates using a multistage probability sampling [24,25]. The first wave of survey in 2011 included 17,708 respondents [24]. The 2015 wave data included 20,284 respondents with positive cross-sectional weights. This survey collected information on 1) socioeconomic information including age, sex, education, residence (rural or urban), assets, income, working status, commercial pension, and health insurance status; 2) health information including selfreported general health, thirteen physician-diagnosed chronic conditions, memory problem, past-year total and OOP inpatient costs, and past-month total and OOP outpatient costs; and 3) behavioural questions including smoking status and alcohol ingestion frequency. Additional information on CHARLS can be found elsewhere [24,26]. We analysed the subsample that had public health insurance coverage to investigate the properties of PHI as supplemental insurance.

Empirical methods
To test adverse or advantageous selection into PHI, we regressed PHI enrolment on self-reported general health, which is a proxy for health risk. The studies in the literature described in the literature review section were mixed in terms of the proxy for health risk that was used. Specifically, some studies used self-reported health and some used realized healthcare expenditure or medical occurrence. Whereas realized utilization has the benefit of better reflecting the true health risk, this information is unknown to the enrolees at the time of purchasing PHI. On the contrast, self-reported heath measures how the individuals perceive their own health risk. Therefore, testing risk selection based on selfreported health reflects the spontaneous behaviour as a result of perceived health and corresponding risk tolerance. The self-reported general health had five categories (1 excellent, 2 very good, 3 good, 4 fair, 5 poor). We created a general health indicator (GHI) variable for fair or better health because the median category was fair. Three probit models of PHI on the GHI variable using different specifications were conducted. First, we estimated a univariate model: where PHI i is an indicator for PHI coverage and Φ −1 is the probit link function. Second, we controlled age, sex, rural or urban residence, and chronic illness. These variables represent information that insurers might use to price insurance or to select enrolees: where X i is the vector of demographic variables and chronic conditions. Finally, we added variables that were not observed or could not be verified at the time of enrolment but represented potential sources of advantageous selection [17,18,20]. These variables included total wealth (in ¥1000), annual income (in ¥1000), smoking, alcohol ingestion frequency (daily or more often), having a commercial pension, and education level (high school or above). The equation form is: where Pr() is the probability of the event in the parentheses, Y i represents the vector of the aforementioned additional variables. In all three equations, α 1 represents risk selection. In equation [2], α 1 represents risk selection conditional on price-setting factors. In equation [3], α 1 represents risk selection conditional on possible sources of advantageous selection that would be relevant if regressions of [2] and [3] suggested advantageous selection. In the results section, we present not only the coefficient estimates but also the marginal effect of GHI at the means on the probability of enrolling in PHI because the latter gives practical implications of the results.
In the meantime, we still rely on the coefficient estimates for statistical inference because testing of marginal effects involves estimates of all coefficients instead of only the coefficient of interest and depends on at what values the variables are evaluated [27]. However, residual confounding that correlates with both self-reported health and enrolment into PHI might still exist even though we controlled for socioeconomic, medical and behavioural variables. As a narrative example, pessimism may correlate with not only self-rating of health but also risk attitude and PHI enrolment. To address this layer of endogeneity, we used an instrumental variable (IV) approach. In addition, α 1 in equations [1]- [3] may represent the mix of ex ante risk selection and potential moral hazard if the latter existed. This is sometimes referred to as reverse causality in literature. To disentangle the risk selection effect from potential moral hazard, we adopted a second additional identification strategy for the specification in equation 3 using adjusted realized hospitalization as the proxy for health risk.
In the first additional strategy, an IV represents exogenous randomness that correlates with GHI but not ϵ i [28]. To that end, we exploited the order effect from the design of the CHARLS survey. Order effect refers to the phenomenon that the order of presenting the general health question and specific health questions can affect the answer to the general health question [29]. In the CHARLS survey, about 50% of the respondents were randomized to answer the general health question at the beginning of the health and healthcare section and the rest the other way around. We created an indicator variable for answering the question at the beginning and used it as the IV for GHI. This IV is potentially ideal because of randomization. The reason that the answer to the general health question can be affected by the order is that a long list of health-related items presented before answering the general health question may help the respondents to contextualize their health problems or lack thereof, reminding them of how sick or healthy they are. More formally, selfperceived health can be decomposed into: is the part of self-perceived health unaffected by first answering the other healthrelated items, SPH i _ post is the part that was affected and was null for those randomized to report general health at the beginning, v i is an endogenous component that correlated with both self-perception of health and PHI enrolment, and v i is the remaining component that was assumed to be random. Naturally, the order of answering the question affects GHI i through SPH i _ post , which is itself a reflection of part of the true underlying health status. The condition of correlation with GHI was tested in the IV regressions. The Staiger-Stock rule of thumb is that the Fstatistic of regressing the endogenous variable on the IV in a linear model should be greater than 10 [30]. Therefore, we also conducted a side-track linear regression of GHI on the order of answering the general health question to examine the F-statistic. It should also be noted that both the response variable of interest (PHI enrolment) and the potentially endogenous treatment variable (GHI) were binary outcomes. Studies have shown that the conventional two-stage least squares procedure of implementing IV leads to misspecification of the second stage and is more vulnerable to bias than bivariate probit (BVP) models when both the response variable and the endogenous variable are binary [31][32][33]. Therefore, we used the BVP approach to implement the IV analysis. The structure of the BVP model takes the following form [28]: where 1{} is an indicator function taking the value one if the condition in the curly brackets are true and zero otherwise, OI i is the indicator of answering the general health question at the beginning, and BVN represents bivariate normal distribution. Conditional on OI i qualifying as an IV, ρ tests the exogeneity of GHI to PHI. This regression was conducted using the maximum likelihood approach.
The second additional identification strategy was replacing GHI with the adjusted risk of hospitalization. We followed the approach used by Fang et al. [17] to analyse risk selection in the Medigap market by using the adjusted risk of hospitalization as the proxy for the underlying health risk. First, the realized occurrence of hospitalization was regressed on the same set of predictors as in equation 3 using a probit model: where H i is an indicator for had any hospitalization in the past year and γ 1 represents potential moral hazard. Then, the adjusted risk of hospitalization is calculated as the predicted probability of hospitalization using the results of equation 6 but excluding the effect of PHI: where Φ is the standard normal cumulative density function. Finally, the specification in equation 3 was reanalysed using: Similar to α 1 in equations 3 and 5, δ 1 represents risk selection. However, the interpretation of the direction of the effect ofP i is opposite to that of GHI. Since PHI plans in China do not reimburse outpatient visits, the high-risk and low-risk individuals would likely differ in the underlying hospitalization risk if there was risk selection.
All analyses used respondent sampling weights and were conducted using Stata (version 14; Stata Corp, College Station, TX, USA).

Results
We identified 17,704 respondents who had public health insurance in the 2015 cross-sectional data, 493 of them had PHI. A tabulation incorporating sampling weights showed that 3.2% of those who had public health insurance also had PHI. Table 1 lists descriptive statistics of the analytical sample. Briefly, PHI beneficiaries were significantly younger (mean: 53.8 vs. 60.1 years, p < 0.001), less likely to live in rural areas (30.2% vs. 52.5%, p < 0.001), and more likely to be in fair or better health status (90.4% vs 80.3%, p < 0.001). PHI beneficiaries also had significantly higher annual income (mean: ¥20,695 vs. ¥6165, p = 0.001).
Estimates of coefficients and marginal effects for equations [1]- [3] are presented in Tables 2, 3 and 4, respectively. In the unadjusted equation, GHI was significantly associated with a 0.0252 (p < 0.001) higher probability of having PHI. When conditional on potential price setting and consumer selection variables, GHI was significantly associated with a 0.0142 (p < 0.05) higher probability of having PHI. In equation [3], only having commercial pension among all the variables of potential sources of advantageous selection was significantly associated with PHI enrolment. Also, GHI was still significantly associated with a 0.0182 (p < 0.05) higher probability of PHI enrolment, which was relatively close to the magnitude in equation [2]. Table 5 displays the coefficient and marginal effect estimates of IV regression in equations 5 using the BVP approach. Although the regression was estimated using maximum likelihood, we followed the convention of IV The percentages incorporated sampling weights. Therefore, the actual sample sizes in each group were not reported because they did not correspond to the reported percentages analysis to term the second equation in equations 5 as the first-stage regression and the first equation in equations 5 as the second-stage regression. In the first-stage regression, the OI variable significantly and negatively predicted the response to the general health question, supporting its validity as an IV. The F-statistic of the side-track first-stage linear regression was 109.42, which was substantially greater than the Staiger-Stock rule of thumb for non-weak IV. In the second stage regression, GHI was significantly associated with a 0.244 (p < 0.001) higher probability of having PHI, confirming the results of advantageous selection. Furthermore, ρ was statistically significant, suggesting endogeneity of GHI to PHI. Results of the regression using equation 8 are listed in Table 6. The adjusted risk of hospitalization was significantly and negatively associated with having PHI. This means individuals at higher risks of hospitalization were less likely to purchase PHI. The marginal effect of the adjusted risk of hospitalization was that the probability of PHI enrolment was reduced by 0.108 (p < 0.05). This result also confirmed advantageous selection.

Discussion
To our knowledge, the current study is the first in the literature to document risk selection associated with PHI in China. Using CHARLS data, we found that there was advantageous selection into PHI among those who had public health insurance in China, which echoed our first hypothesis.
The advantageous selection was partially mediated by physical conditions, suggesting potential discrimination or selection of enrolees by insurers. However, there was still advantageous selection after controlling for the Standard errors in parentheses * p < 0.05, ** p < 0.01, *** p < 0.001 a The general health indicator was 1 if self-reported health was fair or better than fair and was 0 if self-reported health was poor. Zero was the baseline category in regression Table 3 Estimates Standard errors in parentheses * p < 0.05, ** p < 0.01, *** p < 0.001 a The general health indicator was 1 if self-reported health was fair or better than fair and was 0 if self-reported health was poor. Zero was the baseline category in regression demographic variables and physical conditions. Previous studies in the literature were conducted in contexts with regulation against discrimination based on pre-existing conditions. Hence, it was natural to expect that physical conditions would explain advantageous selection in a setting that lacked anti-discrimination legislation. On the contrast of our second hypothesis, our findings suggested that advantageous selection could drive enrolment into PHI even though insurers already screened risk on their side. In the Fang et al. study [17], advantageous selection was explained by the correlation between risk preference variables and health. Similar risk preference variables were also controlled in the current analysis where available, but controlling these variables still did not fully explain the advantageous selection. Whereas it further nullified our third hypothesis, it also likely suggested that there were still unidentified sources of risk preference. For example, risk preference may depend not only on static variables but also on the dynamic change of socioeconomic conditions. Prospect theory predicts that baseline states and changes in wealth can affect risk tolerance [34]. Future studies should try to investigate sources of advantageous selection in a dynamic setting.
The IV approach aimed to address possible omitted variable bias caused by a series of unobserved factors. The unobserved factors could include individual attitude which may contribute to both self-perceived health and enrolment in PHI. However, the results of the IV approach should be interpreted with caution. Our approach targeted at the endogeneity in self-reported health but not health itself. To a certain extent, this resembled measurement error.  Standard errors in parentheses * p < 0.05, ** p < 0.01, *** p < 0.001 a The general health indicator was 1 if self-reported health was fair or better than fair and was 0 if self-reported health was poor. Zero was the baseline category in regression An additional concern of omitted variable bias may be raised by the regional variation in how much the public health insurance schemes address the healthcare need. The underlying rationale is that the OOP rates of healthcare expenditure across regions may affect the uptake of PHI. Although there isn't an ideal approach to quantify the potential omitted variable bias due to this complexity, we conducted a post hoc analysis to provide indirect evidence on whether the regional variation of public health insurance schemes biased the results. First, we created a proxy of the generosity of public health insurance by calculating the city-level OOP rate of past-year inpatient costs using the subsample who had any hospitalization and only had public health insurance, following which we examined if this variable was significantly associated with having PHI. Indeed, the probability of having PHI was negatively associated with the city-level OOP rate of hospitalization. Specifically, the probability of having PHI was increased by 9.7 percentage points (p = 0.008) if the OOP rate was changed from 100% to 0%. This suggests that the regional variation of public insurance schemes did have the potential to cause omitted variable bias. Second, we added the city-level OOP rate variable to the covariate list of   Table 4 (0.28 & 0.018), as did the statistical significance of the estimates. More, the city-level OOP rate variable was statistically insignificant in this regression. Hence, we did not find evidence of omitted variable bias caused by regional variation of public health insurance schemes. In addition to internal validity, which is mainly threatened by omitted variable bias, another dimension of the validity is construct validity. We created a dichotomous general health indicator to test risk selection. However, the results could be misleading if self-reported health could not be dichotomized. We explored the analysis of equation 3 by regrouping categories 1 and 2 of selfreported health into "excellent or very good" and categories 3 and 4 into "good or fair". Category 5, which represented "poor", was used as the reference category. The marginal effect of "excellent or very good" health was 0.023 (p = 0.043). The marginal effect of the "good or fair" health was 0.0175 (p = 0.033). These results not only confirmed the advantageous selection but also documented that the better the health the stronger the advantageous selection. This study has several limitations. First, the data source lacked accurate ex ante health risk information. An ideal dataset would permit researchers to use the PHI enrolment date as the index date and to collect pre-index health and cost information. Second, we might have not sufficiently adjusted for price-setting factors that insurers used for medical underwriting. For example, the insurers may exercise screening of potential beneficiaries, the results of which are not necessarily reported to or recalled by the respondents. This may result in unaccounted-for omitted variable bias. Third, the variables we were able to use as proxies for risk preferences Table 6 Coefficients and marginal effects at the means of the probit regression of supplemental private health insurance enrolment on the adjusted risk of hospitalization Standard errors in parentheses * p < 0.05, ** p < 0.01, *** p < 0.001 were not as extensive as other studies. This restricted our understanding of the sources of advantageous selection. Fourth, some variables had substantial missing data issues, which led to reduced sample sizes and might have resulted in underpowered analyses. This could be a potential reason that income was significantly different between those with and without PHI in descriptive statistics but was not significantly associated with PHI enrolment in regressions. Our findings have important policy implications. First, the existence of either adverse selection or advantageous selections may indicate social inefficiency [23]. In the case of advantageous selection, high-risk individuals may be underinsured [18]. Future policies to regulate the PHI market should take this into consideration. Based on the evidence from the current study, developing antidiscrimination policies can allow more high-risk individuals to purchase PHI, thereby counterbalancing the advantageous selection. In addition, to the extent that health and wealth are usually positively correlated, the current tax deduction incentives to purchase supplemental PHI can be redesigned and leveraged to disproportionately favour the low-income high-risk population. Finally, the evidence from the current study could also inform policymakers in countries where the supplemental PHI markets are similarly premature and the public health insurance addresses only a modest portion of healthcare demand.

Conclusions
To sum up, we used data from CHARLS to provide evidence on advantageous selection in the Chinese PHI market. The findings were confirmed by various identification strategies including multivariate regressions of PHI enrolment on GHI, IV analysis implemented with BVP, and multivariate regressions of PHI enrolment on adjusted risk of hospitalization. We demonstrated that part of the advantageous selection was not necessarily spontaneous but likely imposed by insurers. This finding was more likely to happen in a context lacking regulation against condition-based discrimination. However, there was remaining advantageous selection that could not be explained by medical conditions and risk preference proxy variables. Based on our findings, we recommend that the PHI market regulation and tax deduction incentives in China should favour high-risk individuals.