Skip to main content

Statistical actuarial estimation of the Capitation Payment Unit from copula functions and deep learning: historical comparability analysis for the Colombian health system, 2015–2021


The Capitation Payment Unit (CPU) financing mechanism constitutes more than 70% of health spending in Colombia, with a budget allocation of close to 60 trillion Colombian pesos for the year 2022 (approximately 15.7 billion US dollars). This article estimates actuarially, using modern techniques, the CPU for the contributory regime of the General System of Social Security in Health in Colombia, and compares it with what is estimated by the Ministry of Health and Social Protection. Using freely available information systems, by means of statistical copulas functions and artificial neural networks, pure risk premiums are calculated between 2015 and 2021. The study concludes that the weights by risk category are systematically different, showing historical pure premiums surpluses in the group of 0–1 years and deficits (for the regions normal and cities) in the groups over 54 years of age.


The General System of Social Security in Health (Sistema General de Seguridad Social en Salud, SGSSS for its acronym in Spanish) of Colombia has different financing mechanisms for its operation; the most important in terms of monetary magnitudes is the so-called Capitation Payment Unit (CPU). This ‘health insurance premium’, currently calculated by the Ministry of Health and Social Protection (MHSP), has, since 2006, been computed based on three variables (risk adjusters): age, sex and region [25].

The purpose of the CPU is to finance a set of health technologies (drugs, procedures, supplies, medical devices, etc.), known as the Health Benefits Plan (HBP-CPU), which configures a collective protection mechanism for the right to health under a mandatory insurance scheme [9].

For example, for the year 2020, the SGSSS Resources Administrator (ADRES for its acronym in Spanish), the entity in charge of making the recognition and payment of the CPU to health insurers (called Entities Administrators of Health Benefit Plans, EAHBP), made transfers of about 48.5 trillion Colombian pesos (COP) for the contributory (CR)Footnote 1 and subsidized (SR)Footnote 2 regimes, distributed in similar proportions [1].

In a context of budgetary restrictions – common to all countries, of any income level – and in the face of an evident growing demand for more and better health technologies for the inhabitants, the financial sustainability of the SGSSS must be ensured, maximizing as far as possible the results in terms of the health of the population of the entire national territory. The pressures of health spending derived from the extensions of the HBP-CPU are a constant challenge for health systems, therefore, the constant study of the sufficiency of the cost of health risk management should be a priority evaluation issue for the care of state finances.

In this scenario, the analysis of the CPU's pricing becomes relevant. In this regard, the specialized literature has investigated alternatives for risk adjustment in SGSSS health spending [4, 21, 43, 44]. However, no studies have been found that develop a particular method for calculating the CPU of the risk groups defined by the legislation. The only antecedents are the official documents of the MHSP and the investigation by Basto et al. [3], which focuses exclusively on SR.

Because of this, the present research aims to estimate actuarially the pure risk premiums for CRFootnote 3 by means of copulas functions and deep learning approximations, and to compare the estimated monetary values with those defined by the resolutions, for the years 2015 to 2021. [27,28,29,30, 32, 34, 36]. This will allow reviewing and contrasting the budget allocations that have been made over time based on real-world evidence and taking note of possible improvements in the computation of the financial calculation of health risk management in Colombia.

From 2015 to date, the MHSP has estimated the pure risk premiums for 56 groups that categorize the population affiliated with the health system. For this reason, the analysis period starts from that year and the estimates are made using the same groups. The 56 groups consist of the combinations of the categories of the variables: i) region: normal, remote, cities and special and ii) age/sex group: less than 1 year, 1–4 years, 5–14 years, 15- 18 years (men), 15–18 years (women), 19–44 years (men), 19–44 years (women), 45–49 years, 50–54 years, 55–59 years, 60–64 years, 65–69 years, 70–74 years, and 75 years or more.

This paper is structured as follows. The first section presents the historical context for the CPU and its pricing in the SGSSS. The second section offers a descriptive analysis of the data of interest and the new methodological proposal for estimating the statistical-actuarial pricing models. The third section presents the most relevant results and findings on the actuarial variables of frequency, severity and pure risk premiums. Finally, the fourth section contains the final considerations of the research, its main limitations and some proposals for future research on the subject.

Historical context of the CR-CPU and its pricing

The social bodies responsible for establishing the values of the CPU have been in historical order: the National Council for Social Security in Health, the Health Regulation Commission, and (currently) the Directorate for the Regulation of Health Insurance Benefits, Costs and Rates of the MHSP. Since 2010, unlike previous years, the sufficiency studies use a clear actuarial concept, based on the fundamental insurance equation, assuming that the CPU can be understood as the division between the expected value of health costs and the population exposed to health risk [53].

From the statistical-actuarial approach, pricing methods are used to build premiums that cover the losses of the insured's subscribed risks, that is, that are sufficient, with a high degree of confidence [6, 12]. To estimate the CPU rate, the MHSP has used the method called the expected loss ratio, which is based on the quotient between the calculated loss ratio and the permissible loss ratio of the EAHBP (which according to Law 1438 of 2011 is of the order of 0.9 for CR). The result indicates what is the necessary increase of the CPU to guarantee the financial sufficiency of the SGSSS  [35].

In this context, the MHSP projects costs, income and those exposed to risk. For the first variable, it applies different trend adjustment factors to emulate future conditions: increases in the price level, frequency of claims, claims that are incurred but not reported (IBNR), HBP-CPU update, among others. For the second variable, it projects the possible items that make up the income of the EAHBP of the CR: income from CPU, copayments, moderating fees, recoveries from the Occupational Risk Administrators, income from registration and affiliation fees, income from the High-Cost Account, income Agreement 026 of 2012, as well as income from health promotion and prevention, among others. For the population exposed to risk, the MHSP makes adjustments for missing compensation and for the expected growth in the following year based on the population projections of the National Administrative Department of Statistics (Departamento Administrativo Nacional de Estadística, DANE for its acronym in Spanish).

The base information for the analysis of the regulatory entity refers to the calendar year immediately prior to the year of its realization, for example, the sufficiency study for the year 2019 estimates the increase in the CPU that will be sufficient during the year 2020 to finance health technologies, using real-world data from the year 2018. The latter are extracted, among other databases, from the reports on the provision of health services per affiliate issued by the EAHBP, the affiliate and compensation databases of the CR, the financial statements reported by the EAHBP to the entity for inspection, surveillance and control (National Health Superintendency) and the tariff manuals for health technologies financed by the CPU.

Now, for the case of this study, the conceptual approach considered to estimate the pure health risk premium is the product of frequency and severity, where the first factor corresponds to the ratio between the number of distinct people served and those exposed to health risk, Footnote 4 while the second factor is defined as the ratio between the total costs of health technologies over the number of distinct people served. Formally:


This classic actuarial approach, unlike the expected loss ratio, allows the two variables of interest that describe the health risk to be modeled independently and specifically and to project a sufficient CR-CPU. The pure risk premium estimated in this way meets the theoretical properties desired in all premiums: additivity, independence, scale invariance, consistency and acceptability [55].

Data and empirical strategy


For the statistical-actuarial estimation of the CR-CPU, it was necessary to have information on: i) those exposed to risk (equivalent population), from 2013 to 2020; ii) number of distinct people served by the SGSSS, from 2013 to 2019, and iii) severity (average costs) of health care, from 2013 to 2019. For the first variable, the Database of Affiliates (Base de Datos Única de Afiliados, BDUA for its acronym in Spanish) was used, which contains the information of the fully identified affiliates of the SGSSS who are covered by the HBP-CPU; for the second and third, Demand Management (Gestión de la Demanda, GD for its acronym in Spanish) was used through the Integrated Social Protection Information System (SISPRO for its acronym in Spanish), which includes all the expenses charged to the HBP-CPU by the EAHBP that exceed the validation meshes of the MHSP. GD can be considered a Sufficiency proxy, a confidential database and a fundamental input for the calculation of the CR-CPU from the regulatory entity.

Both BDUA and GD present disaggregations by sex, municipality code, department, among others, which allows the feasibility of this actuarial calculation, in accordance with the guidelines and risk adjusters pre-established in national legislation.

Figure 1 shows the frequency of people served and the number of people exposed of CR by region and year. It can be seen that the frequency is higher in the city and normal regions, and is lower in special and remote regions. The average frequency from 2013 to 2019 was 88.4% in the normal region, followed by cities with 86.7%, special with 81.0% and remote with 68.9%. The ranges for each region over the seven years were: remote (58.1%-79%), cities (82.3%-92.4%), special (75.4%-88.9%), and remote (85.5%-92.4%).

Fig. 1
figure 1

Frequency of people served by the CR by region, 2013–2019

The observed frequency associated with the age groups is shown in Fig A1 (see Appendix). There are no drastic changes in its evolution over time. On average, the age groups with the highest frequency were in this order: 1 to 4 years, less than one year, 75 years or older, 70 to 74 years, 19 to 44 years (women) and 65 to 69 years, these values are included within the range of 89.5% to 100%. In addition, the age group from 15 to 18 years (men) had the lowest frequency of people attended. On the other hand, the percentage variation of the frequencies between 2013 and 2019 was -0.7% in 15 to 18 years (women), -1.86% in 19–44 years (women) and -2.70% in 19 -44 years (men).

Figure 2 presents the severity (in 2020 prices, COP) and the number of exposed by region in the CR. From 2013 to 2019, the remote region presents the greatest severity, on average, 1.34 million COP, followed by cities with 1.1 million, normal with 0.9 million and special with 0.7 million. During this period, severity in the remote region grew 7.9% in real terms, in cities 11.5%, in normal region 44.9% and in the special region 43.6%.

Fig. 2
figure 2

CR severity by region, 2013–2019 (2020 prices)

With regard to severity by age group, Fig A2 (see Appendix) shows that, from 2013 to 2019, it is greater in groups under one year of age and groups over 60 years of age. Severity maintains a stable value over time for all age groups, except for those under one year of age and those over 70 years of age, where it decreased until 2015 and then increased until 2019. During this time interval, the severity in minors for one year was, on average, 1.8 million COP; in the group from 0 to 4 years, 0.7 million COP; in the groups of men and women from 15 to 18 years and 19 to 44 years it was between 0.4 and 0.8 million COP; in the ages between 45 and 59 years it was around 1.0 and 1.5 million COP; and in groups over 60 years of age it ranged from 2.4 million to 3.8 million COP.

Figure 3 shows that the number of people exposed to risk in the CR has grown from 2013 to 2019. In 2013 there were 19.5 million exposed and in 2019 22.3 million, which means a growth of 13.9% over the seven years of analysis.

Fig. 3
figure 3

Number of people exposed to CR risk, 2013–2019

Figure 4 shows the distribution of the number of exposed according to the region between 2013 and 2019. Cities had, on average, 75% of the total exposed, normal 21.2%, special 3.6% and remote only 0.2%. The proportion of those exposed by region was similar throughout the period.

Fig. 4
figure 4

Distribution of those exposed to CR risk by region, 2013–2019

Finally, Fig A3 (see Appendix) represents the participation of the age groups in the number of exposed to CR risk from 2013 to 2019. On average, the participation in the total number of exposed of the group from 0 to 4 years is 5.9%, of the group from 5 to 14 years is 13.8%, of the men and women from 15 to 44 years is 24.4%, from 45 to 49 years is 18.1% and of the group older than 60 years is 13.2%. The transition towards aging explains the greater growth in the participation of older age groups. The percentage change between 2013 and 2019 in the proportion of those exposed to CR was -12% in the group from 0 to 4 years, -15.2% from 5 to 14 years, 3.8% in men from 15 to 44 years, 0.4% in women from 15 to 44 years, 1.1% from 45 to 59 years and 15.5% in the group over 60 years.

Empirical strategy

In a first stage, the forecasts of those exposed to the risk are presented, to later detail the process of computation of the adjustment factors for severity and frequency. Afterwards, the explanation of the copula functions and the approach taken for the pricing process of the pure risk premium of the CR is deepened.

Forecasts for those at risk

A deep learning technique called artificial neural networks (ANN) is used, with high predictive power in demographic, financial and health topics [2, 23, 24, 41, 45, 52]. This type of nonlinear nonparametric model is considered a self-adaptive, accurate method that requires very few assumptions. By simulating the operating system of a biological neuron, ANNs allow for a flexible approach in terms of corresponding functional forms [52]. Thus, the basic architecture of a three-layer fed-forward ANN (one input, one hidden, and one output) is made up of a set of inputs, weights, activation functions, and outputs. Formally:

$$\widehat x=\sum_{l=1}^{rs}\varphi_l^2\cdot f\left(\sum_{k=1}^r\varphi_{lk}x_k+\vartheta_{lk}\right)+\vartheta^2,$$

where \(x_k\, (k=1,\dots ,r)\) is considered the input set, \(\widehat{x}\) the output set, \(f\) the activation function, \({\vartheta }_{lk}, {\varphi }_{lk}\) y \({\varphi }_{j}\)\((l=1,\dots ,r)\) the model parameters and weights, and \(rs\) the neurons in the hidden layers [5]. Then, the ANN training is based on iteratively adjusting these parameters, so that an error function between the forecast \(\widehat{x}\) and the observation \(x\) is minimized. This, from the weighted sum of the outputs of the neurons of the hidden layer.

This data science technique is used to forecast, based on historical series, those exposed to risk for the 56 categories defined by the CPU and the adjustment factors.

Adjustment factors for severity and frequency

In the statistical-actuarial process of pricing it is necessary to express not only the mean cost of attention per person but also the amount of people in terms of the target year. Likely MSHP this investigation takes evidence of the real world in year \(t\) to transform the frequency and severity to \(t+2\), so the economic-financial conditions of the health system that are expected in the future for the CR in the country can be represented.

The method for constructing these adjustment factors developed by Basto et al. [3] is closely followed: the following five factors are employed for the severity:

  1. i.

    Costs incurred but not reported (IBNR): adjust the monetary amount of attention that the EAHBP did not register by the end of the year.

  2. ii.

    Inclusion of technologies in the HBP-CPU: it recognizes the new basket of sanitary technologies that must be financed in \(t+2\), considering the actualization/extension of the HBP-CPU that is made every year.

  3. iii.

    Comparable: the actual normative is able to finance sanitary technologies not financed by the CPU but are considered as comparable with some of these; in this case the difference between that technology and its comparable is recovered.

  4. iv.

    Variation in the number of attentions per user: it projects the average number of times that a person receives health technologies.

  5. v.

    Inflation: it recognizes the rise in the price levels.

For the first three adjustments of severity, the same information from the sufficiency studies of the regulatory entity is used. For the fourth, using GD, the number of monthly attentions per user is forecast, then averaged for the months of year \(t+2\). For the fifth factor, forecasts of the Banco de la República (the central bank of Colombia) and the Ministry of Finance and Public Credit (MFPC) are taken.

On the other hand, for frequency, two adjustments are considered:

  1. i.

    Effective coverage advance: recognizes the increment of the rate between users and exposures, this proportion has been increasing in the last years.

  2. ii.

    Changes in the burden of disease: it adjusts the appearance of new users attended that had not used the health system, due to the occurrence of new health conditions (i.e. new infectious diseases).

The first frequency factor is forecast monthly taking the information of BDUA and GD, then averaging for the months of year \(t+2\). In the case of the second factor, a similar quantitative operation is made but taking as a proxy the variable of diagnostics per capita (ICD-10) from GD.

Pricing with statistical copulas

In the field of actuarial science, copulas have started to obtain a preponderance at the end of the last century and the first decade of the current century, due to their benefits, in particular the high flexibility of modeling the joint distribution of a random n-tuple [7, 8, 14, 15]. This statistical technique has been applied in several fields of investigation related to the payment of claims, pricing, active valorization and, with less relevance, stockpile computation, highlighting the opportunity to model the asymmetric dependence in the tails [11, 19, 20, 46, 47, 51]. More recently, copulas have been applied in collective risk models and deductible price-fixing, furthermore, improvements in the computational efficiency and how to provide intuitive interpretations of the dependence structure have been investigated [13, 39, 48]. For the sector of health insurance, the applications in the scientific indexed literature have been few [49, 54, 56], and in that way, this work can also be considered a pioneer in the field.

In formal terms, and in a succinct way, a copula is a function that describes the dependence between the marginal probability distributions of two or more random variables and is expressed in terms of a multivariate distribution function. In the bivariate case, let \(\left(X,Y\right)\) the random vector with marginal distributions \(F\left(x\right)=Pr(X\le x)\) and \(G\left(y\right)=Pr(Y\le y)\), respectively, and the joint distribution function \(H\left(x,y\right)=Pr(X\le x,Y\le y)\) for \(\left(x,y\right)\in {\mathbb{R}}^{2}\) where \(F,G,H\sim U\left(\mathrm{0,1}\right)\), the bivariate copula \(C\) is a function of the uniform random variables \(u=F(x)\) and \(v=G(y)\) that are constructed in the following way [18, 38]:

$$\begin{array}{c}C:\left[\mathrm{0,1}\right]\times \left[\mathrm{0,1}\right] \to [\mathrm{0,1}]\\ \left(F\left(x\right),G\left(y\right)\right) \mapsto H\left(x,y\right)\end{array} ,$$

and satisfies two properties: i) \(\forall u,v\in \left[\mathrm{0,1}\right]\) then \(C(u,0)=0=C(0,v)\), \(C(u,1)=u\) and \(C(1,v)=v\); ii) \(\forall {u}_{1},{u}_{2},{v}_{1},{v}_{2}\in \left[\mathrm{0,1}\right]\) with \({u}_{1}\le {u}_{2},{v}_{1}\le {v}_{2}\) then \(C\left({u}_{2},{v}_{2}\right)-C\left({u}_{1},{v}_{2}\right)-C\left({u}_{2},{v}_{1}\right)+C\left({u}_{1},{v}_{1}\right)\ge 0\). The first property shows that the contour region of the copula is the consequence of the uniform margin distributions; the second states that \(C(u,v)\) is not decreasing in \(u\) and \(v\). The Sklar theorem (1959) [50] shows that the joint distribution \(H\) can be expressed in terms of the marginal distributions \(F\) and \(G\), and a copula \(C\) such that \(\forall x,y\in {\mathbb{R}}\):

$$\begin{array}{c}H\left(x, y\right)=\mathrm{Pr}\left(X\le x, Y\le y\right)=\mathrm{Pr}\left({F}^{-1}\left(U\right)\le {F}^{-1}\left(u\right), {G}^{-1}\left(V\right)\le {G}^{-1}\left(v\right)\right)\\ =\mathrm{Pr}\left(U\le u, V\le v\right)=C\left(u,v\right)=C\left(F\left(x\right),G\left(y\right)\right),\end{array}$$

where \(U=F\left(X\right)\) and \(V=G(Y)\) with \(U,V\sim U(\mathrm{0,1})\), with \(F\) and \(G\) as well as their inverse functions monotonic increasing. Moreover, if the marginal distribution functions are continuous, then there exists a unique copula \(C\left(F\left(x\right),G\left(y\right)\right)\) equal to \(H\left(x,y\right)\). The detailed implications of the different statistical properties can be reviewed in Nelsen [37].

In practice, the most used copula families are Gaussian, t-Student, mixed Gaussian and Archimedean. In the last, Gumbel, Clayton and Frank stand out.Footnote 5 The Gaussian and t-Student copulas are derived from their own multivariate distributions, for which reason they are called implicit copulas; they also present symmetric dependence but are null or low in the tails [38]. On the other hand, the Archimedean copulas are constructed from a function \({\varphi }_{\theta }:[\mathrm{0,1}]\to [0,\infty ]\) that is continuous, monotone decreasing and convex such that \({\varphi }_{\theta }\left(1\right)=0\), where \({\varphi }_{\theta }\)Footnote 6 is referred to as the generator function. Additionally, they describe a great variety of dependence structures, in particular, they allow modeling asymmetric relations between random variables [22, 37].

For the computation of the CR-CPU, defining \(X\) as severity (continuous variable) and \(Y\) as frequency (discrete variable), it is proposed to model the pure risk premium by a copula, in this case, mixed. The dependence between both variables, following the method developed by Parra [40], includes different covariables through generalized linear models (GLM) in its marginals, which means

$${X}_{i}\sim F\left({x}_{i}|{\mu }_{i},\sigma \right); \mathit{ln}\left({\mu }_{i}\right)={\alpha }_{0}+{\alpha }_{l}\sum_{l}Region+{\alpha }_{k}\sum_{k}Age/sex\_group,$$
$${Y}_{i}\sim G\left({y}_{i}|{\lambda }_{i}\right);\ \mathit{ln}\left({\lambda }_{i}\right)={\beta }_{0}+{\beta }_{l}\sum_{l}Region+{\beta }_{k}\sum_{k}Age/sex\_group + offset(Exposures).$$

Then, the couple is made by the copula and the joint density function of \(X\) and \(Y\) is found,

$$H\left({x}_{i},{y}_{i}\right)=C\left(F\left({x}_{i}|{\mu }_{i},\sigma \right),G\left({y}_{i}|{\lambda }_{i}\right)\right),$$
$$h\left({x}_{i},{y}_{i}|{\upmu }_{i},\sigma ,{\uplambda }_{i}\right)=f\left({x}_{i}|{\upmu }_{i},\sigma \right)*\left[D\left(G\left({y}_{i}|{\uplambda }_{i}\right)|F\left({x}_{i}|{\upmu }_{i},\sigma \right)\right)-D\left(G\left({y}_{i}-1|{\uplambda }_{i}\right)|F\left({x}_{i}|{\upmu }_{i},\sigma \right)\right)\right],$$

where \(D\left(v|u\right)\) is the conditional copula of \(v\) given \(u\) defined as \(\frac{\partial C\left(u,v\right)}{\partial u}\).

From Eq. (8) the likelihood is found, and supposing independence between the observations, the parameters of interest of the GLM and the copula are jointly estimated by its maximization with optimization techniques. Once the final parameters are obtained, Monte Carlo techniques are applied to find values for the random variable from samples of the density function. In the present work, 300 samples are simulated (enough to guarantee convergence) and the median is taken as a punctual observation, given its robustness features. Likewise, intervals are constructed from the 2.5 and 97.5 percentiles.


72 statistical-actuarial models are estimated by year. They come from the combination of the three components, i) severity distributions: Normal, Weibull, Lognormal, Gamma, Inverse Gamma and Inverse Gaussian; ii) frequency distributions: Poisson and Negative Binomial and iii) copula types: two implicit (normal and t-Student) and four Archimedean (Clayton, Gumbel, Frank and Joe).

In each year the best model is selected according to the Borda’s rule, which order and rank the 72 rival models according to the values of i) mean square error (MSE); ii) mean absolute percentage error (MAPE); iii) the square root of the square differences between the estimated copula and empirical copula (RSCE) described by Novales [38], and iv) the cross-validation copula information criterion (xvCIC) developed by Grønneberg & Hjort [17]. For each of these criteria, the best model receives 1 point, the second, 2, and so on.

On the other hand, a regularized goodness of fit test is applied for copulas (RGOFC) created by Genest et al. [16] based on a statistic of the Anderson–Darling type,it has a null hypothesis \(({H}_{o})\) that the copula presents a good fit. Here a value of one is assigned if at a significance level of 5% the null hypothesis is rejected and zero in the contrary case. Finally, the winning model for each year is the one that has the least total points after summing the points obtained for these five metrics. In Table 1 are shown the results of the five metrics of the chosen models for each year in which the CR-CPU is estimated.

Table 1 Evaluation measures of the selected statistical-actuarial models, 2015–2021

The values in COP, of the pure premium estimated, can be observed graphically in Fig. 5.Footnote 7 There, clear historical patterns are evidenced in relation to the pure premium estimated by MSHPFootnote 8 for each year. In summary, as the first point to stand out, for every region, in every year the pure premium for the group of less than 1 year given by MSHP is higher than that computed in this work.

Fig. 5
figure 5

Pure premiums estimated via copulas for the years 2015 to 2021, versus what was calculated by the MSHP

Second, in the regions ‘cities’ and ‘normal’, in ages 15–18 years (women and men), 65–69 years, 70–74 years and more than 74 years, the pure premium estimated by this study is higher than the one computed by MSHP. As a third point to take into account, for the remote and special regions, the pure premium of MSHP is higher although only slightly than the one estimated by copulas in ages 19–44 years, 45–49 years, 50–54 years. 55–59 years, 60–64 year, 70–74 years and more than 74 years.Footnote 9

Discussion and conclusions

The present investigation had the objective to estimate actuarially the CR-CPU in the SGSSS of Colombia, in a systematic and strict way, for the period from 2015 to 2021, using modern statistical techniques such as copulas and ANNs. Regarding the sufficiency studies of the CPU developed by the regulatory entity, this work is differentiated in the following topics: i) to compute the pure risk premium, severity and frequency are modeled, then copulas are applied with the purpose of defining the relation of its joint dependence; ii) to forecast the exposures, analytic approximations of deep learning are used, which show benefits over other demographic forecast methodologies; iii) goodness of fit criteria and capacity of forecast are used to select the best estimations and iv) the adjustment factors of Basto et al. [3] for severity and frequency are considered.

For the period 2015–2021, in all regions, the estimated pure premium is very close to the pure premium defined by the MSHP in the age groups 5–14, 15–18 (men and women), 19–44 (men and women). Discrepancy is only observed in the 15–18 group in the remote region in 2017 and 2020 and in the cities region in 2016.

Compared to the authors' estimates, the MHSP underestimated the CPU in age groups 55 years and older in the remote region for the years 2017, 2018 and 2019, in the cities region for the years 2015 to 2021, in the normal region for the years 2018, 2020 and 2021. Instead, the premium is overestimated in age groups over 55 years in the special region for 2016 and 2017. The difference in the estimates for this age group for 2020 are mainly in the remote and normal regions.

Surpluses are observed in the estimated pure premium of the MHSP in the group of less than1 year for the entire period in all regions, mainly remote and special. It is noted that the difference in the estimates for this age group is accentuated with the passing of the years in the remote region.

As a limitation of this study, the approximation here developed is only made for CR, since SR information of spending on health technologies has always had problems of bad quality and little representation, for which reasons there is no data available. It is important to remember that this regime, for 2020, had approximately 23.9 million affiliates and the financing mechanism of the CPU reached values near 24.4 trillion COP [1]. Thence the importance of paying attention to the statistical-actuarial estimations with evidence from the real world.Footnote 10

An adequate estimate of future health spending, as well as the application of efficient risk management mechanisms (from a comprehensive approach) and health technology assessments, will allow better long-term financial sustainability in national public budgets for the health of the population [10, 42]. The methodological development presented here contributes to the international literature in actuarial health sciences, showing innovative analytical developments that may become applicable in other countries with pluralistic health insurance systems. Likewise, this research based on the use of real-world evidence demonstrated the versatility and functionality of statistical copulas (as an inferential modeling technique), which can contribute to informed decision-making in sector financing policy.

Finally, it is important to indicate that this quantitative study is supported and sustained from a prospective approach of computing using the historical data about the spending on health technologies financed with the CR-CPU. Nonetheless, the ideal scenarios for complete effective coverage and integral health services lending (meaning, a CPU from an opportunity/normative approach) is not within the reach of the actual investigation. This last point will require future investigation projects that treat these problems with specificity and the corresponding scenarios. In addition, the authors consider it wise to review in the future the values of the risk weights under a Bayesian approach, which could contribute a certain value-added at the time of adjusting the risk categories, beyond the benefits already explained that result from the use of the statistical copulas presented in this work.

Availability of data and materials

Data available on request from the authors.


  1. Of which people who have the ability to pay and contribute jointly and severally to the SGSSS are part (their respective beneficiaries are also included).

  2. People who cannot pay their affiliation to the SGSSS (essentially people in conditions of vulnerability and poverty) are welcomed here, being subsidized by the State.

  3. Unfortunately for the SR there is no public financial information to calculate its CPU. Historically, the EAHBPs belonging to this regime have had significant quality problems in their administrative records [31, 33].

  4. Understanding by someone exposed to risk, an individual who was affiliated with the CR of the SGSSS for a full calendar year.

  5. A wide review of other types of copula can be found in Nelsen [37] and Latorre [22].

  6. \(\theta\) is the parameter of the Arquimedean copula, which is defined in the bivariate case as \(C\left(u,v\right)={\varphi }_{\theta }^{-1}\left({\varphi }_{\theta }\left(u\right)+{\varphi }_{\theta }\left(v\right)\right)\).

  7. The monetary values of the pure risk premium, the proportion of distinct persons for each reference year and the values of the frequency/severity adjustment factors for each pricing year are presented in Appendix A (Tables A1, A2, A3, A4, A5, A6, A7).

  8. The pure premium of the MHSP is found by multiplying the values of the CR-CPU set out in the resolutions of the entity by the percentage spent by the health insurance not allocated to utilities and administration, which is 0.90 in the case of the studied regime [26].

  9. The analysis developed here make reference to the punctual estimations of the pure premium for the different categories, not to its confidence intervals.

  10. In the year 2021, for the first time in the history of the country, the MHSP used proper information of the SR in the actuarial estimation of the CPU of 2022.



Capitation Payment Unit


General System of Social Security in Health (Sistema General de Seguridad Social en Salud, acronym in Spanish)


Ministry of Health and Social Protection


Health Benefits Plan


SGSSS Resources Administrator (acronym in Spanish)


Entities Administrators of Health Benefit Plans


Colombian pesos


Contributory regime


Subsidized regime


Incurred but not reported


National Administrative Department of Statistics (Departamento Administrativo Nacional de Estadística, acronym in Spanish)


Database of Affiliates (Base de Datos Única de Afiliados, acronym in Spanish)


Demand Management (Gestión de la Demanda, acronym in Spanish)


Integrated Social Protection Information System (acronym in Spanish)


Artificial neural networks


Ministry of Finance and Public Credit


Generalized linear models


Mean square error


Mean absolute percentage error


Square root of the square differences between the estimated copula and empirical copula


Cross-validation copula information criterion


Regularized goodness of fit test for copulas


  1. ADRES. (2020). Ejecución presupuestal URA a corte de 31 de diciembre 2020. Retrieved 18 February 2022, fromón-financiera/URA/Ejecución-presupuestal-URA

  2. Ayyildiz, E., Erdogan, M., & Taskin, A. (2021). Forecasting COVID-19 recovered cases with Artificial Neural Networks to enable designing an effective blood supply chain. Computers in Biology and Medicine, 139, 105029.

  3. Basto S, Bejarano V, Do Nascimento P, Espinosa O, Estrada K, Higuera S, … Barragán L. Producto 3. Estimación actuarial de la UPC del régimen subsidiado. Bogotá: Instituto de Evaluación Tecnológica en Salud y Ministerio de Hacienda y Crédito Público; 2021.

  4. Bolívar, M. (2018). Ajuste de riesgo en la prima de capitación del sistema de aseguramiento en salud de Colombia para el régimen contributivo (Trabajo Final de Posgrado - Universidad de Buenos Aires). Retrieved from

  5. Cao Q, Leggio K, Schniederjans M. A comparison between Fama and French’s model and artificial neural networks in predicting the Chinese stock market. Comput Oper Res. 2005;32(10):2499–512.

    Article  Google Scholar 

  6. Charpentier A, editor. Computational actuarial science with R. Boca Raton: Chapman & Hall/CRC; 2015.

    Google Scholar 

  7. Cherubini U, Luciano E. Bivariate option pricing with copulas. Applied Mathematical Finance. 2002;9(2):69–85.

    Article  Google Scholar 

  8. Chiou, S., & Tsay, R. (2008). A copula-based approach to option pricing and risk assessment. Journal of Data Science, 6, 273–301. Retrieved from

  9. Corte Constitucional de Colombia. (2018). Sentencia SU124/18. Retrieved 12 February 2022, from

  10. Drummond M, Augustovski F, Bhattacharyya D, Campbell J, Chaiyakanapruk N, Chen Y, ... Yeung K. Challenges of health technology assessment in pluralistic healthcare systems: an ISPOR Council Report. Value in Health. 2022;25(8):1257–67.

    Article  PubMed  Google Scholar 

  11. Duarte G, Ozaki V. Pricing crop revenue insurance using parametric copulas. Rev Bras Econ. 2019;73(3):325–43.

    Article  Google Scholar 

  12. Duncan I. Healthcare risk adjustment and predictive modelling. 2nd ed. New Hartford: Actex Publications; 2018.

    Google Scholar 

  13. Erdemir Ö, Sucu M. A modified pseudo-copula regression model for risk groups with various dependency levels. J Stat Comput Simul. 2022;92(5):1092–112.

    Article  Google Scholar 

  14. Escarela D, Carriére J. A bivariate model of claim frequencies and severities. J Appl Stat. 2006;33(8):867–83.

    Article  Google Scholar 

  15. Frees E, Valdez E. Understanding relationships using copulas. North Am Actuarial J. 1998;2(1):1–25.

    Article  Google Scholar 

  16. Genest, C., Huang, W., & Dufour, J.-M. (2013). A regularized goodness-of-fit test for copulas. Journal de La Société Française de Statistique, 154(1), 64–77. Retrieved from

  17. Grønneberg S, Hjort N. The copula information criteria. Scand J Stat. 2014;41(2):436–59.

    Article  Google Scholar 

  18. Hofert M, Kojadinovic I, Mächler M, Yan J. Elements of copula modeling with R. Cham: Springer; 2018.

    Book  Google Scholar 

  19. Kholifah A, Lestari D, Devila S. Premium calculation using marginal generalized linear model combined with copula. AIP Conf Proc. 2019;2168:020035.

    Article  Google Scholar 

  20. Kularatne T, Li J, Pitt D. On the use of Archimedean copulas for insurance modelling. Ann Actuarial Sci. 2021;15(1):57–81.

    Article  Google Scholar 

  21. Lancheros J. Ajuste por riesgo para el cálculo de la UPC en Colombia: ajuste desde las variables de estado de salud para las aseguradoras colombianas. 2019.

  22. Latorre L. Teoría de cópulas. Introducción y aplicaciones a Solvencia II. Madrid: Fundación MAPFRE; 2017.

    Google Scholar 

  23. Lim B, Zohren S. Time-series forecasting with deep learning: a survey. Philos Trans A Math Phys Eng Sci. 2021;379(2194):20200209.

    Article  PubMed  Google Scholar 

  24. Mahmoud, A., & Mohammed, A. (2021). A survey on deep learning for time-series forecasting. In A. Hassanien & A. Darwish (Eds.), Machine learning and big data analytics paradigms: analysis, applications and challenges (pp. 365–392).

  25. Ministerio de Salud y Protección Social. Estudio de la suficiencia y de los mecanismos de ajuste de riesgo para el cálculo de la Unidad de Pago por Capitación para garantizar el Plan Obligatorio de Salud en el año 2006. Bogotá, D.C.; 2006.

  26. Ministerio de Salud y Protección Social. Ley 1438 de 2011. Por medio de la cual se reforma el Sistema General de Seguridad Social en Salud y se dictan otras disposiciones. 2011.

  27. Ministerio de Salud y Protección Social. Resolución 5593 de 2015. Por la cual se fija el valor de la Unidad de Pago por Capitación (UPC) para la cobertura del Plan Obligatorio de Salud de los Regímenes Contributivo y Subsidiado para la vigencia 2016 y se dictan otras disposiciones. Bogotá D.C.; 2015a.

  28. Ministerio de Salud y Protección Social. Resolución 5925. Bogotá D.C.; 2015b.

  29. Ministerio de Salud y Protección Social. Resolución 6411 de 2016. Por la cual se fija el valor de la Unidad de Pago por Capitación -UPC para la cobertura del Plan de Beneficios en Salud de los Regímenes Contributivo y Subsidiado en la vigencia 2017 y se dictan otras disposiciones. Bogotá D.C.; 2016.

  30. Ministerio de Salud y Protección Social. Resolución 5268 de 2017. Por la cual se fija el valor de la Unidad de Pago por Capitación - UPC para el Plan de Beneficios en Salud de los Regímenes Contributivo y Subsidiado para la vigencia 2018 y se dictan otras disposiciones. Bogotá D.C.; 2017.

  31. Ministerio de Salud y Protección Social. Estudio de suficiencia y de los mecanismos de ajuste del riesgo para el cálculo de la Unidad de Pago por Capitación para garantizar el Plan de Beneficios en Salud para el año 2019. In Ministerio de Salud y Protección Social. Bogotá D.C.; 2019a.

  32. Ministerio de Salud y Protección Social. Resolución 3513 de 2019. Por la cual se fijan los recursos de la Unidad de Pago por Capitación - UPC para financiar los servicios y tecnologías de salud, de los Regímenes Contributivo y Subsidiado para la vigencia 2020 y se dictan otras disposiciones. Bogotá D.C.; 2019b.

  33. Ministerio de Salud y Protección Social. Estudio de suficiencia y de los mecanismos de ajuste del riesgo para el cálculo de la Unidad de Pago por Capitación, recursos para garantizar la financiación de tecnologías y servicios de salud en los regímenes Contributivo y Subsidiado. Año 2020. Bogotá D.C.; 2020a.

  34. Ministerio de Salud y Protección Social. Resolución 2503. Bogotá D.C.; 2020b.

  35. Ministerio de Salud y Protección Social. Estudio de suficiencia y de los mecanismos de ajuste del riesgo para el cálculo de la Unidad de Pago por Capitación, recursos para garantizar la financiación de tecnologías y servicios de salud en los regímenes Contributivo y Subsidiado. Año 2021. In Ministerio de Salud y Protección Social. Bogotá D.C.; 2021a.

  36. Ministerio de Salud y Protección Social. Resolución 2381. Bogotá D.C.; 2021b.

  37. Nelsen R. An introduction to copulas (Second Edi). Portland: Springer; 2006.

    Google Scholar 

  38. Novales, A. (2017). Cópulas. Retrieved from

  39. Oh R, Ahn J, Lee W. On copula-based collective risk models: from elliptical copulas to vine copulas. Scand Actuar J. 2021;2021(1):1–33.

    Article  Google Scholar 

  40. Parra, L. (2015). Modelamiento conjunto del número de siniestros y pagos por reclamación en seguros mediante una cópula mixta desde la perspectiva frecuentista y bayesiana. Retrieved from

  41. Petropoulos F, Apiletti D, Assimakopoulos V, Babai M, Barrow D, Ben Taieb S, Ziel F. Forecasting: theory and practice. Int J Forecast. 2022.

    Article  PubMed  PubMed Central  Google Scholar 

  42. Rao, K., Vecino, A., Roberton, T., López, A., & Noonan, C. (2022). Future health spending in Latin America and the Caribbean: health expenditure projections & scenario analysis. Washington, D.C.

  43. Riascos A, Alfonso E, Romero M. The performance of risk adjustment models in Colombian competitive health insurance market. SSRN Electron J. 2014.

    Article  Google Scholar 

  44. Riascos, A., Romero, M., & Serna, N. (2018). Risk adjustment revisited using machine learning techniques. Proceeding Series of the Brazilian Society of Computational and Applied Mathematics, 6(2).

  45. Richman R, Wüthrich M. A neural network extension of the Lee-Carter model to multiple populations. Ann Actuarial Sc. 2021;15(2):346–66.

    Article  Google Scholar 

  46. Shemyakin, A., Zhang, H., Benson, S., Burroughs, R., & Mohr, J. (2019). Copula models of economic capital for life insurance companies. Retrieved from

  47. Shi P, Feng X, Boucher J-P. Multilevel modeling of insurance claims using copulas. Ann Appl Stat. 2016;10(2):834–63.

    Article  Google Scholar 

  48. Shi, P., & Lee, G. (2022). Copula regression for compound distributions with endogenous covariates with applications in insurance deductible pricing. Journal of the American Statistical Association, 1–38.

  49. Shi P, Zhang W. Managed care and health care utilization: specification of bivariate models using copulas. North Am Actuarial J. 2013;17(4):306–24.

    Article  Google Scholar 

  50. Sklar A. Fonctions de répartition à n dimensions et leurs marges. Pub Inst Statist Univ Paris. 1959;8:229–31.

    Google Scholar 

  51. Tamraz M. Mixture copulas and insurance applications. Ann Actuarial Sci. 2018;12(2):391–411.

    Article  Google Scholar 

  52. Torres J, Hadjout D, Sebaa A, Martínez-Álvarez F, Troncoso A. Deep learning for time series forecasting: a survey. Big Data. 2021;9(1):3–21.

    Article  PubMed  Google Scholar 

  53. Werner, G., & Modlin, C. (2016). Basic ratemaking (Vol. 4; Casualty Actuarial Society, Ed.).

  54. Xie Y, Yang J, Jiang C, Cai Z, Adagblenya J. Incidence, dependence structure of disease, and rate making for health insurance. Math Probl Eng. 2018;2018:1–13.

    Article  Google Scholar 

  55. Young, V. (2014). Premium principles. In Wiley StatsRef: Statistics Reference Online.

  56. Zhao, X., & Zhou, X. (2012). Estimation of medical costs by copula models with dynamic change of health status. Insurance: Mathematics and Economics, 51(2), 480–491.

Download references


We would like to express our great appreciation to Oscar Melo, Freddy Hernández, Alejandra Sánchez, Liliana Blanco, Piedad Urdinola, Oscar López, Jaime Ramírez, Pedro do Nascimento, Raúl Macchiavelli, Angélica Ordónez, Giancarlo Romano, Esteban Orozco, Sergio Basto, Jhonathan Rodríguez, Diego Ávila, Kelly Estrada, and Juan-Camilo Vargas, for their valuable and constructive suggestions regarding this research. We are also grateful to the participants at the ‘XXXI International Statistical Symposium’ for their comments and suggestions. Oscar Espinosa acknowledges Professor Roger Nelsen for his teachings on the statistical theory of copulas.


This research did not receive specific aid from public sector agencies, the commercial sector or non-profit entities.

Author information

Authors and Affiliations



Concept and design: Espinosa, Acquisition of data: Ramos, Martínez, Analysis and interpretation of data: Espinosa, Martínez, Ramos, Bejarano, Drafting of the manuscript: Espinosa, Martínez, Statistical analysis: Bejarano, Espinosa, Supervision: Espinosa. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Valeria Bejarano.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

None declared by the authors.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



Figs. A1, A2, and A3.

Fig. A1
figure 6

Frequency of people served by the CR by age/sex group, 2013–2019

Fig. A2
figure 7

CR severity by age/sex group, 2013–2019 (2020 prices)

Fig. A3
figure 8

Distribution of those exposed to CR risk by age/sex group, 2013–2019

Tables A1, A2, A3, A4, A5, A6, and A7

Table A1 Proportion of distinct persons for each reference year
Table A2 Values of the frequency adjustment factors for each pricing year
Table A3 Values of the severity adjustment factors for each pricing year
Table A4  Estimated pure premium – CR-CPU for the remote region (COP)
Table A5  Estimated pure premium – CR-CPU for cities region (COP)
Table A6  Estimated pure premium – CR-CPU for special region (COP)
Table A7 Estimated pure premium – CR-CPU for normal region (COP)


Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Espinosa, O., Bejarano, V., Ramos, J. et al. Statistical actuarial estimation of the Capitation Payment Unit from copula functions and deep learning: historical comparability analysis for the Colombian health system, 2015–2021. Health Econ Rev 13, 15 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: