Assessing direct healthcare costs when restricted to self-reported data: a scoping review

Background In the absence of electronic health records, analysis of direct healthcare costs often relies on resource utilisation data collected from patient-reported surveys. This scoping review explored the availability, use and methodological details of self-reported healthcare service utilisation and cost data to assess healthcare costs in Ireland. Methods Population health surveys were identified from Irish data repositories and details were collated in an inventory to inform the literature search. Irish cost studies published in peer-reviewed and grey sources from 2009 to 2019 were included if they used self-reported data on healthcare utilisation or cost. Two independent researchers extracted studies’ details and the PRISMA-ScR guidelines were used for reporting. Results In total, 27 surveys were identified containing varying details of healthcare utilisation/cost, health status, demographic characteristics and health-related risk and behaviour. Of those surveys, 21 were general population surveys and six were study-specific ad-hoc surveys. Furthermore, 14 cost studies were identified which used retrospective self-reported data on healthcare utilisation or cost from ten of the identified surveys. Nine of these cost studies used ad-hoc surveys and five used data from pre-existing population surveys. Compared to population surveys, ad-hoc surveys contained more detailed information on resource use, albeit with smaller sample sizes. Recall periods ranged from 1 week for frequently used services to 1 year for rarer service use, or longer for once-off costs. A range of perspectives (societal, healthcare and public sector) and costing approaches (bottom-up costing and a mix of top-down and bottom-up) were used. The majority of studies (n = 11) determined unit prices using multiple sources, including national healthcare tariffs, literature and expert views. Moreover, most studies (n = 13) reported limitations concerning data availability, risk of bias and generalisability. Various sampling, data collection and analysis strategies were employed to minimise these. Conclusion Population surveys can aid cost assessments in jurisdictions that lack electronic health records, unique patient identifiers and data interoperability. To increase utilisation, researchers wanting to conduct cost analyses need to be aware of and have access to existing data sources. Future population surveys should be designed to address reported limitations and capture comprehensive health-related, demographic and resource use data. Supplementary Information The online version contains supplementary material available at 10.1186/s13561-021-00330-2.


Introduction
In the absence of electronic health records (EHR) linkable through unique patient identifiers, researchers must rely on collecting or using secondary healthcare-user data to assess healthcare utilisation and cost. This is the case in Ireland, which is one of seven Organisations for Economic Co-operation and Development countries that lack a comprehensive system of unique patient and provider identifiers [1]. Despite efforts to roll out Individual Health Identifiers and implement EHRs, data are not regularly captured in national statistics monitoring health system performance, patient safety or public health. Overall, employed systems lack interoperability, change is slow and often data are fragmented and records inaccessible [1][2][3], meaning that records in Ireland cannot be linked, yielding barriers to fully developing and using top-down costing approaches. Accordingly, it is difficult to obtain comprehensive descriptions of cost and resource use at a local or national level to inform policy and practice planning to prioritise prevention and match patient needs. While these barriers are not unique to Ireland they rarely occur cumulatively [1]. Consequential resource allocation decisions can lead to suboptimal patient care [4].
Cost analyses of healthcare resources provide information about current allocation and inform debates about rational and efficient future redistribution of resources for prevention and treatment of ill-health, complementing clinical evidence in cost-effectiveness analyses for example where used [5]. Reliable and detailed data could facilitate tailored resource allocations to individual patient and population groups, based on which the staffing and funding of healthcare services can be organised to meet current and future healthcare demands. This requires the identification of health threats, healthcare need and cost drivers of healthcare in the first place, all of which can then be addressed through intervention studies (trials) to identify cost-effective strategies for health policy and clinical practice.
For planning and evaluation, it is common practice to use the principle of attributable costs to establish the costs related to risk factors or long-term cost of a disease. The attributable cost approach requires that the cost of healthcare provided to two groups of individuals can be established by aggregating different sources of data about resource use and cost for each individual [6]. To attribute costs, data are needed in relation to the individual health status (e.g. disease history and outcomes; anthropometric measures; laboratory data; quality of life) and healthcare utilisation (e.g. type, frequency and time of health service use). For a comprehensive identification of cost drivers and needs, additional data requirements include demographic information (sex; age), information on health-related risk and behaviour (genetic predisposition; socio-economic status; environmental factors; lifestyle) and, to integrate a societal perspective, data on labour force participation (absenteeism; presenteeism) and opportunities (educational outcomes) are needed. To maximise accuracy, these various data from an individual should be linked, and available over time. Ideal sources include patient record files and (diseasespecific) national registries containing additional details on health-related risk factors, behaviours, and labour force participation [7].
While method guides exist for some jurisdictions, primarily for conducting health economic evaluations, inconsistencies persist with different costing methods and approaches yielding different results. In response, there are calls for more consistent and efficient costing methods [5]. Generally, health system costs can be assessed using top-down approaches (gross costing) which rely on centralised data repositories, such as hospital patient record files collected prospectively. Alternatively bottom-up approaches (micro or activity-based costing) using data collected directly from healthcare users, often through surveys, censuses or diaries. The advantages of top-down costing include the requirement of fewer resources and provision of better opportunities for generalisation, but at the expense of precision when informing economic evaluations of interventions. Bottom-up costing is more laborious, but provides greater insights into the relationship between activity and cost characteristics, the economies of scale, and the relative costs of different activities and variation in patient-related cost [8,9].
Thus, when analysing data on healthcare utilisation in jurisdictions without HER or linked systems, like Ireland, researchers have limited options. Individual data may be collected from 'top-down' perspective from individual service providers, incurring a high administrative and resource burden. Alternatively, self-reported user data may be used. Often this includes cross-sectional and longitudinal quantitative data collected from general population health surveys or disease-specific surveys and extrapolated to population level, from a 'bottom-up' approach. In comparison, trial data from intervention studies typically involve small disease-and location-specific populations and are less representative of and generalisable to regional or national populations. While data on healthcare utilisation and cost measures gathered from quantitative surveys potentially lack precision, they frequently represent the best available data to assess healthcare costs in jurisdictions like Ireland. While the existence of this issue is commonly acknowledged, here we explore the extent to which general population health surveys are used to overcome it when conducting cost analyses. In doing so we firstly identify the self-reported health service utilisation and cost surveys/datasets available, and consider their suitability for costing direct healthcare use. Secondly, we conduct a scoping review to identify and examine Irish costing studies (including cost analysis, cost estimation and cost of illness) that applied these surveys/datasets to calculate direct healthcare costs.

Methods
An inventory of Irish population health surveys was developed and a scoping review [10,11] was conducted to map the methodologies and data sources used in Irish studies of direct healthcare costs.

Search strategy
Initially, the Irish Social Science Data Archive, safefood, Central Statistics Office and the Department of Health websites and repositories were searched to identify population surveys that contain data on health, healthcare utilisation or healthcare-related cost. To obtain additional information for each of the identified surveys, survey-specific websites and original questionnaires were reviewed.

Analysis
For each population health survey, information was extracted on data collection tools, sample characteristics and the type of healthcare services for which data were collected. Additionally, availability of demographic information, socio-economic information, health status and lifestyle variables from surveys were recorded. This information is collated in an inventory.
For the scoping review of published studies, two researchers independently reviewed the abstracts of all retrieved citations. Initial charting included basic information of the underlying dataset and availability of data on healthcare utilisation or costs. Full-text studies and reports were retained if they (1) used survey or otherwise self-reported data on healthcare utilisation or cost, including patients' direct out-of-pocket costs, and (2) used these data to assess direct healthcare costs. Whereby direct healthcare costs only refer to resources used in the healthcare sector (excluding resources used in other sectors, such as social care) [12]. The following exclusion criteria were applied: Publication outside the study period and region; studies reporting indirect healthcare costs only; studies using simulations, decision-trees or Markov-models only; studies applying primary data from other jurisdictions to the Irish setting; and study protocols. Intervention studies reporting treatment costs (clinical trials) were excluded as these are often not representative of the Irish population and data not always self-reported. Patient registry studies were excluded as these data are often not self-reported.
Studies qualifying for inclusion were reviewed in full and information of data sources and costing methodologies extracted. In particular, details of perspectives and costing methods were identified, areas of health services for which costing was undertaken and reflections on costing methods and datasets were synthesised. Methodological quality and risk of bias were not assessed as this not a common component of scoping reviews [10]. Findings were reported using the PRISMA-ScR guidelines [13,14].

Survey inventory
We identified 21 population surveys or survey tools that contain data on health status, healthcare utilisation or healthcare cost, available from Irish data repositories ( Table 1). The literature review identified six additional surveys that researchers used on an ad-hoc basis to inform their assessment of direct healthcare costs (included on Table 2).
The earliest of these surveys started collecting Irish health data in the 1970s (Census of Population), and approximately half include repeated longitudinal or crosssectional data collection. Sample sizes vary, ranging from 100 participants in an ad-hoc survey (e.g. in the Enhancing Care in Alzheimer's disease (ECAD) study) to thousands of participants in the nationally representative longitudinal surveys, (e.g. Growing Up in Ireland (GUI) '98 (n = 8570) and '08 cohorts (n = 11,134); the Irish Longitudinal Study on Ageing (TILDA) cohort (n = 6279)); cross-sectional repeated surveys (Irish Health Survey (n = 10,323), EU-SILC-Ireland (n = 11,130), etc.); and national single-use surveys (All Ireland Traveller Health Study (n = 8430) and Irish Study of Sexual Health and Relationships (n = 7668)).
Of those surveys assessing healthcare service use (n = 16), most include data on primary care use (n = 14); more than half provide information on specialist services (n = 11), medicinal products (n = 10) or hospital outpatient service use (n = 9). Whereas fewer collect data on  1971,1979,1981,1986,1991,1996,2002,2006      The patient economic impact questionnaire was a study-specific questionnaire developed by the research team, informed by existing instruments, interviews and focus group discussions with survivors, and consultation with health professionals References in table  Gallagher,  emergency department visits (n = 7), hospital inpatient admissions (n = 6) and number of hospital bed days (n = 7). Almost all surveys provide demographic (n = 24) and socio-economic information (n = 24), many reported on the health status of participants (n = 19) and lifestyle variables (n = 14).

Summary of studies
The reviewed full-texts cover the full search period Lastly, to estimate costs related to osteoarthritis and rheumatoid arthritis, Doherty and O'Neill (2014) compare self-reported healthcare use (primary care, inpatient and outpatient hospital care and A&E) in adults aged 50 years and older with these arthritic conditions to healthcare use in the same age group without these conditions, collected in the nationally representative TILDA study (n = 8093).
For the various healthcare services assessed, recall periods differed across studies and sometimes within one study, ranging from one week to ten years or time since relapse or diagnosis; however most studies assessed a fixed period of six [23] or twelve months [18-20, 22, 24-27]. Where variable recall periods were assessed in one study, use of home help was assessed in the past week, medication use in the past month, diagnostic tests, outpatient visits and primary care in the past 6 months and inpatient hospital care in the past twelve months [15,21]. One study combined six-and twelve-month recall periods due to the structure of the underlying primary data [17].

Data quality and reliability
When addressing study limitations, 13 studies referred to data issues [15][16][17][18][19][20][21][22][23][25][26][27][28]. Concerns included absence of resource use data [17,18,28], low response rates and potential for sampling bias and incomplete data, particularly in small dedicated ad-hoc surveys [16,21,27], lack of suitable longitudinal/cross-sectional data to inform reliable bottom-up estimates [18,22], reliance on dichotomous utilisation measures [19] and limited range of services [20] captured in population surveys. Issues surrounding the availability of data led researchers to employ a mix of top-down and bottom-up costing methodologies [17]. Where bottom-up only was employed, authors express concern regarding the generalisability of these results [22]. Additional concerns were highlighted relating to recall bias, which is inevitable when relying on retrospective patient surveys, rather than registry data [15,16,21,27]. This was raised particularly where surveys were completed on behalf of patients [23,25,26]. Risk of bias due to patient selfselection was mentioned in one study that collected information from a disease-specific patient sample [27]. As a side issue, several studies highlight the lack of reference costs for Ireland [17,22,23] and the fragmented nature of systems preventing linking [22].
Researchers addressed some of these limitations through sensitivity and sub-analyses and by providing estimated cost ranges, in addition to single figures presented in their main analysis [17-19, 21-24, 27]. These cost ranges were typically compared to international and, where available, national literature, and reasons for potential differences between study findings and the literature were frequently explored [15][16][17][18][19][21][22][23][24][25][26][27][28]. Patients with disabilities or comorbidities were explicitly excluded from some studies, or the analysis was adjusted for the added costs due to these conditions, so as not to overestimate disease-specific costs [17][18][19]23]. Generally, researchers sought to use samples that were nationally or regionally representative of the wider population [15,18,19,24], or of the patient group studied [15,16,22]. Where representation was not guaranteed, data were often weighted using national statistics or prevalence data before costs were extrapolated to the population level [15, 17-19, 21, 22, 24, 27, 28]. Additionally, bootstrapping was applied to address uncertainties related to small or unbalanced samples [17,21,23], and logarithmic transformations were applied to account for skewness of data [21,22]. Where ad-hoc surveys were used, their design was often based on tested survey tools and included validated instruments (e.g. to assess disease status) [15,17,21,23,[25][26][27]; some adhoc surveys were pre-piloted [22,[25][26][27] and mechanisms were used to increase participation (e.g. reminder letters) [15,16,25,26] and completion (e.g. telephone interviews) [27]. Lastly, the interviewers of the TILDA data used in Richardson et al. (2012) asked participants to report and subsequently show interviewers the medication they were taking, to allow interviewers retrieve the details for all medication provided, for more accurate estimation of polypharmacy costs.

Discussion
This scoping review identified 14 studies from 2009 to 2019 using self-reported data on healthcare use or cost to assess direct healthcare costs in Ireland. Our search identified 27 surveys that provide self-reported data on health or healthcare use; only four pre-existing population surveys and six ad-hoc surveys were used to estimate costs according to our search criteria.
Despite the inclusion of self-reported healthcare use in many Irish datasets, this review demonstrates their limited application for the estimation of healthcare costs. One explanation could be related to the concern among some researchers that existing datasets are not fit for purpose, for example due to the limited choice of healthcare services for which data are collected, therefore necessitating the supplementation with published data and dedicated data collection. However, as this review reveals, such ad-hoc data collection efforts yield small samples, which are subject to selection bias. In fact, a number of identified nationally representative datasets with large sample sizes provide data on healthcare service utilisation, along with health status, demographic, socio-economic and lifestyle information. Many of these datasets consist of repeated cross-sectional or longitudinal survey waves, but have not been subject to cost assessments in Ireland to date. This could be linked to low awareness of these datasets among researchers and health economists (representation of the latter is relatively low in Ireland); difficulties in accessing datasets through publicly available sources and limited funding opportunities for analysing existing datasets. Research funding opportunities in Ireland tend to be biased towards collecting new data. Encouragingly, however, surveys designed and administered through research institutions (e.g. GUI, TILDA) are more likely to be utilised for cost analysis; suggesting that involving (health economic) researchers early in survey design could potentially increase the usability of population health surveys for cost analysis.
The studies included in this review assess the costs of multiple sclerosis, overweight and obesity, chronic noncancer pain, arthritis, dementia, very preterm birth, colorectal cancer and polypharmacy. This shows a tendency to non-acute, chronic conditions with a high prevalence that concern predominantly the older ages, or very young people and their future costs. It is questionable whether this implies sufficient resource coverage/ provision in acute care in Ireland, or may owe to the availability (and thus awareness) of large survey data particularly for the older ages (TILDA) and children (GUI). While all authors mention gaps in data availability, there is a considerably strong emphasis on the dearth in unit cost data particularly relating to older age [17] and demand for more accurate and reliable population-based information on health service utilisation, school/work productivity and psychosocial wellbeing of children [24].
Our survey inventory revealed mixed levels of data coverage. A number of population surveys convey a multitude of questions on use of various healthcare services, demographics, health-related risk and behaviour (GUI, SPHERE, the Irish Health Survey supplement to the QNHS and Healthy Ireland); making these surveys potentially more suitable for cost assessments relative to other surveys identified. Repeated surveys that are potentially still on-going and that collected data only for some healthcare services, e.g. primary care, should consider future extension to assess resource use more comprehensively, including use of outpatient hospital services (EU-SILC, Healthy Ireland, Household Budget Survey, Irish Health Survey-QNHS, SLAN and TILDA). This is in line with data limitations highlighted in studies that used some of these surveys for cost analysis. Particularly, these studies were able to investigate GP service use [18][19][20]24], inpatient hospital care [18][19][20], outpatient hospital services [18][19][20], emergency department visits [20] and medication use [28]. In contrast studies using ad-hoc surveys additionally investigated various specialist services including imaging and laboratory investigations [15,17,[21][22][23][25][26][27] (rehabilitation, respite care and long-term care services [15,21,23], mental health services [22,23,27], alternative therapies [22,27] and various equipment, home modification and transport costs (including ambulance care) [15,16,21,22]. Broad population surveys may accordingly benefit from the addition of an open-ended question where participants can detail the frequency and type of "other" healthcare services used during the study period.
Other (repeated) population surveys also would be more instrumental for comprehensive cost analysis if questionnaires were extended in future data collection. Namely, despite assessing healthcare use, the Household Budget Survey in its current form does not contain data on health status and lifestyle that would be needed for accurate cost analysis. Moreover, the Census and European Social Survey that are carried out routinely may be a good basis for collecting data on healthcare use and lifestyle, which is currently not the case. Similar to the estimation of healthcare use, most studies relied on a variety of sources to identify unit prices, suggesting that a central repository beyond the existing national casemix price list is needed. Some advances continue to be made in relation to this; however more healthcare services and consideration of methodologies still need to be incorporated [29].
While providing more detail on healthcare use, ad-hoc collection of data often led to relatively small, regional samples in comparison to studies using large datasets, thus impeding the studies' generalisability. Ad-hoc surveys sampled through national patient registers [15,16] or from multiple healthcare service providers [27,30] seemed to provide larger sample sizes than ad-hoc surveys sampled from individual healthcare service providers [21][22][23]; however effective sampling strategies are needed to reach representation and complete survey response [16]. Additionally, bootstrapping can be used to address concerns around distribution, e.g. to handle non-parametric data and small sample sizes [23]. Nevertheless, larger population surveys or centralised health system data appear preferential when representation cannot be otherwise reached.
Most studies collected information on resource use over a period of six to twelve months, with shorter periods or longer periods used for frequent use (e.g. medications and home care) or infrequent use (e.g. aids and home modifications), respectively. Studies identified these time periods as most appropriate for cost analysis and to involve a relatively low risk of recall bias based on previous research. However, all studies collected data retrospectively and risk of recall bias therefore cannot be fully ruled out. In response to this, use of diaries for prospective data collection was suggested for future studies [22], indicating the need for a more proactive approach to integrating healthcare use data in budgetary planning.
The challenges associated with lack of standardisation and interoperability of Irish health data is recognised and there are plans for reform [31,32]. While details about the full implementation of an EHR system in Ireland are unknown [1], a data repository (eHealth Ireland Open Data Portal) providing information on health service use in Ireland has been established. However, this is limited to public health services only and currently contains a limited set of healthcare use data [33,34]. Overall, more digital solutions to manage patient data and connect health information within and across service providers and systems are being implemented. Examples include the National Medical Laboratory Information System [35], Digital Ambulance Project [36], Primary Care IT [37] and Epilepsy Lighthouse Project [38]. Most recently, the rollout of the national COVID-19 Vaccine Information System has highlighted the need for using unique patient identifiers in healthcare services, to ensure equitable, complete and safe access to vaccinations and to enable the system's interoperability within Europe through use of standardised vaccination certificates. Out of this need, the system draws on the existing Individual Health Identifiers that were developed on the basis of the Health Identifiers Act 2014, with the explicit aim to monitor and share personal details on health status, healthcare use and demographic information among healthcare providers and national institutions [39]. While this development shows promise to fill data gaps in the future, the implementation and integration of many individual initiatives and the national EHR system remain incomplete; thus the availability of integrated, comprehensive data for research purposes is likely to take time.
Accordingly, in Ireland and other jurisdictions with similar data restrictions, researchers continue to rely on self-reported data to conduct cost analysis. While researchers' concerns around recall bias and incomplete cost data indicate that self-reported data are sub-optimal for cost analysis, a recent study [40] demonstrated there is little difference in precision of cost estimates derived from registry data compared to well-designed health survey questionnaires. Another study highlighted that cost accuracy is primarily driven by the type of healthcare service studied [8]. Nevertheless, a review that compares cost studies using top-down, bottom-up or mixed costing approaches would be interesting in future research.
Meanwhile, acknowledging existing gaps that should be addressed in future data collection, the identified surveys and studies in this review can be used as goodpractice guides informing future survey development that enable comprehensive assessments of healthcare costs. Particularly, longitudinal and repeated crosssectional surveys, in their combined form, can provide representative estimates of healthcare use enabling cost comparisons across age groups and time. Future surveys need to be meaningful and fit for purpose; stakeholder approaches should be adopted where possible to enhance the value of the resulting data. Finally, to avoid future underutilisation of expensive population health surveys, generated data should be in line with the Findable, Accessible, Interoperable and Reusable (FAIR) Data Principles [41].

Limitations
Despite the use of a systematic search strategy, broad search criteria and the inclusion of both peer-reviewed and grey literature, it is possible that not all studies assessing healthcare costs using self-reported healthcare service use data were identified. While this review did not aim to retrieve a complete set of existing studies, cross-referencing and consistency checks using alternative search terms were employed to minimise the risk of bias. Similarly, despite using a broad search strategy, there is a risk that health surveys may be missing in the inventory. Therefore, we encourage other researchers to amend and publish an extended inventory if applicable. Furthermore, it should be noted that the analysis of underlying survey data in terms of their methodological quality is outside the scope of this research.

Conclusion
This scoping review identified 27 health surveys and 14 costing studies that used healthcare use or cost data from self-reported data, to assess direct healthcare costs in Ireland. The identified surveys that have the potential for analysing healthcare costs in Ireland are collated in an inventory. Researchers in and outside of Ireland can use this inventory to review data availability in the future. In the absence of integrated EHRs and unique patient identifiers in jurisdictions like Ireland, carefully designed healthcare surveys are useful tools for cost assessments; however barriers related to awareness, access and usability of these data must be considered.