Preference-based measures of health-related quality of life in congenital mobility impairment: a systematic review of validity and responsiveness.

Introduction Mobility impairment is the leading cause of disability in the UK. Individuals with congenital mobility impairments have unique experiences of health, quality of life and adaptation. Preference-based outcomes measures are often used to help inform decisions about healthcare funding and prioritisation, however the applicability and accuracy of these measures in the context of congenital mobility impairment is unclear. Inaccurate outcome measures could potentially affect the care provided to these patient groups. The aim of this systematic review was to examine the performance of preference-based outcome measures for the measurement of utility values in various forms of congenital mobility impairment. Methods Ten databases were searched, including Science Direct, CINAHL and PubMed. Screening of reference lists and hand-searching were also undertaken. Descriptive and narrative syntheses were conducted to combine and analyse the various findings. Results were grouped by condition. Outcome measure performance indicators were adapted from COSMIN guidance and were grouped into three broad categories: validity, responsiveness and reliability. Screening, data extraction and quality appraisal were carried out by two independent reviewers. Results A total of 31 studies were considered eligible for inclusion in the systematic review. The vast majority of studies related to either cerebral palsy, spina bifida or childhood hydrocephalus. Other relevant conditions included muscular dystrophy, spinal muscular atrophy and congenital clubfoot. The most commonly used preference-based outcome measure was the HUI3. Reporting of performance properties predominantly centred around construct validity, through known group analyses and assessment of convergent validity between comparable measures and different types of respondents. A small number of studies assessed responsiveness, but assessment of reliability was not reported. Increased clinical severity appears to be associated with decreased utility outcomes in congenital mobility impairment, particularly in terms of gross motor function in cerebral palsy and lesion level in spina bifida. However, preference-based measures exhibit limited correlation with various other condition-specific and clinically relevant outcome measures. Conclusion Preference-based measures exhibit important issues and discrepancies relating to validity and responsiveness in the context of congenital mobility impairment, thus care must be taken when utilising these measures in conditions associated with congenital mobility impairments.


Mobility impairment and assistive mobility technology
Mobility impairment is the leading cause of disability in the UK, accounting for 52% of reported disabilities [1]. Mobility impairments arise from a vast array of different disabilities, conditions, injuries and illnesses. However, they can be classified broadly as either congenital (i.e. from birth) or acquired (i.e. occurring later in life). Whether a disability is present from birth or acquired later on in life significantly influences individual adaptation. For instance, individuals with congenital disabilities exhibit higher degrees of life satisfaction, self-identity and self-efficacy (related to their disability) than individuals who have had to adapt to acquired disability [2]. Adaptation to disability is influenced by self-concept and disability identity, which in turn are related to the onset of disability [2].
Common congenital conditions which can impact mobility include cerebral palsy (CP) and spina bifida (SB). CP refers to a number of conditions caused by damage to the parts of the brain which control movement, balance and posture, and can be caused either by abnormal brain development or trauma. CP is symptomized by varying degrees of permanent movement disorder, including poor coordination, muscle stiffness/weakness and involuntary movements. SB affects the development of the spine and spinal cord before birth, and can result in leg weakness and paralysis. There are three types of SB: myelomeningocele, meningocele and SB occulta.
Both congenital and acquired mobility impairments may necessitate the use of assistive technology to alleviate impairments. Assistive technology refers to a wide array of products and services which enhance functioning, participation and promote independence for people who have disabilities. The Medicines and Healthcare Products Regulatory Agency defines assistive technology as any device "intended to compensate for or alleviate an injury, handicap or illness or to replace a physical function" [3]. Assistive technology, such as wheelchairs, are an "essential component for inclusive sustainable development" [4], and can enhance the fundamental freedoms and equality of opportunity for people with mobility impairments and other disabilities. The United Nations (UN) states that access to appropriate and affordable assistive technology is a basic human right [5]; the UN Convention on the Rights of Persons with Disabilities has been ratified by 175 Member States, who are obligated to ensure that affordable assistive technology is available to all individuals in need. However, The World Health Organization (WHO) estimates that only 10% of people who need assistive technology have access to it [6], and there remain persistent challenges in the equitable provision of assistive technology, particularly in developing countries. One of the keys issues is in assessing the costs and benefits of different assistive technologies, and developing evidence-based approaches to provision which make best use of limited resources to maximise the outcomes of people with disabilities.
The National Health Service (NHS) in the UK spends almost £200million per year on wheelchairs alone [7], thus there is an imperative to ensure that assistive mobility technologies (AMTs) such as wheelchairs and other mobility-enhancing interventions are provided in an evidence-based manner, utilising evidence of costeffectiveness to guide service commissioning.

Economic evaluation and quality-adjusted life years
Methods of economic evaluation are now routinely embedded in the evaluation of health technologies, and used to estimate the cost of incremental benefits associated with new and alternative health interventions. Costutility analysis, specifically estimation of cost per qualityadjusted life years (QALYs), has become the predominant form of economic evaluation for new health technologies in the UK, in part due to the National Institute for Health and Care Excellence's (NICE) advocacy for this approach [8]. The QALY framework has become increasingly influential in health policy as a theoretically universal and generic approach to measuring benefits via a single common outcome.
In order to calculate QALYs, health state utility values are needed. These values are most commonly derived from preference-based measures (PBMs) of healthrelated quality of life (HRQoL). HRQoL is a subjective and multi-dimensional construct defined as the perceived impact of health status on quality of life, including physical, psychological and social functioning. PBMs of HRQoL are used to assess the social desirability and utility values associated with different states of health.
As the descriptive systems and value sets of generic PBMs are usually derived from adult samples of the general population, a common criticism is that their genericity limits relevance and sensitivity in certain conditions [9]. Moreover, in health states where quality of life takes precedent over quantity of life (e.g. chronic illness, lifelimiting conditions and disability), QALYs derived from generic PBMs can devalue the effectiveness of an intervention [10].

Use of preference-based measures in the context of mobility impairment
The accuracy of a QALY estimate is subject to the sensitivity and applicability of the measurement tool used to generate the utility data. PBMs have been found to be inconsistent in both congenital and acquired mobility impairments [11][12][13], furthermore different PBMs produce significantly different results for AMT users [14,15].
Previous research shows that patients with congenital mobility impairments do not necessarily consider mobility to have a major impact on their HRQoL when suitable adaptations (such as AMT) are available [16,17]. However, general population PBM value sets heavily impact estimation of HRQoL when ability to walk is affected. As an example, using the NICE approved UK value set for the EuroQoL five-dimension (three level version) (EQ-5D-3 L), the lowest possible mobility level ('confined to bed') has a disutility of − 0.664, meaning that an individual who is unable to walk but is otherwise mobile using AMT can achieve a maximum utility value of 0.336 (0 = death; 1 = perfect health), even if they have no other HRQoL impacts. This raises the questions as to whether existing PBMs are a valid source of utility values in mobility impaired populations, particularly AMT users.
The validity of PBMs can be tested by comparing results across groups of patients and by comparing generic PBMs with condition-specific measures. For instance, the EQ-5D-3 L and the Health Assessment Questionnaire (HAQ; an outcome measure for rheumatoid arthritis) both measure health status in significantly different ways [18], suggesting that the EQ-5D-3 L is lacking consideration of important health impacts associated with rheumatoid arthritis. These issues are partly due to the insensitivity of the EQ-5D-3 L 'mobility' dimension to accurately assess the varied impacts of rheumatoid arthritis on mobility [19]. Similarly, the limited level choices on the EQ-5D-3 L have been found to cause some individuals with mobility impairments to choose levels which are more or less severe than their actual state, such as using 'I am confined to bed' to substitute being confined 'to an electric wheelchair' [20]. The updated five level version of the EQ-5D (EQ-5D-5 L) is unlikely to address this issue as the five level choices still focus on walking and do not take account of alternative methods of mobility.
Even simple generic measures of health status, such as the single question self-reported health (SRH) scale (i.e. "in general, would you say your health is excellent, very good, good, fair, or poor?"), exhibit only limited correlation with PBMs such as the Health Utilities Index (HUI) 3 and Assessment of Quality of Life (AQoL) in the context of SB [21] and CP [22]. Likewise, for individuals with spinal cord injuries, the wording of the 36-Item Short Form Health Survey (SF-36) (from which the Short-Form Six-Dimension (SF-6D) PBM is calculated) must be modified in order to maintain relevance [23].
Considering the potential issues of using generic PBMs in disability, and congenital mobility impairment specifically, it is apparent that there are a number of important considerations when using PBMs to evaluate AMT interventions and other mobility-enhancing interventions for people with congenital mobility impairments. The objective of this systematic review is therefore to examine the measurement properties of generic utilitybased PBMs in various forms of congenital mobility impairment. All evidence reporting (or inferring) the validity, reliability and/or responsiveness of PBMs in conditions associated with congenital mobility impairment was collated and synthesised. Comparable condition-specific reviews of PBM performance have been conducted in mobility impairments such as rheumatoid arthritis [24], CP [25] and multiple sclerosis [26], however PBM performance has not been systematically summarised and collated across a range of congenital mobility impairments to date.

Methods
This systematic review followed the University of York Centre for Reviews and Dissemination (CRD) principles for conducting searches and extracting data [27]. Internet reference database searching was the main strategy for gathering evidence. Databases included: Cochrane Collaboration Register and Library, Science Direct, CINAHL, ASSIA, PsychINFO, PubMed and Web of Science. Screening of reference lists, hand-searching and targeted searching via the CRD database (which covers the Database of Abstracts of Reviews of Effects (DARE), the NHS Economic Evaluation Database (NHS EED) and the Health Technology Association (HTA) database) were carried out in addition to the primary database searches. Due to limited translation resources, only studies written or translated into English or Welsh were eligible for inclusion. Search results were managed using the online bibliographic management software Refworks (for storage of titles from the systematic searches) and Mendeley (for referencing purposes). The systematic review protocol was registered on PROSPERO (CRD42018088932).

Search terms
For the purpose of this review, mobility impairment was defined as any congenital (i.e. present from or shortly after birth) condition, impairment, disability or illness which causes significant restrictions to mobility for 12 months or longer, and which necessitates the use of AMT, surgery or rehabilitation to maintain, facilitate or substitute ambulation, or to reduce complications related to mobility impairment. Acquired mobility impairments (i.e. not present from birth) and short-term injuries, such as sprains or acute muscular injuries, were not included under this definition of mobility impairment.
Search terms included a mixture of MeSH (Medical Subject Heading) and non-MeSH words and phrases, divided into two groups: 'population' and 'outcomes' (see Table 1).
In order to identify studies referring to interventions rather than patient groups (e.g. studies examining 'wheelchair users' more generally), the 'population' search terms also covered relevant AMTs and mobility-enhancing interventions. The 'outcomes' search terms covered relevant PBM keywords, including specific outcome measures (such as the various versions of the EQ-5D and HUI measures). An NHS posture and mobility service manager was consulted to refine the search terms. An example of a search string is shown in Table 2.

Study eligibility
Any study reporting the performance of PBMs of HRQoL in patient groups with congenital mobility impairments was eligible for inclusion. This included studies reporting proxy outcomes. There was no restriction on study type. Studies focussing solely on non-PBMs of HRQoL or acquired mobility impairments were excluded. Studies which included patient groups with varying degrees of disability/disease severity were considered for inclusion if the majority of patients had a congenital mobility impairment or if the data for patients with congenital mobility impairments was reported separately (i.e. sub-group analysis).

Screening
Two researchers undertook each stage of the screening process. For the initial screening process, all identified studies were assessed for relevance based on their title and descriptor terms, the remaining studies were then assessed by their abstract. All studies considered relevant after the initial screening process were then obtained in full. Both reviewers screened each study independently. A third researcher was consulted when there was disagreement about the inclusion of a specific study.

Quality appraisal
All relevant studies which met the initial inclusion criteria were critically appraised for methodological quality by two researchers. Quality appraisal methods were adapted from similar systematic reviews in other clinical areas [28][29][30]. Quality appraisal was not used to exclude studies, but to illustrate the overall quality of research conducted in this topic area. Quality appraisal focussed on six key areas: Whether tests of statistical significance were carried out Differences between interventions and/or patient groups (i.e. sub-group analysis) Clinical significance and relevance of results Reporting of missing data Response and completion rates Explicit reporting of inclusion/exclusion criteria

Data extraction
Data extraction criteria included: Study characteristics: study type, country, number/ composition of study groups, missing data Demographics: number of participants, age, gender, type/severity of mobility impairment Measures: Generic PBMs used, condition-specific measures used, other clinically relevant measures used Outcomes: Mean utility scores, mean utility scores for relevant sub-groups, statistical significance between groups Performance: Known group analyses, convergent validity (correlation between outcomes and/or respondent types), responsiveness, reliability, response/completion rates Analysis of the performance of preference-based measures PBM performance indicators were adapted from COS-MIN measurement property guidance for health-related patient-reported outcomes [31].

Assessment of validity
In this context validity refers to the extent to which a PBM can be considered to measure what it has been designed to measure (i.e. HRQoL), and whether it does so in a systematic manner. By establishing whether a specific PBM is sufficiently valid in a particular patient group, greater confidence can be placed in the generated data. We focussed predominantly on construct validity in this review.
Construct validity was assessed in a number of ways. Firstly, known-group analyses were used to assess whether specific PBMs were able to detect expected differences between different patient groups (i.e. variance due to severity of illness). Secondly, convergent validity was determined by examining correlation between comparable outcomes, for instance between PBMs and condition-specific measures with comparable constructs. Finally, convergent validity was further examined by looking at correlation between respondents types (i.e. self-reported and proxy utility outcomes). In the interest of uniformity, the strength of correlations was defined as absent (r < 0.20), weak (r = 0.20 to 0.35), moderate (r = 0.35 to 0.50) and strong (r ≥ 0.50) [32].

Assessment of reliability
Reliability refers to the replicability of results. Reliability is commonly assessed by examining test-retest results and inter-rater reliability of PBMs in defined unchanging patient groups.

Assessment of responsiveness
Responsiveness refers to the extent to which a measure can identify changes in health status [28]. A responsive measure should be able to detect clinically significant changes in health outcomes over time [29]. Responsiveness was determined by examining the relationship between outcomes derived from PBMs and other relevant measures, before and after an intervention.

Evidence synthesis
Descriptive synthesis of search results was conducted [27] and presented in tabulated form. Narrative synthesis was undertaken to develop a structured narrative of results; extracted results were grouped by type of mobility impairment and category of PBM performance.

Results
See Fig. 1 for search outcomes and the screening process flowchart. Searches were conducted from March to May 2018. In total 1489 study articles were identified: 1332 from the bibliographic searches and 157 articles from other sources (i.e. CRD database, screening of reference lists and hand-searching), of which 410 duplicates were removed. After screening of titles and abstracts, 66 Table 2 Example of keyword search string ("Assisted mobility" Or "Assistive mobility" Or "Brain damage*" Or "Brain injur*" Or Buggy Or Caliper Or Cane Or "Cerebral palsy" Or "Club foot" Or Clubfoot Or Crutch* Or Diplegi* Or Dysmelia Or Dystroph* Or "Electric chair" Or "Electric powered indoor outdoor chair" Or "Electric powered indoor/outdoor chair" Or "Electric scooter" Or "Electrically powered indoor outdoor chair" Or "Electrically powered indoor/outdoor chair" Or "Electronically powered indoor outdoor chair" Or "Electronically powered indoor/outdoor chair" Or encephal* Or EPIOC Or "Functional disab*" Or Handicap* Or Hemiplegi* Or Hydrocephalus Or "Knee scooter" Or "Knee walker" Or "Mobility aid" Or "Mobility device" Or "Mobility dis*" Or "Mobility equipment" Or "Mobility impair*" Or "Mobility scooter" Or "Mobility technolog*" Or "Motor dis*" Or "Motorised scooter" Or Neurodisability Or "Neurological dis*" Or "Neuromotor dis*" Or "Neuromuscular dis*" Or Orthoti* Or "Osteogenesis imperfect" Or Paraly* Or Paraplegi* Or "Physical disab*" Or "Physical impair*" Or "Physically disab*" Or "Physically impaired" Or "Power chair" Or "Powered chair" Or Pushchair Or Quadriplegi* Or Rollator Or Scooter Or "Spina bifida" Or "Spinal muscular atrophy" Or Talipes Or Tetraplegi* Or "Walk aid" Or "Walk-aid" Or Walker Or "Walking aid" Or "Walking frame" Or "Walking stick" Or "Walking-aid" Or Wheelchair) AND (15D OR AQoL OR "Assessment of Quality of Life" OR "Child health utilities" OR "Child health utility" OR CHU9D OR "CHU-9D" OR EQ. 5D OR "EQ-5D" OR EuroQoL OR "Health utilities" OR "Health-utilities" OR HUI OR HUI2 OR HUI3 OR "Preference-based" OR "Preference based" OR QALY OR "Quality adjusted life year" OR "Quality of well-being scale" OR "Quality-adjusted life year" OR "QWB-SA" OR "Short Form Six Dimension" OR "Short From 6 Dimension" OR SF6D OR "SF-6D") studies were identified as potentially eligible. Following full review of full-texts, 35 studies were excluded for a variety of reasons (see Fig. 1). In total 31 studies were identified as relevant and eligible for inclusion in the systematic review. Quality appraisal outcomes are presented in Table 3.

Narrative synthesis
Utility outcomes are presented in Table 5 and PBM performance outcomes are presented in Table 6. The narrative synthesis is categorised by type of mobility impairment and PBM performance indicator. No studies reported PBM reliability outcomes, therefore reliability results have not been presented.

Cerebral palsy Known-group analyses
Five studies reported known-group analyses in CP [22,33,51,53,58]. Petrou and Kupek [51] estimated that the adjusted HUI3 disutility of childhood CP from perfect health was − 0.72 (95% confidence interval [CI] -0.61 to − 0.85), and − 0.65 (95% CI − 0.54 to − 0.78) from childhood norms. Two studies found that as CP severity (i.e. gross motor function) increased, average utility scores (measured using HUI3 or AQoL) decreased in adolescents and young adults with CP [22,53]; Rosenbaum et al. [53] found statistically significant differences in mean utility scores between most Gross Motor Function Classification System (GMFCS) levels (p < 0.01). One study demonstrated that the vision, pain and cognition dimensions of the HUI3 steadily declined as GMFCS level increased, however statistical significance was not reported [33]. Vitale et al. [58] found that adolescents with CP had significantly higher average EQ-5D utility scores (0.92) compared to adolescents with scoliosis (with comorbidities) (0.73; p > 0.05), although selection of these patient sub-groups was not explicitly justified and the version of the EQ-5D was not reported.

Convergent validity: comparing measures
Two studies reported that GMFCS level was correlated with worsening utility scores [22,53]. Young et al. [22] found that GMFCS level in childhood was responsible for between 45% (AQoL: β = − 0.148; p < 0.001) and 53% (HUI3: β = − 0.205; p < 0.001) of variance in utility scores. Rosenbaum et al. [53] found a strong negative correlation between the HUI3 utility scores of adolescents with CP and their GMFCS level (r = − 0.81). In terms of individual PBM dimensions, results from four studies were varied [33,35,39,49]; Kennes et al. [39] found that the dimension most associated with GMFCS level in children was ambulation (tau-b = 0.82; p < 0.01). Bartlett et al. [33] found that the HUI3 vision, pain and cognition dimensions generally worsened as GMFCS level increased in adolescents; however, there was no indication that these dimensions were determinants of motor capacity decline. Two studies [35,49] reported a significant negative association between the HUI3 pain dimension and GMFCS level in children with CP, however both of these studies only examined the HUI3 pain dimension and not the full HUI3 system.
Three studies reported various levels of correlation between PBMs and other outcome measures in CP [22,46,53]. Young et al. [22] found moderate correlation between utility score and SRH (r = 0.41; p < 0.001 for both the HUI3 and AQoL). Two studies compared the Quality of Life Instrument for People with Developmental Disabilities (QOL Instrument) and the HUI3 [46,53]; Rosenbaum et al. [53] reported that adolescents' HUI3 utility scores explained between 3% (belonging) and 14% (being) of variance in QOL Instrument dimension scores, while Livingston and Rosenbaum [46] reported that the HUI3 and QOL Instrument shared up to 23% variance, thus the relationship between the measures was considered to be moderate at best.
Two studies reported on the relationship between different PBMs in CP [22,57]. Although utility scores derived from the HUI3 and AQoL were strongly correlated (r = 0.87; p < 0.001) [22], HUI3 derived utility scores tended to be lower than AQoL derived utility scores [22,57].
Perez Sousa et al. [50] reported a high level of disagreement between parents and children on the EQ-5D          Mean utility scores were slightly higher in the selfreported group (HUI3 mean + 0.04; AQoL mean + 0.03) however this was based on the comparison of youth and adult data and the regres sion models remained unchanged.
NA youth version (EQ-5D-Y); parents reported a lower frequency of problems on all EQ-5D-Y proxy dimensions, particularly fathers.

Responsiveness
Five studies allowed analysis of PBM responsiveness in CP [33,35,46,55,57]. Adolescents with a GMFCS level of V exhibited the largest decreases in HUI3 dimension levels over time, compared to adolescents with GMFCS levels of III and IV [33]. Christensen et al. [35] reported a significant association between physicians' primary pain aetiology and change in HUI3 pain status (p = 0.001). Conversely, in two studies utility outcomes did not change significantly over time; HUI3 utility outcomes were found to be stable over a 1 year period for adolescents with CP (G = 0.91) [46], likewise utility outcomes (derived from HUI3 and AQoL) did not significantly change over an 8 year follow-up period [57]. Slaman et al. [55] utilised the SF-6D in a randomised controlled trial, but did not find a significant difference between the control and intervention groups at the end of the trial (p = 0.42).

Spina bifida Known-group analyses
Three studies reported known-group analyses in SB [51,52,56], two of which found that clinical factors had a significant impact on utility scores. Petrou and Kupek [51] estimated that the adjusted HUI3 disutility of childhood SB from perfect health was − 0.55 (95% CI − 0.40 to − 0.70), and − 0.48 (95% CI − 0.33 to − 0.63) from childhood norms. A statistically significant effect of SB diagnosis on HUI3 utility score (p < 0.001) was reported [52]; children diagnosed with myelomeningocele (mean utility score = 0.51) tended to have lower utility scores compared to children with closed dysraphism (mean utility score = 0.77). Tilford et al. [56] reported that lesion location in childhood SB had a significant impact on overall utility (p < 0.01); individuals with sacral lesions had the highest overall mean utility (0.61; ±0.26). Two studies reported correlation between clinical factors and utility scores in SB [21,38]. Anatomical myelomeningocele level was found to have a significant effect on the HUI utility scores of children with myelomeningocele and shunted hydrocephalus (mean HUI score = 0.03; 95% CI 0.01 to 0.05; p = 0.01) [38], with lower myelomeningocele level showing association with higher utility scores. A similar trend was reported by Young et al. [21], who found that the most important single factor contributing to utility outcomes in SB was surgical lesion level, which was responsible for between 18% (AQoL) and 40% (HUI3) of variance in utility scores.
Young et al. [21] compared results from different PBMs in SB, and found that mean utility scores on the AQoL were lower than mean utility scores on the HUI3 for all sub-groups, despite strong correlation between these measures (r = 0.73; p < 0.001).

Convergent validity: comparing respondents
Sims-Williams et al. [54] reported that proxy and selfreported HUI3 utility scores were highly correlated for children with SB (r = 0.85; significance not reported). Young et al. [21] found that mean self-reported utility scores were slightly higher than equivalent proxy scores (HUI3 mean + 0.04; AQoL mean + 0.03) however respondent type was not influential in their regression analysis.
PBM responsiveness outcomes in SB were not found in the literature.

Mixed patient groups and mobility impairments
This section includes results from studies which did not focus on specific conditions, and where sub-group data could not be examined separately. Relevant conditions/ mobility impairments included muscular dystrophy, spinal muscular atrophy, CP, SB, orthopaedic lower limb deformities, artrogryposis multiple congenital, achondroplasia and hemiplegia.

Known-group analyses
Two studies reported known-group analyses in studies of mixed patient groups [11,51]. Petrou and Kupek [51] found that children with muscular dystrophy or spinal muscular atrophy had a mean utility score of 0.39, equating to an adjusted disutility from perfect health of − 0.62 (95% CI − 0.471 to − 0.761), and an adjusted disutility from childhood norms of − 0.54 (95% CI − 0.400 to − 0.690). Burstrom et al. [11] reported that children with a functional disability (64% congenital; see Table 4 for patient characteristics) had significantly lower EQ-5D-Y dimension scores than the general population (p < 0.001).
Large variance was observed between utility scores derived from different PBMs for young wheelchair users; mean utility scores ranged from 0.24 (EQ-5D-Y) to 0.53 (HUI2) for self-reporting children, and from 0.01 (EQ-5D-Y) to 0.49 (HUI2) for parent proxies [15].
PBM responsiveness outcomes were not found in the literature for this patient group.

Known-group analyses
Lindquist et al. [45] found that adults with a history of hydrocephalus (with or without neuro-impairment) had significantly lower 15D dimension scores, compared to a control group, in the dimensions of vision (p = 0.001), eating (p = 0.000), usual activities (p = 0.004) and mental function (p = 0.000).

Convergent validity: comparing measures
Four studies examined the relationship between the Hydrocephalus Outcome Questionnaire (HOQ) and PBMs [40][41][42][43], however only three of these studies performed relevant statistical analyses [40,41,43]. Kulkarni [41] reported strong correlation (0.81) and a strong linear relationship between HUI2 utility score and HOQ outcomes for children with hydrocephalus. Simple and complex linear regression models both accounted for a large proportion of HUI2 variability (adjusted R 2 = 0.66 and 0.80 respectively). Similarly, a strong correlation was found between HUI2 utility score and the HOQ scores for overall health (r = 0.81), physical health (r = 0.88), social-emotional (r = 0.56) and cognitive (r = 0.57) [40]. Furthermore, a significant positive correlation was exhibited between self-reported scores on the childcompleted version of the HOQ (cHOQ) and proxyreported utility scores on the HUI3 (r = 0.60; p < 0.001) [43].
PBM responsiveness outcomes in childhood hydrocephalus were not found in the literature.

Other conditions and mobility impairments
Only known-group analyses were reported for the following conditions associated with mobility impairment:

Muscular dystrophy
Landfeldt et al. [44] reported that ambulatory status and age were significantly associated with HUI utility scores in muscular dystrophy (HUI version not stated; p < 0.001); young ambulators (5-7 years old) had the highest utility scores on average (0.75), whilst older nonambulators (≥16 years old) had the lowest utility scores on average (0.15).

Spinal muscular atrophy
Lopez-Bastida et al. [47] reported that children with Type II spinal muscular atrophy tended to have lower mean proxy utility scores (− 0.01; ±0.35) than the combined average for all forms of spinal muscular atrophy (0.16; ± 0.44), however statistical analysis was not undertaken.

Morquio A syndrome
One study found that for both adults and children with Morquio A syndrome, wheelchair use was significantly associated with lower utility scores [37]. Significant differences were reported in the adult group between nonwheelchair users and occasional wheelchair users (p = 0.0115) and between occasional wheelchair users and full-time wheelchair users (p = 0.0007). In the child group significant differences were reported between non-wheelchair users and full-time wheelchair users (p = 0.0018) and occasional wheelchair users and fulltime wheelchair users (p = 0.0007).

Congenital clubfoot
Wallander et al. [59] found that male adult patients with congenital clubfoot (CCF) had significantly better overall utility scores than the male norm group (p = 0.027). The female CCF group had a lower average utility score than the norm group, but not significantly (significance level not reported).

Discussion
The results from this systematic review demonstrate that PBMs have been used in a relatively small number of studies relating to congenital mobility impairments. In conditions such as CP and SB, increased clinical severity appears to be associated with decreased utility. This is particularly evident using the HUI3, which was also the most commonly used PBM found in this review. In particular, there appears to be a relationship between utility outcomes and GMFCS level in CP, and clinical factors such as lesion level in SB. In case-control studies, utility outcomes tended to be significantly lower in the case groups, although it is worth noting that in one study the male case-group had significantly higher utility outcomes compared to the control [59].
In order to demonstrate sufficient applicability and sensitivity in a specific disease or disability, association between PBMs and validated clinical/condition-specific outcomes is of key importance. In this respect existing PBMs show weakness in various conditions associated with congenital mobility impairments; exhibiting generally limited correlation with measures such as the QOL Instrument, VAS, SRH, KIDSCREEN-27, KIDSCREEN-10 and LSL. Only the GMFCS and HOQ/cHOQ appeared to be well correlated with PBMs across a number of studies, although it is important to note that GMFCS is a classification system of gross motor function and not an outcome measure.
The results from this systematic review highlight important considerations for the use of PBMs in health states associated with congenital mobility impairment. It is first of note that the use of PBMs has been dominated by studies in CP, followed by SB. This is somewhat unsurprising given that these are two of the most prevalent congenital disabilities which affect mobility. Secondly, it is important to acknowledge that only two studies focussed on adults alone, and that both of these studies were measuring utility outcomes in adults following conditions experienced in childhood [45,59]. This focus on children and adolescents is again unsurprising, as many congenital conditions can be life-limiting, thus examining health outcomes at a young age becomes particularly important. However, this also shows a lack of focus on the health outcomes of adults who have life-long experience of mobility impairment, and the potential ways in which their health outcomes could change over time or be improved.
There is some evidence to demonstrate that different PBMs vary in their estimation of utility outcomes in states of impaired mobility. For instance, Bray et al. [15] found large variation between the EQ-5D-Y and HUI2/3 utility scores for wheelchair users. Likewise, Usuba et al. [57] and Young et al. [21,22] found that utility outcomes were generally higher when derived from the AQoL than the HUI3 for individuals with CP, but vice versa for individuals with SB. This is despite the strong correlation between the AQoL and HUI3 in these populations [21,22]. Unfortunately, statistical differences between these measures were not reported, but it is still important to consider the implications that these differences could have on subsequent QALY outcomes. For instance, if one PBM were to produce significantly higher utility scores than another, this would mean significantly different estimates of cost per QALYs and thus estimates of cost-effectiveness. At present there is limited evidence to enable health economists and researchers to choose between these different PBMs for use in congenital mobility impairments.
Despite documented limitations [61], QALYs have become increasingly influential in health policy as a means to determine the cost-effectiveness of new treatments and services, and therefore guide healthcare funding and prioritisation decisions. It is therefore imperative that the PBMs used to develop QALYs are accurate. Otherwise, the cost-effectiveness of certain interventions in certain patient groups could be underestimated. This in turn could impact funding and prioritisation decisions. In the context of congenital mobility impairment, this could impact the provision of AMT and other mobility-enhancing interventions, and subsequently impact patient outcomes. This is particularly important considering the ongoing issues of unmet need in assistive technology provision. The WHO Priority Assistive Products List (APL) was launched in 2016 to help tackle unmet need [62]. The APL contains 50 priority assistive technology products, and was produced through consultation with users, experts and other key stakeholders. Appropriate PBM data, and subsequent estimates of cost-effectiveness, could help to inform the APL and ensure that the most effective and cost-effective AMTs are prioritised.
Considering that the majority of evidence found in this review related to children, the relationship between selfreported and proxy utility outcomes is particularly significant. The results from various studies in this review demonstrate that proxy-reporting of utility is consistently different to that of self-reporters, including significant respondent-type effects and limited agreement between respondents. Interestingly, proxy respondents were found to both underestimate and overestimate utility outcomes compared to self-reporters. It is therefore important to prioritise outcome measurement from the patient, although this can be challenging in populations who may lack capacity, such as young children and individuals with cognitive impairments.

Methodological implications
Due to the underlying trade-off between quantity and quality of life in the calculation of utility values and subsequent QALYs, there is a tendency for lower value to be placed on extending the length of life of people with long-term disabilities [10,63], as their quality of life is routinely considered to be worse than that of an ablebodied person. Thus, when using the QALY framework to assess the outcomes of individuals with disabilities, it is difficult to achieve substantially higher quality of life when compared to individuals without disabilities, raising concerns about bias [10]. To some extent these issues could be a result of using generic PBMs to value disabled health states.
One of the underlying issues of using PBMs in disability is that the definition of HRQoL differs profoundly between people with disabilities and the general public [64]. When asked to define HRQoL, young wheelchair users focus on a number of concepts not explicitly measured using generic PBMs, such as ability to adapt, achievement and independence [17]. The experience of disability also affects HRQoL perceptions. Mechanisms of adaptation, coping and adjustment can help individuals with disabilities to experience diminishing effects to their HRQoL over time. These processes are also influenced by the onset of disability, as individuals with congenital disabilities demonstrate better adaptation than individuals with acquired disabilities [2]. The evaluation of states of disability by non-disabled individuals may therefore cause such states to have an exaggerated perceived impact on HRQoL and health status [65], particularly with regards to congenital disability.
When assessing the desirability of hypothetical health states, individuals focus on the transition from their own health state to the hypothetical health state, thus general public beliefs about the impact of disability do not always reflect the lived experience [66,67]. Focus on personal transition means that processes such as adaptation are not accounted for, causing a discrepancy in how states of disability impact HRQoL [9].
An alternative approach is to use more sensitive condition-specific measures to model utility values on generic PBMs. Although this approach is advocated by NICE [68] there are serious concerns about the validity of modelled utility values [69], as it cannot be assumed that modelled utility values are representative of directly measured utility values [70]. Using modelled utility data to guide funding and resource allocation decisions is therefore controversial. For instance, Sidovar et al. [71] mapped the 12-Item Multiple Sclerosis Walking Scale (MSWS-12) onto the EQ-5D-3 L. While prediction estimates were relatively precise for patients with moderately impaired mobility, they were significantly less accurate for individuals with severe impairments.
Neilson et al. [72] suggest supplementing PBMs with additional questions relating to functional activities which have a large impact on overall quality of life, such as 'sitting' for AMT users. For instance, Persson et al. [73] added complimentary mobility and social relationship items to the EQ-5D-3 L to increase sensitivity in disabled populations. However, this approach assumes that supplemental questions can be mapped on to the health state preference values of existing measures without impacting accuracy.

Limitations and challenges
Defining the term congenital mobility impairment was one of the key challenges of designing this systematic review. An NHS posture and mobility service manager was consulted to help construct the search terms, and a number of preliminary searches were carried out to test search terms for both sensitivity and scope. A multitude of conditions and disabilities can affect mobility from birth or early infancy, thus we attempted to cover a wide variety of these in our search terms, but accept that there are likely to be conditions and disabilities which were missed. Given the search results, we are confident that we have captured the vast majority of relevant studies relating to at least the most common forms of congenital mobility impairment. One key issue was considering whether to include studies relating to hydrocephalus in this review. Although hydrocephalus is not always associated with mobility impairment, it can cause movement issues and is commonly related to relevant conditions such as CP and SB. We therefore chose to include studies related to childhood hydrocephalus.
We chose specifically to focus on congenital mobility impairment due to the significant differences in life satisfaction, self-identity and self-efficacy experienced by people with congenital disabilities compared to people with acquired disabilities [2]. Further research could compare PBM performance in congenital and acquired mobility impairments. Previous reviews have reported mixed results regarding the validity and responsiveness of existing PBMs in the assessment of health states associated with acquired mobility impairments, such as rheumatoid arthritis [24] and multiple sclerosis [26].

Conclusion
To our knowledge, this is the first systematic review of the use of PBMs in congenital mobility impairment. Evidence suggests that existing generic PBMs exhibit important issues relating to validity and responsiveness, and thus care must be taken when selecting a PBM as an outcome measure in this context. Condition or disability specific approaches to utility measurement, such as the mobility and quality of life (MobQoL) outcome measure [74], could improve the sensitivity and applicability of utility measurement in this context.