Prediction models | 30% high-cost users prevalence | 20% prevalence (the base case) | 10% prevalence | 5% prevalence | ||||
---|---|---|---|---|---|---|---|---|
Sensitivitya | F1d | Sensitivitya | F1d | Sensitivitya | F1d | Sensitivitya | F1d | |
Traditional regression models | ||||||||
 All conventional variables (TRM1)e | 17.9% | 26.4% | 4.9% | 9.1% | * | * | * | * |
 As per TRM1 but no ethnicity variables (TRM2) | 16.5% | 25.8% | 4.9% | 9.0% | * | * | * | * |
 As per TRM2 but no smoking variables (TRM3) | 16.3% | 25.6% | 4.6% | 8.6% | * | * | * | * |
Machine learning modelsf | ||||||||
 Random forest | 45.2% | 49.3% | 37.8% | 41.2% | 29.9% | 32.6% | 25.6% | 28.5% |
 KNN | 45.7% | 46.5% | 38.0% | 39.0% | 29.2% | 30.1% | 25.2% | 26.0% |
 L1-regularised logistic regression | 75.2% | 50.9% | 78.9% | 34.5% | 72.5% | 21.0% | 76.2% | 25.0% |
 Classification trees | 46.1% | 55.3% | 19.5% | 30.6% | 11.4% | 19.8% | 10.9% | 19.5% |