Hospital length of stay prediction tools for all hospital admissions and general medicine populations: systematic review and meta-analysis

Gokhale, Swapna; Taylor, David; Gill, Jaskirath; Hu, Yanan; Zeps, Nikolajs; Lequertier, Vincent; Prado, Luis; Teede, Helena; Enticott, Joanne

doi:10.3389/fmed.2023.1192969

SYSTEMATIC REVIEW article

Front. Med., 16 August 2023
Sec. Regulatory Science
Volume 10 - 2023 | https://doi.org/10.3389/fmed.2023.1192969

Hospital length of stay prediction tools for all hospital admissions and general medicine populations: systematic review and meta-analysis

Swapna Gokhale^1,2^*

David Taylor³

Jaskirath Gill^1,4

Yanan Hu¹

Nikolajs Zeps^5,6

Vincent Lequertier^7,8

Luis Prado⁹

Helena Teede^1,5

Joanne Enticott^1,5^*

¹Monash Centre for Health Research and Implementation, Faculty of Medicine, Nursing, and Health Sciences, Monash University, Clayton, VIC, Australia
²Eastern Health, Box Hill, VIC, Australia
³Office of Research and Ethics, Eastern Health, Box Hill, VIC, Australia
⁴Alfred Health, Melbourne, VIC, Australia
⁵Monash Partners Academic Health Sciences Centre, Clayton, VIC, Australia
⁶Eastern Health Clinical School, Monash University Faculty of Medicine, Nursing and Health Sciences, Clayton, VIC, Australia
⁷Univ. Lyon, INSA Lyon, Univ Lyon 2, Université Claude Bernard Lyon 1, Lyon, France
⁸Research on Healthcare Performance (RESHAPE), INSERM U1290, Université Claude Bernard Lyon 1, Lyon, France
⁹Epworth Healthcare, Academic and Medical Services, Melbourne, VIC, Australia

Background: Unwarranted extended length of stay (LOS) increases the risk of hospital-acquired complications, morbidity, and all-cause mortality and needs to be recognized and addressed proactively.

Objective: This systematic review aimed to identify validated prediction variables and methods used in tools that predict the risk of prolonged LOS in all hospital admissions and specifically General Medicine (GenMed) admissions.

Method: LOS prediction tools published since 2010 were identified in five major research databases. The main outcomes were model performance metrics, prediction variables, and level of validation. Meta-analysis was completed for validated models. The risk of bias was assessed using the PROBAST checklist.

Results: Overall, 25 all admission studies and 14 GenMed studies were identified. Statistical and machine learning methods were used almost equally in both groups. Calibration metrics were reported infrequently, with only 2 of 39 studies performing external validation. Meta-analysis of all admissions validation studies revealed a 95% prediction interval for theta of 0.596 to 0.798 for the area under the curve. Important predictor categories were co-morbidity diagnoses and illness severity risk scores, demographics, and admission characteristics. Overall study quality was deemed low due to poor data processing and analysis reporting.

Conclusion: To the best of our knowledge, this is the first systematic review assessing the quality of risk prediction models for hospital LOS in GenMed and all admissions groups. Notably, both machine learning and statistical modeling demonstrated good predictive performance, but models were infrequently externally validated and had poor overall study quality. Moving forward, a focus on quality methods by the adoption of existing guidelines and external validation is needed before clinical application.

Systematic review registration: https://www.crd.york.ac.uk/PROSPERO/, identifier: CRD42021272198.

Background and significance

Hospital inpatient and outpatient services make up the bulk of the health spending for all the Organization for Economic Co-operation and Development (OECD) countries (1). Australian health expenditure has increased by an average of 2.7% per year in the last 18–20 years, and the cost of hospital care accounted for 40% of the total, of which 61.7% was spent on acute admitted care (2, 3). In 2020–2021, the cost of acute admitted care was AUD33.8 billion, with the average cost per admitted acute care separation being $5,315 (4). Length of stay (LOS) in an acute hospital is a significant influencer of the cost of delivering hospital-based care and is a key measure of hospital performance according to the Australian Health Performance Framework (5). Extended LOS increases the risk of hospital-acquired complications (HACs) and impacts patient access and flow (6). A recent report showed up to a 3- to 4-fold variation in the average LOS in Australian hospitals (3) often due to a complex interaction of multiple factors, including some unrelated to the patient's condition. HACs similar to delirium can prolong hospital LOS by 6–7 days and increase mortality (7, 8). Reducing unwanted variation in LOS is essential in Australia and globally to ensure the sustainability of economically viable health services for the future.

To utilize healthcare resources efficiently, studies have been undertaken globally utilizing existing data and applying statistical techniques such as machine learning (ML), to develop and validate predictive models identifying patients at risk of extended LOS (9–13). Prior studies have investigated LOS prediction in disease-specific groups such as heart failure (14), cardiac surgery (15), thermal burns (16), or population-specific groups such as intensive care unit (ICU) and neonatal care (17, 18). Other recent reviews have looked at this outcome from a risk adjustment perspective (19) or a broad epidemiological perspective (20).

Prediction of risk of extended LOS in heterogenous populations such as all hospital admissions and General Medicine is common but lacks impact (20, 21). Accurate and timely risk prediction can enable targeted interventions to streamline care, reduce unwarranted extended LOS, and potentially impact system-level management of patient flow issues by providing high-level visibility of impending access issues and enabling proactive decision-making (2, 22). A review of the literature published in 2019 had examined methodologies applied to create LOS predictions. The authors found that approximately half of the included studies (36 of 74) did not restrict the studied population by diagnosis groups, and only a third had calculated the prediction at the time of admission or earlier (20). We aimed to extend this review by broadening the search, evaluating the risk of bias (ROB) (23) of the included studies, and adding data from the recent 2 years to capture the emerging Artificial Intelligence (AI/ML) approaches. This review aims to identify validated prediction variables and methods used in tools that predict the risk of extended LOS in all hospital admissions and specifically General Medicine admissions. This is needed to advance the evidence base required by healthcare administrators and planners on possible future predictive tools supporting efficient resource utilization and patient flow.

Methods

“Prediction tools” or “tools” for this review can include any type of risk assessment tools/flags/factors or risk prediction models that used computerized statistical methods for predicting hospital LOS. This review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (24). Protocol was registered on the International Prospective Register of Systematic Reviews (PROSPERO) (https://www.crd.york.ac.uk/PROSPERO/) (#CRD42021272198).

Search strategy

We searched CINAHL, EMBASE, OVID MEDLINE, OVID EMCARE, and Cochrane systematically on 31 August 2021 and updated the search on 28 June 2023, using a predefined search strategy guided by our library scientist (VD), as shown in Supplementary Table S2. The primary concepts searched were “risk factors”, “statistical/prediction models”, and “Length of stay”. Considering the rapidly advancing field of health data analytics, we narrowed the search to only include English language articles, from OECD comparable countries and published after 2010. Reference lists of included publications were examined to identify any additional potential studies. A gray literature search using key terms was completed in Google and Google Scholar in a time-limited way (20 h over 4 weeks).

Eligibility criteria

As shown in Supplementary Table S3, we included primary studies that reported LOS predictive tools for adults admitted to acute care hospitals that reported prediction metrics (25) to inform what works in LOS prediction methods and in what context. No limits on publication types were applied. We excluded studies looking at day procedures (LOS < 24 h) and those describing or including admissions to nursing homes, or subacute/rehabilitation facilities due to the difference in their operational structure and purpose, compared to the acute hospital setting.

Model for all admissions (mixed medical and surgical admissions) was the focus based on recent reports suggesting the positive impact of identifying and managing acuity on hospital resource utilization (26). We also studied the prediction tools for the General Medicine admissions (2, 3, 5) due to their high LOS variation, which is summarized in a separate section.

Studies that were not primary research, including conference abstracts, unpublished studies, book chapters, and review articles, were excluded. We also excluded reports focusing on condition/procedure-specific LOS tools such as burns, joint replacements, cardiology, cancer, maternity, and pediatric admissions and studies that did not assess LOS as an outcome.

No limits on publication types were applied. Once studies were highlighted for inclusion, the reference lists of included publications were manually searched for additional studies.

Study screening and data extraction

Screening, full-text review, data extraction, and quality assessment were completed using the web-based data management platforms of Covidence (27) and EndNote X9.3.3 (Clairvate). Title, abstract, and full-text screening was conducted by two reviewers (SG and JG) who were responsible for selecting studies for inclusion. In case of discrepancies, consensus was reached via discussion. SG extracted data based on the CHARMS and TRIPOD checklist (28, 29) into a predefined data extraction table.

Quality assessment

The risk of bias was assessed independently by two reviewers (SG and YH) based on PROBAST recommendations. Disagreement was resolved by consulting a third reviewer (JE). Using the PROBAST tool (30), studies were rated as low/moderate/high concern for bias and applicability in each of the four domains: participants, predictors, outcomes, and analysis (23, 29). We used guidance from the adaptation of the PROBAST tool for ML models (31).

Data synthesis

The data items extracted for each included article are provided in Supplementary Table S4. Data sources were classified as (1) administrative/registry/claims, and (2) medical records and prediction modeling methods as classic statistical methods/ML/both. Model performance measures of discrimination and calibration were extracted and synthesized.

Discrimination measures, where possible, were presented as Area Under Receiver Operating Curve (AUROC) with a 95% confidence interval (CI) (21). We applied AUROC thresholds of 0.5 to suggest no discrimination (ability to identify patients with and without the risk under test), 0.7–0.8 as acceptable, 0.8–0.9 as excellent, and >0.9 as outstanding discrimination (32). Calibration was assessed using reported calibration plots, where available, or using calibration statistics (32, 33).

Predictor variables in the included LOS models were classified into categories adapted from the recent systematic review by Lequertier et al. (20), as shown in Supplementary Table S5. The level of validation (development with or without internal validation and/or external validation) was based on the PROBAST guideline (30).

Meta-analysis

Meta-analysis of prediction models is challenging especially when models are specified differently and have heterogenous predictors and outcome definitions (34). Conversely, it is also valuable to understand the impact of the underlying variation in case mix and population characteristics on the prediction estimates (35). As such, we have presented a random-effects meta-analysis using restricted maximum likelihood estimation for external validation studies of LOS prediction models. As guided by recent literature on a meta-analysis of prediction model studies (36, 37), models having comparable outcome types (binary) and predictors were included, and we reported the 95% prediction interval of theta (21) to provide a range for the estimated performance of the model in a new population. Stata SE 17 was used for statistical analysis and calculation. When the standard error of AUROC was unreported, it was estimated using the method by Hanley and McNeil (38) and Kottas et al. (39). Heterogeneity was reported as I² (40). The number of eligible validation studies was small, and hence further investigation of sources of heterogeneity was not possible.

Publication bias

Forest plots showing effect sizes and confidence intervals were generated. Egger's regression was used for evaluating funnel plot asymmetry due to small-study effects (33, 41).

Results

The search yielded 8,103 studies from OVID Medline (4,172), OVID Emcare (260), CINHAL (555), EMBASE (3-076), and Cochrane (40). Records were exported to Covidence, and 319 duplicates were removed. In total, 7,784 records were screened, which yielded 213 potential reports for full-text retrieval. Citation searching identified an additional 17 records which were assessed for eligibility. A recent update identified a further nine studies for full-text review. Following the full-text review, 39 were selected for inclusion based on the eligibility criteria: 14 reporting on GenMed populations and 25 on all admissions. PRISMA diagram illustrates the search in Figure 1. Study characteristics are summarized in Supplementary Table S6.

FIGURE 1

Figure 1. PRISMA flow diagram demonstrates the systematic review of the literature for hospital length of stay prediction tools. PRISMA, preferred reporting items for systematic reviews and meta-analyses; ^** based on exclusion criteria provided in Supplementary Table S3; OECD, organization for economic co-operation and development.

All admissions prediction models

Of the 25 studies, the majority were published in the last 5 years, 11 were from the United States, six from the European Union, two from Australia, and one each from the United Kingdom, Canada, Japan, South Korea, Algeria, and Singapore. All studies were observational: two prospective and 22 retrospective, a single cross-sectional study. The median duration was 3.75 years (range 0.6–12) with a median sample size of 53,211 (range 332–42,896,026).

Data sources

There was greater use of medical records data (60%) compared to administrative data (40%). All studies collected data at and during admission (84%) or used data collected post-discharge in addition to admission data. LOS was predicted categorically in 64% or continuously in 28% of studies and both categorically and continuously in 8% of studies. The cut-off for defining prolonged LOS ranged from 5 to 14 days, and two studies used a predefined diagnosis-specific increase of LOS tertile as their cut-off.

Predictive modeling methods

The level of validation was low with only 2 of 25 reported validation studies (four models). Of the 45 models reported in 25 studies, classical statistical approaches accounted for just under half (44%), ML methods such as ridge regression, random forest, gradient boosting machine algorithms, and generalized linear models were used in 32%, and deep learning approaches (24%) included stacked recurrent neural network, channel-wise long short-term memory (LSTM), multi-modal deep learning, and ensemble-based neural networks. The greater prevalence of ML and deep learning approach in this group is likely to reflect the number and complexity of the variables and the large sample size used in these studies.

Analytical pipeline

The median number of predictors used was 18 (range 2–714). Inclusion of all candidate predictors in multivariable modeling was common (96%) without pre-selection of variables which was done in a single study (42). Feature/predictor selection methods during multivariable modeling were largely poorly reported in 76% of studies. When reported, AIC (43–45), recursive feature elimination (46), and full model approach (47, 48) were used for feature/predictor selection. Missing data were handled using imputation by various methods in 16% of studies but remained under-reported in the remaining studies (84%). Methods used to manage over-fitting and optimism were commonly used in 80% of studies. They included combinations of random split, k-fold cross-validation, bootstrapping, hyper-parameter tuning and selection and stochastic gradient descent techniques; and were not reported in 20% of studies. The more recent studies reported various hyperparameter optimisation methods such as Bayesian (49) and Gaussian (50)-based selection and tuning processes, gradient descent methods (51), and 10-fold cross-validation (52).

Table 1 and Supplementary Table S8 show the key information for all admission LOS prediction models included in the systematic review.

TABLE 1

Table 1. All admission LOS prediction models included in the systematic review (n = 45).

Reported performance metrics and interpretation

The frequency of the various reported model performance measures is summarized in Figure 2 and Supplementary Table S7.

FIGURE 2

Figure 2. Frequency of LOS prediction model performance metrics reported in all admissions LOS prediction models (n = 45). AIC Akaike information criterion. The following performance metrics were used less than three times and are not represented in the figure: Pred/z-score/MMRE (mean magnitude of relative error), model adequacy/model fit R2/adjusted R-squared, Cohen's kappa, explained variance/Nagelkerke's R-squared, Brier score, and median AE (absolute error).

Discrimination

AUROC was the most frequently reported metric of discrimination (42% models) outlined in Figure 2. The median values of AUROC were 0.7365 (range 0.63–0.832), indicating the fair-to-good discriminative ability of the majority of the models (67). Other discrimination metrics reported were accuracy (20%), C-statistic (13%), and mean absolute error (MAE) (11%).

Calibration

Calibration metrics (likelihood ratio index, HL goodness of fit, and calibration plots) were reported in only 20% of models. All the reported models appeared to be sufficiently calibrated.

Of the two studies reporting comprehensive performance measures, including calibration, discrimination, and overall accuracy measures, both Harutyunyan et al. (LOS>7 days) and Hilton et al. (LOS>5 days) demonstrated an excellent discriminative ability with AUROC of 0.84 (49, 59) with good calibration of models using ML/deep learning (recurrent neural networks, LSTM, and gradient boosting machines) and data from electronic medical records.

Predictors/variables

The most frequently used predictors and predictor categories are outlined in Table 2 and Supplementary Figure 1. Variable/feature importance was reported in half the studies using diverse association metrics such as hazard ratio, incident rate ratio, and estimates/regression coefficients making comparisons based on the strength of association of predictors imprecise.

TABLE 2

Table 2. Most frequently used variables in risk prediction of prolonged LOS in all admissions (n = 25).

The top three predictor categories used were risk scores (68%), demographic and anthropometric variables (68%), and admission characteristics (60%). Risk scores included illness severity scores, functional indices, co-morbidity scores, and neurocognitive screening tools. A wide range of demographic variables representing the social determinants of health (SDOH) such as ethnicity, socioeconomic index, anthropometric characteristics, and marital status were used frequently. Admission characteristics, such as admission source, day/month of admission, need for ICU admission, admitting unit, procedure type, time and length of last admission, elapsed LOS, and discharge/transfer destination, were used widely, possibly owing to the predominant use of medical record data sources and ongoing data collection throughout the admission period. Many studies using electronic medical records used information about the number of tests, consults, assessments, medication, and investigations as proxy indicators of extended stay rather than the actual results of these events (47, 51, 58, 66, 68).

Physical examination parameters and diagnostic and administrative variables were included in 40% of studies, while documentation and clinical notes, medications, health professional characteristics, and hospital characteristics were included less frequently. Admission diagnoses such as cancer and mental health conditions were noted as important features having an impact on LOS.