CatBoost Predicted Stroke-Associated Pneumonia After Bridging Therapy

TL;DR: A 2026 preprint in medRxiv reported that an interpretable CatBoost machine-learning model predicted stroke-associated pneumonia after acute ischemic stroke bridging therapy with a test-set AUC of 0.932, with 7-day stroke severity and early inflammatory markers among the strongest contributors.

Key Findings

135 stroke patients: The retrospective analysis included 135 acute ischemic stroke patients who received thrombolysis followed by mechanical thrombectomy.
51.9% pneumonia rate: Stroke-associated pneumonia developed in 70 patients, or 51.9% of the cohort.
11 selected variables: LASSO regression selected 11 clinical and laboratory predictors for machine-learning modeling.
0.932 test AUC: The CatBoost model achieved an AUC of 0.952 in training and 0.932 in the test set.
Top SHAP factors: SHAP interpretation ranked 7-day NIHSS, 24-hour SIRI, and 24-hour white blood cell count as leading contributors.

Source: medRxiv (2026) | Wang et al.

Stroke-associated pneumonia (SAP) is a common complication after severe ischemic stroke. This preprint focused on patients who received intravenous thrombolysis followed by bridging mechanical thrombectomy, a high-risk group with large-vessel stroke and intensive early care needs.

The study’s useful contribution is not just a high model score. It tried to make the prediction explainable by showing which neurological and inflammatory features pushed the CatBoost model toward a pneumonia-risk estimate.

CatBoost Predicted Pneumonia After Bridging Stroke Therapy

The cohort came from Xinxiang Central Hospital between January 2019 and December 2023. Researchers screened 192 patients and included 135 after exclusions.

All included patients had acute ischemic stroke and underwent bridging therapy: intravenous thrombolysis followed by mechanical thrombectomy (MT). Pneumonia was defined using radiographic changes plus clinical symptoms such as fever, respiratory findings, or laboratory evidence.

Clinical variables: Stroke severity, dysphagia, vascular subtype, glucose, coagulation markers, and outcome scores were recorded.
Inflammatory markers: The analysis included NLR, PLR, SII, and SIRI, which combine white-cell and platelet measures into immune-response indices.
Model comparison: Ten machine-learning models were built after LASSO feature selection, with CatBoost selected by ROC and decision-curve performance.

CatBoost is a gradient-boosting method that can handle mixed clinical features and non-linear relationships. The study paired it with SHAP, which assigns feature-level contributions to individual model predictions.

Stroke-Associated Pneumonia Affected 70 of 135 Patients

SAP was common in this selected stroke population. Pneumonia occurred in 70 of 135 patients, while 65 were classified as non-pneumonia cases.

Patients with SAP had higher stroke-severity scores at multiple time points. They also had worse discharge and 90-day modified Rankin Scale scores, meaning the pneumonia group had poorer functional outcomes.

Neurological severity: NIHSS scores were higher in the pneumonia group before surgery, 24 hours after surgery, and 2-7 days after admission.
Inflammation: Twenty-four-hour neutrophil count, white blood cell count, NLR, SII, and SIRI were higher in the pneumonia group.
Coagulation and glucose: Admission fasting glucose, D-dimer, and fibrin degradation products were also higher in the SAP group.

The pattern fits the clinical problem. Severe stroke can impair swallowing, cough, consciousness, and mobility, while systemic inflammation may mark or amplify infection risk.

Dysphagia was also common in the SAP group, recorded in 54.3% of patients who developed pneumonia. That gives the model a clinical anchor: aspiration risk, neurological impairment, and early immune activation can converge in the first days after reperfusion therapy.

Bar chart summarizing CatBoost pneumonia prediction performance in stroke bridging therapy — The CatBoost model combined stroke severity and early inflammatory markers to stratify pneumonia risk after bridging therapy.

Inflammatory Biomarkers Were Most Informative at 24-48 Hours

The study compared several immune-inflammatory indices over time. NLR means neutrophil-to-lymphocyte ratio, SII means systemic immune-inflammation index, and SIRI means systemic inflammatory response index.

Twenty-four-hour and 48-hour values carried the clearest separation. The SAP group had significantly higher 24-hour NLR, 24-hour SII, 24-hour SIRI, 48-hour NLR, and 48-hour SIRI.

NLR signal: Higher neutrophil-to-lymphocyte ratio can reflect stronger innate inflammation relative to lymphocyte counts.
SIRI signal: SIRI combines neutrophils, monocytes, and lymphocytes, potentially capturing broader immune activation.
PLR contrast: Platelet-to-lymphocyte ratio did not significantly differ between groups in this cohort.

This timing point is clinically relevant. A marker measured too early may miss post-stroke immune shifts, while a later marker may arrive after pneumonia risk is already unfolding.

The PLR null result also keeps the biomarker story specific. The study did not claim that every simple blood-count ratio predicted pneumonia; the clearest differences were concentrated in neutrophil-linked and systemic inflammatory indices.

SHAP Highlighted 7-Day NIHSS and 24-Hour Immune Markers

The CatBoost model reached an AUC of 0.952 in the training set and 0.932 in the test set. Those values indicate strong discrimination in this dataset, though not yet external validation.

SHAP interpretation ranked NIHSS_7d, SIRI_24h, and WBC_24h as the top contributors. Higher 7-day stroke-severity scores and higher early inflammatory values pushed predictions toward SAP risk.

Interpretability benefit: SHAP helps clinicians see whether a risk estimate came from plausible clinical factors.
Prediction limit: A high AUC in one hospital can fall when tested in different care systems or patient mixes.
Use case: The model is best read as a candidate risk-stratification tool, not a pneumonia diagnosis.

The preprint also notes that early identification could guide prevention. That should mean closer monitoring and individualized clinical review, not automatic antibiotics based only on an algorithm.

Single-Center Preprint Results Need External Validation

The strongest limitation is scope. This was a single-center, retrospective, small-sample study, and the paper was a preprint that had not completed peer review.

Machine-learning models can overfit subtle local patterns, including lab timing, documentation practices, imaging thresholds, and discharge workflows. External validation is essential before clinical use.

Population limit: The cohort was limited to bridging-therapy stroke patients, not all ischemic strokes.
Model limit: Training and test performance came from one institution’s dataset.
Clinical limit: Pneumonia prevention decisions still require bedside assessment of swallowing, consciousness, oxygenation, and infection signs.

The practical takeaway is still useful. In severe stroke after reperfusion therapy, early inflammatory trajectories may add pneumonia-risk information beyond traditional neurological severity alone.

Citation: DOI: 10.64898/2026.04.15.26350997. Wang et al. Inflammatory Biomarkers and Interpretable Machine Learning Model for Stroke-Associated Pneumonia Risk Stratification in Patients Undergoing Bridging Therapy for Acute Ischemic Stroke. medRxiv. 2026.

Study Design: Single-center retrospective observational machine-learning study.

Sample Size: 135 acute ischemic stroke patients treated with thrombolysis followed by mechanical thrombectomy.

Key Statistic: CatBoost achieved a test-set AUC of 0.932 for stroke-associated pneumonia prediction.

Caveat: The model needs peer review and external validation before clinical deployment.