EHR Machine Learning Predicted Clozapine Initiation in Schizophrenia

TL;DR: A 2026 preprint in medRxiv trained machine-learning models on Danish electronic health records and found that an XGBoost model predicted clozapine initiation within 365 days with an AUROC of 0.81 in patients with schizophrenia or schizoaffective disorder.

Key Findings

229,761 prediction times: The main model used routine psychiatric hospital contacts from 5,806 patients after lookback and outcome-window filtering.
929 predictors: Researchers combined 179 structured clinical predictors with 750 text-derived predictors from electronic health record notes.
0.81 AUROC: The best XGBoost model outperformed logistic regression on the held-out test set, where logistic regression reached 0.78.
23% positive predictive value: At a 7.5% predicted-positive rate, about 1 in 4 positive XGBoost flags was followed by clozapine initiation within 365 days.
127-day median lead time: Among correctly flagged prescriptions, the first true-positive XGBoost flag came a median of 127 days before clozapine was prescribed.

Source: medRxiv (2026) | Perfalk et al.

Clozapine is often reserved for treatment-resistant schizophrenia, even though delayed initiation is a long-standing clinical problem.

This preprint tested whether routine electronic health record data could produce a dynamic warning system. The model did not predict who would respond to clozapine; it predicted whether a clozapine prescription would be started within the next year.

The Model Rechecked Risk at Each Psychiatric Hospital Contact

The cohort came from the Psychiatric Services of the Central Denmark Region, where public hospital records capture most psychiatric care. Researchers included adults with schizophrenia or schizoaffective disorder who had contact with psychiatric services between 2013 and 2024.

Instead of making a single prediction at diagnosis, the model used a dynamic setup. Each outpatient visit or inpatient-start contact became a new prediction time, and the outcome was incident clozapine prescription within 365 days.

Training split: 85% of patients were assigned to model training and cross-validation.
Held-out test split: 15% of patients were reserved for final evaluation.
Outcome count: 9,400 prediction times were followed by clozapine initiation within 365 days.

After filtering for adequate lookback and lookahead windows, the main analysis included 194,234 training prediction times and 35,527 test prediction times.

Clinical Notes Added Predictive Value Beyond Structured Data

The strongest model used 929 total predictors. Structured predictors included age, sex, diagnoses, medication history, hospital contacts, coercive measures, suicide-risk scores, laboratory results, and rating-scale data.

Free-text clinical notes were converted into 750 TF-IDF predictors. TF-IDF means term frequency-inverse document frequency, a text-mining method that weights words or short phrases by how common they are in one document relative to the broader document set.

Structured data: Diagnoses, medications, laboratory tests, admissions, outpatient contacts, and coercive-measure records supplied non-text predictors.
Clinical-note text: Terms such as hearing voices, self-harm, deterioration, depot medication, and clinical suspicion markers appeared among high-importance predictors.
Benchmark model: Elastic net logistic regression was trained as a simpler comparator against XGBoost.

XGBoost is a gradient-boosted decision-tree method. In this study, it handled the mixed clinical data better than logistic regression, especially when the full predictor set was available.

XGBoost Reached 0.81 AUROC on the Held-Out Test Set

AUROC, or area under the receiver operating characteristic curve, measures how well a model ranks future clozapine starts above non-starts across thresholds. A value of 0.81 indicates good discrimination, though not perfect clinical certainty.

At a 7.5% predicted-positive rate, the XGBoost model had a positive predictive value of 23% and sensitivity of 42% in the results section. The same threshold detected 52% of unique clozapine prescriptions at least once before the prescription date.

Comparison table showing XGBoost and logistic regression performance for clozapine initiation prediction — The XGBoost model showed stronger held-out performance than logistic regression across the main test-set metrics reported for the 7.5% predicted-positive threshold.

The preprint abstract lists sensitivity as 32% at this threshold, while the results section and table text report 42% for XGBoost and 32% for logistic regression. That mismatch should be resolved before clinical interpretation.

The Clinical Use Case Is an Alert, Not an Automatic Decision

A clozapine-start prediction is not the same as a treatment-resistance diagnosis. It is a flag that a patient’s recent record resembles patterns seen before clozapine was later started.

Potential use: A silent validation phase could test whether the model performs prospectively before clinicians see alerts.
Practical target: The model could prompt review of delayed clozapine eligibility, monitoring barriers, or previous nonresponse.
Not a prescribing rule: Laboratory monitoring, adherence, contraindications, patient preference, and clinician judgment still determine whether clozapine is appropriate.

The model’s 23% positive predictive value also means most flagged prediction times would not be followed by clozapine within a year. Its value would depend on how costly and acceptable the follow-up review is.

Preprint and Proxy-Outcome Limits Matter

The paper is a preprint, so it has not yet been certified by peer review. The outcome was clozapine initiation, not proven treatment resistance or clinical benefit from clozapine.

That distinction is important. Some patients who meet clinical criteria may not start clozapine because of blood-monitoring burden, adherence barriers, side-effect concerns, or local practice patterns.

Proxy outcome: Prescription start captures clinical action, not the full population who might benefit from treatment.
Regional data: The model was trained in one Danish region and would need retuning or retraining elsewhere.
Text leakage risk: Clinical notes may include early clinician discussion of clozapine, though removing clozapine and Leponex terms still left test AUROC near 0.79.

The responsible interpretation is narrow: routine EHR data predicted future clozapine starts with useful discrimination in this dataset. Whether that improves care depends on prospective testing and careful clinical implementation.

Citation: DOI: 10.64898/2026.04.17.26351083. Perfalk et al. Predicting clozapine initiation among patients with schizophrenia via machine learning trained on electronic health record data. medRxiv. 2026.

Study Design: Retrospective dynamic prediction-model study using routine electronic health record data from a Danish regional psychiatric system.

Sample Size: 229,761 prediction times from 5,806 patients after main-model filtering.

Key Statistic: The best XGBoost model achieved AUROC 0.81 on the held-out test set, with 23% positive predictive value at a 7.5% predicted-positive threshold.

Caveat: The outcome was clozapine prescription initiation, not treatment response, and the preprint needs peer review and prospective validation.