NLP Stroke Subtyping Linked Lobar ICH to Dementia and Cortical Stroke to MI

TL;DR: A 2026 preprint in medRxiv used natural language processing (NLP) on Scottish CT and MRI reports to subtype stroke at scale, linking lobar intracerebral hemorrhage to higher later dementia risk and cortical ischemic stroke to higher early myocardial infarction risk.

Key Findings

  1. 785,331 head scans: Researchers applied NLP to CT and MRI head-scan reports in Scotland from 2010 to 2018.
  2. 64,219 stroke phenotypes: The linked dataset identified 64,219 people with clinical stroke phenotypes, with mean age 73.4 years.
  3. 26,719 ischemic subtypes: NLP subtyped 12,616 deep ischemic strokes and 14,103 cortical ischemic strokes.
  4. Lobar ICH dementia risk: Lobar intracerebral hemorrhage was linked to higher dementia risk beyond 6 months versus controls (adjusted hazard ratio 3.5; 95% CI, 2.3-5.3).
  5. Cortical ischemic MI risk: Cortical ischemic stroke was linked to higher myocardial infarction risk within 6 months (adjusted hazard ratio 4.6; 95% CI, 3.4-6.3).

Source: medRxiv (2026) | Hosking et al.

Natural language processing (NLP) can extract structured information from free-text clinical reports. In this study, researchers used NLP to read brain imaging reports and classify stroke by type and location, details that routine hospital codes often miss.

The clinical reason is straightforward. A deep ischemic stroke, cortical ischemic stroke, deep intracerebral hemorrhage, and lobar intracerebral hemorrhage can have different long-term risks, but coded health data often flatten them into broad stroke categories.

NLP Subtyped Stroke From Scottish CT and MRI Reports

The analysis began with 785,331 people who had a CT or MRI head scan in Scotland. Researchers linked imaging-report NLP with hospital readmissions, prescriptions, cancer registry data, and death records.

The final clinical stroke group included 64,219 people, and the subtype analysis classified deep and cortical ischemic stroke, deep and lobar intracerebral hemorrhage, subarachnoid hemorrhage, and subdural hemorrhage.

  • Deep ischemic stroke: 12,616 cases were classified from report text.
  • Cortical ischemic stroke: 14,103 cases were classified from report text.
  • Intracerebral hemorrhage: 1,814 deep ICH and 1,456 lobar ICH cases were subtyped.

The study then matched each stroke case with four age- and sex-matched controls without stroke. Outcomes included recurrent stroke readmission, myocardial infarction, cancer, dementia, epilepsy, and death.

Lobar Intracerebral Hemorrhage Had Higher Later Dementia Risk

The dementia association was strongest for lobar intracerebral hemorrhage (ICH). Beyond 6 months after stroke, lobar ICH was associated with higher dementia risk compared with controls, with an adjusted hazard ratio of 3.5.

That estimate means the modeled hazard was 3.5 times higher than in matched controls after adjustment for factors such as age, sex, diabetes, hypertension, atrial fibrillation, medication history, and prior healthcare use.

  1. Outcome window: Dementia risk was reported beyond 6 months after the stroke event.
  2. Subtype specificity: The highlighted dementia result was tied to lobar ICH rather than all strokes as one group.
  3. Brain-health relevance: Lobar hemorrhage location may capture vascular and neurodegenerative vulnerability better than a broad ICH code.

This does not prove lobar ICH directly caused every later dementia diagnosis. It shows that finer location information can reveal outcome patterns that broad coding would hide.

Matrix showing NLP-derived stroke subtype counts and key dementia and myocardial infarction risk results
Free-text imaging reports allowed researchers to separate stroke subtypes and estimate clinically different outcome risks.

Cortical Ischemic Stroke Carried Higher Early MI Risk

The strongest early cardiovascular result involved cortical ischemic stroke. Within 6 months, cortical ischemic stroke was associated with higher myocardial infarction risk, with an adjusted hazard ratio of 4.6.

See also  Chronic Pain With Depression Had a Distinct Brain Structure Profile

The analysis split follow-up into the first 6 months and the period after 6 months because event rates were much higher soon after stroke. That split is important for interpreting early cardiovascular risk.

  • Early period: The first 6 months captured the highest-risk window for acute post-stroke complications.
  • Cortical subtype: Cortical ischemic stroke showed the highlighted MI association.
  • Matched comparison: Controls were matched by age and sex, and models adjusted for several vascular and healthcare-use factors.

The result supports closer cardiovascular surveillance after cortical ischemic stroke, especially in the early post-stroke period. It also shows why subtype location can matter for follow-up planning.

Lobar ICH Had More 1-Year Stroke Readmission Than Deep ICH

Stroke readmission also differed by hemorrhage location. The absolute 1-year readmission rate was higher after lobar ICH than after deep ICH: 4.9% versus 3.4%.

The confidence intervals were 3.9% to 6.1% for lobar ICH and 2.6% to 4.3% for deep ICH. Those values are smaller than the dementia and MI hazard ratios, but they are still clinically useful because readmission is a concrete health-system outcome.

  • Location detail: Lobar and deep hemorrhages are not interchangeable in prognosis.
  • Operational value: Report text can supply location information even when billing or diagnosis codes do not.
  • Planning value: Subtype-specific risk estimates can guide audit, follow-up, and epidemiology.

For health systems, the important advance is not only the specific percentage. It is the ability to produce subtype-specific outcome estimates across a national dataset.

Free-Text Imaging Reports Can Improve Stroke Epidemiology

The broader method result is that free-text clinical reports can be transformed into usable research variables. Existing stroke codes often say ischemic or hemorrhagic stroke without giving enough location detail for outcome modeling.

This study used a rules-based NLP system previously validated for radiology reports. By linking that text-derived information to routine health data, researchers could estimate outcomes in an unselected population rather than only a consented cohort.

  • Strength: The dataset covered national health-system records and large numbers of real-world stroke patients.
  • Limitation: NLP depends on report wording and prior validation, and the source is a preprint.
  • Clinical boundary: The estimates describe population risk, not the prognosis of a single patient.

The practical conclusion is that stroke subtype and location should not be treated as optional detail. When imaging reports contain that information, NLP can make it usable for outcome surveillance and research.

Citation: DOI: 10.64898/2026.04.17.26351150. Hosking et al. Prognosis of stroke subtypes in whole population health systems data: a matched cohort study. medRxiv. 2026.

Study Design: Matched cohort study using NLP-derived stroke subtypes linked to Scottish routine health-system outcomes.

Sample Size: 64,219 people with clinical stroke phenotypes identified from 785,331 people with head scans.

Key Statistic: Lobar ICH was associated with later dementia risk beyond 6 months (adjusted hazard ratio 3.5), and cortical ischemic stroke with early MI risk (adjusted hazard ratio 4.6).

Caveat: The source was a preprint and the estimates depend on report text, coding linkage, and observational adjustment.

Brain ASAP