Mental Health Brain Biomarker Studies Were Mostly Small and Cross-Sectional

TL;DR: A 2026 review in BMC Psychiatry found 441 primary MRI and electroencephalogram (EEG), a scalp recording of brain electrical activity, mental-health biomarker studies, but most were small, cross-sectional, and concentrated in depression, making routine clinical use premature.

Key Findings

58,824 records screened: Researchers searched MEDLINE and Embase from 2010 to September 2023, then mapped 441 eligible primary studies and 27 systematic reviews.
Depression dominated the map: Depressive disorders accounted for 320 primary studies and 17 systematic reviews, while bipolar disorder, PTSD, OCD, anxiety disorders, and substance use disorder had much smaller evidence bases.
MRI and EEG were the main tools: About three-quarters of primary studies used MRI-based techniques, and about 20% used electroencephalogram (EEG), a scalp recording of brain electrical activity.
Diagnosis was the main target: Roughly two-thirds of primary studies focused on diagnosis, and nearly all diagnostic studies were cross-sectional rather than longitudinal.
Small samples were common: The evidence map reported 263 studies with fewer than 100 participants and only 9 studies with more than 1,000 participants.
Clinical use is premature: The review concluded that larger longitudinal studies across broader mental-health conditions are still needed before these tests can guide routine care.

Source: BMC Psychiatry (2026) | Sowerby et al.

Neuroimaging biomarkers are often discussed as the path toward precision psychiatry: a brain scan or electrical recording that helps diagnose depression, predict relapse, or match a patient to treatment.

This evidence map gives that promise a more measured shape. It shows a large research literature, but one that is still uneven, narrow, and often too small to support routine clinical decisions.

441 Mental-Health Biomarker Studies Were Mapped Across Six Disorder Groups

The review focused on neuroimaging and neurophysiologic tests used for diagnosis, prognosis, or treatment-response prediction in depressive disorders, bipolar disorder, anxiety disorders, obsessive-compulsive disorder, posttraumatic stress disorder, and substance use disorder.

The search was broad. From 58,824 unique records, researchers identified 441 primary studies and 27 systematic reviews.

The distribution was not balanced across disorders:

Depressive disorders: 320 primary studies and 17 systematic reviews.
Bipolar disorder: 61 primary studies and 3 systematic reviews.
PTSD: 39 primary studies and 2 systematic reviews.
OCD and anxiety disorders: 26 and 22 primary studies, respectively.
Substance use disorder: 25 primary studies and no systematic reviews in the mapped set.

That imbalance limits clinical generalization because a biomarker cannot be assumed to work across diagnoses. Depression research may not transfer cleanly to PTSD, OCD, or substance use disorder.

It also affects how readers should interpret positive findings. A classifier trained mostly on depression samples may be detecting features of depressive illness, medication exposure, scanner site, or control selection rather than a broad psychiatric biomarker.

For precision psychiatry, disorder coverage is not a side detail. A test meant to help real clinics must work in patients with overlapping symptoms, comorbid diagnoses, and diagnostic uncertainty.

MRI and EEG Carried Most of the Precision Psychiatry Evidence

The techniques were also concentrated. MRI-based methods made up about three-quarters of primary studies, while EEG accounted for about one-fifth.

MRI can measure brain structure, functional activation, blood-flow proxies, and white-matter pathways. EEG captures electrical activity at the scalp and is cheaper and easier to deploy, but it has different spatial limits.

The map suggests that precision psychiatry research has leaned heavily on those two families of tools. Other imaging and neurophysiologic techniques appeared much less often.

Researchers also recorded whether studies used machine learning and whether models were validated. The table showed many studies using algorithmic approaches, but validation remained a central problem.

Machine learning raises the stakes for sample size and external testing. A model can perform well inside one dataset because it learned the quirks of that dataset, not because it found a durable brain signature.

External validation asks a stricter test: can the same model classify or predict outcomes in a new group collected by different researchers? The evidence map makes clear that this standard is still uneven across the field.

Brain ASAP summary visual showing mental health biomarker evidence-map counts and caveats — The evidence map found many MRI and EEG mental-health biomarker studies, but the study base was concentrated in depression and often used small, cross-sectional samples.

Most Studies Tested Diagnosis, Not Long-Term Prognosis

About two-thirds of the primary studies focused on diagnostic separation. In practice, that often meant comparing patients with a mental-health diagnosis against healthy controls at one time point.

That design can reveal group differences, but it is a weak test of clinical utility. A clinician rarely needs to distinguish a carefully screened research group from a healthy control volunteer.

Real clinics usually present a harder mix: partial treatment response, overlapping anxiety and mood symptoms, medication effects, substance use, trauma history, sleep disruption, and medical comorbidity. A biomarker that only separates idealized groups may fail in that setting.

Longitudinal designs are better suited to those clinical tasks because they can test whether a brain measure predicts what happens next, not only whether a group differs at baseline.

The harder clinical tasks are different:

Prediction: Which patient will relapse, recover, or remain impaired?
Treatment matching: Who is more likely to respond to medication, psychotherapy, neuromodulation, or combined care?
Differential diagnosis: Which symptom pattern reflects bipolar disorder, unipolar depression, PTSD, substance-related illness, or overlapping conditions?
Durability: Does the biomarker still work across hospitals, scanners, ages, medication states, and illness stages?

The evidence map found fewer longitudinal designs and fewer studies built around those harder clinical tasks.

Small Samples Made Clinical Translation Harder

Sample size was one of the clearest limits. The review reported 263 primary studies with fewer than 100 participants, 82 studies with 100 to 200 participants, and only 9 studies with more than 1,000 participants.

Small studies can support early discovery, especially when brain imaging is expensive. They are less reliable for clinical prediction, where models must survive variation across sites, scanners, populations, medications, and symptom profiles.

The size issue is especially important for treatment response. Predicting who will improve after antidepressants, psychotherapy, neuromodulation, or combined care requires enough nonresponders and responders to test the model honestly.

The age distribution was also narrow. Only a small number of studies focused on older adults, even though late-life depression, dementia overlap, medication burden, vascular injury, and frailty can change both brain measures and psychiatric presentation.

The Main Result Is a Research-Readiness Warning

The review does not say MRI or EEG biomarkers have failed. It says the field has not yet produced enough validated, disorder-diverse, longitudinal evidence to justify broad clinical implementation.

That distinction is important. Brain-based markers may eventually help psychiatry move beyond symptom checklists, but a scan difference in a small cross-sectional study is not the same as a clinically deployable test.

The evidence map therefore gives researchers a practical agenda. Depression has enough preliminary work to support validation studies, while PTSD, OCD, anxiety disorders, bipolar disorder, and substance use disorder need broader and more balanced evidence.

For now, the best reading is cautious: the literature is large enough to guide better studies, but not strong enough to replace careful clinical assessment.

Larger Longitudinal Studies Are the Next Test

The practical next step is not simply more scans. It is better-designed studies that ask clinically relevant questions and validate models outside the original dataset.

Stronger studies would need larger samples, multi-site designs, longitudinal follow-up, medication and comorbidity detail, and broader coverage of PTSD, OCD, anxiety disorders, bipolar disorder, and substance use disorder.

Until then, mental-health brain biomarkers should be treated as promising research tools rather than stand-alone diagnostic instruments.

Citation: DOI: 10.1186/s12888-025-07429-4; Sowerby et al., neuroimaging and neurophysiologic biomarkers for diagnosis and prognosis across mental-health disorders, BMC Psychiatry 2026;26:375.

Study Design: Evidence map and scoping review of neuroimaging and neurophysiologic mental-health biomarker studies.

Sample Size: 441 primary studies and 27 systematic reviews from 58,824 unique records.

Key Statistic: 263 primary studies had fewer than 100 participants, and depression accounted for 320 of the 441 primary studies.

Caveat: Evidence maps describe the literature but do not formally rate individual-study risk of bias or prove clinical utility.