TL;DR: A 2026 study in Nature Human Behaviour found that many brain-connectivity feature sets predicted cognition and psychiatric traits nearly as well as top-ranked edges, making biomarker interpretation pipeline-dependent.
Key Findings
- The fifth decile beat the first: In HCPD executive function, mid-ranked features predicted scores at r = 0.16 vs. r = 0.14 for the canonical top decile. The “best” model was not the one built from the strongest individual edges.
- Discarded features still generalized externally: A ninth-decile PNC executive-function model predicted HCPD scores at r = 0.13 — nearly matching the first-decile model at r = 0.14, despite using non-overlapping features.
- PNC executive function held signal through decile 6: Decile 1 reached r = 0.33, decile 2 r = 0.32, and significance persisted past the midpoint. The “weak” edges were not noise.
- 84.64% of second-decile edges were new: Later deciles were not weak echoes of the same network. They represented largely distinct connectivity patterns that still predicted outcomes.
- Same accuracy, different brain map: For PNC executive function, decile 1 highlighted visual-frontoparietal connectivity; later deciles barely touched it. Equivalent prediction, divergent neurobiology.
- Tested across HBN, ABCD, HCPD, and PNC: 13 cognitive, developmental, age, sex, and psychiatric outcomes — not a single cherry-picked benchmark.
Source: Nature Human Behaviour (2026) | Adkinson et al.
Neuroimaging machine-learning papers often move from prediction to interpretation: if a model predicts behavior from selected brain connections, those connections are treated as candidate biomarkers.
The study tested that interpretive step directly. Different sets of discarded connectivity features could perform almost as well as the top-ranked set while pointing to different brain networks.
Accurate AI Prediction Did Not Identify One Stable Brain Circuit
Connectome-based prediction often compresses thousands of brain connections into a smaller selected feature set, then treats that set as the candidate biomarker network for a trait.
Prediction and biomarker interpretation are separate claims: a model can predict from one selected edge set while other connectivity subsets carry similar information.
Feature selection makes connectomes easier to interpret by ranking edges, keeping the strongest, and discarding the rest. The risk is that researchers may treat one selected feature set as the privileged biology rather than as one defensible subset among many.
Researchers tested whether discarded edges were truly noise, or whether they still held enough information to build similarly accurate models with different anatomical implications.
Slicing the Connectome Into Ten Layers
To find out, the team split each training set’s connectivity edges into ten non-overlapping deciles, ranked by how strongly each edge related to the target phenotype.
The top decile held the strongest individual associations. The bottom deciles held the edges a standard pipeline would have thrown out before interpretation ever started.
Then they ran connectome-based predictive modeling on each decile separately. If the standard pipeline’s premise is correct, only the top deciles should predict well. The lower ones should fall off into noise.
That is not what happened. In PNC executive function:
- Decile 1: prediction reached r = 0.33.
- Decile 2: prediction stayed nearly identical at r = 0.32.
- Through decile 6: prediction remained statistically significant.
In HCPD executive function, the fifth decile actually outperformed the first — 0.16 versus 0.14. The gap is small, but it directly breaks the “highest-ranked = privileged” assumption.

External Validation Changed Which Brain Connectivity Biomarkers Looked Best
External validation is supposed to be where shaky neuroimaging interpretations get caught. A model trained on one dataset gets tested on a fully independent one, and the prettiness of the original brain map either survives or quietly falls apart.
Here, the discarded features survived. A PNC executive-function model built from the ninth decile generalized to HCPD at r = 0.13 — almost identical to the first-decile model’s r = 0.14.
- Ninth-decile PNC model: generalized to HCPD at r = 0.13.
- First-decile model: generalized at r = 0.14.
- Interpretation: external validation did not rescue one privileged feature set.
The two models did not share their edges. Both retained enough predictive information to generalize across cohorts.
If a model built from features the field would have discarded performs almost as well in a held-out cohort, the case for declaring any one decile the biomarker network gets considerably weaker.
Similar Prediction Accuracy Came From Different Connectivity Features
The harder finding is not that overlooked features predict well. It is that similar-performing deciles imply different brain maps.
In PNC executive function, connectivity between visual-association and frontoparietal networks dominated decile 1 — and largely faded out of later deciles. The networks weren’t just smaller copies of the original; they were structurally different.
The overlap numbers make this concrete. Compared with the first decile:
- Decile 2: 84.64% of edges were new — almost no overlap.
- Decile 3: 38.82% of edges were new.
- Decile 5: 22.19% of edges were new.
These are not weak variants of one stable biomarker. They are different anatomical accounts of the same trait, generated by changing one ranking choice that researchers usually treat as cosmetic.
A significant model surfaces only the subset of brain-wide information that rises under one ranking scheme. A non-overlapping subset can imply a different mechanism without much loss of accuracy.
Feature Selection Changed Brain Biomarker Results Across AI Models
Feature selection was not useless. In ridge regression, selection generally improved performance.
The narrower argument is that once selected features get treated as the unique neurobiology of a phenotype, the conclusion is doing more interpretive work than the data support.
Two reasons lower-ranked edges may keep predicting well:
- Redundancy: brain connectomes are highly autocorrelated, so the strongest edges can overlap in the information they carry.
- Alternative feature pools: lower-ranked but less-correlated edges can recapture similar predictive information from a different anatomical angle.
That is the optimistic reading.
The cautionary reading is that lower-ranked features may also be picking up confounds, demographic stereotypes, or other information that the field would not endorse as biology if it had a clearer view of the model’s reasoning.
Researchers treat that ambiguity as a warning about interpretation rather than as a license to throw every feature into a model.
Psychiatric Biomarkers Need Stability Across Feature-Selection Pipelines
This lands hardest in psychiatry, where biomarker claims already outpace reliability. If one decile gives a “depression network” and another non-overlapping decile gives a similarly predictive but different map, the colorful network diagrams in those papers are saying less than they appear to.
A second possibility is subtype structure. Multiple deciles may work because different subgroups of people are best captured by different feature sets — meaning some of today’s interpretive instability may be a disguised subtype problem, not a pure modeling bug.
Before any neuroimaging biomarker is used for diagnosis, prognosis, or treatment selection, the bar should be higher than “predicts above chance.” The biological interpretation also needs to stay stable across alternative feature sets; otherwise the reported brain map may reflect a pipeline choice more than a reproducible trait network.
Citation: DOI: 10.1038/s41562-026-02447-y; Adkinson et al; Feature selection leads to divergent neurobiological interpretations of brain-based machine learning biomarkers; Nature Human Behaviour; 2026.
Study Design: Cross-dataset connectome-based predictive modeling with decile-level feature stratification.
Sample Size: 12,200 participants across HBN, ABCD, HCPD, and PNC, spanning 13 cognitive, developmental, demographic, and psychiatric outcomes.
Key Statistic: Lower-ranked connectome deciles predicted at r values within ~0.01 of top-ranked deciles (e.g., HCPD executive function: decile 5 r = 0.16 vs. decile 1 r = 0.14), with up to 84.64% non-overlapping edges.






