Abstract / Summary

Atrial fibrillation (AF) is the most common sustained cardiac arrhythmia and confers a four to fivefold increase in ischemic stroke risk, accounting for approximately 15 - 20% of all stroke events globally. Despite this burden, the predominant risk stratification tool, the CHA2DS2-VASc score, achieves only modest discrimination, constrained by its static, additive architecture that cannot capture the nonlinear, high-dimensional interactions inherent in real-world electronic health record (EHR) data. This evidence gap creates a dual clinical hazard: under-anticoagulation in high-risk patients and unnecessary bleeding exposure in those whose risk is overestimated. This study aimed to systematically evaluate the predictive performance, methodological rigor, and clinical readiness of machine learning (ML) models derived from EHR data for the prediction of ischemic stroke in patients with AF. A systematic search of PubMed, Embase, Scopus, and Web of Science was conducted from inception through September 2025, following PRISMA 2020 guidelines. Studies were eligible if they developed or validated ML models for ischemic stroke prediction using EHR data in adults with AF and reported at least one quantitative performance metric. Methodological quality was assessed using the PROBAST and TRIPOD-AI frameworks. Eight studies (2017 to 2024) encompassing 809,523 patients across seven countries were included. Supervised ensemble methods consistently outperformed CHA2DS2-VASc, with AUROCs ranging from 0.66 to 0.91 versus 0.54 to 0.68 for the traditional score. However, performance varied substantially: several models achieved only marginal gains (AUROC 0.63 - 0.69), and the AUROC range reflects pronounced heterogeneity rather than uniform superiority. Critical barriers persist - only one study performed external validation; fewer than half applied explainable AI techniques; class imbalance was rarely addressed; and 88% of studies received a high risk of bias rating in the analysis domain under PROBAST, a finding that substantially limits confidence in the reported performance estimates. In light of the pervasive methodological limitations identified, including high analytic risk of bias, absence of external validation, and lack of model interpretability, claims of ML superiority over CHA2DS2-VASc must be interpreted with caution. While ML models demonstrate potential discriminative improvements, current evidence is insufficient to support clinical adoption. Translating algorithmic promise into bedside impact requires dynamic longitudinal modeling, rigorous multisite external validation, transparent risk attribution, and prospective evaluation within real-world EHR workflows.

Primary Source

International journal of medical informatics

View Source

Ask Prognia AI

Have questions about this review article?

Prognia AI can search this source alongside 35M+ PubMed papers and current ESC, AHA, NICE, and ADA guidelines to give you a fully cited clinical answer.

Ask a Clinical Question Browse Guidelines

Related Clinical Guidelines

ESC2023New

Heart Failure 2023

ESC2020

Atrial Fibrillation 2020

ESC2023New

Acute Coronary Syndromes 2023

Advancing stroke prevention in atrial fibrillation: a systematic review of machine learning-based risk prediction models.

Abstract / Summary

Related Clinical Guidelines

Related Blog Posts