CardioLens: Automated Segmentation and Ejection Fraction Estimation

Abstract

The manual analysis of echocardiograms, a cornerstone of cardiac diagnostics, is often a time-consuming process prone to inter-observer variability and error. This project, CardioLens, introduces a robust deep learning framework for the automated segmentation of the left ventricle and subsequent calculation of the Left Ventricular Ejection Fraction (LVEF) from echocardiogram videos. Our dual-task pipeline utilizes a high-performing DeepLabV3 model with a ResNet-101 backbone for semantic segmentation and an R(2+1)D video model for LVEF estimation. A key contribution of this work is the systematic optimization of the entire workflow using the Intel® AI software stack. By leveraging the Intel® Extension for PyTorch (IPEX) for accelerated training and the Intel® Distribution of OpenVINO™ Toolkit for inference, we demonstrate a significant performance increase, including a nearly 50% reduction in inference time on standard Intel® CPUs. The final segmentation model achieves a Dice Score of 0.92, providing a highly accurate, efficient, and accessible solution to augment clinical decision-making in cardiology.

Project Outcomes

Input Raw Echo Video

Predicted Mask of Left Ventricle

Algorithmic ECG from Mask

• • •

Introduction

Cardiovascular Diseases (CVDs) remain the leading cause of mortality worldwide, making accurate and timely diagnosis a critical global health priority. Echocardiography stands as the most widely used non-invasive imaging modality for assessing cardiac function. A key biomarker derived from these scans is the Left Ventricular Ejection Fraction (LVEF) — the percentage of blood pumped out of the left ventricle with each contraction. LVEF is a fundamental indicator of cardiac health and is crucial for diagnosing and managing conditions like heart failure.

Fig. 1 — Anatomical diagram of the human heart, emphasizing the left ventricle's role in systemic circulation.

Despite its importance, the clinical workflow for LVEF assessment faces significant challenges. Manual tracing by a sonographer is time-consuming and subject to high inter-observer variability. This project introduces CardioLens, an end-to-end automated framework designed to address these limitations through a dual-task AI pipeline that performs both accurate semantic segmentation of the left ventricle and direct LVEF estimation from raw echocardiogram videos, optimized with Intel® AI tooling for CPU deployment.

Methodology

Our methodology is built around a comprehensive pipeline that automates the entire process from video input to diagnostic output, using the EchoNet-Dynamic Dataset — a large, publicly available collection of echocardiogram videos with associated LVEF labels and left-ventricle tracings.

Fig. 2 — End-to-end architecture of CardioLens, detailing the data flow from input video to final diagnostic outputs.

Dataset

The EchoNet-Dynamic dataset from Stanford University was partitioned into three subsets for rigorous evaluation. A substantial 74.4% was allocated for training, while the validation (12.8%) and test (12.7%) sets were reserved for hyperparameter tuning and final evaluation respectively.

The distribution of Ejection Fraction (EF) values across patient videos shows a typical right-skewed distribution with concentration between 50–70%, representative of a real-world clinical population.

Fig. 3 — Distribution of Ejection Fraction values; the scatter plots explore the relationship between EF and volumetric measurements (EDV / ESV).

Models & Training

Three segmentation architectures were evaluated: the transformer-based Intel DPT-Large model, and two DeepLabV3 variants (MobileNetV3-Large and ResNet-101 backbones). The DeepLabV3 + ResNet-101 was selected for its superior Dice performance. Following segmentation, LVEF is estimated using an 18-layer R(2+1)D video model that learns spatio-temporal features to compute End-Diastolic Volume (EDV) and End-Systolic Volume (ESV):

LVEF Formula

EF = (EDV − ESV) / EDV

Intel Technologies

Training was accelerated with the Intel® Extension for PyTorch (IPEX). For deployment, models were converted to OpenVINO™ Intermediate Representation (IR) format, applying graph pruning, quantization, and kernel fusion — enabling real-time analysis on standard CPUs without requiring specialized hardware.

Fig. 4 — Intel® OpenVINO™ Toolkit for optimizing and deploying deep learning models at the edge.

Experiments & Results

All experiments were conducted on a 12th-Gen Intel® Core™ i7-12650H CPU using PyTorch, TorchVision, Hugging Face Transformers, OpenCV, and the Intel® AI Analytics Toolkit.

Model Comparison

Metric	Intel DPT	DeepLabV3 ResNet-101	DeepLabV3 MobileNetV3
Loss	0.1419	0.0441	0.053
Overall Dice Score	0.5632	0.9209	0.8836
Diastolic Dice	0.5707	0.9058	0.8595
Systolic Dice	0.5584	0.9304	0.8994
Time / Epoch (s)	223.6	7.1	6.6

Training Curves

The plots below show Dice Score and loss progression over 40 epochs for the selected model. Both EDV and ESV Dice scores improve rapidly before stabilising, confirming successful learning of cardiac structure boundaries.

(a) EDV Dice Score vs Epoch

(b) Training Loss vs Epoch

(d) Overall Dice Score vs Epoch

Fig. 5 — Training metrics over 40 epochs showing progressive improvement across all Dice measures and loss convergence.

Inference Speedup with OpenVINO™

Fig. 6 — Inference time for the segmentation task. OpenVINO™ consistently outperforms native PyTorch and IPEX-optimized inference.

Fig. 7 — Inference time for the LVEF estimation task. OpenVINO™ reduces processing time by nearly half.

Demo Outputs

Input Raw Echo Video

Predicted LV Mask

Generated ECG Visualization

Fig. 8 — End-to-end demo: raw input → segmentation overlay → ECG waveform.

Discussion

Our experiments confirm that CNN architectures like DeepLabV3 + ResNet-101 can achieve expert-level accuracy in left-ventricle segmentation. More importantly, the ~50% reduction in inference time with OpenVINO™ demonstrates that complex AI analysis can run efficiently on ubiquitous CPU hardware, lowering the barrier to clinical adoption. CardioLens can serve as an automated "second opinion", reducing diagnostic errors, improving inter-operator consistency, and integrating into existing PACS systems.

A significant area for future work is extending and validating the framework for pediatric and infant cardiac data, which presents unique challenges due to different heart rates and sizes. We also plan to expand to detect other cardiac abnormalities beyond LVEF.

Conclusion

CardioLens successfully developed and validated an AI-powered framework for automated echocardiogram analysis. By selecting an appropriate deep learning architecture and leveraging hardware-aware optimization with the Intel® Distribution of OpenVINO™, we built a system that is both highly accurate (Dice 0.92) and computationally efficient (~50% inference speedup). This represents a significant step towards practical AI integration in routine cardiology workflows.