Empowering cardiovascular diagnostics with SET-MobileNet: A lightweight and accurate deep learning based classification approach

2026-02-24

Zunair Safdar, Jinfang Sheng, Muhammad Usman Saeed, Muhammad Ramzan, A. Al-Zubaidi,
Empowering cardiovascular diagnostics with SET-MobileNet: A lightweight and accurate deep learning based classification approach,
Image and Vision Computing,
Volume 162,
2025,
105684,
ISSN 0262-8856,
https://doi.org/10.1016/j.imavis.2025.105684.
(https://www.sciencedirect.com/science/article/pii/S0262885625002720)
Abstract: Cardiovascular diseases (CVDs) remain the leading cause of mortality worldwide, necessitating early detection and accurate diagnosis for improved patient outcomes. This study introduces SET-MobileNet, a lightweight deep learning model designed for automated heart sound classification, integrating transformers to capture long-range dependencies and squeeze-and-excitation (SE) blocks to emphasize relevant acoustic features while suppressing noise artifacts. Unlike traditional methods that rely on handcrafted features, SET-MobileNet employs a multimodal feature extraction approach, incorporating log-mel spectrograms, Mel-Frequency Cepstral Coefficients (MFCCs), chroma features, and zero-crossing rates to enhance classification robustness. The model is evaluated across multiple publicly available heart sound datasets, including CirCor, HSS, GitHub, and Heartbeat Sounds, achieving a state-of-the-art accuracy of 99.95% for 2.0-second heart sound segments in the CirCor dataset. Extensive experiments demonstrate that multimodal feature representations significantly improve classification performance by capturing both time-frequency and spectral characteristics of heart sounds. SET-MobileNet is computationally efficient, with a model size of 8.61 MB and single-sample inference times under 6.5 ms, making it suitable for real-time deployment on mobile and embedded devices. Ablation studies confirm the contributions of transformers and SE blocks, showing incremental improvements in accuracy and noise suppression.
Keywords: Heart sound classification; Cardiovascular; Deep learning; Transformers