Implementation of Deep Learning for Measurement of Penile Curvature on Real 2D Intraoperative Images

2026-02-13

Sriman Bidhan Baray, Saidanvar Agzamkhodjaev, Mansour Ali, Muhammad E.H. Chowdhury, Tariq O. Abbas,
Implementation of Deep Learning for Measurement of Penile Curvature on Real 2D Intraoperative Images,
Journal of Pediatric Urology,
2025,
105703,
ISSN 1477-5131,
https://doi.org/10.1016/j.jpurol.2025.105703.
(https://www.sciencedirect.com/science/article/pii/S1477513125006941)
Abstract: Background
Penile curvature (PC) may occur in up to 10% of male births worldwide and is typically associated with the birth defect hypospadias. While the extent of PC impacts surgical management and patient outcomes, curvature evaluation is inconsistent between surgeons due to a lack of reliable assessment techniques. Our goal was to create a dependable, automated deep-learning solution to precisely assess PC from real-time intraoperative 2D images.
Materials and Methods
A dataset of 421 images was assembled and annotated by four human experts. Annotations were used to calculate PC angles and determine ground truth curvature degrees in each case. All images and ground truth angle information were used to train 3 different deep-learning models. A YOLOv8 model was trained to localize and crop the penile region, then a deep-learning model was employed to segment the shafts and generate binary mask images. In the final stage, a modified HRNet model was used to integrate angle error, predict four key points denoting mid-axes of the proximal and distal shaft, and then use these landmarks to calculate curvature automatically.
Results
The proposed system demonstrated a high level of reliability in localizing penile areas, as evidenced by a mean average precision score of 99.4%. Furthermore, our pipeline exhibited strong performance in the segmentation task, achieving an impressive Intersection over the Union metric of 83.56% and a Dice Similarity Coefficient of 91.02%. In terms of angle prediction, the system achieved a mean absolute error of 7.9°. By comparison, variability among human raters ranged between 6.5–12.0° (median ≈ 8.9°), consistent with previously reported manual errors of 3.5–13.6°. Thus, the AI system matched or outperformed human raters, providing more consistent and reliable curvature estimation. The model achieved a median error of 7.8° across 421 images, with 82% of predictions within ±10° of ground truth. Only 6% of cases crossed the 30° surgical threshold, confirming the tool’s reliability for clinical decision-making.
Discussion
This study demonstrates the successful implementation of deep learning and keypoint-based measurement of PC that could significantly improve patient assessment by surgeons and hypospadiology researchers.
Keywords: Penile curvature; artificial intelligence; machine learning; hypospadias; chordee