Multimodal AI for Predicting IVF Pregnancy Outcomes

Original Title: Multimodal intelligent prediction model for in vitro fertilization

Journal: NPJ digital medicine

Overview

This study introduces VaTEP, a multimodal deep learning framework that integrates time-lapse system videos of developing embryos with tabular clinical data. Developed and validated using data from 9,786 participants across three medical centers, VaTEP predicts three clinical outcomes: fetal heartbeat presence, singleton versus multiple pregnancy, and miscarriage versus live birth. Using a multi-task learning approach, the system optimizes these predictions simultaneously. Results show the model achieved an area under the curve (AUC) of 0.8000 for fetal heartbeat, 0.8823 for singleton versus multiple pregnancy, and 0.9258 for live birth versus miscarriage. These values exceeded the performance of senior embryologists. Analysis identified maternal age, anti-Müllerian hormone levels, and endometrial thickness as significant variables informing the model's decisions. The framework provides a quantitative tool for embryo selection accounting for both embryonic development and maternal physiology.

Novelty

The novelty lies in the integrated end-to-end architecture and specific pre-training tasks for enhanced video representation. Unlike models that treat video and clinical data separately, this approach uses a cross-attention mechanism for deep interaction between modalities. A technical contribution is the use of two pre-training tasks: video reconstruction and embryo developmental phase prediction. These allow the encoder to learn spatiotemporal patterns and biological milestones before fine-tuning for outcomes. The model also uses a multiple frame sampling strategy to capture information from the entire developmental sequence efficiently. Expanding prediction targets to include multiple pregnancy risks and live birth outcomes represents a comprehensive approach. This multi-task framework enables feature sharing across related clinical endpoints, improving generalization compared to single-task systems.

Potential Clinical / Research Applications

Clinical and research applications include using this technology as a standardized decision-support tool to reduce multiple pregnancies. By identifying embryos with the highest live-birth potential, clinicians can confidently recommend single embryo transfers, minimizing risks like preterm birth. In research, the model's identification of influential variables, such as hormone levels, helps scientists understand the interaction between embryonic quality and uterine receptivity. The framework could be adapted to other medical tasks involving temporal data, such as monitoring fetal development or analyzing endoscopic videos. Since the model uses accessible clinical data and standard imaging, it could be deployed in resource-limited settings where expensive genetic testing is unavailable, helping to standardize care quality across different regions.

Multimodal AI for Predicting IVF Pregnancy Outcomes

Overview

Novelty

Potential Clinical / Research Applications

Comments

Leave a Reply Cancel reply