Trained using 18 million embryo images, this new powerful AI tool could replace invasive IVF testing, offering a faster, cheaper, and more accurate embryo selection.
Study: A foundational model for in vitro fertilization trained on 18 million time-lapse images. Image credit: Krakenimages.com/Shutterstock.com
A study published in Nature Communications introduces the Foundational IVF Model for Imaging (FEMI), a new approach for embryo assessment in in vitro fertilization (IVF). The study evaluated FEMI performance on ploidy prediction, blastulation time prediction, embryo component segmentation, blastocyst quality scoring, embryo witnessing, and stage prediction.
What determines a successful IVF?
A successful IVF is dependent on accurate assessment and selection of viable embryos. Traditional embryo assessment methods have many limitations, including a lack of standardization, high costs, and varying regulations concerning preimplantation genetic testing for aneuploidy (PGT-A) across different countries.
Variations in scoring systems and diagnostic tools significantly affect proper embryo selection, which may adversely affect IVF success rates and patient outcomes. Therefore, a more efficient, affordable, and non-invasive method is urgently required to assess embryos to improve IVF success rates and prevent the emotional and financial strain on patients.
The role of artificial intelligence (AI) in IVF
Artificial intelligence (AI) has been employed to predict embryos' morphology and ploidy status, which is crucial for a successful IVF procedure. Although many deep learning models, such as STORK and ERICA, have shown considerable potential in analyzing embryo morphology based on images, these models rely on image-based data and embryologist input.
Researchers have continued to address the shortcomings and have developed new models with higher efficacy or improved the predictive accuracy of existing models. For example, Blastocyst Evaluation Learning Algorithm (BELA) can predict ploidy status using a multitask learning approach without any embryologist assistance. However, this model is limited to predicting embryo quality scores and ploidy status.
Vision Transformers (ViTs) are a foundation model architecture with a transformer-based approach. This approach allows the model to capture complex patterns within images. Another advantage of ViTs is their ability to process large-scale data. Although this approach has been employed to develop IVFormer, its application is constrained due to insufficient training dataset diversity.
Development of the FEMI model
The newly developed FEMI model utilizes the Vision Transformer masked autoencoder (ViT MAE), which aids self-supervised learning (SSL) in rebuilding the original image from a masked input. The encoder-decoder structure approach of the ViT MAE enables learning essential facts about the dataset.
The FEMI model was trained using a diverse dataset of approximately 18 million time-lapse images from multiple clinics. For the training dataset, time-lapse images taken after 85 hours post-insemination (hpi) at z-axis depths were selected. To augment feature learning, images were tightly cropped around the embryos. The time-lapse image dataset was divided into an 80% training and 20% validation split, treating each image as an independent sample.
The current study evaluated the predictive accuracy of FEMI on multiple clinical tasks, such as blastocyst quality scoring, ploidy prediction, blastulation time prediction, embryo component segmentation, embryo witnessing, and stage prediction. For most tasks, the model solely considered single-embryo time-lapse images; however, it used the video input for blastocyst quality scoring and ploidy prediction. Maternal age was incorporated for ploidy prediction tasks due to its known impact on chromosomal abnormalities.
The model was also designed to be fine-tuned to clinic-specific scoring systems, enhancing real-world adaptability.
FEMI shows potential to improve IVF success rates
The performance of FEMI on ploidy prediction tasks was compared against various benchmark image and video-based models, such as a MoViNet model, VGG16, EfficientNet V2, ResNet101-RS, ConvNext, and CoAtNet. This study observed that FEMI significantly outperforms all comparison models. It also demonstrated superior accuracy in predicting ploidy under conditions of low embryo quality.
The current study highlighted that FEMI significantly outperformed other reference models on overall blastocyst score (BS) and inner cell mass score prediction in multiple datasets. FEMI also outperformed all models for the expansion and trophectoderm scores in both image and video inputs.
Blastocyst components segmentation, such as zona pellucida (ZP), the trophectoderm, and inner cell mass, is crucial in the visualization and downstream analytical processes. However, FEMI did not significantly outperform other models in these tasks; it showed a non-significant increase in Dice score, suggesting comparable performance.
FEMI outperformed all comparison models for embryo witnessing in all datasets except the Weill ES dataset. Accurately predicting blastulation time helps embryologists assess embryo quality and plan subsequent visualization processes. FEMI could accurately predict the hours post-insemination at which an embryo begins to form a blastocyst.
It is important to accurately predict the embryo stage to monitor the developmental progression and optimize outcomes in IVF procedures. For FEMI and other benchmark models, embryo classification was formulated as a regression task rather than a traditional classification problem, allowing for finer prediction granularity. In this task, FEMI achieved a top-1 accuracy of 60.31%, comparable to Embryovision’s 60.58%, and outperformed the performances of the other models. These findings highlighted the advantage of using SSL on large-scale, unlabeled data to process complex developmental features.
While FEMI consistently showed strong results, the degree of improvement varied across datasets and tasks.
Clinical implications
While FEMI demonstrated high performance across various tasks, the authors note several significant limitations. The segmentation and stage prediction tasks were trained and tested on the same datasets due to limited labelled data, potentially affecting generalizability. Ploidy prediction excluded mosaic embryos and only used data up to 112 hpi, although some viable embryos developed later.
Many datasets were from high-resource clinics, which may limit FEMI’s immediate applicability in lower-resource or highly variable clinical environments.
Despite these limitations, FEMI’s design as a foundation model enables future fine-tuning and adaptation with broader datasets. The authors suggest using it as a backbone for other clinical prediction tasks, such as implantation or live birth, pending access to relevant labels.
The study presents FEMI as a promising tool to standardize and improve embryo assessment in IVF. Using self-supervised learning on a large, diverse dataset allows it to generalize well across tasks and outperform traditional models. The authors acknowledge its limitations, including segmentation performance and dataset scope, and these should be considered when evaluating its clinical use. With further validation and clinical trials, FEMI could serve as a powerful decision-support system in reproductive medicine.
Download your PDF copy now!
Journal reference:
- Rajendran, S. et al. (2025) A foundational model for in vitro fertilization trained on 18 million time-lapse images. Nature Communications. 16(1), 1-15. Doi: https://doi.org/10.1038/s41467-025-61116-2 https://www.nature.com/articles/s41467-025-61116-2