Predicting Cumulative Live Birth Rate for Patients Undergoing IVF/ICSI: A Machine Learning Breakthrough with XGBoost

Predicting Cumulative Live Birth Rate for Patients Undergoing IVF/ICSI: A Machine Learning Breakthrough with XGBoost

Infertility affects 1 in 6 couples worldwide, and more are turning to in vitro fertilization (IVF) or intracytoplasmic sperm injection (ICSI) to start families. But these treatments are expensive, emotionally draining, and carry risks like ovarian hyperstimulation syndrome. For patients and doctors alike, knowing who is most likely to have a live birth could mean better, more personalized care—something traditional prediction models have struggled to deliver.

Researchers from Peking Union Medical College Hospital in China set out to change that. In a 2022 study published in the Chinese Medical Journal, they used a machine learning algorithm called XGBoost to predict cumulative live birth rates for IVF/ICSI patients with tubal or male infertility. The goal? To see if this advanced model could outperform the conventional logistic regression models doctors often rely on.

Why Predicting IVF/ICSI Success Matters

Traditional models use statistical methods like logistic regression to estimate live birth odds. But these models often fall short: their “discriminatory power”—how well they distinguish between patients who will have a live birth and those who won’t—is low. For patients, this means uncertainty. For doctors, it means less ability to tailor treatments (like adjusting medication doses or suggesting alternative protocols) to individual needs.

How the Study Worked

The team analyzed data from 3012 patients who underwent IVF or ICSI at their hospital between July 2014 and March 2018. They focused on cumulative live birth rates—looking at all cycles from a single ovarian stimulation, including the first fresh embryo transfer and any subsequent freeze-thaw cycles. This is more accurate than looking at just one cycle because many patients use frozen embryos later.

To build their model, they used:

  • Clinical details: Age, body mass index (BMI), infertility type (tubal vs. male), how long the couple had been infertile, and the type of controlled ovarian hyperstimulation (COH) protocol (e.g., long vs. short GnRH agonist).
  • Hormone levels: Serum levels of follicle-stimulating hormone (FSH), estrogen (E2), luteinizing hormone (LH), prolactin (PRL), and testosterone (T) taken at two key points: basal (before treatment) and two days after the trigger shot (which starts ovulation).

They excluded patients using donor eggs or sperm, those with endometriosis or endocrine conditions (like diabetes or thyroid disease), and anyone with missing data. The study was approved by the hospital’s ethics board.

What the XGBoost Model Found

The results were striking:

  • XGBoost outperformed traditional logistic regression by a wide margin. The model’s “area under the curve (AUC)” score— which measures how well it distinguishes between patients who will have a live birth and those who won’t—was 0.901 (excellent, since scores above 0.8 are considered strong). The logistic regression model scored just 0.724 (moderate).
  • Both models were “well-calibrated”: Their predicted odds of live birth matched actual outcomes closely.
  • XGBoost had better clinical utility: A “decision curve analysis (DCA)” showed that the model helped doctors make better decisions than either guessing or using the traditional model. In short: XGBoost wasn’t just more accurate—it was more useful for real-world care.

The Top Factors Predicting Live Birth

The XGBoost model identified eight key features that most influenced outcomes. The top three were:

  1. Age: Older patients were less likely to have a live birth (a well-known factor in fertility).
  2. Estrogen levels two days after the trigger shot (E21): Higher levels correlated with better odds.
  3. Prolactin levels two days after the trigger shot (PRL1): Lower levels were linked to live birth.

Other important factors included basal LH levels (before treatment), post-trigger LH levels, and total FSH medication used. These make sense: Age affects egg quality, while hormone levels reflect how well the ovaries respond to stimulation.

Why This Is Better Than Traditional Models

Traditional models like the McLernon model (used in the UK) have AUC scores around 0.72–0.73—far lower than XGBoost. Even other machine learning studies (like those by Qiu et al., who used XGBoost for IVF prediction with an AUC of 0.73, or Amini et al., who found random forests worked best with an AUC of 0.81) didn’t match this study’s performance. Most importantly, none of these studies directly compared their models to traditional logistic regression—something this team did, proving XGBoost’s superiority.

Limitations to Consider

The study isn’t perfect:

  • Retrospective design: Researchers looked back at old data, which can introduce bias (e.g., missing variables or unrecorded factors).
  • Single-center data: All patients were from one hospital, so results might not apply to diverse populations (e.g., patients with different medical histories or treatment protocols).
  • No external validation: The model hasn’t been tested on data from other hospitals, which is needed before it’s used widely.

What This Means for Patients and Doctors

For patients, this model could mean personalized care: A doctor could use XGBoost to estimate their odds of live birth before treatment, helping them decide whether to proceed, adjust their protocol, or consider other options (like donor eggs). For doctors, it’s a tool to make more informed decisions—reducing guesswork and improving patient outcomes.

This study was published in the Chinese Medical Journal in 2022 by Zhiyan Chen, Duoduo Zhang, Jingran Zhen, Zhengyi Sun, and Qi Yu from the Department of Obstetrics and Gynecology at Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College.

doi:10.1097/CM9.0000000000001874

Was this helpful?

0 / 0