IDENTIFICATION OF PREGNANCY LOSS RISK FACTORS USING MACHINE LEARNING ALGORITHMS
Keywords:
Pregnancy loss, Risk factors, Machine learning, KNN, SVM, Decision Tree, Extra Trees ClassifierAbstract
Pregnancy loss, or spontaneous abortion, is defined as the loss of a fetus before the 20th week of gestation. According to the American College of Obstetricians and Gynecologists (ACOG), approximately 15–20% of clinically confirmed pregnancies result in pregnancy loss. This study utilized cross-sectional data from the Bureau of Statistics Punjab (BSP) to investigate the risk factors associated with pregnancy loss. Multiple machine learning algorithms—including Logistic Regression (LR), K-Nearest Neighbors (KNN), Linear Discriminant Analysis (LDA), Support Vector Machines (SVM), Naïve Bayes (NB), Regularized Naïve Classifier (RNC), Classification and Regression Trees (CART), Bernoulli Naïve Bayes (BNB), Passive Aggressive, and Extra Trees Classifier (ETC)—were applied to assess predictive performance. Among these, KNN achieved the highest accuracy at 91%, while all other algorithms exceeded 80% accuracy. Feature selection and importance analysis using LR, CART, and ETC identified the number of children ever born and place of delivery as the most influential factors affecting pregnancy loss risk.