Qing Lin,Zhuan-Ji Fang
Abstract BACKGROUND Gestational diabetes mellitus (GDM) is a condition characterized by high blood sugar levels during pregnancy.The prevalence of GDM is on the rise globally,and this trend is particularly evident in China,which has emerged as a significant issue impacting the well-being of expectant mothers and their fetuses.Identifying and addressing GDM in a timely manner is crucial for maintaining the health of both expectant mothers and their developing fetuses.Therefore,this study aims to establish a risk prediction model for GDM and explore the effects of serum ferritin,blood glucose,and body mass index (BMI) on the occurrence of GDM.AIM To develop a risk prediction model to analyze factors leading to GDM,and evaluate its efficiency for early prevention.METHODS The clinical data of 406 pregnant women who underwent routine prenatal examination in Fujian Maternity and Child Health Hospital from April 2020 to December 2022 were retrospectively analyzed.According to whether GDM occurred,they were divided into two groups to analyze the related factors affecting GDM.Then,according to the weight of the relevant risk factors,the training set and the verification set were divided at a ratio of 7:3.Subsequently,a risk prediction model was established using logistic regression and random forest models,and the model was evaluated and verified.RESULTS Pre-pregnancy BMI,previous history of GDM or macrosomia,hypertension,hemoglobin (Hb) level,triglyceride level,family history of diabetes,serum ferritin,and fasting blood glucose levels during early pregnancy were determined.These factors were found to have a significant impact on the development of GDM (P < 0.05).According to the nomogram model's prediction of GDM in pregnancy,the area under the curve (AUC) was determined to be 0.883 [95% confidence interval (CI): 0.846-0.921],and the sensitivity and specificity were 74.1% and 87.6%,respectively.The top five variables in the random forest model for predicting the occurrence of GDM were serum ferritin,fasting blood glucose in early pregnancy,pre-pregnancy BMI,Hb level and triglyceride level.The random forest model achieved an AUC of 0.950 (95%CI: 0.927-0.973),the sensitivity was 84.8%,and the specificity was 91.4%.The Delong test showed that the AUC value of the random forest model was higher than that of the decision tree model (P < 0.05).CONCLUSION The random forest model is superior to the nomogram model in predicting the risk of GDM.This method is helpful for early diagnosis and appropriate intervention of GDM.
Key Words: Gestational diabetes mellitus;Prediction model;Model evaluation;Random forest model;Nomograms;Risk factor
Gestational diabetes mellitus (GDM) is a metabolic disease that occurs or is first discovered during pregnancy[1,2] and is a risk factor for many adverse pregnancy outcomes.International data show that by 2021,the proportion of pregnant women with GDM worldwide has reached 16.7% and continues to grow[3].Preventing GDM has become an important challenge for global health.At present,numerous studies have been conducted worldwide to predict the likelihood of GDM[4,5],but these studies are only applicable to foreign populations,and their applicability to domestic populations is not ideal.There are relatively few studies on the risk prediction of GDM in China,which needs to be further strengthened.Therefore,the objective of this investigation is to establish a predictive model for GDM risk.By comparing the predictive efficacy of the nomogram model and the random forest model,this will provide clinicians with a more scientific and accurate risk prediction tool for GDM,promote early diagnosis and intervention of GDM,and provide pregnant women with corresponding intervention measures and health education.
A retrospective analysis of 406 pregnant women aged 22-43 years,with an average age of (31.17 ± 4.02) years,who underwent a routine prenatal examination in our hospital was conducted from April 2020 to December 2022.According to whether GDM occurred,they were divided into two groups,including the GDM group (n=197) and the non-GDM group (n=209).
Inclusion criteria were: (1) Normal pregnant women;and (2) natural pregnancy.Exclusion criteria were: (1) Patients with diabetes who had been diagnosed or were receiving treatment before pregnancy;(2) women who could not participate in the survey and follow-up;(3) adolescent pregnant women (< 18 years old);(4) those suffering from other chronic diseases,such as cardiovascular disease,liver disease,renal dysfunction,or malignant neoplasms;and (5) pregnant women who used hormones and immunosuppressants.
The clinical data of early pregnancy (6-13 wk) were collected,including height,weight,pre-pregnancy body mass index (BMI),family history,hemoglobin (Hb) level,fasting blood glucose,and other indicators.Two persons were responsible for data entry and verification.
Pregnant women at the gestational age of 24 to 28 wk,underwent an oral glucose tolerance test.Glucose water (75 g) was consumed after 8 h fasting on an empty stomach and then blood glucose was measured 3 times within 2 h.A diagnosis of GDM was made if the blood glucose level measured ≥ 5.1 mmol/L,10.0 mmol/L,or 8.5 mmol/L during the fasting,1-h,or 2-h tests,respectively[6].
Statistical software SPSS 21.0 was utilized for data analysis.The measurement data were represented as the mean and standard deviation,and group comparisons were conducted using thet-test.The enumeration data were represented as number (percentage),and the comparison between groups was conducted using theχ2test or Fisher's exact test.A multivariate logistic regression analysis was utilized,and statistical significance was determined at theP< 0.05 level.Based on the machine learning method,the nomogram prediction model was established by R language,and the random forest model was established using the Random Forest package.The model's application performance was assessed using sensitivity,specificity,and the area under the receiver operating characteristic curve (ROC AUC).The AUC was compared using the Delong test.
The comparison results of the general data in the two groups showed that there were significant differences in BMI,family history of diabetes,GDM history,macrosomia,hypertension,Hb level,triglyceride level,serum ferritin,and fasting blood glucose in the first trimester of pregnancy between the two groups (P< 0.05).These results are shown in Table 1.
Whether GDM occurred or not was used as the dependent variable,and the statistically significant variables in the univariate results were included in the multivariate logistic regression analysis as the independent variables,and the assignment criteria of each variable are shown in Table 2.The multivariate results showed that preconception BMI,family history of diabetes,GDM history,macrosomia,hypertension,Hb level,triglyceride level,serum ferritin,and fasting blood glucose in early pregnancy were the influencing factors of GDM as shown in Table 3 (P< 0.05).

Table 2 Variable assignment

Table 3 Multivariate logistic regression analysis results of gestational diabetes mellitus
Nomogram model construction:The results of multivariate logistic regression analysis were plotted into a nomogram model using R language and are shown in Figure 1.The total score was derived by assigning scores to each risk factor in the nomogram,and the corresponding probability of GDM occurring was determined using the total score and its associated probability value.

Figure 1 Risk prediction nomogram model of gestational diabetes mellitus. BMI: Body mass index;DM: Diabetes mellitus;GDM: Gestational diabetes mellitus;Hb: Hemoglobin.
Random forest prediction model construction:Nine statistically significant indicators from univariate analysis were included in the random forest model,and the values are shown in Table 2.The results showed a fixed tree value,and when mtry=10,the false positive rate of the model was the smallest.Based on mtry=10,when ntree=500,the model error was based on stability.Therefore,based on the mtry=10 and ntree=500 parameters,the top 5 variables in predicting the occurrence of GDM by the random forest model were serum ferritin,fasting blood glucose in the first trimester,BMI before pregnancy,Hb level and triglyceride level,as shown in Figure 2.

Figure 2 Variable importance analysis of random forest model. A: The diagram shows that the value of each variable was changed into a random number,and the random forest also measured the degree of reduction in accuracy;B: The importance of each variable was compared by calculating the heterogeneous influence of each variable on the observations on each node of the classification tree.The larger the value,the greater the importance of the variable.BMI: Body mass index;DM: Diabetes mellitus;GDM: Gestational diabetes mellitus;Hb: Hemoglobin.
Comparison of the performance of the two predictive models:The nomogram model's ability to discriminate was assessed by the ROC AUC (Table 4 and Figure 3).The AUC of the random forest model was higher than that of the nomogram model (Z=-6.104,P< 0.001).

Table 4 Prediction performance evaluation results of the nomogram model and random forest model (%)
GDM is a condition that affects glucose metabolism during pregnancy.Typically,it occurs after the 27thweek of gestation,although some women may develop preexisting diabetes prior to conception.The pathogenesis of GDM is complex,and its etiology is undefined[7].In this study,after comparing the basic characteristics between pregnant women in the group with GDM and the group without GDM,the factors affecting the occurrence of GDM were obtained by multivariate logistic regression analysis,including preconception BMI,family history of diabetes,GDM history,macrosomia,hypertension,Hb and triglyceride levels,serum ferritin,and fasting blood glucose in the first trimester.These findings are essentially congruent with those of Lietal[8] and Tongetal[9].
This study revealed that pregnant women with a positive family history of diabetes exhibited a greater likelihood of GDM occurrence in comparison to their counterparts lacking such a familial history.Diabetes has a genetic predisposition,and can be passed on genetically to the next generation.Pregnant individuals who have a familial history of diabetes may possess a genetic predisposition that elevates the likelihood of the onset of GDM.A positive family history of diabetes mellitus has been established as one of the risk factors for GDM based on various national and internationalstudies[10-12].If a pregnant woman is diagnosed with GDM in a previous pregnancy,she is also more likely to have GDM in subsequent pregnancies,as confirmed by studies[13].Therefore,for pregnant women with a familial predisposition to diabetes and GDM,it is recommended that doctors pay close attention to their health during pregnancy.
This study found that hypertension plays an essential role in the progress of GDM,and studies have confirmed that hypertension is one of the factors that pose an independent risk for GDM[14].Hypertension may lead to the onset and progression of GDM by affecting placental blood flow and insulin sensitivity,causing islet cytopenia and dysfunction.In addition,this study also found that excess preconception BMI is one of the factors that pose an independent risk for GDM.This is because overweight and obesity affect insulin metabolism and production,increasing the body's need for insulin,and thus increasing the risk of GDM[15].Therefore,weight control before pregnancy and maintaining a normal BMI can reduce the risk of GDM.For patients with hypertension during pregnancy,surveillance and intervention should be strengthened to reduce the risk of GDM.
Sissalaetal[16] found that Hb level is a risk factor for GDM,this finding is in alignment with the outcomes of the present investigation.The reason for this is that the level of Hb may affect the diastolic blood pressure of pregnant women,thereby increasing maternal peripheral vascular resistance.This condition may reduce the stiffness of the large arteries and lead to the formation of insulin resistance,thereby increasing the risk of GDM[17,18].Serum ferritin is a major form of intracellular iron storage,and the body's iron stores are positively correlated with Hb levels.Research has indicated that pregnant women diagnosed with GDM exhibit elevated serum ferritin levels in comparison to their non-GDM counterparts;therefore,regular measurement of Hb levels and serum ferritin levels during pregnancy can help pregnant women detect problems in a timely manner and take corresponding treatment measures.Studies have demonstrated that lipid and lipoprotein abnormalities,including elevated triglycerides,are associated with insulin resistance and type 2 diabetes,hence leading to significantly higher levels of triglycerides in GDM compared to non-GDM patients[19,20].Therefore,monitoring blood lipid levels during pregnancy is of great clinical significance to effectively predict the onset of GDM.
In this investigation,the nomogram model and random forest model were established by applying preconception BMI,family history of diabetes,GDM history,macrosomia,hypertension,Hb and triglyceride levels,serum ferritin,and fasting blood glucose levels in the first trimester,and compared the prediction effect of the model.It was found that the AUC of GDM exhibited a value of 0.950 (95% confidence interval: 0.927-0.973),with a sensitivity rate of 91.4% and specificity rate of 84.80%.Compared with the nomogram model,it had better calibration and prediction accuracy.The reason for this may be that compared with the logistic regression model,the random forest model is not easy to overfit,has more advantages in processing high-dimensional data,and does not require feature selection.
In summary,nine indicators,including preconception BMI,family history of diabetes,GDM history,macrosomia,hypertension,Hb and triglyceride levels,serum ferritin,and fasting blood glucose level in early pregnancy,effectively predicted the incidence of GDM.In this study,the predictive model for risk assessment of GDM based on the results of multivariate analysis had a better predictive effect,and the random forest model had higher efficiency in predicting the risk of GDM,which can effectively anticipate the likelihood of developing diabetes.In pregnant women,this has important guiding significance for the prevention and treatment of GDM.However,this study only collected data in one hospital,and the sample size was small,which had certain limitations,and it is necessary to include a larger sample size for large-scale model verification in the future to provide a reference for clinical prediction of the incidence of GDM.
Gestational diabetes mellitus (GDM) is a common metabolic disease during pregnancy,which has adverse effects on maternal and child health.The establishment and evaluation of risk prediction models can help to identify high-risk groups early and take corresponding intervention measures to reduce the risk in pregnant women and newborns.At present,research in this field mainly focuses on the screening of predictors and the construction of models and explores their reliability and practicability.These studies provide a theoretical basis and method support for the prevention and management of gestational diabetes.
The purpose of this study is to establish a reliable risk prediction model for gestational diabetes to help doctors detect and treat patients with GDM.The key issues to be solved in this study include determining the best predictors and establishing effective models.Solving these problems is of great significance for improving the diagnostic rate of early diabetes and reducing the risk of complications in pregnant women and fetuses.It will also have a positive effect on future research in this field.
The main objective of this study is to establish a reliable risk prediction model for GDM.The achieved goals include obtaining the risk factors of GDM,establishing a risk factor prediction model,and evaluating the model.The random forest model has a good prediction effect,which can effectively predict the risk of diabetes in pregnant women and indicate the direction for future research in this field.
In this study,a retrospective case analysis method was adopted,and the study subjects were stratified into two groups: Those with GDM and those without GDM.According to whether GDM occurred,the general data of the two groups of pregnant women were investigated and analyzed,and we established a risk prediction model for GDM during the trimester using both the logistic regression and random forest models,and the two models were evaluated and validated.The peculiarity and novelty of the research methods lie in the adoption of machine learning methods,which greatly improve the accuracy and reliability of the model.
This study successfully established a risk prediction model for early gestational diabetes in pregnant women (random forest and nomogram model).After analyzing and screening a number of clinical factors,the random forest model had high prediction accuracy and judgment ability.This study provides strong support for early prevention and intervention of gestational diabetes in pregnant women and provides a reference value for further research in this field.In the future,it is necessary to further expand the sample size,improve the considered factors and verify the stability and applicability of the model.
This study proposed a model for predicting the likelihood of developing gestational diabetes during the early stages of pregnancy and compared the predictive effects of the random forest and nomogram models.The results suggested that the random forest model can more accurately predict the risk of gestational diabetes during early pregnancy.
Future research should focus on improving the risk prediction model of gestational diabetes in pregnant women and improve the accuracy and stability of the model to meet clinical needs.We should also explore new predictors,explore pathological mechanisms,and identify intervention strategies to reduce the risk of diabetes and its complications in pregnant women and improve maternal health.
Author contributions:Lin Q designed and performed the research and wrote the paper;Fang ZJ designed the research and supervised the report.
lnstitutional review board statement:This study was reviewed and approved by the Ethics Committee of the Fujian Maternity and Child Health Hospital.
lnformed consent statement:As the study used anonymous and pre-existing data,the requirement for the informed consent from patients was waived.
Conflict-of-interest statement:We have no financial relationships to disclose.
Data sharing statement:No additional data are available.
Open-Access:This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers.It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license,which permits others to distribute,remix,adapt,build upon this work non-commercially,and license their derivative works on different terms,provided the original work is properly cited and the use is non-commercial.See: https://creativecommons.org/Licenses/by-nc/4.0/
Country/Territory of origin:China
ORClD number:Qing Lin 0009-0003-4682-7249;Zhuan-Ji Fang 0000-0001-6637-4556.
S-Editor:Qu XL
L-Editor:A
P-Editor:Chen YX
World Journal of Diabetes2023年10期