A machine learning model for predicting severe mycoplasma pneumoniae pneumonia in school-aged children

Ye, Yingying; Gao, Zhenpeng; Zhang, Zhiling; Chen, Jianlong; Chu, Chu; Zhou, Weifang

doi:10.1186/s12879-025-10958-8

Research
Open access
Published: 21 April 2025

A machine learning model for predicting severe mycoplasma pneumoniae pneumonia in school-aged children

Yingying Ye¹,
Zhenpeng Gao¹,
Zhiling Zhang¹,
Jianlong Chen¹,
Chu Chu¹ &
…
Weifang Zhou¹

BMC Infectious Diseases volume 25, Article number: 570 (2025) Cite this article

530 Accesses
1 Altmetric
Metrics details

Abstract

Objective

To develop an interpretable machine learning (ML) model for predicting severe Mycoplasma pneumoniae pneumonia (SMPP) in order to provide reliable factors for predicting the clinical type of the disease.

Methods

We collected clinical data from 483 school-aged children with M. pneumoniae pneumonia (MPP) who were hospitalized at the Children's Hospital of Soochow University between September 2021 and June 2024. Difference analysis and univariate logistic regression were employed to identify predictors for training features in ML. Eight ML algorithms were used to build models based on the selected features, and their effectiveness was validated. The area under the curve (AUC), accuracy, five-fold cross-validation, and decision curve analysis (DCA) were utilized to evaluate model performance. Finally, the best-performing ML model was selected, and the Shapley Additive Explanations (SHAP) method was applied to rank the importance of clinical features and interpret the final model.

Results

After feature selection, 30 variables remained. We constructed eight ML models and assessed their effectiveness, finding that the CatBoost model exhibited the best predictive performance, with an AUC of 0.934 and an accuracy of 0.9175. DCA was used to compare the clinical benefits of the models, revealing that the CatBoost model provided greater net benefits than the other ML models within the threshold probability range of 34% to 75%. Additionally, we applied the SHAP method to interpret the CatBoost model, and the SHAP diagram was used to visually show the influence of predictor variables on the outcome. The results identified the top six risk factors as the number of days with fever, D-dimer, platelet count (PLT), C-reactive protein (CRP), lactate dehydrogenase (LDH), and the neutrophil-to-lymphocyte ratio (NLR).

Conclusions

The interpretable CatBoost model can help physicians accurately identify school-aged children with SMPP. This early identification facilitates better treatment options and timely prevention of complications. Furthermore, the SHAP algorithm enhances the model's transparency and increases its trustworthiness in practical applications.

Peer Review reports

Background

Mycoplasma is the most common pathogen that causes community-acquired pneumonia (CAP) in hospitalized children aged five years and older. Since September 2023, there has been a significant increase in the number of M. pneumonia cases among children in different regions of China [1]. In respiratory wards, more than 50% of hospitalized children have been diagnosed with MPP, particularly in the 6–11 age group, where mycoplasma is the primary cause [2, 3]. Although most children with M. pneumonia experience mild symptoms and have a good prognosis, there has been a notable rise in mycoplasma resistance in recent years. It is estimated that over 80% of pneumonia cases in China involve strains of mycoplasma that are resistant to macrolide antibiotics, contributing to an increase in the number of children diagnosed with SMPP [4, 5]. SMPP can lead to more serious inflammatory responses and complications, including necrotizing pneumonia, pulmonary embolism, and obstructive bronchiolitis. These complications pose a significant risk to children's health [6]. Therefore, using existing clinical data to predict the occurrence of SMPP in school-aged children is crucial for clinical practice.

As the incidence of SMPP grows, more studies are focusing on its early prediction. Current predictive models mainly rely on traditional logistic regression methods, which often struggle with data imbalance. Furthermore, these algorithms may not meet the precision needed for modern diagnostics and treatment. Currently, there is no comprehensive model that systematically integrates clinical features and effectively quantifies the early predictive capability of SMPP in school-aged children. With advancements in artificial intelligence, ML has been widely applied in various medical fields due to its ability to handle large datasets and its strong predictive power [7, 8]. Additionally, the SHAP algorithm was used to address the "black box" problem associated with machine learning algorithms. This approach clarifies the predictions made by the optimal model, ranks the importance of the predictors, and provides interpretable outputs [9, 10]. This study aims to integrate multiple clinical and laboratory datasets to develop a ML model that assists frontline clinicians in the early prediction of SMPP in school-aged children, thereby facilitating early diagnosis and targeted treatment strategies.

Materials and methods

Ethics approval

This study was approved by the Ethics Committee of Children's Hospital of Soochow University (2024 CS098). Written informed consent was obtained from the guardians of all participants.

Study population and data collection

A total of 562 school-age children with MPP who were hospitalized at the Children's Hospital of Soochow University from September 2021 to June 2024 were enrolled in this study. After applying the inclusion and exclusion criteria and excluding extreme or abnormal values, 483 samples were ultimately included. Collect clinical information from the pediatric records within the electronic medical record system. Document the patient's age, sex, clinical symptoms, pulmonary complications, fever days, length of hospitalization, the number of fiberoptic bronchoscopy procedures performed, and their results. Clinical symptoms were recorded within two hours of admission. Additionally, record the allergy history, including eczema, allergic rhinitis, allergic sinusitis, and urticaria. Include laboratory test results obtained within 24 h of admission, such as complete blood count, CRP, comprehensive biochemical panel, fluid immunology, cardiac troponins, coagulation profile, and lymphocyte subpopulation analysis. Finally, document the results of the initial chest imaging conducted after admission.

SMPP diagnoses and inclusion and exclusion criteria

Diagnostic criteria

Diagnosis is based on the"Guidelines for the Diagnosis and Treatment of Mycoplasma Pneumonia in Children (2023 Edition)" [11]. MPP is defined by the presence of one or both of the following: (1) a single serum Mycoplasma pneumonia (MP) antibody titer of ≥ 1:160 (PA method) or a fourfold or greater increase in paired serum MP antibody titers during the illness; (2) Positive MP DNA or RNA. SMPP refers to MPP accompanied by one of the following conditions: poor general condition, significantly increased respiratory rate (RR > 70 or > 50 breaths/min), cyanosis, respiratory distress, involvement of ≥ 2/3 of a single lung lobe or multiple lung lobes, pleural effusion, pulse oxygen saturation ≤ 0.93, or extrapulmonary complications. NSMPP refers to children who do not meet the diagnostic criteria for SMPP.

Inclusion criteria

1) Age ≥ 6 years and ≤ 12 years; 2) Presence of respiratory symptoms and/or fever (> 37.3℃); 3) Chest imaging confirming pneumonia; 4) Positive MP-DNA/RNA in nasopharyngeal secretions or BALF, or a fourfold or greater increase in paired serum MP antibody titers.

Exclusion criteria

1) Age < 6 years or > 12 years; 2) Incomplete clinical data; 3) Presence of other respiratory diseases, such as bronchial asthma, bronchopulmonary dysplasia, tuberculosis, primary ciliary dyskinesia, cystic fibrosis, bronchial foreign body, lung tumors, and non-infectious interstitial lung disease; 4) Presence of immunodeficiency, congenital heart disease, or hereditary neurological disorders.

Predictors screening

By comparing the demographic characteristics, clinical features, laboratory tests, bronchoscopy findings, and chest imaging results of the SMPP group and the NSMPP group, we selected variables with statistically significant differences (P < 0.05). Subsequently, we employed univariate logistic regression to filter out variables with P < 0.1.

Model development

This study randomly divided all enrolled children into a training cohort (N = 387) and a testing cohort (N = 96) in an 80:20 ratio. Based on the type of problem and the characteristics of the data, we selected eight machine learning models suitable for classification and prediction during model training. These models include logistic regression (LR), support vector machine (SVM), classification decision tree (CDT), random forest (RF), gradient boosting decision tree (GBDT), light gradient boosting machine (LightGBM), CatBoost, and extreme gradient boosting (XGBoost). Figure 1 illustrates the workflow of this research.

Model performance evaluation

The performance of the models was evaluated using AUC, along with additional metrics such as accuracy, sensitivity, specificity, F1 score, and recall. AUC is a performance measure used to evaluate the quality of classifiers, with a higher AUC value indicating better classification performance. Accuracy represents the proportion of correctly identified samples among all samples and is one of the most common metrics for model evaluation. Sensitivity measures the percentage of true positive patients correctly identified as positive; a higher sensitivity indicates a lower rate of missed diagnoses. Specificity is assessed as the percentage of true negative patients correctly identified as negative, with higher specificity reflecting a lower misdiagnosis rate. Recall is an important metric that measures a model's ability to identify all true positive examples in classification tasks; a higher value indicates that the model is more effective at finding all true positives. The F1 score is a weighted average of precision and recall, combining the results of these two metrics. Each model underwent five-fold cross-validation to assess average performance. The ML model with the best overall performance was then selected as the final model, and its applicability was evaluated using DCA.

Model explanation

In addition to evaluating the model, this study uses the SHAP algorithm [12] to rank the importance of predictive factors in the optimal model, providing interpretable outputs. The SHAP algorithm explains predictive models by quantifying the influence of each feature on the model's predictions, highlighting both positive and negative contributions of features. This research uses the SHAP algorithm to clarify the significance of features in the predictive model and to calculate SHAP values, indicating whether the input features are positively or negatively associated with the output. This method addresses the"black box"problem, helping clinicians understand the rationale behind model decisions, which enhances the model's trustworthiness and usability in practical applications.

Statistical analysis

This study utilized SPSS 27 and Python 3.10.0, TensorFlow framework for programming, Pandas library and scikit-learn library for data preprocessing and model training. The included characteristics comprised both numerical and categorical variables. For numerical variables, If the data followed a normal distribution, independent samples t-tests were conducted to compare differences between groups, reported as mean ± standard deviation (\(\overline{x }\pm s\)). For data that did not conform to a normal distribution, non-parametric tests (Mann–Whitney U test) were employed, with results presented as median and interquartile range (M [P25, P75]). Categorical variables were expressed in terms of frequency and percentage (n [%]), and comparisons were made using chi-square tests. All statistical results were considered significant at P < 0.05.

Results

Patient characteristics

Among the 483 children diagnosed with MPP, 236 were male (48.86%) and 247 were female (51.14%), with an average age of 8.30 ± 1.47 years. Of these, 91children were classified into the SMPP group, while 392 were placed in the NSMPP group. There were no statistically significant differences in gender or age between the two groups (P > 0.10), as shown in Table 1.

Table 1 Baseline characteristics of the patients

Full size table

The differential analysis of clinical manifestations and laboratory test results between the SMMP and NSMMP patient groups is presented in Table S1 - 2. Compared to the NSMPP group, children with SMPP experienced a longer duration of fever and hospitalization (P < 0.001). Additionally, the incidence of pleural effusion, atelectasis, and lung collapse was significantly higher in SMPP patients (P < 0.001). Furthermore, a greater proportion of SMPP patients underwent electronic bronchoscope compared to NSMPP patients (P < 0.001). Regarding laboratory test results, significant differences were observed in most parameters between SMPP and NSMPP, with the exception of white blood cells (WBC), globulin (GLB), urea, creatinine (CREA), total cholesterol (TCHOL), cholic acid (CG), immunoglobulin A (IgA), immunoglobulin G (IgG), immunoglobulin M (IgM), fibrinogen (Fib), cytotoxic T cells as a percentage of T lymphocytes (CD3 + CD8 + %), natural killer cells (CD3-CD(16 + 56) + %), and high-sensitivity cardiac troponin T (Hs-cTnT) (P < 0.05).

Subsequently, the characteristics that showed significant differences were further analyzed using univariate logistic regression, with the results presented in Table S3. This table includes the odds ratios (OR), confidence intervals, and associated p-values. In summary, the analysis of clinical features and observations demonstrated significant differences in clinical characteristics and laboratory test results between SMPP and NSMPP. Finally, the study identified 30 variables as potential predictive factors for inclusion in the machine learning model.

Model development and performance comparison

All enrolled patients were randomly divided into a training cohort (N = 387) and a testing cohort (N = 96) in an 80:20 ratio. The selected 30 variables were used to train and evaluate eight different ML models. The study used five-fold cross-validation to determine the hyperparameters for each model. First, the original dataset was divided into five mutually exclusive subsets. Next, a hyperparameter space was created for grid search, and each pair of hyperparameter combinations was cross-validated. Finally, the combination with the best average validation performance was selected to determine the hyperparameters for each model. The discriminative performance of each model is presented in Table 2 and Fig. 2. The CatBoost model (AUC = 0.934, ACC = 0.9175) demonstrated the highest predictive performance among the eight models, followed by the RF model (AUC = 0.934, ACC = 0.8792) and the XGBoost model (AUC = 0.927, ACC = 0.8351). To reduce the risk of selection bias in the dataset, five-fold cross-validation was conducted for all models in the test set to obtain the average performance across five predictions. The results, shown in Table S4, indicate that the CatBoost model outperforms the other ML models in terms of average predictive performance.

Table 2 Performance of each model for prediction

Full size table

Additionally, DCA was performed on five machine learning models with an AUC greater than 0.9 in the testing dataset to compare the net benefits of different clinical decision-making methods. DCA assesses the clinical net benefit of predictive models by comparing the intervention strategies suggested by the models against the default strategies of intervening in all patients or not intervening at all. The clinical net benefit is defined as the minimum probability at which further intervention in the disease is justified [13]. Figure 3 illustrates the net benefits at various threshold probabilities. The blue line represents the scenario in which all patients receive the intervention, while the yellow line indicates the scenario where no patients receive the intervention. The strategies provided by any of the five machine learning models outperform the default strategies of either intervening with all patients or not intervening at all. Within the threshold probability range of 34% to 75%, the CatBoost model demonstrates superior net benefits compared to the other machine learning models.

Model interpretation using the SHAP algorithm

The SHAP algorithm can be utilized to elucidate the individual predictive variables of the final model, allowing us to ascertain the significance and polarity of each predictor's contribution to the model's predictions. The ranking of variable importance is illustrated in Fig. 4. The number of days with fever exhibits the strongest predictive value across all ranges, followed by D-dimer levels. Furthermore, the horizontal axis (SHAP values) indicates the magnitude and direction of each feature's impact on the prediction outcome. The further a point is from the central line, the greater the influence of that feature on the model's output; positive SHAP values denote a positive impact, while negative SHAP values indicate a negative impact. The color coding reflects whether the variable is high (red) or low (blue) in that particular row of the dataset. It is evident that increases in the number of fever days, D-dimer, CRP, LDH, and NLR have a positive influence, propelling the prediction towards SMPP, whereas increases in PLT and the albumin-to-globulin ratio (A:G) exert a negative influence, steering the prediction towards NSMPP.

Using SHAP values, Fig. 5 illustrates the explanations for two different predictions. Features that increase the predicted probability of SMPP (compared to the average prediction, referred to as the baseline) are displayed in red, while features that decrease the prediction are shown in blue. In Fig. 5a, most parameters for the affected child fall outside the normal range (with moderate to severe pleural effusion and elevated levels of NLR, D-dimer, CRP, natural killer (NK) cells, B cells, and LDH), leading the model to predict an increased risk of SMPP for this child. Conversely, Fig. 5b depicts a child without moderate to severe pleural effusion, with most parameters within the normal range (including NLR, D-dimer, NK, B cells, PLT, creatine kinase (CK), prealbumin (PA), alanine aminotransferase (ALT), etc.), resulting in the model predicting a lower risk of SMPP for the second child.

Discussion

We developed a machine learning model for the early diagnosis of SMPP, based on children with MP infection admitted to the Children's Hospital of Soochow University. Additionally, we employed the SHAP algorithm to interpret the optimal machine learning model, illustrating the impact of predictive features on the predicted outcomes.

In previous studies predicting pediatric SMPP, researchers have constructed models using classical logistic regression. The nomogram model achieved an AUC value of 0.777 for predicting pediatric SMPP [14].The predictive features included immunoglobulin M, eosinophil percentage, eosinophil count, hemoglobin, erythrocyte sedimentation rate, and prealbumin. Similarly, another model also developed a nomogram model for predicting pediatric SMPP, reporting AUC values of 0.867 and 0.840 for the training and validation sets, respectively [15]. With the advancement of machine learning, CatBoost has emerged as a superior method for constructing medical predictive models. CatBoost [16, 17] is a gradient boosting framework based on decision trees. Compared to traditional decision tree models, CatBoost excels in handling categorical variables. It calculates target statistics for categorical features to reduce distribution bias in training and testing data, thereby improving model accuracy and generalization performance. Furthermore, CatBoost provides excellent predictive results without the need for extensive parameter tuning, which reduces the risk of overfitting and facilitates the development of more robust models. Researchers have employed six machine learning models to predict the severity of acute pancreatitis, finding that the CatBoost model demonstrated optimal performance, with an average AUC score of 0.81 ± 0.033 and an accuracy of 89.1% [18]. In another study [19], six machine learning models were used to predict survival outcomes in patients with recurrent cervical cancer, and the results similarly indicated that the CatBoost model outperformed all other models, achieving an AUC of 0.99 and an accuracy of 95.6%. This study also utilized eight machine learning models to predict SMPP in school-aged children, with the CatBoost model achieving an AUC value of 0.991 and an accuracy of 0.964. Compared to other machine learning models, the CatBoost model exhibited superior accuracy. In clinical applications, the feasibility of the CatBoost model is greater than that of traditional logistic regression, as distinguishing between SMPP and NSMPP based on clinical data requires high accuracy, sensitivity, and specificity.

The strengths of this study are as follows: First, the clinical features included were multidimensional, encompassing both clinical presentations and laboratory tests. These data are commonly available in most care settings, enhancing the generalizability of the prediction models. Second, we utilized machine learning algorithms to train the models, which are better suited for handling complex data and nonlinear relationships compared to traditional logistic regression. In clinical practice, the occurrence of disease typically results from a combination of factors, making machine learning algorithms more appropriate for the early prediction of severe Mycoplasma pneumoniae pneumonia (SMPP) and capable of providing accurate predictions. Furthermore, transparency and interpretability of predictive models are crucial in real-world clinical settings. However, machine learning models often lack explanations for their outputs. In this study, we employed explainable artificial intelligence tools (SHAP) to address the issue of interpretability. Consequently, the model we constructed provides interpretable outputs while effectively capturing complex data, offering valuable support for the early clinical identification of SMPP.

In this study, we provided a comprehensive interpretation of the model using SHAP summary plots. We identified several clinical factors that may be associated with the development of SMPP in school-age children. These factors include the number of days with fever, D-dimer levels, PLT, CRP, LDH, and the NLR. Numerous studies have confirmed that a longer duration of fever, along with elevated levels of D-dimer, CRP, LDH, and NLR, is associated with an increased likelihood of developing SMPP [20,21,22,23]. However, there is still a lack of research on PLT in relation to SMPP. The findings of this study indicate that lower PLT levels are associated with a higher risk of developing SMPP. PLT levels in the pediatric SMPP group are significantly lower than those in the mild illness group and the healthy control group, which aligns with the results of this study and confirms the importance of PLT in the early prediction of SMPP [24]. PLT refers to the number of platelets in a unit volume of peripheral blood, reflecting the dynamics of platelet production and apoptosis in the bloodstream. In recent years, a growing body of evidence has shown that platelets are not only essential components in mediating hemostasis and thrombosis but also play an active role in the immune response to microorganisms and foreign substances [25,26,27]. When the body is infected by pathogens, the interaction between platelets and neutrophils triggers an immune response that facilitates the progression of inflammation. During bacterial infections, platelets actively mediate the host response through interactions with circulating leukocytes. Additionally, platelets can interact with various viruses, and thrombocytopenia often affects the severity and prognosis of viral infections. Research has shown that the antiplatelet agent aspirin is beneficial in the treatment of influenza [28,29,30]. However, there is currently a lack of studies examining the interactions between platelets and mycoplasma infections. This study has demonstrated the clinical value of platelets in predicting SMPP, highlighting the need for further foundational research to explore the role of platelets following mycoplasma infection. Such research could inform clinical strategies for the prevention and treatment of SMPP from multiple perspectives.

Certainly, this research has several limitations. First, the model was developed retrospectively using data from a single center, which introduces inherent biases in the data collection process. Second, the model validation was conducted solely through internal validation methods. To enhance the model's generalizability, it may be necessary to utilize additional external datasets for validation. Third, the proposed model lacks prospective validation, which will be a focus of future research efforts.

Conclusions

The CatBoost model shows promise as a tool for the early identification of SMPP in school-age children. To further validate the findings of this study, multi-center validation and large-scale prospective studies are recommended.

Data availability

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

ML:: Machine learning
SMPP:: Severe Mycoplasma pneumoniae pneumonia
MPP:: Mycoplasma pneumoniae pneumonia
MP:: Mycoplasma pneumonia
AUC:: Area under the curve
DCA:: Decision curve analysis
SHAP:: Shapley additive explanations
PLT:: Platelet
CRP:: C-reactive protein
LDH:: Lactate dehydrogenase
NLR:: Neutrophil-to-lymphocyte ratio
CAP:: Community-acquired pneumonia
NSMPP:: Non-severe mycoplasma pneumoniae pneumonia
DNA:: Deoxyribonucleic acid
RNA:: Ribonucleic acid
RR:: Respiratory rate
BALF:: BronchoaLveolar Lavage Fluid
LR:: Logistic regression
SVM:: Support vector machine
CDT:: Classification decision tree
RF:: Random forest
GBDT:: Gradient boosting decision tree
LightGBM:: Light gradient boosting machine
CatBoost:: Gradient boosting
XGBoost:: EXtreme gradient boosting
WBC:: White blood cells
GLB:: Globulin
CREA:: Creatinine
TCHOL:: Total cholesterol
CG:: Cholic acid
IgA:: Immunoglobulin A
IgG:: Immunoglobulin G
IgM:: Immunoglobulin M
Fib:: Fibrinogen
Hs-cTnT:: High-sensitivity cardiac troponin T
OR:: Odds ratios
A:G:: Albumin-to-globulin ratio
NK:: Natural killer
CK:: Creatine kinase
PA:: Prealbumin
ALT:: Alanine aminotransferase

References

Kutty PK, Jain S, Taylor TH, Bramley AM, Diaz MH, et al. Mycoplasma pneumoniae Among Children Hospitalized With Community-acquired Pneumonia. Clin Infect Dis. 2019;68(1):5–12.
Article CAS PubMed Google Scholar
Gao Y, Feng X, Yuan T, Li M, Wei M, Li S. Post-pandemic trends: Epidemiological and etiological insights into acute respiratory infections in southern China. Diagn Microbiol Infect Dis. 2024;109(3): 116293.
Article PubMed Google Scholar
Yan C, Xue GH, Zhao HQ, Feng YL, Cui JH, Yuan J. Current status of Mycoplasma pneumoniae infection in China. World J Pediatr. 2024;20(1):1–4.
Article CAS PubMed PubMed Central Google Scholar
Meyer Sauteur PM, Beeton ML, European Society of Clinical Microbiology and Infectious Diseases (ESCMID) Study Group for Mycoplasma and Chlamydia Infections (ESGMAC), and the ESGMAC Mycoplasma pneumoniae Surveillance (MAPS) study group. Pneumonia outbreaks due to re-emergence of Mycoplasma pneumoniae. Lancet Microbe. 2024;5(6):e514.
Article PubMed Google Scholar
Chen YC, Hsu WY, Chang TH. Macrolide-Resistant Mycoplasma pneumoniae Infections in Pediatric Community-Acquired Pneumonia. Emerg Infect Dis. 2020;26(7):1382–91.
Article CAS PubMed PubMed Central Google Scholar
Yang S, Lu S, Guo Y, Luan W, Liu J, Wang L. A comparative study of general and severe mycoplasma pneumoniae pneumonia in children. BMC Infect Dis. 2024;24(1):449.
Article CAS PubMed PubMed Central Google Scholar
Theodosiou AA, Read RC. Artificial intelligence, machine learning and deep learning: Potential resources for the infection clinician. J Infect. 2023;87(4):287–94.
Article PubMed Google Scholar
Haug CJ, Drazen JM. Artificial Intelligence and Machine Learning in Clinical Medicine, 2023. N Engl J Med. 2023;388(13):1201–8.
Article CAS PubMed Google Scholar
Rai T, Shen Y, He J, Mahmud M, Brown DJ, Kaur J, et al. Understanding feature importance of prediction models based on lung cancer primary care data. In 2024 International Joint Conference on Neural Networks (ijcnn). Yokohama: IEEE; 2024. p. 1–8.
Duan J, Li H, Ma X, Zhang H, Lasky R, et al. Predicting SARS-CoV-2 infection among hemodialysis patients using multimodal data. Front Nephrol. 2023;2(3):1179342.
Article Google Scholar
Zhao SY, Qian SY, Chen ZM, Gao HM, Liu HM, Zhang HL, Liu JR. Guidelines for the Diagnosis and Treatment of Mycoplasma Pneumonia in Children (2023 Edition). Electron J Emerg Infect Dis. 2024;9(1):73–9.
Google Scholar
Lundberg S, Lee SI. A unified approach to interpreting model predictions. arXiv. 2017: arXiv:1705.07874.
Vickers AJ, van Calster B, Steyerberg EW. A simple, step-by-step guide to interpreting decision curve analysis. Diagn Progn Res. 2019;3(2):18.
Article PubMed PubMed Central Google Scholar
Chang Q, Chen HL, Wu NS, Gao YM, Yu R, Zhu WM. Prediction Model for Severe Mycoplasma pneumoniae Pneumonia in Pediatric Patients by Admission Laboratory Indicators. J Trop Pediatr. 2022;68(4):fmac059.
Article PubMed Google Scholar
Zhang X, Sun R, Jia W, Li P, Song C. A new dynamic nomogram for predicting the risk of severe Mycoplasma pneumoniae pneumonia in children. Sci Rep. 2024;14(1):8260.
Article CAS PubMed PubMed Central Google Scholar
Hancock JT, Khoshgoftaar TM. CatBoost for big data: an interdisciplinary review. J Big Data. 2020;7(1):94.
Article PubMed PubMed Central Google Scholar
Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A. CatBoost: unbiased boosting with categorical features. arXiv. 2017;arXiv:1706.09516.
Google Scholar
Kui B, Pintér J, Molontay R, Nagy M, Farkas N, Gede N, et al. EASY-APP: An artificial intelligence model and application for early and easy prediction of severity in acute pancreatitis. Clin Transl Med. 2022;12(6):e842.
Article CAS PubMed Google Scholar
Geeitha S, Ravishankar K, Cho J, Easwaramoorthy SV. Integrating cat boost algorithm with triangulating feature importance to predict survival outcome in recurrent cervical cancer. Sci Rep. 2024;14(1):19828.
Article CAS PubMed PubMed Central Google Scholar
Li L, Guo R, Zou Y, Wang X, Wang Y, et al. Construction and Validation of a Nomogram Model to Predict the Severity of Mycoplasma pneumoniae Pneumonia in Children. J Inflamm Res. 2024Feb;22(17):1183–91.
Article Google Scholar
Li YT, Zhang J, Wang MZ, Ma YM, Zhi K, Dai FL, Li SJ. Changes in coagulation markers in children with Mycoplasma pneumoniae pneumonia and their predictive value for Mycoplasma severity. Ital J Pediatr. 2023;49(1):143.
Article CAS PubMed PubMed Central Google Scholar
Qiu J, Ge J, Cao L. D-dimer: The Risk Factor of Children’s Severe Mycoplasma PneumoniaePneumonia. Front Pediatr. 2022;10: 828437.
Article PubMed PubMed Central Google Scholar
Li D, Gu H, Chen L, Wu R, Jiang Y, Huang X, Zhao D, Liu F. Neutrophil-to-lymphocyte ratio as a predictor of poor outcomes of Mycoplasma pneumoniae pneumonia. Front Immunol. 2023;14:1302702.
Article CAS PubMed PubMed Central Google Scholar
Li M, Gao J. Correlation and Clinical Significance of Changes in Serum Soluble P-selectin, D- dimer and Platelet Levels with the Severity of Mycoplasma Pneumoniae Infection in Children. Altern Ther Health Med. 2024:AT9935.
Koupenova M, Livada AC, Morrell CN. Platelet and Megakaryocyte Roles in Innate and Adaptive Immunity. Circ Res. 2022;130(2):288–308.
Article CAS PubMed PubMed Central Google Scholar
Mandel J, Casari M, Stepanyan M, Martyanov A, Deppermann C. Beyond Hemostasis: Platelet Innate Immune Interactions and Thromboinflammation. Int J Mol Sci. 2022;23(7):3868.
Article CAS PubMed PubMed Central Google Scholar
van der Meijden PEJ, Heemskerk JWM. Platelet biology and functions: new concepts and clinical perspectives. Nat Rev Cardiol. 2019;16(3):166–79.
Article PubMed Google Scholar
Koupenova M, Clancy L, Corkrey HA, Freedman JE. Circulating Platelets as Mediators of Immunity, Inflammation, and Thrombosis. Circ Res. 2018;122(2):337–51.
Article CAS PubMed PubMed Central Google Scholar
Rohlfing AK, Rath D, Geisler T, Gawaz M. Platelets and COVID-19. Hamostaseologie. 2021;41(5):379–85.
Article PubMed Google Scholar
Henry BM, de Oliveira MHS, Benoit S, Plebani M, Lippi G. Hematologic, biochemical and immune biomarker abnormalities associated with severe illness and mortality in coronavirus disease 2019 (COVID-19): a meta-analysis. Clin Chem Lab Med. 2020;58(7):1021–8.
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This is a clinical research project approved by Children’s Hospital of Soochow University. We greatly appreciate all participants in this study.

Clinical trial number

Not applicable.

Funding

No funding secured for this study.

Author information

Authors and Affiliations

Department of Infectious Diseases, Children’s Hospital of Soochow University, No. 303, Jingde Road, Suzhou, China
Yingying Ye, Zhenpeng Gao, Zhiling Zhang, Jianlong Chen, Chu Chu & Weifang Zhou

Authors

Yingying Ye
View author publications
You can also search for this author inPubMed Google Scholar
Zhenpeng Gao
View author publications
You can also search for this author inPubMed Google Scholar
Zhiling Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Jianlong Chen
View author publications
You can also search for this author inPubMed Google Scholar
Chu Chu
View author publications
You can also search for this author inPubMed Google Scholar
Weifang Zhou
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Y.Y. collected cases and experimental datas and wrote the main manuscript text, Z.G. and Z.Z. analyzed and interpreted the experimental datas, J. C. analyzed the datas, W. Z. and C. C. designed the research and revised the manuscript. All authors have read and approved the final manuscript.

Corresponding authors

Correspondence to Chu Chu or Weifang Zhou.

Ethics declarations

Ethics approval and consent to participate

This study was reviewed and approved by the Ethics Committee of Children's Hospital of Soochow University (2024CS098). Written informed consent was obtained from the guardians of all participants.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Material 1.

Supplementary Material 2.

Supplementary Material 3.

Supplementary Material 4.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Ye, Y., Gao, Z., Zhang, Z. et al. A machine learning model for predicting severe mycoplasma pneumoniae pneumonia in school-aged children. BMC Infect Dis 25, 570 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12879-025-10958-8

Download citation

Received: 20 November 2024
Accepted: 10 April 2025
Published: 21 April 2025
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12879-025-10958-8

A machine learning model for predicting severe mycoplasma pneumoniae pneumonia in school-aged children

Abstract

Objective

Methods

Results

Conclusions

Background

Materials and methods

Ethics approval

Study population and data collection

SMPP diagnoses and inclusion and exclusion criteria

Diagnostic criteria

Inclusion criteria

Exclusion criteria

Predictors screening

Model development

Model performance evaluation

Model explanation

Statistical analysis

Results

Patient characteristics

Model development and performance comparison

Model interpretation using the SHAP algorithm

Discussion

Conclusions

Data availability

Abbreviations

References

Acknowledgements

Clinical trial number

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary Information

Supplementary Material 1.

Supplementary Material 2.

Supplementary Material 3.

Supplementary Material 4.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Infectious Diseases

Contact us