Zhang Jingyue, Yang Chong, Lan Gaoshuang, Sun Yinjuan, Zhang Linlin, Yuan Hengjie
Objective To provide a basis for the selection of antiemetic regimen by establishing an artificial intelligence model for predicting chemotherapy-induced nausea and vomiting (CINV) in cancer patients receiving platinum-based chemotherapy with high emetic risk. Methods The clinical information on cancer patients who received cisplatin or carboplatin with area under the blood concentration-time curve (AUC) ≥4 and registered in the Department of Oncology, Tianjin Medical University General Hospital from January 2018 to December 2022 was collected, including gender, age, history of alcohol consumption, history of vomiting in pregnancy, chemotherapy cycle, patient expects to have CINV, chemotherapeutic agents, antiemetic regimen, out-of-hospital antiemetic treatment, sleep of less than 7 hours on the night before chemotherapy, occurrence of CINV in the previous cycle, and creatinine clearance (Ccr). After pre-proces- sing, the data were randomly divided into the training set and the test set. The training set was used to construct the prediction model, and the test set was used to evaluate the prediction efficiency of the model. Three algorithms, gradient boosting decision tree (GBDT), random forest (RF), and logistic regression (LR), were used to build a prediction model and evaluate the model performance, respectively. The evaluation metrics included accuracy, sensitivity, recall, F1 value (the reconciled mean of sensitivity and recall), and area under the receiver operating characteristic curve (AUROC). Finally, Shapley Additive exPlanation (SHAP) was applied to analyze the interpretability of the clinical features with predictive significance. Results A total of 698 patients, 439 males (62.9%) with a median age of 64 (21, 84) years, were included in this study and received a total of 1-654 cycles of chemotherapy. The chemotherapy regimen contained cisplatin in 364 cases with 864 cycles of chemotherapy, and carboplatin with AUC ≥4 in 361 cases with 790 cycles of chemotherapy. The number of treatment cycles in which neurokinin-1 receptor antagonist (NK-1 RA), 5-hydroxytryptamine-3 receptor antagonist (5-HT3 RA), and dexamethasone were selected as the antiemetic regimen was 1 347, and in those with the selection of 5-HT3 RA and dexamethasone was 307. The Spearman′s correlation analysis showed no strong correlation between the feature variables in the patients, and all of them could be used for model building. GBDT optimal hyperparameters n_estimators=500, max_depth=9; RF optimal hyperparameters max_depth=5; LR optimal hyperparameters penalty=L2. Three prediction models, GBDT, RF and LR, were established based on the optimal hyperparameter training data, respectively. The accuracy of GBDT model was 0.903, sensitivity was 0.882, recall was 0.903, F1 value was 0.883, and AUROC was 0.778±0.036 (95%CI: 0.739-0.814); the accuracy of RF model was 0.885, sensitivity was 0.861, recall was 0.885, F1 value was 0.870, and AUROC was 0.679±0.041 (95%CI: 0.636- 0.720); the LR model had an accuracy of 0.817, a sensitivity of 0.851, a recall of 0.817, an F1 value of 0.832, and an AUROC of 0.682±0.042 (95%CI: 0.639-0.723). Ccr, age, chemotherapy cycle, history of alcohol consumption, and patient expects to have CINV were the main features predicted by the model. The risk of CINV was negatively associated with Ccr, age, and chemotherapy cycle. And the risk of CINV was lower in patients with no history of drinking alcohol and patient expects to have CINV. Conclusion The GBDT, RF, and LR models could all predict the risk of CINV in patients receiving platinum-based chemotherapy with high emetic risk, with the GBDT model having the best predictive effect.