The diagnostic yield of the current criteria for assigning the risk of choledocholithiasis (CL) is inaccurate. The aim of our work was to develop a logistic regression model for predicting CL diagnosis in patients catalogued as either intermediate or high risk for CL, according to the criteria of the American Society for Gastrointestinal Endoscopy (ASGE).
Material and methodsWe conducted an analytic, observational, cross-sectional study for evaluating the diagnostic yield of a logistic regression model in adults with intermediate or high risk for CL. A receiver operating characteristic (ROC) curve analysis was done to determine the best cutoff point for predicting the diagnosis of CL. Endoscopic retrograde cholangiopancreatography (ERCP) was utilized as the gold standard for diagnosing CL.
ResultsA total of 148 patients suspected of presenting with CL were studied. In our cohort, 71 had immediate risk and 77 had high risk. CL diagnosis was confirmed in 102 patients (69%). Our model showed an area under the curve (AUC) of 0.68. In patients with an intermediate risk for CL, the AUC value was 0.72 and the positive predictive value (PPV) was 70%. In patients with a high risk for CL, the AUC value was 0.78 and the PPV was 89%.
ConclusionOur model appears to better predict the diagnosis of CL than the ASGE criteria for patients with an intermediate or high risk for the disease. Our model can guide clinical decisions in patients with suspected CL.
El rendimiento diagnóstico de los criterios actuales para asignar el riesgo de coledocolitiasis (CL) es impreciso. El objetivo de nuestro trabajo fue desarrollar un modelo de regresión logística para predecir el diagnóstico de CL en pacientes catalogados como riesgo intermedio y alto de CL, según los criterios de la Sociedad Americana de Endoscopia Gastrointestinal (ASGE).
Material y métodosRealizamos un estudio transversal, observacional y analítico para evaluar el rendimiento diagnóstico de un modelo de regresión logística en adultos con riesgo intermedio y alto de CL. Se realizó un análisis de curva característica operativa del receptor (COR) para determinar el mejor punto de corte para predecir el diagnóstico de CL. Se utilizó la colangiopancreatografía retrógrada endoscópica (CPRE) como estándar de oro para el diagnóstico de CL.
ResultadosSe estudiaron 148 pacientes con sospecha de CL. En nuestra cohorte, 71 presentaron riesgo intermedio y 77 riesgo alto. El diagnóstico de CL se confirmó en 102 pacientes (69%). En la cohorte, nuestro modelo mostró un área bajo la curva (ABC) de 0.68. En pacientes con riesgo intermedio de CL, el valor de ABC fue de 0.72 y el valor predictivo positivo (VPP) fue del 70%. En pacientes con riesgo alto de CL, el valor de ABC fue de 0.78 y el VPP de 89%.
ConclusiónNuestro modelo parece predecir mejor el diagnóstico de CL que los criterios de la ASGE para pacientes de riesgo intermedio y alto. Nuestro modelo puede orientar las decisiones clínicas en pacientes con sospecha de CL.
Choledocholithiasis (CL) is a frequent cause of extrahepatic bile duct obstruction that can be diagnosed in up to 15% of patients with cholecystolithiasis.1,2 Given that there can be complications due to the presence of CL (acute cholangitis, acute pancreatitis), its early diagnosis and treatment is of vital importance. Diagnosis is currently based on clinical, radiologic, and laboratory parameters. Considering body mass index, age, sex, the presence of bile duct dilation, and liver function tests, the American Society for Gastrointestinal Endoscopy (ASGE) and the European Society for Gastrointestinal Endoscopy (ESGE) developed predictive scales for CL.2,3
The ASGE criteria classify patients into low risk, intermediate risk, and high risk for CL, with a probability of 10%, 10–50%, and >50%, respectively. However, accuracy varies from 40% to 85% for high risk and 30%–40% for intermediate risk.2 Due to the wide variability in the diagnostic yield of the abovementioned criteria in clinical practice, other methods utilized for predicting CL diagnosis, such as logistic regression and symbolic regression, have been implemented.4
The gold standard for diagnosing CL is endoscopic retrograde cholangiopancreatography (ERCP), with 94% sensitivity and 100% specificity, but this method is not exempt from serious complications and should be performed mainly in the context of therapeutic indications.5
Therefore, we evaluated the diagnostic accuracy of our application through a logistic regression model utilized in patients with an intermediate or high risk for CL.
Material and methodsAn analytic, observational, cross-sectional study was conducted at a single center within the time frame of February 1, 2022, and February 1, 2023. As background in 2021, a logistic regression model was obtained through artificial intelligence for predicting the diagnosis of CL. Initially, the model was validated in a retrospective cohort of patients diagnosed with intermediate or high risk for CL that underwent ERCP during 2020 at the gastroenterology service of our hospital. From that precedent, the model was prospectively applied to hospitalized patients with intermediate or high risk for CL, to evaluate its diagnostic yield. Only patients with the presence of a stone during ERCP were considered to have a definitive diagnosis of CL. The model was developed by one of the authors (LM T-T), who is a Doctor in Artificial Intelligence. An established cutoff point of ≥0.6 for discerning whether the model predicted a positive result (the presence of a stone in ERCP) or a negative cutoff point <0.6 (no stone in ERCP) was determined. The App model was applied at the bedside of 148 patients admitted to our hospital. Inclusion criteria were patients ≥18 years of age, with clinical suspicion, and with intermediate or high risk of CL through laboratory or imaging tests. Exclusion criteria were patients at low risk for CL by the ASGE classification (considering that those patients did not require invasive studies before cholecystectomy), patients with previous cholecystectomy, previous ERCP or biliary surgery, pregnant women, patients with cirrhosis of the liver, clinical suspicion of cholangitis, ASA III, and patients that did not complete their follow-up at our hospital. Laboratory tests were carried out at hospital admission and at 24 and 48h after hospitalization. The laboratory tests performed upon hospital admission were utilized to classify patients according to the ASGE criteria.
Logistic regression modelOur logistic regression model included age, sex, time of ERCP to hospital admission, AST, ALT, alkaline phosphatase, total bilirubin, and the diameter of the common bile duct measured by abdominal ultrasound, for their analysis (Fig. 1). Through the logistic regression model, the contribution of each variable in predicting CL was evaluated and utilized for the control of other confounding factors.
Logistic regression model used to evaluate the clinical, laboratory, and imaging characteristics of 148 patients with intermediate or high risk of choledocholithiasis and to predict the diagnosis of choledocholithiasis.
ALP: alkaline phosphatase; ALT: alanine aminotransferase; AST: aspartate aminotransferase; CBD: common bile duct; F: female; M: male.
Three experienced endoscopists carried out all the ERCPs in our study. The results of the model were not taken into consideration for making medical decisions.
Statistical analysisFrequencies (%), medians (q25-q75), or means ± standard deviation were reported in the descriptive analysis. A receiver operating characteristic (ROC) curve analysis was done to establish the best cutoff point for predicting the diagnosis of CL. Model sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV) were reported. The Python program was utilized for the statistical analysis.
Ethical considerationsThe protocol was reviewed and approved by the ethics committee of our institution (GA22-00005). Written statements of informed consent were obtained before participation in the study.
ResultsA total of 148 patients with either intermediate or high risk for CL were recruited. Median age was 43 years (range:16−85) and 110 (74%) of the patients were women. As mean values, AST was 214.37 ± 184.25 U/l, ALT 288.13 ± 228.17 U/l, alkaline phosphatase 290.49 ± 174.57 U/l, total bilirubin 6.87 ± 5.13 mg/dl, and the diameter of the common bile duct 9.2 ± 5.4 mm. In our cohort, 71 (48%) patients were classified as having an intermediate risk for CL and 77 (52%) as having a high risk for CL. Table 1 shows all laboratory test results. ERCP was performed on 125 (85%) patients ≥48h from hospital admission. CL diagnosis through the identification of a stone by ERCP was made in 102 (69%) patients. ECRP demonstrated the presence of a stone in the common bile duct in 26 (36.6%) patients at intermediate risk for CL and in 41 (53.2%) patients at high risk for CL.
Clinical and laboratory characteristics of 148 patients with clinical suspicion of choledocholithiasis upon their admission to the Hospital Universitario “Dr. José Eleuterio González”.
Number of patients | 148 |
---|---|
Age (years) | 43 (16−85) |
Sex (n, %) | 148 |
Male | 38 (26) |
Female | 110 (74) |
Time of ERCP performance with respect to hospital admission | |
<48h | 23 (16) |
>48h | 125 (85) |
AST (IU/l) | 214.37±184.25 |
ALT (IU/l) | 288.13±228.17 |
Total bilirubin (mg/dl) | 6.87±5.13 |
Direct bilirubin (mg/dl) | 4.43±3.51 |
Alkaline phosphatase (IU/l) | 290.49±174.57 |
Common bile duct (mm) | 9.28±5.4 |
Probability according to ASGE criteria | |
High risk | 77 (52) |
Intermediate risk | 71 (48) |
Choledocholithiasis present in the prediction model (n, %) | 148 |
Yes | 81 (55) |
No | 67 (45) |
ERCP-confirmed | |
choledocholithiasis (n, %) | 148 |
Yes | 102 (69) |
No | 46 (31) |
ALT: alanine aminotransferase; ASGE: American Society for Gastrointestinal Endoscopy; ST: aspartate aminotransferase; ERCP: endoscopic retrograde cholangiopancreatography.
The analysis of our model in the cohort revealed an area under the ROC curve (AUC) value of 0.68, indicating moderate predictive capaciVty. The AUC value for patients with intermediate risk was 0.72, with 65% sensitivity, 65% specificity, 70% PPV, 59% NPV, and 71% accuracy. In high-risk patients, the AUC value was 0.78, with 66% sensitivity, 67% specificity, 89% PPV, 32% NPV, and 89% accuracy (Fig. 2).
DiscussionThe development of different noninvasive, economic tools (neural networks, machine learning) for predicting the presence of CL is vitally important, given that current methods (magnetic resonance imaging, ERCP) for its diagnosis are costly, not exempt from risks for the patient, and not always widely available. Unfortunately, the development of CL prediction tools continues to produce heterogeneous results.
Currently, the ASGE criteria define patients as having low, intermediate, and high risks for CL, but when applied to different populations, these definitions vary greatly. Matt Ridley expressed this concept in his book, The Agile Gene.6 He wrote that persons are similar because they are different and different because they are similar. In such a context, the applicability of any score or criterion becomes quite difficult.
In 2017, Narváez et al. applied the ASGE criteria to patients at the Hospital Universitario “Dr. José Eleuterio González”. Those authors reported a diagnostic accuracy of CL in the high-risk patients of 59%, with 85% sensitivity and 24% specificity. In the intermediate-risk patients, accuracy was 41%, with 14% sensitivity and 75% specificity, indicating unnecessary ERCP in almost half the patients.7 With those data in mind, we developed a logistic regression model to be prospectively applied as an App at the bedside of the patient suspected of presenting with CL. In our model, the PPV was 89% and accuracy was 89% for high-risk patients. Our results were similar to those reported by Dalai et al., who analyzed 270 patients at high risk for CL, utilizing artificial intelligence. Those authors described 91% sensitivity, 25% specificity, 87% PPV, 33% NPV, and 81% accuracy.8 Additionally, in 2014, a published prospective study by Jovanovic et al. utilized an artificial neuronal network to determine the risk for ERCP indication in patients with suspected CL. The AUC value was 0.88 (95% CI 83–93%), significantly higher than that of our model, but showed a similar PPV of 92% in patients at high risk of CL. That model correctly classified 92% of patients who needed an ERCP.9
A study by Steinway et al., utilizing a machine learning-based method for predicting CL in 1,378 patients, compared the diagnostic accuracy of the gradient boosting machine-learning method versus the 2019 ASGE and ESGE criteria, finding accuracy of 71%, 62%, and 62%, respectively, results not significantly different from ours.4
In a study on 1,171 patients, He et al. reported that the specificity of the ASGE criteria for CL in high-risk patients was 74% (95% CI 72–77%) and the PPV was 64% (95% CI 61–675). Even though the high-risk criteria demonstrated a probability above 50% of presenting with CL, more than one-third of the patients underwent diagnostic ERCP.10 In a retrospective article by Jagtap et al. that compared the ASGE and ESGE criteria for CL in high-risk patients, a higher PPV was obtained with the ESGE criteria.11
Table 2 shows the results of a group of studies on patients at high risk of CL, describing sensitivity from 61% to 91%. In addition, specificity varied from 24% to 97%. All studies were conducted on populations with a different prevalence of choledocholithiasis.7,8,10–13
Comparison of the diagnostic yield of our logistic regression model and the different statistical models presented in the literature, in patients at high risk for choledocholithiasis.
Sensitivity | Specificity | PPV | NPV | |
---|---|---|---|---|
Our predictive model (logistic regression) | 66% | 67% | 89% | 32% |
Narváez et al.7 (ASGE criteria) | 86% | 24% | 60% | 56% |
Dalai et al.8 (mechanized learning) | 91% | 25% | 87% | 33% |
He et al.10 (ASGE criteria) | 70% | 74% | 64% | 79% |
Jagtap et al.11 (ASGE criteria) | 75% | 97% | 90% | 91% |
Herrera et al.12 (symbolic regression) | 61% | 85% | 87% | 57% |
Ovalle et al.13 (ASGE criteria) | 69% | 52% | 79% | 38% |
NPV: negative predictive value; PPV: positive predictive value.
Our predictive model applied to the patients with intermediate risk for CL had 65% sensitivity, 65% specificity, 70% PPV, 59% NPV, and 71% accuracy for CL diagnosis. Our results are similar to those reported in the literature (Table 3).7,11,12
Comparison of diagnostic yield of our logistic regression model and the different statistical models presented in the literature in patients at intermediate risk for choledocholithiasis.
Sensitivity | Specificity | PPV | NPV | |
---|---|---|---|---|
Our predictive model (logistic regression) | 65% | 65% | 70% | 59% |
Narváez et al.7 (ASGE criteria) | 14% | 76% | 44% | 40% |
Jagtap et al.11 (ASGE criteria) | 24% | 20% | 10% | 42% |
Herrera et al.12 (symbolic regression) | 73% | 77% | 55% | 88% |
NPV: negative predictive value; PPV: positive predictive value.
Most of the prediction models reviewed in the literature have good diagnostic yield in patients at high risk for CL. These data can help confirm the indication for therapeutic ERCP or the necessity for bile duct examination (Table 4).
Classification criteria for the risk of choledocholithiasis, according to the American Society for Gastrointestinal Endoscopy (ASGE) and its suggested treatment.1
Risk classification | Clinical criteria | Treatment |
---|---|---|
High risk | Choledocholithiasis present in a noninvasive imaging study or cholangitis or total bilirubin >4 and dilated common bile duct | ERCP |
Intermediate risk | Altered liver function tests or age >55 years or dilated common bile duct (> 6mm with gallbladder in situ) | Magnetic resonance cholangiopancreatography or endoscopic ultrasound |
Low risk | None of the above (symptomatic cholecystolithiasis with none of the abovementioned factors) | Cholecystectomy |
ERCP: endoscopic retrograde cholangiopancreatography.
Nevertheless, the medical decision in patients in the intermediate-risk group continues to be a theme of interest because the PPV of our model in that group of patients was 70%, which was a marginal value, similar to that of several other studies. Therefore, an additional imaging study, such as magnetic resonance imaging or endoscopic ultrasound, is still necessary for confirming the diagnosis of CL.14
It should be highlighted that our logistic regression model showed improvement in diagnostic accuracy, with respect to the ASGE criteria. However, evaluating its performance in a different population is necessary because of the high prevalence variability of bile duct stones.
The limitations of our study include the fact that it was conducted at a single center, ERCP was performed ≥ 48h from hospital admission, we did not have laboratory test data from the same day as the ERCP, the majority of our patients were young women with a mean age under 40 years, and our sample size was based on consecutive sampling of patients with intermediate or high risk of CL during one year. Most likely, the diagnostic yield of our model would be more statistically robust if the 3 CL risk groups had been evaluated. Nevertheless, we only selected patients at intermediate or high risk for CL, because in low-risk patients ERCP is not needed and cholecystectomy is indicated.
The strong point of our article is that the model was developed utilizing adequate methodology and it was prospectively applied for diagnosing CL. All ERCPs were performed by expert endoscopists, and the model was applied at the patient’s bedside.
ConclusionOur logistic expression model showed an improvement in diagnostic accuracy, with respect to the ASGE criteria. We believe our findings can be useful for guiding the physician in his/her clinical decision-making in patients with suspected choledocholithiasis.
Financial disclosureNo financial support was received in relation to this study/article.