Ethical aspects
The research ethics committee of University Hospitals Leuven granted ethical approval for this investigation before its initiation (protocol number: S67798). The current study adheres to the ICH-GCP principles and the World Medical Association Declaration of Helsinki on medical research. Considering the anonymization of patient data conducted before any analysis, informed consent was not required.
Dataset
Retrospectively, a set of 111 CBCT scans was retrieved from the Dentomaxillofacial Imaging Center database of UZ Leuven University Hospital in Leuven, Belgium. These scans were acquired for different purposes unrelated to the current investigation, such as endodontic treatment planning, implant planning, and oral and maxillofacial surgery procedures. The dataset was acquired using two CBCT devices: 3D Accuitomo 170 (J Morita, Kyoto, Japan) and NewTom VGi evo (Cefla, Imola, Italy). It is important to highlight that heterogeneous acquisition parameters were employed to obtain the CBCT scans: For 3D Accuitomo, 90 kilovoltage-peak (kVp), 5 milliampere (mA), field of view (FOV) of 8 × 8, 10 × 10, 14 × 10, and 17 × 12 cm, with a voxel size ranging from 0.125 to 0.250 mm. For the NewTom VGi EVO, the parameters comprised 110 kVp, 3 – 20 mA, FOV sizes of 8 × 8, 10 × 10, 12 × 8, 16 × 16, and 24 × 19 cm, with a voxel size ranging from 0.125 to 0.300 mm. Figure 1 summarizes the conceptual framework of the current study design.

Conceptual framework of the study design for developing and validating an AI-driven tool for the automated segmentation of maxillary premolars.
For the selection of the imaging dataset, inclusion criteria encompassed CBCT scans from patients with a complete permanent dentition and satisfactory image quality, characterized by medium levels of sharpness and contrast, and low noise levels. This approach ensured accurate delineation of pulp chambers and root canals in maxillary premolars. CBCT scans with FOV covering either the maxilla alone or both the maxilla and mandible were included. Although some scans captured both arches, the analysis focused exclusively on maxillary premolars. Including scans with different FOVs aimed to enhance the generalizability of the study findings. Scans with poor image quality, such as those affected by significant artifacts from beam hardening or movement, were excluded.
The selected CBCT scans were exported in Digital Imaging and Communication in Medicine (DICOM) format and randomly distributed into the three steps of the CNN model:
i) CNN Training (n = 55, 96 teeth): Training the AI model using manual segmentation carried out by operators as the ground truth.
ii) CNN Validation (n = 14, 16 teeth): Conducting internal validation of the AI model by optimizing parameters until the establishment of an ideal architecture.
iii) CNN Testing (n = 42, 70 teeth): Conducting the performance assessment of the AI model through the comparison of 3D models generated entirely by AI versus those obtained from refined automated segmentation (R-AI) performed by an expert.
The DICOM files were imported into the cloud-based online platform named as “Virtual Patient Creator” (Relu, Leuven, Belgium). This interactive platform offers a set of editing tools (e.g., brush, contour, and interpolation tools) for the manual segmentation of dentomaxillofacial anatomical structures on CBCT scans. Utilizing these tools enabled the precise delineation of the limits of the pulp chambers and root canals of maxillary premolars displayed in the multiplanar reconstructions of CBCT scans.
Each pulp chamber and root canal of the maxillary premolars teeth in the training and validation dataset of the CNN model underwent manual segmentation by two operators (F.S.N., and S.A.). Prior to performing manual segmentation on the ground truth sample, all operators underwent training and calibration using different CBCT scans not included in the study dataset. This calibration involved manual segmentation of 10 maxillary premolars on separate CBCT scans by the two operators (F.S.N. and S.A.) at two different time points. The segmentation results were compared to assess intra-examiner agreement (same operator at different times) and inter-operator agreement (between the two operators) using two metrics: intersection over union (IoU) and 95% Hausdorff distance (HD). Operators were considered calibrated if the IoU and 95% HD were at least 80% and 0.20 mm, respectively, for both intra- and inter-operator agreements. An oral radiologist with 8 years of experience (R.C.F.) reviewed all manual segmentations before CNN model development and made adjustments when deemed necessary. The final segmentation maps were exported in Standard Triangle Language (STL) format and later used as input for training the CNN model.
CNN network architecture
The CNN model developed in this study was built based on two sequential neural networks based on the 3D U-Net architecture. Each network comprised four contracting encoder blocks and three expansive decoder blocks. These blocks included two convolutions with a standard kernel size (3 × 3 × 3), followed by rectified linear unit (ReLU) activation and group normalization with eight feature maps. The decision to adopt a two-step method stemmed from the challenges encountered when applying CNN to CBCT scans with a large FOV10,11.
The first neural network detected approximate pulp chambers and root canals, generating an initial segmentation model. Subsequently, the second neural network refined the initial segmentation, enabling automatic segmentation of the structures of interest at full resolution. The CNN models were implemented in PyTorch, and increased robustness of the AI algorithm was achieved using data augmentation strategies within the training dataset. These strategies included elastic deformation, rotation, scaling, cropping, and mirroring.
Moreover, the CNN model underwent optimization using the ADAM optimization algorithm. This process included reducing the learning rate and implementing early stopping based on the validation set to prevent overfitting and ensure the effective performance of the CNN model. Subsequently, the finalized CNN model was implemented and made accessible on the online cloud-based AI platform called “Virtual Patient Creator”.
CNN model testing
Automated segmentation of pulp chambers and root canals of maxillary premolars was conducted using the aforementioned online platform. Each CBCT scan in DICOM format was uploaded to the platform, which then automatically segmented the pulp cavity structures for each maxillary premolar, generating individual 3D models in STL format. Additionally, the platform automatically recorded the time taken to generate the segmentation map in seconds.
An experienced oral radiologist with 8 years of experience (R.C.F.) evaluated the automated segmentations of the test set to detect and correct any errors, including oversegmentation or undersegmentation, in the AI-generated 3D models. After evaluating each automated segmentation, the operator determined that all segmentation maps required some form of minor correction.
For conducting this assessment, the resliceable axes tool within the “Virtual Patient Creator” platform was employed. By activating this tool, all CBCT multiplanar reconstructions (axial, sagittal, and coronal) were aligned to be parallel to the long axis each root canal. The brush tool was utilized to add or remove voxels in the segmentation maps, using the anatomical contour of the pulp cavity structures displayed on the CBCT reconstructions as reference. Finally, a new R-AI segmentation map of the pulp cavity structures of each maxillary premolar was obtained in STL format. A digital stopwatch was used to record the time taken on manual refinements.
Validation metrics
A voxel-level confusion matrix was applied to evaluate the performance of the developed AI tool. The AI and R-AI 3D models were compared, and four variables were derived:
- (a)
False positive (FP): voxels initially identified as part of the pulp cavity structures by the CNN model but subsequently removed by the operator during the refinement of the AI segmentation.
- (b)
False negative (FN): Voxels not initially recognized as part of the pulp cavity structures by the CNN model but later included by the operator during the refinement of the AI segmentation.
- (c)
True positive (TP): Voxels representing the actual pulp cavity structures that were accurately segmented during the automated segmentation.
- (d)
True negative (TN): Voxels not associated with the pulp cavity structures and correctly excluded from the automated segmentation.
The performance of the developed CNN model was assessed using the following accuracy metrics based on the aforementioned variable values: IoU, Dice similarity coefficient (DSC), Recall, Precision, Accuracy, and 95% HD (Table 1).
Comparison between human and AI-driven segmentations
The evaluation of AI-based automated segmentation performance involved a comparison with manual segmentation performed by a human (i.e., manual segmentation). Twenty-one teeth, constituting 30% of the test sample and including both maxillary first and second premolars, were randomly selected. An experienced endodontist with experience in CBCT image analysis (A.O.S.J.) manually performed the segmentation of pulp cavity structures using the aforementioned AI platform. The operator used the contour tool to manually outline the pulp chamber and root canal boundaries for each tooth based on the axial reconstructions of the CBCT scans.
Subsequently, the resliceable axes tool was employed to align each CBCT scan parallel to the long axis of each root canal. This allowed the operator to add or remove voxels while navigating the sagittal and coronal reconstructions of CBCT scans, facilitating the establishment of an ideal 3D model for each tooth. This task was performed twice, with a 30-day interval, to evaluate the accuracy of manual segmentation. The STL files obtained from the initial and subsequent segmentation sessions for each case were compared to calculate the accuracy metrics previously described. Finally, these manual segmentation results were compared with those obtained from automated segmentation by the CNN model for each accuracy metric. A digital stopwatch was used to record the time taken to manually segment the pulp chamber and root canal of each maxillary premolar tooth.
Time-efficiency analysis
The comparison of the time needed for segmenting the pulp cavity structures of maxillary premolars was conducted for the different methods investigated: manual, AI, and R-AI methods. This analysis utilized the same sample (n = 21) employed for assessing the accuracy of manual and AI-driven segmentation:
i) Manual segmentation: The time required for the operator to perform the manual segmentation of pulp cavity structures encompassed the duration from importing the DICOM data to the AI platform until obtaining the segmentation map.
ii) AI segmentation: The online platform recorded the time spent for the automated segmentation of pulp cavity structures until obtaining the 3D model.
iii) R-AI segmentation: The duration of manual refinements performed by the operator was recorded and combined with the time taken by the AI method.
Statistical analysis
The analysis of data was conducted using SPSS statistical software (version 24.0, IBM Corp., Armonk, NY). Descriptive data analysis involved summarizing the results with mean and standard deviation (SD) values for accuracy and time-efficiency assessment.
The normal distribution of data was verified through the Shapiro–Wilk test. For the comparison of mean accuracy metric values between the maxillary first and second premolars, the independent t-test was utilized. Similarly, to compare the performance between AI-driven and manual approaches, the paired t-test was applied. Lastly, the one-way analysis of variance (ANOVA) with the Tukey post-hoc test was conducted to compare the time needed for pulp cavity structures segmentation among the segmentation methods investigated. A significance level of 5% was adopted for all analyses.
Using GPower statistical software (version 3.1.9.2, GPower, Düsseldorf, Germany), post hoc power analyses were performed for all statistical tests performed in the study as follows: for the independent t-test, the analysis considered the difference between group means, the SD, and the sample size for each group. For the paired t-test, the mean difference between paired observations, their SD, and the sample size for each accuracy metric were considered. For ANOVA, the power analysis considered the minimum difference between groups, the within-group SD, and the number of observations per group. Based on these parameters, the statistical power achieved ranged from 70 to 99%.