This study was approved by the ethics committee of our university (No. 496) and was performed in accordance with the Declaration of Helsinki.
Panoramic images of 491 patients (214 females and 277 males) with a mean age of 8.8 years who had unilateral or bilateral CA were selected from the image database at Aichi-Gakuin University Dental Hospital. The images were collected between August 2004 and July 2020. Images obtained just before bone graft surgery for CA were used for the analysis. Among the 491 patients, 299 patients had CA accompanied by CP and were assigned to the “CP present group”. The remaining 192 patients, who only had CA, were assigned to the “CP absent group”. In the CP present group, 209 and 90 patients had unilateral and bilateral CA, respectively, whereas 174 and 18 patients had unilateral and bilateral CA, respectively, in the CP absent group. The presence of CP was confirmed by medical records and examination of computed tomography images. When the cleft was limited anteriorly to the incisive foramen on the most inferior axial computed tomography slice in which the foramen was visible, the case was assigned to the CP absent group; and when the cleft was extended posteriorly to the incisive foramen, it was assigned to the CP present group. The panoramic images were obtained using a Veraviewepocs unit (J. Morita Mfg. Corp., Kyoto, Japan), with a tube voltage of 75 kV, tube current of 8 mA, and exposure time of 16.2 s, or an AUTO III NTR unit (Asahi Roentgen Industry, Kyoto, Japan), with a tube voltage of 75 kV, tube current of 12 mA, and exposure time of 12 s.
We created two models (Models A and B) in the present study. Model A was created using a DetectNet, with both object detection and classification functions. This network has five main parts: (1) data input and data augmentation; (2) a fully convolutional network, which extracts features and predicts object classes and bounding boxes per grid square; (3) loss function measurement; (4) bounding box clustering; and (5) mean average precision calculation . The adaptive moment estimation (Adam) solver was used with 0.0001 as the base learning rate. Model B was created using a VGG-16 , which has only the classification function. These systems were created on a system running Ubuntu OS version 16.04.2 with an 11 GB graphics processor unit (NVIDIA GeForce GTX 1080 Ti; NVIDIA, Santa Clara, CA, USA). The VGG-16 and customized DetectNet were from the DIGITS library version 5.0 (NVIDIA; https://developer.ndivia.com/digits) and used in the Caffe framework.
Development and assessment of Model A
The panoramic images including whole area of the maxilla and mandible were downloaded in JPEG format and were 900 × 900 pixels in size (Fig. 1a). The datasets used in the learning and inference processes are shown in Table 1. Thirty images were randomly assigned to the test dataset and included both CP present and absent group images. In the CP absent group, only five bilateral CA images were assigned because of the small number of cases. The remaining images not assigned to the test dataset were used as training and validation data for creating the model. The training and validation data were arbitrarily selected using a ratio of approximately 80:20. Model A was created to initially detect the upper incisor area regardless of whether CP was present or absent, and thereafter, the areas were classified into two classes, namely, CP present or absent areas. The upper incisor area, where the CP actually existed or would arise, was defined as a rectangular region of interest (ROI). The bilateral superior distal ends of the ROI were set at the most distal part of the lateral walls of the nasal cavities. When the vertical position differed between the left and right sides, the higher position was chosen as the superior distal end. The inferior margin was set at the alveolar ridge between the central incisors. The coordinates of the upper left (x1, y1) and lower right (x2, y2) corners of these ROIs were recorded using ImageJ software (National Institute of Health, Bethesda, MD, USA), and they were converted to text form together with their classifications (CP present or absent; Fig. 1).
When the test data were given to the DL-based model, it predicted a rectangular box showing the incisor area. When the model classified the area as CP present, the box was colored blue, whereas it was red for CP absent areas (Figs. 2,3,4). A box was considered correctly detected when it sufficiently included the location where CAs actually existed or would arise and was limited to the upper incisor area, meaning that the lateral ends did not extend beyond the canine, the superior end did not extend beyond the orbital floor, and the inferior end did not extend beyond the tip of the central incisor.
The detection performance of the incisor area was evaluated using recall, precision, and F-measure, which are defined as follows:
Recall = number of correctly detected upper incisor areas/number of all upper incisor areas.
Precision = number of correctly detected upper incisor areas/(number of correctly detected upper incisor areas + number of falsely-detected areas).
F-measure = 2 (recall × precision)/(recall + precision).
The classification performance for correctly detected upper incisor areas was evaluated by calculating the sensitivity, specificity, accuracy, and the area under the receiver operating characteristic curve (AUC) with the CP present areas considered to be the positive class.
Development and assessment of Model B
Using the same data used to develop Model A (Table 1), Model B was created for directly classifying the panoramic images into two categories, namely CP present or absent images.
The training data were augmented to create 2600 images by adjusting image sharpness, brightness, and contrast using Irfan View software (Irfan Škiljan, Austria; https://www.irfanview.com/). The learning process was performed in 100 epochs. Thereafter, the test images were input to the developed model, which classified them as CP present or absent images. The classification performance was assessed by calculating sensitivity, specificity, accuracy, and the AUC, with CP present images considered to be the positive class.
Comparison of DL-based model and human-observer classification performance
To compare the classification performances of the models with those of the human observers, two radiologists with 5 and 6 years of experience diagnosed the same test data used in the assessment of DL-based models. They were asked to classify them into one of two categories (CP present or absent).
The differences between the AUC values of the two models and human observers were statistically assessed using the χ2 test. The significance level was set to p < 0.05.