Various causes of failure have been reported in various aspects of periapical radiography with the bisecting and parallel techniques, including the suitability of horizontal and vertical projection angles, the appropriateness of receptor positioning, and the presence or absence of cone cutting [1, 12,13,14,15,16]. Accordingly, we comprehensively evaluated these aspects and classified images of canines into two categories to determine their ground truth quality. The exposure conditions were not taken into account, because small inadequacies could be remediated using image processing in digital systems.
As for tooth segmentation on periapical radiography, Ronneberger et al.  reported relatively low recall, precision, and F measure values for upper and lower molar segmentations using a U-net architecture (0.747, 0.453, and 0.564, respectively). Contrarily, the present results showed good segmentation performance (recall, precision, and F measure values of 0.937, 0.961, and 0.949, respectively). This discrepancy can be partially attributed to the differences in root configuration between the teeth, indicating the difference in the number of roots per tooth. However, there have been some reports in which all types of teeth, including the maxillary canines, were segmented on panoramic radiographs [6, 7, 18]. Leite et al. reported good performance at segmenting the maxillary canines (recall, precision, and F measure of 0.969, 0.964, and 0.973, respectively) . Lee et al. also reported high segmentation accuracy of 0.889 for the maxillary canines . In spite of the difference in the modalities used, the present results support those of the other reports about maxillary canine segmentation on radiographs.
In our previous studies evaluating the classification performance on panoramic radiographs, relatively small areas, such as those of the maxillary incisor and maxillary sinus [3, 19], were cropped from areas of entire panoramic radiographs, and good performance was verified. The learning models in those studies were created without segmentation. Therefore, model 1 was created without the segmentation process to compare its performance to that of model 2, which was created with the segmentation process. As a result, the model’s classification performance (measured as AUC) was significantly improved by including the segmentation step before classification. This means that we should try to perform segmentation before classification when classification performance would otherwise be insufficient.
Some of the classification failures observed in our dataset might have been caused by segmentation failures, as the technical quality was generally assigned as bad when a tooth other than the target canine was painted as a canine. When the root apex of the maxillary canine could not be sufficiently segmented, the DL model might have classified the image as bad quality owing to recognizing the result as shortening of the root. Therefore, the model’s classification performance could be improved if the segmentation performance could be improved.
The present study has several limitations. First, the causes of failure could not be definitely identified, because the radiographs were classified on the basis of overall suitability. For self-assessment purposes for students and residents, it is desirable to build a system that can separately clarify the causes of failure. Second, for the evaluation of large numbers of images in the field of education, false classification of truly good-quality images into the bad category should be avoided. Although the classification was performed with only two categories in the present study, three categories (i.e., good, undecided, and bad quality) might be better if the undecided images can be reevaluated by the instructors. Third, phantom images were used in addition to patient images, because there were not enough images of poor quality in the database. Although actual cause was unclear, mixing patient and phantom images might affect quality evaluation. Four, the number of datasets was too small to generalize the results, and only the canines were evaluated. In future investigations, a system that can evaluate the quality of all teeth should be developed with larger datasets including various pathologies, such as deep cares, periapical lesion and root fracture.
In conclusion, we confirmed a potential application of DL systems in the evaluation of the technical positioning quality of intra-oral radiographs using segmentation and classification techniques.