Home Dental Radiology Optimizing dental implant identification using deep learning leveraging artificial data

Optimizing dental implant identification using deep learning leveraging artificial data

by adminjay


Study design

The central aim of this study was to assess the effects of supplementing the learning model with artificial implant images generated via a CNN-based deep learning model. This was rooted in a deep learning framework utilizing CNN, with supervised learning serving as the primary learning approach.

Ethics statement

The Institutional Review Board (IRB) of Kagawa Prefectural Central Hospital Ethics Committee granted approval for this study (approval number: 894), which was designed as a non-interventional, retrospective investigation. The research employed fully anonymized data, thereby the need for informed consent was waived by the ethics committee of Kagawa Prefectural Central Hospital. It was established that neither patient identities nor their image information would be disclosed, nor would such information be shared with other research institutions unless specifically requested by the research subjects or their representatives. The study adhered to the guidelines of the Declaration of Helsinki, and all the protocols were approved by the IRB.

Data set and preprocessing

Implant data embedded in the patient

This investigation utilized two types of panoramic radiography equipment, namely, AZ3000CMR and Hyper-G CMF (ASAHIROENTGEN IND. Co. Ltd., Kyoto, Japan), to acquire digital dental panoramic radiographs. All clinical digital image data were output in a tagged image file format (TIFF) (with dimensions of 2964 × 1464, 2804 × 1450, 2776 × 1450, or 2694 × 1450 pixels) using the Kagawa Prefectural Central Hospital Image Storage and Communication System (HOPE Dr ABLE-GX, FUJITSU Co., Tokyo, Japan). Identification and labeling of dental implant brands were performed based on electronic medical records and dental implant usage ledgers. Panoramic tomographic images were created using imaging techniques specialized for image diagnosis in the dental field. This imaging technique is unique in that it moves a pair of X-rays and a detector on a trajectory that follows the shape of the subject’s jaw. Because the X-rays enter the X-ray detector in the form of a fan beam, almost no blurring occurs in the image of objects near the detector. Therefore, by creating a good trajectory and rotating around the jaw, the object can be clearly imaged on the detector side. Based on this principle, an X-ray transmission image was created, and the values were digitized to create a panoramic X-ray image. The voltage and current of the tube that generated X-rays were automatically adjusted for each patient and were in the ranges of 60–86 kV and 2–12 mA, respectively. The detected signal was digitized with a sampling interval of 0.1 mm and concentration resolution of 10 bits and converted to the required digital image size.

For the deep learning analysis, digital panoramic X-ray images from the picture archiving and communication system (PACS) was imported into Photoshop Elements (Adobe Systems, Inc., San Jose, CA, USA). All dental implants within the images were manually aligned by oral and maxillofacial surgeons in a manner similar to those reported in previous papers6,7. These surgeons, also responsible for the preprocessing and cropping of images, were not informed in advance about the specific brand of the implanted devices. The clinical dataset included images from all stages of treatment such as implant fixtures, fixtures with healing abutments, provisional prostheses, and final prostheses; hence, all images encapsulating every treatment stage were included. These cropped images were saved in a portable network graphics (PNG) format.

Dental implant images

Image of the implant region of interest

This study focused on ten prevalent dental implant fixtures amenable to three-dimensional (3D) scanning, extensively used at the Kagawa Prefectural Central Hospital (Tables 1 and 2). The scanning was performed using a KaVo LS 3 desktop scanner (Bochum, Germany).

Table 1 Description of 10 types of dental implant systems.
Table 2 Distribution of implant brands used in the study.

Creation of artificial X-ray implant images

3D data of implant

To create artificial X-ray images for training the classification model, we used 3D data acquired from the surface scanning of a pre-implantation implant fixture using a dental 3D scanner. Only the aforementioned 10 types of fixtures were included in this 3D data, which were formatted in stereolithography (.STL format).

Jawbone area image

After the acquisition of the dental panoramic radiograph, an image—termed here as the jawbone peripheral area image—was created. This image cut out the region displaying the tissue surrounding the jawbone where the implant might be embedded. Forty maxillary and twenty mandibular region images of the jawbone area, obtained from 40 panoramic dental radiographs, were used while maintaining a 1:1 aspect ratio.

Creation of artificial X-ray images

All artificial image creation steps described in Sect. 2.4.2.3 to 2.4.2.6 were performed in Python (version 3.7.10). An artificial X-ray image was generated from 3D implant data, based on X-ray imaging principles. This process simulated a condition where the implant interior was filled with a homogeneous metal, with parallel pseudo-X-rays incident on it (Fig. 1). For details, artificial X-ray images were created using Steps (i) and (ii) described below.

Fig. 1
figure 1

Schematic of the artificial X-ray imaging process: (a) 3D implant data; (b) resultant artificial X-ray image; (c) simulation of pseudo-X-rays with parallel lighting. We reproduced X-ray photography, based on its underlying principles, on a computer and created artificial X-ray images from the 3D data of the implant. X-rays were simulated as parallel light pseudo-X-rays.

Calculation of the thickness of 3D data

To create an artificial image from 3D data, the thickness of the 3D data had to be calculated. Initially, as shown in Fig. 2, the 3D data were placed in a coordinate space such that the center of gravity was set as the origin. Next, the xz plane was segmented into grids, and straight lines extending from the grid points in the positive y-axis direction were considered pseudo-X-rays. The pseudo X-rays were emitted within the ranges − 80 ≤ x ≤ 80 and − 80 ≤ z ≤ 80, and their intersection with the 3D data surface was determined using Tomas Möller’s intersection detection algorithm8. The thickness of the 3D data was derived from these intersection points, facilitating the creation of a 160 × 160-pixel image.

Fig. 2
figure 2

Computation of 3D data thickness: (a) positioning in coordinate space; (b) generation of pseudo-X-rays; (c) calculation of the intersection points between pseudo-X-rays and the 3D data surface. After arranging the 3D data on the coordinate space such that the center of gravity is the origin, the xz plane was divided into a grid, and the straight lines extending from the grid points in the positive direction of the y axis were defined as pseudo-X-rays. The intersection between the pseudo-X-ray and 3D data surface was determined. The thickness of the 3D data was calculated from the intersection points, and an implant image was created.

Determination of pixel values considering the attenuation of X-rays

Artificial X-ray imagery was generated employing the thickness of the 3D data calculated in the prior segment (refer to Fig. 3). The X-ray intensity I after passing through a material of thickness x was determined using Eq. (1). This calculation assumed an initial intensity I_0 of 1 and an X-ray attenuation coefficient µ of 0.04 for the material. Equation (2), derived from Eq. (1), was used to calculate the pixel value p in the artificial X-ray image. The X-ray attenuation coefficients were set by visual confirmation during imaging.

Fig. 3
figure 3

Determination of pixel values considering attenuation of X-rays.

$$:I={I}_{0}exp(-mu:x)$$

(1)

$$:p=left(1-{I}_{0}exp(-mu:x)right)times:{(p}_{max}-{p}_{{x}_{min}})+{p}_{{x}_{min}}$$

(2)

px_min =125: Pixel value of artificial X-ray image when the thickness x of the 3D data is minimum (Average of pixel values of four corners of 7946 implant images).

pmax=255: Maximum pixel value of artificial X-ray image.

Creation of images with various angles

To account for potential variations in implant placement, the 3D data were rotated along all three axes, resulting in the generation of 125 images per product across a range of angles. Here, the rotation angles for the x-, y-, and z-axes are denoted as θ_(x-axis), θ_(y-axis), and θ_(z-axis), respectively. The boundaries for these angles were set as − 20°≤θ_(x-axis) ≤ 20°, 0°≤θ_(y-axis) ≤ 45°, and 0°≤θ_(z-axis) ≤ 288°. The original image was then flipped both vertically and horizontally, resulting in the creation of 500 unique images (Fig. 4). For the rotation angle, the x- and y-axes were based on the images of panoramic radiographs acquired in clinical practice. The rotation angles were respectively determined in five equal increments within each range. The confirmed image with the largest tilt was set as the standard. For the z-axis, images from all directions were created at five levels: 0, 72, 144, 216, and 288°.

Fig. 4
figure 4

Generation of images at various embedding angles.

Creating a background for artificial X-ray images

To enhance the realism of the artificial X-ray image, the background was constructed from the imagery of the surrounding area of the jawbone. Prior to background synthesis, a preprocessing step was undertaken. An image wherein the pixel value of the artificial X-ray image background was set to 0 was created for background synthesis. Additionally, this image was binarized, and a separate mask image with edge extraction was produced. This mask image was later used in post-processing to integrate the implant body with background.

Subsequently, the created jawbone peripheral region image was resized to 160 × 160 pixels, matching the dimensions of the artificial X-ray image. The two were then synthesized based on Eq. (3). The coupling section of the implant body with the superstructure in the artificial X-ray image was synthesized using an image of the area surrounding the jawbone.

$$:{p}_{{i}_{s}}=left{begin{array}{c}{0.7p}_{{i}_{a}}+{0.3p}_{{i}_{j}},:{p}_{{i}_{a}}ne:0\:{0.3p}_{{i}_{a}}+{0.7p}_{{i}_{j}},:{p}_{{i}_{a}}=0end{array}right.$$

(3)

pis: pixel value of the composite image of the peripheral jawbone area and artificial radiographs.

pi.a.: pixel value of artificial X-ray image.

pij: pixel value of jawbone surrounding area image.

Upon the synthesis of the background, the artificial X-ray image underwent post-processing to seamlessly blend the implant body with the background. Utilizing the mask image, the pixel value at the boundary between the implant body and the background in the artificial X-ray image was calculated as the average of the neighboring four pixels, as outlined in Eq. (4).

$$:{i}_{left(x,yright)}=average({i}_{left(x-1,yright)},:{i}_{left(x+1,yright)},{i}_{left(x,y-1right)},{i}_{left(x,y+1right)})$$

(4)

The comprehensive process for creating the background for artificial X-ray images is depicted in (Fig. 5).

Fig. 5
figure 5

Creating a background for artificial X-ray images. To increase the realism of the artificial X-ray images, the background was constructed from images of the surrounding jawbone. Prior to the background composition, a background composition was created for preprocessing. This image was then binarized, and another mask image was created with edge extraction. The implant in the artificial X-ray image was synthesized using images around the jawbone. During background synthesis, the artificial X-ray images were post-processed to seamlessly fuse the implant body and background.

Processing for implant image reproduction

Within the cropped images, the reference dimensions for the rectangular area were determined based on the y-axis rotation angle of the 3D data. The image was then randomly cropped such that the width varied by ± 10 pixels from the reference size and the height varied by ± 5 pixels from the reference size. The details of the cropping parameters are presented in (Table 3).

Table 3 Image cropped reference size.

To introduce horizontal blur into the image, we leveraged Fourier transform. We created a filter that randomly assigned a pixel value of 255 within the (1–4) pixel range from the upper left and upper right while all other pixel values were set to 0. Both the artificial X-ray image and the filter underwent Fourier transformation, and the product in the spatial frequency domain was obtained. The result was then subjected to an inverse Fourier transformation to yield an image with added horizontal blur.

Contrast adjustment was performed using gamma correction. The pixel values were manipulated in accordance with Eq. (5). The γ variable was randomly determined within the range of 0.2 ≤ γ ≤ 1.5. The steps involved in adding horizontal blur and adjusting contrast are shown in (Fig. 6). Through these processes, with and without horizontal blurring and contrast adjustment, we augmented the total number of artificial X-ray images from 500 to 2000 per product.

Fig. 6
figure 6

Application of horizontal blur and adjustment of contrast levels. Fourier transform was used to generate horizontal blur in the image. The Fourier transforms of the artificial X-ray image and the filter were performed, and their product was calculated in the spatial frequency domain. By performing the inverse Fourier transform of the result, an image with horizontal blur was obtained. Gamma correction was used to adjust the contrast of the image.

$$:{I}^{{prime:}}={I}_{max}times:{left(frac{I}{{I}_{max}}right)}^{frac{1}{gamma:}}$$

(5)

I: Pixel value before conversion.

Imax=255: Maximum pixel value.

I’: Pixel value after conversion.

CNN model architecture

ResNet, a classification CNN model developed by He et al. in 2015 was selected for this study9. ResNet features a structure consisting of several stacked residual blocks. Within a residual block, the input can bypass CNN processing and directly reach the next layer via a shortcut connection. This mitigates the issue of gradient loss and yields a suitable CNN model for learning, even in deep networks.

Although ResNet50 is a basic CNN model, it is useful as a classifier. This study aimed to verify the accuracy of the classifier by adding artificial images6. ResNet50 was selected because it provided stable classification results in previous implant classification models.

To build a more effective deep learning model, we employed fine-tuning to retrain some weights of an existing trained model. The CNN model used in this study was fine-tuned using the ImageNet database of natural images. Implementation of the deep learning classification task process was performed using Keras (version 2.2.4), TensorFlow (version 1.15.2), and Python (version 3.7.10).

A schematic of this study is shown in (Fig. 7).

Fig. 7
figure 7

A schematic of this study.

Model training and procedure

In this study, we utilized three distinct datasets:

Dataset A

This is a compilation of the implant-only data obtained from radiographs implanted into human jawbones.

Dataset B

This dataset is an augmentation of Dataset A, supplemented with artificial X-ray images that did not undergo background processing. For the training data in Dataset A, 207 artificial X-ray images per product were randomly incorporated.

Dataset C

This dataset was formulated by enhancing Dataset A with artificial X-ray images that had undergone various background processing procedures. Analogously to Dataset B, 207 artificial X-ray images per product were randomly inserted into the training data from Dataset A.

We employed k-fold cross-validation (k = 10) to generalize model training, ensuring that all data except the test data was used in the training algorithm10. The dataset was randomly partitioned into 11 sections, ensuring an even distribution of each product; nine sections were allocated as training data, one as validation data, and one as test data (Fig. 8). Here, an under-sampling method was used to align the number of training and validation data for each product to the number of images for the smallest product (Straumann Bone Level: 207 training data, 23 validation data). Each validation fold was independent of other training folds and was utilized to assess the training state. After completing one cycle of model training, we conducted ten similar validations using the test data.

Fig. 8
figure 8

Representative breakdown of experimental data across each dataset.

All the deep learning classification models were executed on a 64-bit Ubuntu 18.04.5 LTS operating system (Canonical Ltd., London, UK) supported by an NVIDIA GeForce RTX 2060 SUPER 8 GB graphics processing unit (NVIDIA, Sta. Clara, CA, USA). The parameters of the optimizer, weight decay, and momentum were maintained consistent across all models. A stochastic gradient descent was used as the optimizer, and the weight decay with momentum was set at 0.9. The learning rate was fixed at 0.001. All models underwent training for a maximum of 400 epochs with a mini-batch size of 32. This process was replicated ten times for each dataset, using distinct random seeds.

Performance metrics

The performance of the deep learning classification of each dataset was evaluated based on accuracy, precision, recall, specificity, and F1 score. Each performance metric for the complete model was calculated as an average of the 10-fold cross-validation results using Eqs. (3)– (7). Additionally, we adopted the area under the curve (AUC) and receiver operating characteristic (ROC) metrics for performance evaluation.

In these equations, TP, FN, TN, and FP represent true positives, false negatives, true negatives, and false positives, respectively.

$$:text{A}text{c}text{c}text{u}text{r}text{a}text{c}text{y}=frac{text{T}text{P}:+:text{T}text{N}}{text{T}text{P}:+:text{F}text{P}:+:text{T}text{N}:+:text{F}text{N}}$$

(3)

$$:text{P}text{r}text{e}text{c}text{i}text{s}text{i}text{o}text{n}=frac{text{T}text{P}}{text{T}text{P}:+:text{F}text{P}}$$

(4)

$$:text{R}text{e}text{c}text{a}text{l}text{l}=frac{text{T}text{P}}{text{T}text{P}:+:text{F}text{N}}$$

(5)

$$:text{S}text{p}text{e}text{c}text{i}text{f}text{i}text{c}text{i}text{t}text{y}=frac{text{T}text{N}}{text{T}text{N}:+:text{F}text{P}}$$

(6)

$$:text{F}1:text{s}text{c}text{o}text{r}text{e}=2times:frac{text{p}text{r}text{e}text{c}text{i}text{s}text{i}text{o}text{n}:times::text{r}text{e}text{c}text{a}text{l}text{l}}{text{p}text{r}text{e}text{c}text{i}text{s}text{i}text{o}text{n}:+:text{r}text{e}text{c}text{a}text{l}text{l}}$$

(7)

Visualization of each dataset feature by dimensionality reduction algorithm

To comprehend the factors contributing to the discrepancies in accuracy among the datasets, we visualized the feature values of the images input into the model trained on each dataset. The features derived from the implant classification model underwent dimension reduction from 2048 to two dimensions using the Global Average Pooling layer and t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm11, which is a dimensionality reduction algorithm for reducing high-dimensional data to two- or three-dimensional data.

Statistical analysis

We statistically evaluated the classification performance of each dataset. All data were examined from nine replicates utilizing the JMP statistical software package version 16.1.0 for Macintosh (SAS Institute Inc., Cary, NC, USA). Statistical significance was designated at p < 0.05 for all analyses. We performed nonparametric tests based on the results of the Shapiro–Wilk test. We used the Wilcoxon test to calculate differences in the classification performance between the respective datasets for each performance metric. When testing more than three groups, we corrected the significance level using the Bonferroni correction to minimize type I errors. We calculated effect sizes as Hedges’ g (unbiased Cohen’s d) using the following equations:

$$:Hedges{prime:}:g=frac{{|M}_{1}-{M}_{2}|}{s}$$

(8)

$$:s=sqrt{frac{{(n}_{1}-1){s}_{1}^{2}+({n}_{2}-1){s}_{2}^{2}}{{n}_{1}+{n}_{2}-2}}$$

(9)

Here, M1 and M2 represent the average performance metrics for the respective datasets, s1 and s2 denote the standard deviations of the datasets, and n1 and n2 indicate the number of analyses for each dataset. The determination of the effect size was based on criteria proposed by Cohen and extended by Sawilowsky12: an effect size of 2.0 or more is enormous, 1.0–2.0 is a very large effect, 0.8–1.0 is large, 0.5–0.8 is moderate, 0.2–0.5 is small, and 0.2 or less is very small.



Source link

Related Articles