Deep negative volume segmentation | Scientific Reports

Our study began from the following simple question while we were performing a very tedious manual annotation of a compound three-dimensional (3D) structure. Q: Instead of finding the exact contours that circumscribe the 3D object, can we segment the air that fills the gaps within its parts? What deep neural network architecture would accomplish that, given the gaps are the absolute complements to the annotation labels? To find answers, we geared up with the most complex 3D object we could find.

Some of the most structurally complex objects in the human body are indisputably the joints, in general, and the temporomandibular joint (TMJ), in particular. TMJ is a bilateral joint formed by the mandibular and the temporal bones of the skull, differing from the other joints anatomically and functionally^1,2. TMJs enable functions like chewing and speaking. Several medical research groups still actively debate trying to explain the kinetic function of the TMJ joint, its multiple degrees of freedom, and even its relation to a plethora of known illnesses (maxillofacial ones and beyond^2,3). Accurate interpretation of TMJ images has become essential in a variety of clinical practices, ranging from the basic assessment of wear and tear (e.g., osteoarthritis) to intricate surgical interventions (e.g., arthroplasty). The lack of trustworthy automation of the basic diagnosis-assisting routines (such as tendon segmentation or a measurement of the cartilage wear) stems from the fact that such compound joints have extremely intricate 3D anatomy and a variety of surrounding tissues of perplexed morphologies and textures⁴. We show a number of 3D examples of the TMJ’s complex geometries in the supplementary material.

Millions of people suffer from temporomandibular disorders (TMDs), having such symptoms as a limitation or a deviation of the range of the jaw’s motion, certain TMJ sounds, associated headache, and the very pain in the joints. Orthodontic, maxillofacial, and plastic surgeries point to the other large related cohort of patients. Despite being that common, the diagnostics of all of the mentioned TMJ symptoms remains very challenging⁵, and the current clinical practice entails very rudimentary linear or 2D measurements of the joint’s tissues. Such measurements have obvious shortcomings: they are subjective, time-consuming, and not accurate enough due to the in-plain estimations. In fact, significant outcome differences were reported when TMJ is measured in 2D vs. in 3D⁶. True 3D characterization of TMJ in medical images is essential for improving various clinical practices, including dentistry, orthodontics, maxillofacial and plastic surgeries.

Manual 3D annotation of the TMJ is usually undertaken only by the top hospitals, requiring expertise of the maxillofacial doctors, that of a 3D modelling technician, and a long collaborative effort to draw a fitting 3D model of the jaw and of the other head parts involved⁷. In fact, there is simply no standardized annotation workflow for contouring the TMJ structures even manually today. This manuscript proposes a new protocol for such an annotation and proposes a method for its end-to-end automation in clinical use.

Medical background

Joint health assessment

Joint health assessment is essential in many clinical practices, ranging from basic orthopedics to complex maxillofacial and plastic surgeries^8,9,10. While different metrics of the health of the inter-articular space have been proposed, the exact definition of the joint space boundaries is still a matter of debate (see, e.g., wrist¹¹, knee¹², or hip¹³). Conventionally, the diagnosticians resort to basic in-plain measurements of the linear dimensions between some anatomic reference points in the radiological scans to assess the health of the joint⁶. Several recently proposed automation techniques^14,15,16 demonstrated robustness and reproducibility required for expanding the assessment to 3D, still confirming the disagreement in the definition of the joint space volume of interest, which could be attributed to the vague borders between the soft and the connecting tissues as well as their intricate texture and anatomic structure¹⁷. The current practices indicate the need for a robust and repeatable joint space assessment method that would operate both volumetrically and automatically.

TMJ space specifics

For TMJ space, this demand is especially well-articulated, because the proper joint space is required for the normal free movement of the jaw (or the mandibular condyle) and the movement of the articular disc within the joint. The widening or narrowing of the joint space may point to some type of TMJ pathology, whereas the difference between the left- and the right-side joint spaces is the main cause of facial asymmetry, even if the bones themselves remain symmetrical⁵. Moreover, the development of the TMJ space is highly individualized, making a comparison between the patients difficult¹⁸. Another unanswered question in the TMJ community is the definition of the “ideal” mandibular condyle position, stimulating the debates between gnatologists and orthodontists and affecting the development of a single joint health assessment standard¹⁹. Thus, the high variability across different patient cohorts⁴, the lack of agreement on the joint’s ‘home’ position, and the lack of a proper joint space assessment standard, hinder the application of modern data-dependent deep learning tools to address the challenge.

Current clinical TMJ space assessment standards and metrics

Because of the complexity of TMJ, the 2D slice-by-slice visualization is insufficient for finding the cause of a given symptom, requiring a true 3D reconstruction to describe its anatomy. Yet, many doctors have to resort to rudimentary linear measurements of the objects in the 2D scans. Among the currently used metrics for TMJ examinations are the horizontal condylar angle (HCA), sagittal ramus angle (SRA), medial joint space (MJS), lateral joint space (LJS), superior joint space (SJS), anterior joint space (AJS), and the width/depth of mandibular fossa (FW, FD)²⁰. Being selected by the eye and being based on imprecise reference points, these metrics can only depict the 2D representation of the 3D pattern. In our work, we suggest to consider the comprehensive volumetric measures instead, such as the volume and the surface area of the joint space, proposing the most complete morphological and topological description of the TMJ.

Technical background

Object localization on medical scans

Automatic localization of objects of interest is a prerequisite for many medical imaging tasks, as it can narrow down the field of view to the important structures. As of today, there are several approaches for detecting specific areas of various shapes and sizes such as body parts, bone tissues, organs, nodules, and tumors in 3D MRI and CT images^{21,22,23,24,25,26}. Completely autonomous cropping in medical images has been reported²¹. It is a common practice to use a cascaded approach, consisted of several steps: object localization and object segmentation or another required action. The first step is to localize the object from the entire 3D scan, and then provide a reliable bounding box for the more refined steps²⁷, Mask R-CNN²⁸, 3D RoI-aware U-Net²³, segmentation-by-detection¹³, etc.).

Medical image segmentation

With the advent of artificial intelligence to medical image computing, a wide range of image segmentation challenges were successfully tackled by deep learning methods (see Refs.^29,30,31,32 for review). In particular, significant advances were made by the architectures based on the Convolutional Neural Networks (U-Net^33,34, V-Net³⁵, U-Net++³⁶, MD U-Net³⁷, Stack U-Net³⁸, etc.). Among many anatomical objects that have been drawn to the focus of the segmentation challenges, the human bones have remained the subject of active research^39,40. Modern high-resolution imaging⁴¹ and the segmentation approaches enabled thorough quantitative studies which nowadays help assess changes in the bone structure⁴² and porosity⁴³.

Of specific value to our task, are the 3D U-Net³⁴ and the attention-gated 3D U-Net⁴⁴ architectures that take advantage of efficient GPU computing, the ability to achieve high precision with a fewer training samples, and the capability of “learning where to look” with the class-specific pooling⁴⁵. To automate the negative volume segmentation task, we first needed to segment the major bones (mandibular and temporal bones), which eventually draw us to select the V-Net architecture³⁵. V-Net is similar to 3D U-Net but is more prone to convergence thanks to learning the residual function along the way. The summary of the architecture selection is covered in “Mandibular condyle and temporal bone segmentation” section. Once the bone segmentation was automated, we proceeded with the segmentation of the space between the bones. For that, we introduced a new inflation procedure that gradually fills the space between the inner structures of the joint until the entire negative volume is occupied. The inflation procedure and the full segmentation pipeline are described in “Automatic pipeline: segmentation of negative volume” section.

Mesh inflation

Deformation, inflation or deflation are commonly employed in complex 3D reconstruction problems to boost the model quality by detailing the meshes. Modern physics-based mesh deformation and generation methods, combine robust constraint optimization and efficient re-meshing⁴⁶, which proved useful in medical imaging^47,48 but still requires additional evaluation of the nesting feasibility criteria, often viewed as constraint optimization problems for meshes⁴⁹.

Contributions

The key contributions of our paper are the following:

New paradigm for segmentation of the ‘air gaps’ within complex 3D objects (the concept of “Negative Volume”) using a deep neural network.
New manual annotation workflow for negative volume segmentation in the human joints. It is multiple orders of magnitude more descriptive than current clinical standard.
First automatic end-to-end pipeline for extraction of negative volumes within a human’s joint, incorporating deep learning-based localization, segmentation, and surface mesh inflation.
New volumetric measure of a joint’s health based on its symmetry properties via the state-of-the-art topological cloud-to-cloud metrics.

In this work, we propose a new workflow, by suggesting to shift the focus from the segmentation of the hard-to-contour anatomical structures within the joint to the segmentation of the spaces between these structures (the gaps). We have called the method “negative volume” reconstruction and presented a new method of manually annotating such a volume in “Manual annotation pipeline: negative volume concept” section. Also, we present an end-to-end pipeline for extracting deep negative volumes from the CT scans to automate and to improve the manual one. Our fully-automatic 3D deep negative volume segmentation/reconstruction approach is described in “Automatic pipeline: segmentation of negative volume” section.

Source link