This is a retrospective rater-based study on agreement of measurements of the maxilla and mandible in CBCT images obtained in patients before orthodontic treatment. It was conducted, analysed, and reported in accordance with the Guidelines for Reporting Reliability and Agreement Studies (GRRAS) . An initial study protocol was prepared, including data collection, raters and statistical analyses. The protocol was discussed and accepted by the raters.
Pre-treatment CBCT scans from 450 patients, females over 15 years and males over 16 years, referred for treatment during 2008–2013 to a private clinic for orthodontics and oral surgery in Scandinavia were available. Either CBCT scans from individuals with missing permanent teeth, other than third molars, periodontal disease visually detected on the radiographs, major asymmetries of the jaws or previous orthodontic treatment was excluded.
Radiography and categorization of subjects
CBCT examinations were performed using an i-CAT CBCT 17–19 (Imaging Sciences International, LLC 1910 N Penn Road, Hatfield, PA 19440, US). The patients were seated in an upright position during scanning. With the aid of laser markers, the midsagittal and occlusal planes were adjusted perpendicular to each other. Field of view (FOV) was set to 16 cm × 13 cm with a voxel size of 0.3 mm. Exposure was set at 120 kVp and 18.54 mAs with a scanning time of 17.8 s. Calibration of this machine was regularly performed according to the manufacturer’s requirements twice a year.
Lateral head images were generated from the CBCT scans using the i-CAT software program. Cephalometric analysis of lateral images was done using the computer software program Total Interactive Orthodontic Planning System  (TIOPS, www.tiops.com). Mouse-click on the points of landmarks was used to classify subjects into three groups based on their craniofacial height using the angle of the lower mandibular border (Mandibular line, ML) in relation to cranial base (Nasion-Sella line, NSL). The inclination of the angle formed between the NSL line and the ML line was used to categorize the subjects into the following: low-angle < 27°, average/normal-angle 27–37° and a high-angle group > 37°. After identifying 60 individuals in the low-angle group, this number of scans was set as the limit for the number to be included in the normal- and high-angle group for equal comparisons giving a total of 180 subjects, as described previously (16).
Using i-CAT Vision software (Imaging Sciences International, Hatfield, Pennsylvania, USA), a fully reconstructed three-dimensional image with sagittal, coronal, and axial slices was generated.
Raters and rating (measurements)
Five raters performed measurements on the CBCT images. Of the raters, one is a specialist in oral and maxillofacial radiology (with 29 years of experience), and one is a post doc in oral and maxillofacial radiology (with 5 years of experience). Furthermore, the raters consisted of one specialist in oral and maxillofacial surgery (with 16 years of experience), one resident at the same department and one general dental practitioner. All raters were aware of the purpose of the study and performed the same measurements independently of each other. Prior to the measurements, an information session and calibration exercise took place with all the raters, and the assessment instructions were specified both verbally and in writing. Thus, the instructions were provided to all the raters. All raters were familiar with handling CBCT images.
All measurement sessions took place in the same room and a BARCO (MFGD 1318; BARCO, Kortrijk, Belgium) 18.10 greyscale liquid crystal display monitor was used with a luminance of 400 cd/m2 and resolution of 1280 × 1024 pixels. The observation room was dimly lit and kept constant below 50 lx as recommended by American Association of Physicists in Medicine Task Group 18 . The distance to the screen was approximately 50 cm. There was no restriction on the observation time. The raters were allowed to use the zooming tool. All raters were blinded to clinical features such as craniofacial height and sex.
Before beginning to measure, raters had the possibility to adjust for small deviations in the patient´s head position during exposure by re-aligning the skull through an adjustment of the images in the sagittal, coronal and axial planes, respectively. The nasion line of the subject was oriented horizontally prior to measurements in the maxilla. For mandibular measurements, the mandibular base line was set horizontally. For every group of patients (low, normal, high angle) 3 sites (molar, premolar, and midline region) in maxilla and mandible, respectively, were measured by each rater in rotation (Fig. 1a). Sites were chosen within the three groups of patients to obtain an even distribution between molar, premolar and midline regions. Measurements were performed, with one height and two width measurements between the teeth at selected cross-sectional sites (Fig. 1b). The measurements were performed using the measurement tools in the software program i-CAT vision. For calculation of intrarater agreement, 10% of the sites were randomly selected in IBM SPSS software (version 22.0; IBM Corp Armonk, NY, USA) and measured by all raters in a second session after approximately 2 months.
The measurements were simultaneously and manually documented in an Excel (Microsoft Office Excel® 2010; Microsoft Corporation, Redmond, WA) file by the responsible researcher.
All computations necessary for the statistical analysis were performed using IBM SPSS software (Version 22.0; IBM Corp Armonk, NY, USA). For all variables, the three groups (low, normal, high angle) were compared using a one-way analysis of variance with a Tukey post hoc test. A significance level of 5% was used in all comparisons.
Inter- as well as intrarater agreement of measurements in selected cross-sectional sites was calculated as intra-class correlation coefficients (ICCs 2.1) with 95% confidence interval (CI). Only measurements from the first measurement session performed by each rater were used to calculate interrater ICC. The level of agreement was interpreted according to the guideline proposed by Koo and Li  as follows: < 0.50, poor; between 0.50 and 0.75, fair; between 0.75 and 0.90 good; above 0.90, excellent agreement.