Limited intrarater and interrater reliability of acute ligamentous ankle injuries on 3 T MRI

Objectives To determine the diagnostic reliability of the Schneck grading system for acute ligamentous injuries of (1) the three major ligamentous ankle complexes, (2) the individual ankle ligaments and (3) the Sikka classification for syndesmosis injury. Methods All acute ankle injuries in adult athletes (≥18 years), presenting to the outpatient department of a specialised Orthopaedic and Sports Medicine Hospital, within 7 days postinjury were screened for inclusion. Ankle injuries were excluded if imaging demonstrated a frank ankle fracture or if the 3 T MRI study could not be acquired within 10 days postinjury. Two radiologists graded the three major ligamentous complexes (lateral ankle complex, deltoid complex and syndesmosis complex) and their comprising individual ligaments according the four-grade Schneck grading system. Syndesmotic injuries were classified according the four-grade Sikka classification for consequent injury of the individual syndesmosis ligaments and the deltoid complex. Agreement and kappa (K) statistics were calculated to determine intrarater and interrater reliability. Results Between September 2016 and September 2018, a total of 92 MR scans were obtained (87 patients). Interrater and intrarater reliability of the Schneck grading system was moderate to substantial for the lateral ankle complex (K=0.47–0.76), fair to almost perfect for the syndesmosis complex (K=0.37–0.89) and fair to moderate for the deltoid complex (K=0.14–0.51). For the individual ligaments, kappa values ranged from moderate to substantial for the anterior talofibular ligament (ATFL) (K=0.55–0.73), fair to substantial for the calcaneofibular ligament (K=0.31–0.62) and fair to almost perfect for the anteroinferior tibiofibular ligament (AITFL) (K=0.36–0.89). Diagnostic reliability of the Sikka classification ranged from moderate to almost perfect (K=0.51–0.95). Conclusions Grading of the three major ligamentous complexes and of the individual ankle ligaments according the Schneck grading system resulted in limited diagnostic reliability. When dichotomised for the presence of complete discontinuity, the interrater reliability of the Schneck grading system improved to substantial and almost perfect for the ATFL and AITFL, respectively. Classification of syndesmosis injury according the Sikka classification resulted in moderate interrater reliability.


INTRODUCTION
Acute ankle sprains are among the most common sport-related injuries. 1 The lateral ankle ligaments are most frequently injured (0.93/1000 athlete exposures), followed by the syndesmosis (0.38/1000 athlete exposures) and deltoid ligaments (0.06/1000 athlete exposures). 2 In athletes, MRI is increasingly used for initial diagnosis and prognosis of ligamentous ankle injuries. [3][4][5] To categorise and translate these MR findings into clinical practice, standardised grading systems with high diagnostic reliability are warranted.
Diagnostic reliability of standardised grading systems for acute ligamentous ankle injuries have been described in various studies. [6][7][8] However, only the grading system used by Roemer et al included grading of injury in multiple ligamentous complexes (lateral ankle complex, deltoid complex and syndesmosis complex). 9 In this study, two radiologists determined intrarater and interrater reliability of a five-grade system for acute and chronic ligamentous ankle injury, based on 30 MR scans (1.5 T). 9 The main limitation in this study was that it reported diagnostic reliability per ligamentous complex (eg, lateral ligaments) and not per individual ligament (eg, anterior talofibular ligament (ATFL)), leaving diagnostic reliability of scoring acute injury of individual ankle ligaments on 3 T MRI unknown.
Reliability of prognostic scoring for syndesmosis injury has been evaluated in two previous studies. 7 10 In a retrospective cohort study by Howard et al, prognostic scoring of syndesmosis injury in 16 NFL players resulted in fair to almost perfect interobserver reliability. However, except for syndesmotic joint width, no association between prognostic scoring and time to return to play was established. Sikka et al evaluated a prognostic syndesmosis injury classification in a retrospective cohort study on 36 NFL players with MRI-confirmed (1.5 T) syndesmosis injury. The main limitation in this What are the new findings ► MR grading of the ligamentous complexes according the Schneck grading system and classification of syndesmosis injury according the Sikka classification resulted in slight to almost perfect reliability. ► MR grading of the individual ankle ligaments according the Schneck grading system resulted in limited reliability. ► When dichotomised for the presence of complete discontinuity, the inter-rater reliability of the Schneck grading system improved to substantial and almost perfect for the anterior talofibular ligament and anteroinferior tibiofibular ligament, respectively.

Original research
study was that it lacked evaluation of the classifications' interrater reliability. Given these two limitations, a diagnostic reliability study on grading of the individual ankle ligaments, the ligamentous complexes and classification of syndesmotic injury severity is warranted. Therefore, the aim of this study was to determine the diagnostic reliability of the Schneck grading system for acute ligamentous injuries of (1) the three major ligamentous ankle complexes, (2) the individual ankle ligaments and (3) the Sikka classification for syndesmosis injury. 10 11

Patient selection
Patients presenting to the outpatient department of a specialised Orthopaedic and Sports Medicine Hospital within 7 days after an acute ankle injury were asked to participate in this study. Inclusion criteria were: acute ankle injuries in adult athletes (≥18 years), participating in sports at a professional or recreational level and presenting within 7 days of injury. Ankle injuries were excluded if imaging studies demonstrated a frank ankle fracture or if the 3 T MRI study could not be acquired within 10 days postinjury. After clinical history and physical examination was performed by a Sports Medicine Physician or Orthopaedic Surgeon, MR images were obtained. Written informed consent was obtained from all patients at time of inclusion.

Standardised MRI grading
The MR scans were scored by two radiologists specialised in musculoskeletal radiology (JA and MA) with 11 and 3 years of experience in MSK-imaging, respectively. The two radiologists, hereafter referred to as R1 and R2, scored the lesions using a standardised scoring form. Prior to assessing the MR scans, both radiologists participated in an individual familiarisation session, followed by a joint calibration session. During a 2-hour familiarisation session, the use of the standardised score form was practised, assessing 10 ankle MR scans that were not included in this dataset. To assure accurate interpretation of the scoring form during the calibration session consensus was reached on the scoring of another 10 ankle MR scans, not included in this dataset. To assure blinding of the radiologists to the clinical findings, the MR scans were scored in presence of a postgraduate medical researcher. In order to determine intrarater reliability, one radiologist (R1) repeated the scoring process. To minimise recall bias, the radiologist repeated scoring after a period of 28 days.

Grading system for ligamentous complexes and individual ligaments
The ligamentous complexes were graded as normal (grade 0) or in accordance with the highest graded acute lesion (grade 1-3) in one of its comprising individual ligaments. All individual ligaments were graded according to the four grade Schneck grading system 11 (table 1).

Classification of syndesmotic injury
In patients with an observed syndesmotic injury, the severity of the syndesmotic injury was classified in accordance to the classification proposed by Sikka et al 10 (table 1).

Grading of ligamentous complexes
The three major ligamentous complexes (lateral ankle complex, deltoid complex (subdivided in deep deltoid and superficial deltoid) and syndesmosis complex) were graded according the four-grade Schneck grading system.

Grading of individual ligaments
The following individual ankle ligaments were graded according to the four grade Schneck grading system:

Presence versus absence of acute ligamentous lesions
To assess the intrarater and interrater reliability for the presence or absence of acute ligamentous lesions in the ligamentous complexes and individual ligaments, the MRI grading system was evaluated as dichotomous outcomes; ► Grade 0: was considered absence of an acute lesion. ► Grade 1-3: was considered presence of an acute lesion.

Presence versus absence of complete discontinuity
To assess the intrarater and interrater reliability for the presence or absence of complete discontinuity in the ligamentous complexes and individual ligaments, the MRI grading system was evaluated as dichotomous outcomes; ► Grade 0-2: was considered absence of complete discontinuity. ► Grade 3: was considered as presence of complete discontinuity.

Statistical analysis
Descriptive statistics was used to present patient demographics (age, time to MRI, sports) and the number and distribution of lesions graded by the individual observers. Continuous variables were presented as mean with SD for data with a normal distribution and as median with IQR in case of non-normal distribution. Categorical data were presented as frequencies and proportions. Intrarater and interrater reliability of the Schneck grading system 11 (ligamentous lesions; grade 0-3) and Sikka classification system 10 (syndesmosis injury; grade I-IV) were determined using linear weighted kappa statistics on an ordinal scale (K). Intrarater and interrater reliability for dichotomised data were determined using unweighted kappa statistics.
Overall agreement was calculated for dichotomous observations and weighted agreement was calculated for ordinal variables. We calculated prevalence (P) and bias index from cross tabulations for the dichotomous variables. Prevalence was defined as percentage (%) of included ankle injuries with positive findings. Bias index was defined as the extent to which the radiologists disagreed on the proportion of positive (or negative) findings. 12 Reliability was considered poor if <0, slight 0-0.20, fair 0.21-0.40, moderate 0.41-0.60, substantial 0.61-0.80 and almost perfect if 0.81-1.00. 13 Statistical analysis was performed using Statistical Package for Social Sciences (SPSS V.21.0, Chicago, Illinois, USA). Weighted agreement was calculated using Stata Statistical Software, Release 11 (Stata, College Station, Texas, USA).

Baseline characteristics
Between September 2016 and September 2018, a total of 115 acute ankle injuries (110 athletes) were assessed for eligibility (figure 4). Ninety-two ankles were included. Of these 92 imaged acute ankle injuries, 4 were subsequent contralateral ankle injuries and 1 case of reinjury (>1 year). The median age at time of injury was 23 years (IQR 20-27), with a range from 18 to 42 years. The median time from injury to MRI was 3 days (IQR 1-5). Of the 87 included athletes, 47% played football, 14% volleyball, 14% basketball, 11% futsal, 5% athletics and 7% participated in other sports.

Classification of syndesmotic injury
The distribution of the Sikka classification for both radiologists is reported in the online supplemental appendix (table 2; online supplemental appendix). Use of this classification system in patients with syndesmosis injury resulted in almost perfect intrarater reliability (k=0.95) and moderate interrater reliability (K=0.51).

Grading of individual ligaments
The distribution of acute ligamentous lesions (Schneck grades 1-3) per individual ligament, as graded by both radiologists, is detailed in the online supplemental appendix (table 2, online supplemental appendix). Grading of the individual lateral ankle ligaments resulted in substantial intrarater reliability (K=0.62-0.73) and slight to moderate interrater reliability (K=0.14-0.55). For

Original research
the individual syndesmosis ligaments intrarater reliability ranged from substantial to almost perfect (K=0.63-0.94) and interrater reliability ranged from poor to moderate (K=-0.02 to 0.56). Intra rater reliability for the deltoid ligaments ranged from fair to substantial (K=0.27-0.69) and interrater reliability ranged from slight to fair (K=0.01-0.24).

DISCUSSION
In this study, we reported the reliability of the Schneck grading system and the Sikka classification for acute ligamentous ankle injuries on 3 T MRI. Grading of the ligamentous complexes according the Schneck grading system and classification of syndesmosis injury according the Sikka classification resulted in slight to almost perfect reliability. Grading of the individual ankle ligaments according the Schneck grading system resulted in limited reliability. When dichotomised for the presence or absence of complete discontinuity, the interrater reliability of the ATFL and AITFL improved to substantial and almost perfect, respectively.

Grading of ligamentous complexes: comparison with previous literature
For grading of acute ligamentous complex injuries, only two previous studies have reported on the diagnostic reliability of a standardised grading approach. 6 9 In a prospective study Gaebler et al presented the diagnostic reliability of a grading approach to acute injury of the lateral ligamentous complex. Applied on 0.5 T and 1.0 T MRI, grading resulted in good intrarater reliability (κ=0.65) and fair interrater reliability (κ=0.40). 6

Grading of individual ligaments: comparison with previous literature
For grading of all the individual ankle ligaments, no comparable study on acute ankle injuries has been published. For the individual lateral ankle ligaments, one study investigated the reliability of 3 T MRI for acute injury of the ATFL. 14 In this study the diagnostic accuracy and diagnostic reliability for addition of the 'bright-rim sign' to the standard diagnostic criteria of ATFL injury were determined. Interrater reliability for acute injury of the ATFL varied widely, depending on the applied definition (κ=0.48-0.93).
In chronic lateral ankle ligament injuries, two studies have investigated the reliability of scoring injury to the ATFL. 15 16 Kim et al reported excellent interrater reliability (Intraclass Correlation Coefficient =0.915) for detection of presence or absence of injury to the ATFL. 15 In contrast to our study, no grading of injury severity was applied. In another study, grading of chronic injury was reported in a cohort of patients with chronic lateral ankle instability. 16 Grading on a four-grade scale for chronic injury resulted in substantial intrarater reliability (K=0.68-0.75) and moderate to almost perfect interrater reliability (K=0.55-0.87).
For acute deltoid injury, the interrater reliability on 3 T MRI has been reported in one study. 8 In this study, diagnostic reliability of MRI was investigated in a cohort of patients with lateral malleolar fractures secondary to a Supination-External Rotation trauma. The interrater reliability for partial and complete discontinuity of the deep deltoid ligaments ranged from fair to moderate (k=0.46; k=0.22), which is better than that observed in our cohort. In addition, the increased prevalence of deltoid injury in this cohort could potentially have decreased the reported kappa-values. 12 For syndesmotic injuries, three previous studies have reported on diagnostic reliability of MRI. 7 17 18 The main difference with our population is that these studies only included patients with MRI-confirmed syndesmosis injury or with acute ankle fractures and thus an increased prevalence. As increased prevalence comes with high chance agreement, the reported kappa values for syndesmotic injury might be lower than the true kappa value in a non-selected population. 12 In the study by Hermans et al, patients with an acute ankle fracture underwent 1.5 T MRI. 17 Grading of acute syndesmosis injury demonstrated substantial and almost perfect interobserver reliability for the AITFL (K=0.61) and PITFL (K=0.83). Addition of a 45° oblique MRI-plane improved the interrater reliability for the AITFL to almost perfect (K=0.92). As our multiplane MRI-sequence lacked such a plane, the addition could hypothetically further improve the diagnostic reliability for low-grade injuries of the AITFL.

Presence versus absence of complete discontinuity
In daily clinical practice, the presence of periligamentous oedema or partial discontinuity is less consequential, as the decision for surgical intervention is based on the presence of complete ligamentous discontinuity. 19 20 Simplified scoring for the presence of complete discontinuity or acute lesions might therefore be more clinically relevant in this setting. Therefore, the four-grade Schneck grading was dichotomised for presence of acute lesions and presence of complete discontinuity. This resulted in improved interrater reliability for complete discontinuity of the ATFL (substantial) and AITFL (almost perfect). As complete discontinuity of these two ligaments has major ramifications in selected patients (eg, athletes), the improved reliability of dichotomised grading might be preferential in the clinical setting.

Strength and limitations
To our knowledge, this study is the largest prospective cohort study on diagnostic reliability of grading acute injuries of all three ligamentous complexes. Its strength lies in its prospective design, broad inclusion criteria (all acute ankle injuries) and use of 3 T MRI. Despite these facts, the study has some limitations. First, the reported reliability of the deltoid ligaments and posterior syndesmosis ligaments (PITFL and TTFL) should be interpreted with caution as the low prevalence of injury potentially influenced the k-values and corresponding CIs. 12 An even larger cohort of athletes might improve the obtained k values and narrow the CI further. Secondarily, although discrepancies in MRI grading are inherent, considerable bias was observed for the dichotomization strategy in which ligaments were graded either normal (grade 0) or as having an acute lesion (grade 1-3). The bias indices normalised for the dichotomisation strategy in which ligaments were either not completely discontinuous (grade 0-2) or completely discontinuous (grade 3). This suggests a bias of the second radiologist towards scoring low-grade injuries. Potentially, increased interrater reliability could have been achieved with a more elaborate calibration session. However, the limited calibration of both radiologists should be considered a strength of this study, as it represents daily clinical practice.
Future research should aim to correlate grading of injury severity with return to play prognosis after acute ligamentous ankle injuries. Application of the current available grading systems on 3 T MRI is insufficiently reliable for this purpose. Dichotomised scoring for complete discontinuity of ankle ligaments and additional (angulated) MR-planes could potentially improve interrater reliability; however, additional research is required to substantiate these claims. Dichotomised grading (absence or presence of complete discontinuity) of the ATFL and AITFL resulted in substantial to almost perfect interrater reliability. Therefore, when interpreting MRI results of an acute ligamentous ankle injury, reported presence of complete discontinuity of the ATFL and AITFL can be considered reliable. Dichotomised grading (absence or presence of complete discontinuity) of the CFL, resulted in fair interrater reliability. Thus, reported discontinuity of the CFL should be interpreted with caution. In clinical practice, this means that MRI can guide treatment of ligamentous ankle injuries, based on the presence of complete discontinuity of the ATFL and AITFL only. This implies that MR imaging can reliably differentiate acute ankle injuries with and without syndesmotic involvement. Since the AITFL is the first syndesmotic ligament to be injured, a prerequisite for syndesmotic instability is complete discontinuity of the AITFL. 21 MR imaging could therefore aid the identification of patients, with an increased probability of syndesmotic instability (eg, torn AITFL) for further diagnostic work-up (eg, arthroscopy). 22 The use of MR imaging in the diagnosis of acute ligamentous ankle injuries can therefore be considered most useful in those patients with increased probability of syndesmosis injury.

CONCLUSION
In athletes, grading of the three major ligamentous complexes and the individual ankle ligaments according the Schneck grading system using 3 T MRI resulted in limited reliability. When dichotomised for the presence of complete discontinuity, the interrater reliability of the Schneck grading system improved to substantial and almost perfect for the ATFL and AITFL, respectively. Classification of syndesmosis injury according the Sikka classification resulted in moderate inter-rater reliability.
1 Supplementary appendix Intra-and inter-rater reliability of (1) grading ligamentous complexes and individual ligaments according the four grade Schneck grading system, (2) classification of syndesmosis injury according the four grade Sikka classification system, dichotomization of Schneck grading system for (3) presence vs absence of acute lesions and (4)  The total valid lesions for both radiologists (R1a, R1b R2) out of an overall total of 92 MR scans are presented (N). Reliability for grading (Schneck) and classification (Sikka) are presented as weighted-kappa (K) and weighted agreement. Reliability for grading (Schneck)