This study aimed to evaluate the inter- and intra-rater reliability of a manual method used for measuring the AbdH muscle volume based on the MRI images of feet with HV deformity for research and clinical purposes. Before interpreting the results, it is necessary to evaluate the reliability of methods used for measuring the characteristics of muscles responsible for the formation of HV deformities. The ICCs for inter-rater and intra-rater reliability indicated excellent reliability. The SEM% for intra- and inter-rater agreement was estimated at 6.2% and 2.1%, respectively, which is comparable to the results of previous studies. In this regard, in a study by Franettovich Smith et al., the SEM% of inter- and intra-rater agreement was 4% and 6%, respectively for the CSA measurement of the AbdH muscle by ultrasound (
31).
Moreover, based on the findings reported by Jung et al., the SEM% was estimated at 3.8% (
32); nevertheless, it should be noted that both of these studies used the US imaging method. Generally, the SEM value represents the measurement error (
33). An error may occur while detecting the exact location and borders of the muscle among other intrinsic muscles (
33). The manual tracing of borders can also influence the measurements. Besides, the resolution of MRI images is an important factor that may affect the precision of muscle borders. Two raters were trained in several sessions, during which reference images, such as anatomical atlases of foot muscles, were used to determine the exact path and borders in different cuts; the prior experience of raters in such measurements may be the cause of high ICC and low SEM values (
28).
Additionally, the inter- and intra-rater MDC95 values were estimated at 17.2% and 5.9%, respectively in the present study; the MDC value represents the potential to detect changes exceeding the measurement error for research or clinical applications. Therefore, if a single muscle volume measurement technique is employed by a single rater, not all changes in the muscle volume (< 5.9%) are actual changes. This finding is in line with the results of a study by Jung et al., which showed significant changes in the AbdH muscle CSA on ultrasound images after two types of interventions (
32). Moreover, Hing et al. evaluated the reliability of two ultrasound machines and found that a change greater than 21.25% is needed to be 95% confident that a real change has occurred in the AbdH muscle CSA (
34).
Similarly, Lund et al. examined inter- and intra-rater differences in using a manual method to measure the muscle volume of the dorsal ankle (tibialis anterior muscle, extensor digitorum longus, and extensor hallucis longus) in MRI images. Overall, these studies aimed to determine the number of slices needed for calculations and reported excellent inter- and intra-rater reliability (0.98 - 1.0) (
16). It is known that the volume of these muscles (tibialis anterior muscle, extensor digitorum longus, and extensor hallucis longus) is larger than that of deep foot muscles, which may make it easier to identify and follow their path. In a validity and reliability study of a semi-automatic method for discriminating adipose tissue, subcutaneous fat, and intrinsic muscles of the foot, the ICC was mostly above 0.95, which indicated a high level of agreement among therapists (
23). Also, Pons et al. examined the validity and reliability of automatic, semi-automatic, and manual techniques, which were used for measuring the muscle volume based on MRI images in healthy population. For cases of muscle pathology, more data on metrological quality of techniques are required. In addition, techniques that simplified the segmentation, made errors in volume and shape estimation (
20). Previous research has investigated the reliability of slice-by-slice measurements. The intra-rater reliability was good to excellent in four studies (0.7 - 1.0) (
21,
33,
35,
36), and inter-rater reliability was moderate to good in eight studies (0.5 - 0.89) (
10,
21,
33,
35-
39). Seven studies used manual methods to calculate the total volume of muscles by summing up the measured CSAs in all slices, similar to the method used in the current study (
33,
35,
36,
38,
40-
42). However, to the best of our knowledge, no study has yet evaluated our manual method to measure intrinsic foot muscles, especially the AbdH muscle. After muscle segmentation, seven methods were used to calculate the muscle volume. There was no measurement error in volume calculations, and error was related to the time of muscle segmentation (
20).
In previous studies, the IF muscles, which are located deep within several layers, were commonly classified in groups due to their small and irregular size (
11,
23,
24). The separation of a particular muscle from the adjacent intrinsic muscles is a somewhat difficult procedure. To find the beginning and end of a muscle, greater accuracy is needed, since there is a likelihood of measurement error. However, this is not an issue in the middle slices, as the border of muscles is easily separable. The measurement of the IF muscle volume is challenging because of its arrangement in a four-layer complex; therefore, it is very difficult to differentiate these muscles from others (
43).
In individuals with HV deformities, the path of the AbdH muscle may be displaced below the head of the first metatarsal bone, depending on the severity of deformity (
14,
15). Following changes in the muscle anatomy and biomechanics in individuals with HV deformities, muscle imbalance will develop between the abductor and adductor muscles of the hallux (
15). Based on the results of a study by Stewart et al., significant changes were observed in the mediolateral width, dorsoplantar thickness, and CSA of the AbdH muscle between feet with and without HV based on ultrasound data. However, no significant changes were observed in different degrees of deformity (
44). The reliability analysis of the AbdH muscle volume measurement in HV patients provides an important opportunity to gain further insight into the effects of interventions and strategies that focus on improving the strengths and functions of this small muscle by monitoring any related changes.
The limitations of this study include because of time-consuming image segmentation, measurement done one time by each rater; therefore, the absolute agreement was investigated and average reliability was not reported. A lack of comparison between the manual technique and automatic techniques is another limitation of this study.
In conclusion, the inter- and intra-rater reliability of the AbdH muscle volume measurement based on slice-by-slice examination in MRI images was found to be excellent. Therefore, it can be used as a reproducible method to measure the rate of change in the AbdH muscle volume in various treatments or research applications. Due to the excellent intra-rater reliability and lower standard error percentage of measurements, a single person is preferred to perform the measurements in comparative studies. Further research with a larger sample size is recommended.