The present study was designed to evaluate the level of agreement between commonly used methods of Mizaj assessment in PM, including clinical judgment by a PM physician (CMA) and two standardized self-report questionnaires (MMQ and SMQ). By determining the extent to which these approaches yield comparable temperamental classifications, the findings help clarify their methodological alignment and clinical applicability.
According to the interpretation criteria proposed by Landis and Koch (1977) (
13), kappa values between 0.41 and 0.60 represent moderate agreement, values between 0.21 and 0.40 represent fair agreement, and values between 0.00 and 0.20 represent slight agreement. In this study, most coefficients fell within the slight to moderate ranges, reflecting heterogeneous alignment between assessment approaches across different temperament dimensions. This indicates that, although certain dimensions show partial overlap, none of the approaches are fully interchangeable. This pattern suggests that clinical and self-report assessments capture complementary but distinct aspects of the temperament construct.
When comparing CMA with the MMQ, the highest agreement was observed for the Coldness dimension (kappa = 0.512; 95% CI, 0.368 - 0.656), representing moderate agreement. This suggests a moderate level of correspondence between MMQ and CMA classifications of Cold temperament, possibly because both approaches emphasize relatively observable physiological and somatic indicators. By contrast, the lowest agreement in this pair occurred for Balance in Dry and Wet (kappa = 0.241), which falls within the fair agreement range. Although this level of agreement was somewhat stronger than that observed between CMA and SMQ for the same dimension, it remained limited. This relatively weak correspondence may be due to restricted item coverage in the MMQ, which primarily evaluates wetness and dryness through indices such as obesity or leanness and skin texture (
10), and may therefore fail to capture subtler clinical signs recognized by physicians during face-to-face assessments.
When comparing CMA with the SMQ, the highest concordance was observed for the Hotness dimension (kappa = 0.512; 95% CI, 0.422 - 0.602), indicating moderate agreement. This finding suggests that both methods identify individuals with Hot temperament features in a relatively similar manner. One possible explanation is that manifestations of hotness, such as thermal sensation, behavioral tendencies, and other perceptible features, may be more readily recognized by both respondents and clinicians. In contrast, the weakest alignment in this pair was found for Balance in Dry and Wet (kappa = 0.081; 95% CI, -0.023 to 0.185), corresponding to slight agreement. The confidence interval overlapping zero further indicates that this agreement may not be statistically meaningful. This weak correspondence suggests that questionnaire-based self-perceptions are limited in capturing the clinical cues physicians use to judge moisture balance. Possible contributors include self-report bias, uncertainty in respondents' self-perception, and a mismatch between qualitative clinical markers and questionnaire item content.
Agreement between the two questionnaires (MMQ and SMQ) was the lowest among all pairwise comparisons, ranging from kappa = 0.132 to 0.366, corresponding to slight to fair agreement. The strongest alignment occurred for Dryness (kappa = 0.366), whereas the weakest was found for Balance in Hot and Cold (kappa = 0.132). Although both tools are grounded in similar theoretical concepts derived from PM, they appear to differ in item weighting, response scaling, and interpretive thresholds. This finding indicates that standardized questionnaires developed for Mizaj assessment should not be considered interchangeable, as they may diverge considerably in their classification outcomes. The consistently modest kappa values point to both methodological and conceptual differences among the tools. Clinical evaluation incorporates multisensory and interpretive cues, including visual observation, palpation, and narrative context (
14-
16). In contrast, questionnaires operationalize temperament through structured self-report domains, which may improve reproducibility but can also oversimplify complex temperamental nuances.
As a result, reliance on questionnaire data alone may lead to misclassification, particularly in borderline or mixed cases. These observations support a complementary rather than competitive role for clinical and questionnaire-based assessments. In practice, questionnaires may serve as efficient screening instruments, whereas comprehensive clinical evaluation remains essential for confirmation and therapeutic decision-making. The broader implications are consistent with evidence from psychometric studies showing that even widely validated temperament-related tools, such as the Persian TCI 140, exhibit differential factor structures and internal consistencies across cultural contexts, underscoring the inherently multidimensional nature of temperament measurement (
17).
Although assessment agreement was the primary focus of the present study, the findings related to Mizaj distribution and demographic patterns also offer useful contextual insight. The most frequent categories in both CMA and MMQ were Balance in active and receptive qualities, whereas the SMQ more often identified Hot and Wet temperament. Similar distributions have been reported among patients with metabolic syndrome in Qazvin, although differences in sampling strategy and health status may explain part of the divergence. Another Iranian study by Parvizi et al. (
18) in Shiraz identified Wet as the dominant temperament, a difference that may be influenced by climate and dietary habits. Variations from the findings reported by Aziz et al. in India (
19) may likewise reflect ecological conditions as well as differences in the studied populations (
20).
With respect to age, classical PM doctrines associate youth with Hot and Wet temperament and aging with Cold and Dry temperament. Although some age-related shifts were observed in the present study, the trends were less distinct than traditional expectations. This discrepancy may be related to modern lifestyle and psychosocial factors, such as sedentary behavior, dietary modification, sleep disturbance, and mental stress (
20-
23). These secondary observations further illustrate the complex interaction between inherited constitution and environmental modulation in contemporary populations.
5.1. Limitations
This study has several limitations. The evaluation was conducted by a single PM specialist, reflecting routine clinical practice; although this limits estimation of interrater reliability, future multi-rater studies should address this gap. Because no independent external gold standard exists for Mizaj assessment, the findings should be interpreted as reflecting agreement between methods rather than diagnostic validity. Although incorporation bias is a concern in agreement studies, it was mitigated here because clinical assessment was completed before, and independently of, questionnaire administration. The study population consisted predominantly of male healthcare workers, which may limit generalizability. Finally, as with all kappa-based analyses, the observed coefficients may have been influenced by the prevalence and distribution of temperament categories.
5.2. Conclusions
The findings of the present study show that inter-method agreement in Mizaj assessment is limited and generally remains within the slight to moderate range. The physician's clinical judgment showed closer alignment with the Salmannezhad questionnaire for Hotness and with the Mojahedi questionnaire for Coldness, while very weak or near-zero agreement in some domains, particularly Balance in Dry and Wet, indicates that certain aspects of Mizaj remain difficult to standardize across instruments. Further refinement of self-report content, larger validation studies, and cross-cultural calibration of diagnostic criteria are needed to improve methodological coherence. Integrating multiple sources of data, including clinical observation, physiological indicators, and psychometric profiles, may provide a more faithful operationalization of the multifaceted concept of Mizaj in both research and clinical practice.