1. Background
Maintaining proper body posture is essential for overall well-being. Research suggests that an optimal upright posture reflects a healthy musculoskeletal system and serves as a crucial marker of the body’s functional health (1). Upper crossed syndrome (UCS) in the upper body and dynamic knee valgus (DKV) in the lower limbs are two prevalent postural deformities commonly observed in both clinical and athletic populations. The UCS is a postural imbalance characterized by a distinctive pattern of muscle tightness and weakness (2). In individuals with UCS, there is a characteristic imbalance in muscle function, where muscles such as the suboccipital, sternocleidomastoid, levator scapulae, pectoralis major and minor, scalenes, and upper trapezius become tight (3), whereas muscles of the neck and posterior upper back, such as the deep neck flexors, serratus anterior, rhomboids, middle trapezius, and lower trapezius, are weakened (4). Postural deformities affiliated with UCS include a forward head posture (FHP), cervical lordosis, and thoracic hyperkyphosis (5). It has been shown that the prevalence of UCS ranges from 11% to 60% in different populations and age groups (6). Moreover, research has shown that UCS can trigger a cascade of biomechanical disturbances that extend to more distal regions of the body, including the lower extremities (7). Consequently, implementing targeted corrective exercises for UCS is essential not only to restore postural balance but also to prevent secondary musculoskeletal complications. In a study, it was shown that corrective exercises and corrective games can usefully diminish the angle of head forward, kyphosis, and shoulder in individuals with UCS (8). Another study reported that an eight-week NASM corrective exercise program may decrease the angles of forward head, forward shoulder, and thoracic kyphosis (9). Moreover, it was shown that a selected corrective exercise program had an effect on the variables of upper extremity functions and proprioception in the cervical area in individuals with UCS (10).
Additionally, DKV refers to an altered movement pattern of the lower extremity, typically involving a combination of femoral adduction and internal rotation, knee abduction, forward translation and external rotation of the tibia, along with ankle eversion (11). This malalignment is characterized by noticeable medial displacement of the knee joint, moving inward past the foot-to-thigh alignment, which signifies a valgus collapse at the knee (12). The DKV is recognized as a key risk factor for both acute and overuse injuries, including non-contact anterior cruciate ligament (ACL) tears and the development of patellofemoral pain (PFP) (13, 14). Correcting faulty movement mechanics can play a crucial role in preventing ACL injuries and other lower limb pathologies, many of which are influenced by modifiable risk factors (15). Those exhibiting poor movement quality are particularly responsive to targeted exercise interventions. In a review, it was found that exercise interventions appear to be an effective method to enhance dynamic balance and functional performance in individuals with DKV (16). Another study showed that participation in corrective exercise programs may lead to significant enhancement in strength and performance of individuals with DKV (17).
Although exercise programs designed by professionals have proven effective, they are increasingly being replaced by artificial intelligence (AI)-driven approaches in modern practice. The AI is transforming the field of sports medicine and can aid in mass personalization and improving the outcomes of personalized rehabilitation protocols and injury prevention strategies (18). The AI-driven exercise prescription, using neural networks and logistic regression, tailors training programs to user needs and is expanding in the fitness domain (19). Furthermore, findings from previous studies indicate that AI has been effective in promoting physical activity among various populations, including children, adolescents, adults, the elderly, and individuals with disabilities (20, 21). For example, a study reported positive effects of an AI-generated core stability program on balance and flatfoot in blind individuals (22). Further validation in real-world settings is essential, as findings indicate that AI technology, particularly GPT-4, can generate safe exercise routines (23).
Based on the current literature, there has yet to be a comprehensive investigation validating the effectiveness of AI-generated exercise programs in improving posture. Prior studies have not explicitly addressed the extent to which these AI-designed programs are valid and effective in achieving these outcomes, nor have they evaluated whether AI can generate evidence-based, high-quality training plans tailored to such health-related variables.
2. Objectives
The current study aims to fill this gap by examining the validity of prescribed AI-generated exercise interventions in improving UCS and DKV.
3. Methods
3.1. Study Design and Setting
This study involved developing an AI-generated exercise program aimed at enhancing DKV and UCS, utilizing the Delphi method. The Delphi process consists of administering a questionnaire within a specific domain, where a panel of experts assesses the program’s suitability. The research team was composed of physiotherapists with a minimum of 5 years of professional experience, university faculty members specializing in rehabilitation and corrective exercise with an academic record including publications, experts in exercise physiology, and certified coaches in the field of sports science. Additionally, a statistician and research methodology expert with extensive experience in applied studies participated in the project. A steering committee, consisting of several specialists, was responsible for designing, reviewing, and analyzing the responses and expert feedback.
To assess the validity, three distinct methods were applied: The Content Validity Ratio (CVR), the Content Validity Index (CVI), and the Impact Score (IS). A panel of ten university-level experts specializing in corrective exercises and sports-related injuries participated in the evaluation process. For calculating the CVR, each expert reviewed every item and selected one of three possible judgments: (A) necessary, (B) helpful but not necessary, and (C) not necessary for each question or item. According to the Lawshe table (24), if the score acquired for each question is more than 0.62 (based on evaluations from ten experts), it indicates that the question is essential and necessary to be included in the tool with an acceptable level of significance. To determine the CVI, the same panel rated each item for clarity, simplicity, relevance, and ambiguity using a 4-point Likert scale. This scale allowed experts to indicate the degree of association between items, using the following levels: "No relation", "somewhat related", "good relation", and "very high relation". The CVI was calculated as the percentage of items with agreeable points (ranks 3 and 4) among total voters. The CVI score required for item acceptance was higher than 0.79 (24). Moreover, IS was employed to gauge the perceived significance and relevance of each item according to expert judgment. Experts assigned ratings on a 5-point Likert scale, ranging from 1 (not important) to 5 (very important). The IS for each item was calculated using the following formula: IS = Frequency (%) × Importance (mean score). An IS ≥ 1.5 was considered acceptable and indicative of satisfactory face validity, as per established psychometric validation guidelines (25). This approach ensured that only exercises deemed both clinically relevant and contextually appropriate by the expert panel were retained in the final protocol.
Additionally, the Fleiss Kappa coefficient (κ) was calculated to assess the degree of agreement (reliability) between the experts’ responses. The interpretation of this coefficient was based on the following criteria (26): (A) κ ≤ 0.4: Weak or poor reliability, (B) 0.4 < κ ≤ 0.6: Moderate reliability, (C) 0.6 < κ ≤ 0.8: Good reliability, and (D) κ > 0.8: Excellent reliability.
3.2. Participants and Sampling
Moreover, two Iranian male participants were assessed in this study, one diagnosed with UCS (age: 19 years, weight: 70 kg, and height: 175 cm) and the other with DKV (age: 19 years, weight: 72 kg, and height: 176 cm). Inclusion criteria were males aged 18 - 25 years, diagnosed with UCS or DKV, and not having any musculoskeletal injuries in the past 6 months. Exclusion criteria included neurological conditions, recent acute musculoskeletal discomfort, lower limb or spinal surgical history, or any other condition that would make it unsafe to participate in functional tests. Furthermore, all assessments were conducted by a qualified specialist with relevant professional experience in musculoskeletal evaluation and rehabilitation, whose background is detailed to establish the reliability of the evaluation process.
3.3. Tools/Instruments
3.3.1. Assessment of Dynamic Knee Valgus
The DKV was evaluated using the Single-Leg Squat (SLS) test, a clinically validated method for assessing frontal plane knee alignment during functional tasks (27). To ensure precise measurement and objective analysis, the performance was recorded using a high-definition video camera (Canon, model: PowerShot A630) positioned in the frontal plane. The recorded footage was subsequently analyzed using Kinovea software (28), which allowed for frame-by-frame evaluation of joint angles. Specific anatomical landmarks were identified and tracked to calculate the knee valgus angle during the deepest point of the squat. The participant demonstrated a mean knee valgus angle of 21.40°, exceeding the commonly cited threshold of abnormal valgus (> 15°) (29), which has been associated with impaired neuromuscular control and increased risk of lower limb injuries such as PFP and ACL rupture.
3.3.2. Assessment of Upper Crossed Syndrome
In the present study, posture was quantitatively evaluated through standardized lateral-view photogrammetry, a validated, non-invasive method for measuring postural angles with high reliability (30). The participant was instructed to stand in a natural, relaxed posture while a lateral-view photograph was taken under consistent lighting and positioning conditions. The images were analyzed using Kinovea software to obtain precise angular measurements of postural alignment. The results revealed a mean thoracic kyphosis angle of 56.14°, exceeding the typical clinical threshold for hyperkyphosis (> 40°), indicating excessive curvature in the thoracic spine. The craniovertebral angle (CVA) averaged 58.7°, which is below the normative value of ≥ 60°, suggesting a FHP. Additionally, the shoulder angle was measured at 61.5°, consistent with anterior shoulder translation, a hallmark of scapular protraction and muscle imbalance associated with UCS (31).
3.4. Intervention/Procedures
Following the initial assessment and identification of the participants presenting with UCS and DKV, an 8-week corrective exercise protocol was designed using ChatGPT-4o. Specific, evidence-based prompts, grounded in biomechanical principles, current rehabilitation guidelines, and posture correction strategies, were used to generate a personalized training regimen. Moreover, to ensure the clarity, relevance, and consistency of the AI-generated content used in this study, we utilized the Originality. The AI Prompt Generator for crafting each prompt. This tool was employed to systematically generate prompts used for analysis, content creation, and communication within the study framework.
3.4.1. The Prompt for Upper Crossed Syndrome
Write an 8-week corrective exercise program for a person who is 19 years old, weighs 70 kg, and is 175 cm tall, with a hyperkyphosis angle of 56.14°, a CVA of 58.7°, and a shoulder angle of 61.5°, based on the frequency, intensity, time, and type (FITT) principles for optimal results. Please ensure that the program includes specific exercises targeting the identified postural issues and adheres to the FITT principles. Additionally, provide figures or diagrams for a better understanding of each exercise, emphasizing proper form and technique (Appendix 1 in Supplementary File).
3.4.2. The Prompt for Dynamic Knee Valgus
Write an 8-week corrective exercise program based on the FITT principles for optimal results for a person who is 19 years old, weighs 72 kg, and is 176 cm tall, with a DKV angle of 21.40°. Please ensure that the program includes specific exercises targeting the identified postural issues and adheres to the FITT principles. Additionally, provide figures or diagrams for a better understanding of each exercise, emphasizing proper form and technique (Appendix 1 in Supplementary File).
3.5. Data Analysis
To calculate the level of agreement among experts, Cohen’s κ was used. In addition, to assess the validity of the exercises, three key indices were employed: The CVR, CVI, and IS. Data analysis was performed using SPSS version 27 and Microsoft Excel version 2024.
4. Results
According to Table 1, the IS indicates that all exercises possess the required level of validity. However, based on the CVI, exercise 1 (Hip Flexor Stretch) did not meet the necessary validity criteria. Furthermore, according to the CVR Index, only exercises 9, 15, and 18 demonstrated acceptable content validity.
Abbreviations: DKV, dynamic knee valgus; IS, Impact Score; CVI, Content Validity Index; CVR, Content Validity Ratio.
a > 1.5
b CVI = > 0.79.
c CVR = > 0.62.
According to Table 2, the IS indicates that all exercises possess the required level of validity. However, based on the CVI and CVR, exercises 3 (Scapular Retraction)and 9 (Cat-Cow Stretch) did not meet the necessary validity criteria.
Abbreviations: UCS, upper crossed syndrome; IS, Impact Score; CVI, Content Validity Index; CVR, Content Validity Ratio.
a > 1.5
b CVI = > 0.79.
c CVR = > 0.62.
The Cohen’s κ for expert agreement on the entire set of exercises was -0.16, with a 95% confidence interval ranging from -0.52 to 0.19 (Table 3). This negative kappa value suggests poor agreement among the experts, indicating that their evaluations may not be consistent beyond a chance level. The confidence interval also crosses zero, which further supports the conclusion that there is no statistically significant agreement between the raters.
| Cohen’s Kappa | Confidence Interval | |
|---|---|---|
| Upper Bound | Lower Bound | |
| -0.16 | 0.019 | -0.052 |
a g ≤ 0.4: Weak or poor reliability; 0.4 < g ≤ 0.6: Moderate reliability; 0.6 < g ≤ 0.8: Good reliability and, g > 0.8: Excellent reliability.
5. Discussion
Leveraging AI to design exercise programs represents an emerging and innovative method, appreciated for its ability to scale and adapt to individual needs. In this research, ChatGPT-4o was utilized to design personalized, 8-week corrective exercise programs addressing UCS and DKV. These protocols were structured around the FITT approach to ensure systematic program development. Nevertheless, evaluations by domain specialists using content validity measures (including CVR, CVI, and IS) indicated that while a majority of the exercises showed acceptable content relevance, some did not meet the required content validity benchmarks. For example, within the DKV group, only 3 out of the 20 exercises achieved the minimum CVR score of 0.62. This discrepancy suggests that while AI can generate relevant exercise suggestions, expert oversight remains essential to refine and validate the clinical applicability of these prescriptions. Concerning these results, a study investigating the effect of a 5-week AI-generated calisthenics training program on health-related physical fitness components showed that AI can be used for fitness training, but professionally designed programs are superior in some areas (19). Another study by Ebrahimi et al. reported that AI-generated core stability training may be effective for flatfoot and balance in blind individuals with expert observation (22). Despite these findings, while AI, including ChatGPT-4o, can produce biomechanically and theoretically sound content, human oversight is required to ensure safety, contextual appropriateness, and individualization (23).
Additionally, the AI-generated programs adhered to the FITT principles and included commonly recommended exercises for each deformity. For UCS, these likely included strengthening of the deep neck flexors and scapular stabilizers and stretching of tight anterior musculature (2, 3). For DKV, the protocol presumably focused on strengthening the gluteus medius, improving hip stability, and neuromuscular control of knee alignment, as supported by existing literature (11, 15). A study demonstrated that ChatGPT-4o showed great potential to become a smart, interdisciplinary, yet independent assistant to provide accurate and individualized exercise prescriptions for both general and professional use (32). In addition, Lo et al. indicated that using an AI-embedded mobile app to provide a personalized therapeutic exercise program may be beneficial for chronic neck and back pain (33).
One of the most concerning outcomes of this study was the poor reliability among experts, as reflected in the Cohen’s κ of -0.16. This negative value implies that the agreement among reviewers was worse than random chance. This could result from multiple factors: Variability in expert backgrounds (e.g., physiotherapy, biomechanics, sports coaching), ambiguities or inconsistencies in the AI-generated content, and differences in interpretative criteria for what constitutes a "valid" corrective exercise. Thus, enhancing expert panel calibration or refining evaluation rubrics could improve future reliability scores. The use of AI platforms like ChatGPT-4o may accelerate the development of first-draft rehabilitation protocols, reduce planning time for clinicians, and allow for mass customization (34). However, validation mechanisms such as the CVR and CVI, as employed here, are essential before any AI-generated program can be adopted clinically.
5.1. Conclusions
This study provides a foundational step toward validating AI-generated exercise programs for postural correction. The findings suggest that while platforms like ChatGPT-4o can generate generally appropriate material, discrepancies remain in terms of expert consensus with established validity benchmarks. Therefore, AI may support rehabilitation only as an adjunct under professional supervision, rather than as an independent tool.
5.2. Limitations and Suggestions
Although the AI output was based on prompts grounded in clinical standards, the study shows that not all exercises met expert-defined thresholds for relevance and clarity, emphasizing the need for cautious integration of AI into therapeutic planning. Moreover, AI lacks the lived experience and contextual awareness necessary to tailor interventions to complex variables such as specific pathology, comorbid conditions, stages of recovery, or the individual’s readiness and psychosocial context, all of which are critical to the practical success of rehabilitation strategies. Also, AI should not be seen as a replacement for human expertise but rather as an augmentative tool that can assist in the initial generation of exercise plans, subject to further refinement by clinicians. This hybrid approach allows healthcare professionals to benefit from the efficiency and scalability of AI while maintaining the necessary clinical oversight to ensure patient safety and individualized care.
The study's main limitation lies in the small sample size (n = 2 participants) and the lack of follow-up data on actual postural improvements. The small expert panel (n = 10) also contributes to the variability in validation indices. Additionally, this study included only two male participants, which limits the generalizability of the findings. Furthermore, although the study used multiple validation metrics (CVR, CVI, IS), poor inter-rater agreement reduces confidence in these outcomes.