The main goal of our narrative review article was to identify the effective factors influenced by the number and duration of OSCE stations. Our results indicated trends that could be useful for understanding key factors such as skills, reliability, validity, and cost, which can influence the number and duration of OSCE stations. Our results suggest that there are four elements to consider in the initial stage of OSCE design: Skills, reliability, validity, and cost. Emphasizing the importance of determining the appropriate number and duration of stations in OSCE design underscores the need for assessment methods capable of distinguishing competent medical students from incompetent ones. Barman stated that designing a reliable, valid, objective, and practicable OSCE requires great sensitivity in planning and administration, such as using a test matrix and its learning objective allocations, organizing a sufficient number of stations, training academic staff, and using proper checklists (
18). Mavis designed an OSCE to examine the self-efficacy of second-year medical students with 11 stations lasting 3.5 hours, but it lacked reliability and validity (
23). Numerous studies have reported various station numbers and durations to assess different skills compatibly in OSCE design (
3,
12,
23,
24,
26). Pierre et al. designed an OSCE to evaluate child health medical students' perceptions at the end of their clerkship with 13 stations, each lasting 7 minutes, except for the history-taking station, which lasted 14 minutes. Therefore, considering various skills requires different station durations in OSCE design (
12). Thus, it seems necessary to consider factors other than skills when determining the number of stations and test duration in OSCE development. There is controversy regarding the standard length of an OSCE station. Some researchers have obtained good reliability (≥ 0.8) with stations lasting 5 - 10 minutes each, while others recommend a 6 - 8 hour OSCE to achieve reliable results (
26). Although higher reliability in OSCEs is generally achieved with more stations (
18), some studies have found the opposite. Brannick et al. investigated OSCEs and found that tests with fewer than 10 stations may result in reliability greater than 0.80. Conversely, other studies showed that even with more than 25 stations, reliability of less than 0.80 was obtained (
8).
Therefore, it appears that factors other than reliability affect the number and duration of stations in OSCE implementation. To remove poorly performing OSCE stations, Auewarakul et al. analyzed the OSCE to estimate reliability and found that reliability was the basis of construct validity (
30). In a study by Cohen in 1990, an OSCE was implemented for surgical residents to evaluate the construct validity of the test. Cohen et al. found that 17 of 19 stations were required to obtain construct validity, but he also considered three other elements: Reliability, skills, and cost (
31). Khan et al. states that OSCE stations usually last 5 to 10 minutes each, but reliability, validity, and task allocation are essential for determining station numbers and duration (
27).
Barman defines the types of validity and mentions that for an OSCE to have a high level of validity, it is important to consider the blueprint of subject matter that students are expected to achieve according to learning objectives (
18). However, he does not address validity in terms of OSCE station number and duration. Direct and indirect costs are reported in a limited number of studies. OSCE costs vary due to differences in station designs across various OSCEs. In his 2015 paper, Brown et al. notes that cost resources are influenced by the time period of the study (
19). Additionally, the main costs are impacted by inflation over the time period of the study.
Consequently, outdated budget estimates are not adequate resources for stakeholders. As Pell et al. state, reducing the number of stations results in decreased marginal costs, such as reductions in clinical resources, the number of simulations, support staff, and assessors, which leads to overall cost reduction (
17). According to Reznick et al., multisite examinations for licensees, large-scale, high-stakes examinations require increased costs for OSCE implementation with more stations (
34). Although Cookson et al. claim that if the number of stations is halved in a sequential OSCE, the cost will be reduced by one-third, it is not possible to definitively predict the amount of cost reduction because OSCE costs are influenced by the required skills, the number of assessors, standardized patients, and essential equipment (
35).
Additionally, the cost of a station depends on whether real patients or professional standardized patients (SP) are used, and this cost is influenced by whether the OSCE is a new implementation or an iterated one. As mentioned previously, the focus of this study was to consider important factors related to OSCE station numbers and duration. An extensive search of databases such as PubMed, ScienceDirect, Scopus, Eric, Ovid, and Google Scholar for OSCE studies revealed four key factors.
Papers from various disciplines, including undergraduate and postgraduate medicine (e.g., internal medicine, surgery, child health, and nursing), were reviewed. It was found that factors such as skills, validity, reliability, and cost play significant roles in the initial stage of OSCE design and have a substantial influence on OSCE station numbers and duration. Similar levels of influence were observed in OSCEs across different fields like internal medicine, surgery, child health, and nursing in various universities of medical sciences. Our findings revealed that these four factors simultaneously affect OSCE station numbers and duration.
The presence of some factors with different OSCEs is due to slight differences in subjects from one discipline to another. Our reviewed studies identified four key factors compared to other studies. These studies were then categorized into four groups: Skills, reliability, validity, and cost. We selected the studies based on station numbers and duration in OSCE design developed for undergraduate and postgraduate medical sciences to scrutinize key factors. The studies elaborated in our review provide examples of factors that may be useful for developing OSCEs in the future.
Most studies, with the exception of Lind et al. in 1998, which examined the effect of length, timing, and content of the third-year surgery rotation on several clerkship and post-clerkship performance metrics, developed OSCEs to assess student competency and measure reliability and validity. However, they did not focus on the important factors influencing OSCE station numbers and duration (
37). The opinions of OSCE designers offer important perspectives that can contribute to the development of more accurate, reliable, and valid tests. As mentioned in the introduction, no prior review appears to have determined the effective factors influencing OSCE station numbers and duration.
4.1. Strengths and Limitations
Investigating one of the basic needs in OSCE design, namely determining the appropriate number and duration of stations in the initial stages of test design, is one of the strengths of this study.
We are aware that our review may have two limitations. First, we could not assess all the evidence pertaining to this field. Although some progress has been made using our review, these incremental factors provide only a partial answer to OSCE development. Second, we only included studies in English, which may have limited our findings. These limitations highlight the difficulty of collecting comprehensive data for this review.
4.2. Conclusions
The findings underscore the importance of determining the appropriate number and duration of stations in the early stages of OSCE design. The data indicate that the four variables of skills, validity, reliability, and cost have a considerable impact on OSCE design. This research suggests that investigating these factors can be instrumental in designing effective OSCE tests and help predict the utility of the designed assessments. Ultimately, this study, with its insights, can aid in developing widely used and impactful OSCE tests and pave a clearer path for future OSCE design and evaluation in the field of medical sciences. In addition to assisting in OSCE test design, the results of this study can also help predict the utility of the designed OSCEs.
4.3. Highlights
Station numbers and duration are two important components that should be considered in the early stages of designing the OSCE structure. Reliability, validity, clinical skills, and budget are crucial factors in determining the number and duration of stations. Having sufficient insight into the factors influencing the utility of the OSCE is essential for exam designers.
4.4. Lay Summary
In objective structured clinical examinations, students' clinical skills are assessed in a simulated environment. Determining the number and duration of stations are two important elements in the design of this test. The decision regarding the number and duration of stations depends on various factors. The results of this study showed that the validity of the test, the skills considered for evaluation, obtaining consistent results from repeated assessments, and the cost of the test all affect the number and duration of stations in this evaluation method.