Our results showed that most of the orthodontic exams are held in the form of MCQ (86.55%), and only a small part (13.5%) is in the form of written questions. Interestingly, this distribution does not exist in all exams, and in some of the exams, only MCQs are used, which indicates that most professors use objective questions to measure students' academic achievement and use subjective and written exams less. The objective exams are those in which both the question and the answer are given to the exam takers, and the exam taker decides on the given answers. These exams usually measure the ability to recognize, while written exams generally measure the ability to recall. Although these two processes of recall and recognition are psychologically closely related and constitute aspects of memory, they differ in terms of practical and applicable abilities. Therefore, it can be said that most orthodontic exams measure students' knowledge to the extent of recognition. However, it is better to use written and objective exams (multiple-choice questions) at the same time to better identify what students have learned.
According to the results of data analysis, the average rate of observing structural rules in orthodontic exams was 93.61%, which is not significantly different from the average for exams in all departments in 2000 - 2001 by Baharvand et al. (
18) (96.2%); in general, this shows the high quality of orthodontic exams in terms of structure. The case that was not observed in terms of structural validity more than others was related to the use of negative words and verbs, as well as using the option of ‘all or none of the options’ in the answers.
The content validity, expressed here as content relevance, was evaluated against the course plan which was planned by the department.
The content validity of the studied exams was 94.8%, which shows that unrelated questions related to the taught chapters are rarely included in the exams, and the educational objectives are well covered. This result is also consistent with Baharvand’s (
18) as well as Kazemi’s (
19) studies. They evaluated 1013 questions gathering from 18 exams, and their results showed a content validity of 92.38%.
Validity refers to the objective that the exam is designed to achieve. A valid exam is the one that is suitable for measuring what is desired.
Many sets of questions can be prepared in a subject. Content validity indicates to which extent the sample of questions used in an exam represents this comprehensive set. In order to make the exams valid, the exam questions should be a complete example of the objectives and content of the course. One of the factors that affect the validity of the exam is the quality of the exam questions. A question that is prepared according to the rules in the field of questioning adds to the validity of the exam (
15).
The average difficulty index of the studied exams in this study was 68.38%. The orthodontic exam of the second semester of 2014 - 2015 had the highest difficulty index, which can be considered due to the small number of MCQ exams, that were the purpose of this study. Also, orthodontic exam 3 had the lowest difficulty index.
It can be said that the difficulty index of the exams was appropriate, and the exams were not very difficult or very easy. These results are consistent with the results of Abbasi et al. (
20) in which only 27.6% of questions had an appropriate range of difficulty coefficient. However, they are inconsistent with Imam Jome et al. (
21) results in which 62% of the questions were considered as easy.
One of our expectations of standard exams is that the exam takers got scores which are distributed over a range. The wider the distribution, the better is the exam. In other words, the higher the variance of the scores obtained from the standard exam, the better the exam. Considering the difficulty index obtained from this study, it can be said that almost all the studied exams were appropriate in terms of the distribution of scores, and they were suitable and standard in this respect (variances).
A larger coefficient of discrimination indicates the distinctive power of the question, and the closer this number is to one, the more appropriate it is. It can be said that the coefficient of discrimination of the studied exams was low (with an average of 0.4116), which shows that the studied questions do not have the power to distinguish between strong and weak students.
The students of Shahid Beheshti Dental School are considered the best ones and also at the same level of knowledge in the entrance exam. This can be assumed the reason for the low coefficient of discrimination, but the distribution of students' scores in the exams and reports obtained from teachers rejected this claim. So, it can be concluded that in terms of the distinctive power, question design is not appropriate. These results are consistent with the results of Abbasi et al. (
20) in which 87.6% of the questions had the coefficient of discrimination of less than 0.5. Also, in a study by Baharvand et al. (
18) conducted on the written exams of theoretical courses, the coefficient of discrimination was 22.3% in all departments, which shows that the orthodontic department exams have much higher distinctive power than the theoretical exams of other departments.
The percentage of taxonomy of the questions was non-uniformly distributed. Approximately 90% of the questions were in the Taxonomy I and II. In the orthodontic exam of the second half of 2013 - 2014, no question was placed in Taxonomy III group. Also, no significant relationship was observed between the high percentage of questions and the cognitive level of Taxonomy III, and the difficulty index of the exam. This might be due to the study population of our study. We used only MCQ exams for calculating the difficulty index, while most questions in the level of Taxonomy III were in the form of written question.
In general, the results show that most of the questions designed in orthodontic exams of Shahid Beheshti Dental School measure most of the students' mental archives, and the number of questions for which the student needs to be analyzed is small. This finding is consistent with the study by Shakournia (
3), in which 90% of MCQ exams were in Taxonomy I and II. It is also consistent with the study by Baharvand et al. (
18), in which the percentage of questions with this taxonomy was 84.6%. Also, with increasing students' educational level and the expansion of the taught materials, the possibility for professors to design questions with higher cognitive levels (taxonomy III) is increased, and more questions were observed at this level in the theoretical orthodontic exam 3 in both semesters of 2013 - 2014 and 2014 - 2017. Taxonomy I and II generally investigate memory and recall, and Taxonomy III analyzes and combines the information of the exam subject with the practical power of the information. From Anderson/Krathwohl's point of view (2001), Taxonomy I measures knowledge that is less involved in understanding at this level, and the only memorization is investigated. At the cognitive level, Taxonomy II requires a greater understanding of superficial awareness of information in memory. Understanding the content causes them not to be forgotten. At a higher cognitive level, Taxonomy III, the lessons learned are more in-depth than the previous two ones, and we need to combine previous information and lessons learned.
Also, with comparing the index between similar exams of 2013 - 2014 and 2014 - 2017, the only index that had a significant difference was the structural validity index in the theoretical orthodontic exam (P = 0.01) and theoretical orthodontics 2 (P = 0.06). In both cases, the structural validity was increased in 2014 - 2015 exams.
4.1. Conclusion
According to our results, MCQ/written exams held in the orthodontics department of Shahid Beheshti School of Dentistry in academic years of 2013 - 2014 and 2014 - 2015 were at a relatively good level in terms of observing indices for measuring the quality of exams. While the content validity, observance of structural rules, and difficulty index of the exams were appropriate and acceptable, the discrimination coefficient and cognitive level of the questions (taxonomy) were at a low level, which requires more attention.