A total of 329 studies were identified through searches in databases. After removing duplicated studies, 199 articles remained. In the next step, 191 studies that did not meet inclusion criteria based on the abstracts were excluded. The full text of another study was not available. The eligibility of another five studies was not confirmed based on a full-text review. Three studies were excluded following a review of the full text of retrieved studies because they were related to developing a test for assessing word association, rapid automatic naming, and word finding. Two studies were appropriate for adults that did not meet the inclusion criteria. Reference lists for relevant documents were manually searched to maximize the identification of eligible studies, and five articles were identified. In the second and third stages of the search, no new studies were identified. Finally, eight studies were identified as eligible that developed a tool to assess vocabulary for Iranian Persian-speaking children. A summary of information about each tool is presented in
Table 1. These studies were related to developing or adapting and determining psychometric properties of vocabulary tests. Five studies were related to the adaptation of the original version: peabody picture vocabulary test (PPVT) (
44), test of language development- primary: 3 (TOLD-P: 3) (
45), British Picture Vocabulary Scale (BPVS-II) (
46), MacArthur-Bates communication development inventory: Persian version (CDI-I: P) (
47), and short form of Persian picture vocabulary scale (PPVS) (
48). Three other studies were related to the development of a new vocabulary test in the Persian language: receptive picture vocabulary test (RPVT-I) (
49), receptive picture vocabulary test (RPVT-II) (
50), and picture verb test (PVT) (
51). All of these tests provide assessments of vocabulary breadth knowledge. Five tests (RPVT-I, RPVT-II, PPVT, BPVS-II, and PPVS) only assess receptive vocabulary; the PVT assesses expressive vocabulary, and two others (TOLD-P: 3 and CDI-I: P) target both receptive and expressive vocabulary knowledge. The TOLD-P: 3 and CDI-I: P are not uni-modal, but only the vocabulary section is examined in the present study. In other words, whenever we talk about a CDI-I: P, we mean the word section, and whenever we talk about the TOLD-P: 3 test, we mean the two oral and picture vocabulary sub-tests.
| Test | Study | Subtests | Number of Items | Age Range | Sample Size | Picture Plate Description |
|---|
| Peabody picture vocabulary test (PPVT) | Razavieh and Shahim (44) | Receptive vocabulary | 97 | 3 - 11 years | 1010 | Full-color drawings; One picture per plate |
| Test of language development- primary: 3 (TOLD-P: 3) | Hasanzadeh and Minaei (45) | Picture vocabulary and oral vocabulary subtests | 30; 28 | 4 - 8 years | 1235 | Color drawings; One picture per plate for picture words subtest; Four pictures per plate for the spoken word subtest |
| MacArthur-Bates Communication Development Inventory: Persian version (CDI-I: P) | Kazemi et al. (47) | Words and Gestures: | 680 | 8 - 16 months | 30 | Checklists |
| Picture verb test (PVT) | Soltaninejad et al. (51) | Expressive verb vocabulary | 55 | 36 - 54 months | 106 | One picture per plate |
| British picture vocabulary scale; (BPVS-II) | Kazemi et al. (46) | Receptive vocabulary | 168 | 5 - 11 years | 180 | Four white and black pictures per plate |
| Receptive picture vocabulary test (RPVT-I) | Hassanpour et al. (49) | Receptive vocabulary | 240 | 30 - 71 months | 91 | Color drawings; Four pictures per plate |
| Receptive picture vocabulary test (RPVT-II) | Salehi Zahabi et al. (50) | Receptive vocabulary | 240 | 6 - 13 years | 118 | Color drawings; Four pictures per plate |
| Short form of Persian Picture Vocabulary Scale (PPVS) | Pouretemad et al. (48) | Receptive vocabulary | 38 | 5 - 6 years | Pilot study: 100; Original study: 410 | White and Black line drawings |
Regarding administration and response elicitation, participants should indicate their responses by selecting the target picture from an array of four receptive ones or naming the picture in expressive ones. However, the CDI-I: P is the only tool that includes a parent report checklist. The RPVT-II and PPVS are not timed. The CDI-I: P takes between 20 and 40 minutes to administer, depending on the mother's level of education. The administration time for other tests is not specified. Except for the PVT and CDI-I: P, the remaining vocabulary tests assess lexical noun categories. All reviewed studies were reported in journal articles, except for the TOLD-P: 3, which was also published as an accessible vocabulary assessment tool. As a result, the tests’ information was obtained from their subsequent articles. Most of them were carried out as master's thesis projects, and they could only be accessed with the permission of test developers. Some of them (TOLD-P: 3 and PPVT) have provided norm scores in large groups, and researchers have claimed that they can be used as reference (norm-reference) for identifying clinical samples. Three tools (TOLD-P: 3, PPVT, and CDI-I: P) included some items related to noun, verb, and adjective, whereas two others had no items related to the verb: the RPVT-I and RPVT-II. The grammatical category of the items is not mentioned in the PPVS and BPVS-II. The PVT was designed exclusively to assess the expression of verbs. The familiarization process and practice items were explicitly mentioned in the PVT and the TOLD-P: 3.
6.1. Criterion 1 (Standardization Sample)
Our investigation of this criterion focused primarily on the TOLD-P: 3, the PPVT, and the BPVS-II, all claiming to have attempted to standardize a vocabulary test. However, the sample's quality was examined in other studies. The TOLD-P: 3 standardization sample represented the population proportion in different geographic regions of Tehran city and the socioeconomic status proportion in different regions based on census data. This test's normative sample included both typical and atypical children (e.g., intellectual disabilities and learning disorders) and did not exclude them. No consensus exists on whether atypical individuals should be included in the normative sample. Proponents of including atypical individuals believe that such a sample would more accurately represent the full range of language abilities (
34,
52).
For the PPVT, the distribution of socioeconomic status in the standardization sample was close to its proportion in the target province's population based on census data. However, no explanation has been provided for the geographical residence, normalcy violation, and the PPVT's approach to dealing with it. The BPVS-II study failed to meet any of the properties of criterion 1. In this study, only the sampling method (simple random) is mentioned without considering geographical residence and population proportion. Similarly, in other studies, only the sampling method has been mentioned.
Another consideration is the recency of the standardization sample. Because the population can change over time, children should not be measured against an out-of-date sample. The vocabulary area of language development, especially in children, is sensitive to changes over time. Another issue to consider is the appearance of depicted objects and the visual attractiveness of test pictures. The TOLD-P: 3 test (i.e., the only test with a published manual) was developed in 2002 and has not been revised since then. Furthermore, there is no mention of the year in which the sample data were collected.
6.2. Criterion 2 (Sample Size)
Some studies (e.g., the PPVT) grouped subjects by age at six-month intervals, while others (e.g., the TOLD-P: 3) grouped subjects by year. The TOLD-P: 3 test only provided an adequate sample size in each of the five age sub-groups ranging from four to eight years. The PPVT provided an adequate sample size only in each of the six age sub-groups from six to 11 years Subjects were grouped into 6-months intervals from 36 to age 72 months and yearly intervals from 6 through age 11 years. In BPVT, none of subgroups met the age criterion: ages five (n = 35), seven (n = 46), nine (n = 50), and 11 (n = 49). The PPVS provided an adequate sample size in each age group. The remaining five tests (PVT, RPVT-I, RPVT-II, and CDI-I: P) failed to meet this criterion. The CDI-I: P, similar to other tools, did not have any normative data due to the small sample size and was at a preliminary level of adaptation. However, it was found through email contact with the corresponding author that the determining psychometric characteristics of the newest edition of CDI has been done, and standardization is scheduled to take place soon.
6.3. Criterion 3 (Content Validity)
In the PPVS, TOLD-P: 3, and PVT, the classical approach to item analysis (i.e., estimating the difficulty level and the discriminate power of items) was used. The content validity was estimated using the Content Validity Index (CVI) in the PVT, RPVT-I, and RPVT-II and the content validity rating (CVR) in the RPVT-I and PVT. The BPVS-II did not use any method to evaluate content validity.
6.4. Criterion 4 (Mean and Standard Deviation)
Central tendency indices scores are reported in the PPVT, BPVS-II, PVT, TOLD-P: 3, RPVT-I, and RPVT-II for age subgroups. In addition, three studies, PPVT, RPVT-I, and RPVT-II, also reported these indices for gender subgroups. These indices were not reported in the CDI-I: P.
6.5. Criterion 5 (Concurrent Validity)
The PPVT provided a weak correlation between test scores and academic achievement in one age group. Evidence for this criterion has been provided for the picture vocabulary and oral vocabulary subtests of TOLD-P: 3 by examining the correlation between the similarities and vocabulary subtests of the Wechsler test. In BPVS-II, concurrent validity has been investigated by examining correlations between vocabulary scores and the verbal, practical, and general intelligence scores of the Wechsler test. Five reviewed tests did not provide evidence of concurrent validity (
47-
51).
6.6. Criterion 6 (Predictive Validity)
None of the reviewed tools examined the predictive validity evidence.
6.7. Criterion 7 (Construct Validity)
Of the eight reviewed tests, only the TOLD-P: 3 used the exploratory factor analysis to assess the construct validity. Developmental trends, increasing the mean raw scores in the form of one-year age ranges, were considered the second evidence to assess the construct validity. An increase was observed in the mean of raw scores among age subgroups of BPVS-II, TOLD-P: 3, and PPVS. Mean scores increased across age subgroups in the PVT (three age groups: 36 to 42 (M = 37/35), 42-48 (M = 39/06), and 48-54 (M = 42/97)) and RPVT-I (means presented in six-months intervals). In the RPVT-II, mean scores increased with age, except for 8 to 9 and 9 to 10 years, where mean scores remained constant. There was no information about the subjects' mean scores in the CDI-I: P paper. The mean scores of the PPVT revealed developmental trends across all age groups.
Only the TOLD-P: 3 provided evidence for group comparisons. Children with learning disorders, speech and language development delay, mental retardation, and attention deficit hyperactivity disorder were assessed. The mean differences between the groups with disorder and the normative or control groups were more than 2 SD in picture vocabulary and oral vocabulary subtests. Gender differences in test scores were compared in two reviewed tests (PPVS and RPVT-I), but the findings were insignificant.
6.8. Criterion 8 (Internal Consistency)
One method of estimating the reliability of a test or scale is to calculate the correlation coefficient among items (Cronbach’s alpha coefficient). The RPVT-II (0.83), PVT (0.71), TOLD-P: 3 (with Cronbach's alpha for oral and picture vocabulary subtests of 0.89 and 0.76, respectively), PPVS (0.84), CDI-I: P and BPVS-II (0.84) passed the 0.70-0.90 criterion. The vocabulary production (0.87) and vocabulary comprehension (0.98) subscales of CDI-I: P showed the highest values but these Cronbach's alpha coefficients, especially in the case of the second subscale, which is above 0.9, probably indicate the presence of highly related and redundant items. The internal consistency of two tests (RPVT-I and PPVS) was assessed by measuring the split-half reliability.
6.9. Criterion 9 (Test-Retest Reliability)
Four tests (TOLD-P: 3, PPVS, CDI-I: P, and BPVS-II) did not meet this criterion because they did not assess the test-retest correlation. In four reviewed tests (RPVT-I, RPVT-II, PVT, and PPVT) with reported test-retest reliability, the values were above 0.70.
6.10. Criterion 10 (Inter-rater Reliability)
None of the reviewed tools examined the inter-rater reliability evidence.
6.11. Criterion 11 (Administration and Scoring)
Four tests (PVT, TOLD-P: 3, RPVT-I, and RPVT-II) provided brief descriptions of administration and scoring procedure, but administration procedure was described by only two: PPVS and CDI-I: P. The TOLD-P: 3 was the only reviewed test that provided standard score and percentile cut-off point information.
6.12. Criterion 12 (Qualification)
None of the reviewed tools described the required qualifications for the administration and scoring.
The results of the review of the psychometric characteristics of vocabulary assessment tools are provided in
Table 2.
| Psychometric Criteria | Tests |
|---|
| PPVT | TOLD-p: 3 | CDI-I: P | PVT | BPVS: I | RPVT- | RPVT-II | PPVS |
|---|
| Standardization sample | | | | | | | | |
| a | - | √ | - | - | - | - | - | - |
| b | √ | √ | - | - | - | - | - | - |
| c | - | - | - | - | - | - | - | - |
| d | - | - | - | - | - | - | - | - |
| Adequate sample size | √ | √ | - | - | - | - | - | √ |
| Content validity | - | √ | - | √ | - | √ | √ | √ |
| Mean and standard deviation | √ | √ | √ | √ | √ | √ | √ | √ |
| Concurrent validity | √ | √ | - | - | √ | - | - | - |
| Predictive validity | - | - | - | - | - | - | - | - |
| Construct validity | | | | | | | | |
| a | - | √ | - | - | - | - | - | - |
| b | √ | √ | - | √ | √ | - | - | √ |
| c | - | √ | - | - | - | √ | - | √ |
| Internal consistency reliability | √ | √ | √ | √ | √ | √ | - | √ |
| Test-retest reliability | √ | - | - | - | - | √ | √ | - |
| Inter-rater reliability | - | - | - | - | - | - | - | - |
| Test administration and scoring procedures | - | √ | √ | √ | - | √ | √ | √ |
| Qualifications for the test administrator or scorer | - | - | - | - | - | - | - | - |
Abbreviation: PPVT, peabody picture vocabulary test; TOLD-P: 3, test of language development- primary: 3 (TOLD-P: 3); BPVS-II, British Picture Vocabulary Scale; CDI-I:P, MacArthur-Bates Communication Development Inventory: Persian version; PPVS, short form of Persian picture vocabulary receptive; RPVT-I, picture vocabulary test; RPVT-II, receptive picture vocabulary test; PVT, picture verb test.