Distinguishing precocious puberty from premature thelarche sometimes poses diagnostic dilemma. Pubertal signs inconsistent with laboratory findings and multifactorial nature of pubertal onset can cause confusion for clinicians during the decision-making process. Any diagnostic tool cannot allow the definitive diagnosis alone. Although GnRH test is considered as gold standard to make diagnosis, it has some considerable drawbacks (
22,
23). It is a time consuming, painful and invasive procedure and causes injection anxiety in children. Moreover, its variable sensitivity, specificity and cut off results limit diagnostic value (
24).
Scoring models provide estimating simple and useful approach in case of diagnostic equivocal conditions. It is also used to predict prognosis in many diseases (
25). Thus, it guides therapeutic process, effectively. In literature, no scoring model has been used in differential diagnosis of precocious puberty, so far.
We aimed to establish a newly scoring model as a complementary or alternative diagnostic approach to GnRH test that distinguishes PP from PT. In this study, we designed that the newly developed scoring system was a reliable method for the differential diagnosis of PP and PT without GnRH test.
In our study, we enrolled 164 (61.5%) girls who were diagnosed with PP and 103 (38.5%) girls who were diagnosed with PT according to conventional diagnostic procedures including GnRH test, retrospectively. Age at presentation of pubertal signs is very important in distinguishing between benign early pubertal conditions and true PP (
26). Since age at onset of pubertal signs had high sensitivity in our study, we included age at onset in the scoring model. The mean age of the patients in the PP and PT groups at onset was 7.21 ± 1.36 and 5.09 ± 2.64 years, respectively (
Table 1). One hundred and seventy-four (66.3%) girls were in the grey zone. When compared to previous studies, mean age of our cases was found mildly higher than that in similar reports (
26,
27). Most of our cases were in grey zone including ages of 7 - 8 years. Later we interpreted that these cases applied to early pubertal signs. Therefore, we couldn’t follow growth rate of most cases.
The enlarged uterine length, increased ovarian volume and advanced bone age usually represent the exposure to estrogenic effects due to activation of hypothalamo-hypophysial-gonadal axis or its excessive peripheral production. These findings indicate reliable evidence of pubertal signs. It was shown in many reports that both increased uterine length and increased ovarian volume can be used to distinguish PP from PT and its variants (
28). However, some studies also measured other parameters such as shape, thickness and volume of uterine, the uterine length was used as diagnostic parameter in the present study because it is measured easier (
29). We found that cut off value of uterine length is 32 mm (sensitivity 80.5% and specificity 83.5%). In our study we measured ovarian volume and used mean volume of bilateral ovaries. We calculated that cut off value of ovarian volume is 1.09 cm
3 (sensitivity, 76.8% and specificity, 73.8%). There are different measurements for uterine length and ovarian volume as pubertal signs in the literature (
30,
31). These differences may result from different onset age, duration and stages of pubertal status.
Advanced bone age guides to make diagnosis and predict prognosis in precocious puberty. It also plays a role in making-decision for treatment. Moreover, in a study, it was demonstrated that advanced bone age is the most effective predictor of the result of GnRH test (
32). This indicates that advanced bone age can be used as an alternative diagnostic tool to GnRH test. We found that cut off value of bone age to chronological age is 1.1 (sensitivity 70.7% and specificity 86.4%). This measurement is consistent with similar studies (
33).
In present study, mean estradiol level (pg/ml) in PP (mean 17.4 ± 5.54) was higher than in PT (mean 5.99 ± 3.6) (P = 0.001). Its cut off value was 12 pg/ml found to use for scoring model. In our scoring model, esradiol had the highest point (3.5 point). This result suggests that level of estrogenic exposure is important and plays a role in development of pubertal changes (
34).
Our scoring model is the first report that establishes differential diagnosis of precocious puberty. Therefore, we couldn’t compare it with similar studies. We compared with other diagnostic or prognostic scoring models regarding it diagnostic value. There are many clinical scoring models (
35,
36). Compared with previous scoring models, such as scoring system to distinguish uncomplicated from complicated acute appendicitis, our models have similar diagnostic value (
37). Scoring models can be created with combination of many variables (
25). The following variables were statistically significant to be used in scoring model: age at diagnosis (years), BA/CA (year), estradiol (pg/ml), uterine length (mm) and ovarian volume (cm
3) (P = 0.001 for each variable). We chose diagnostic variables with both significant and non-invasive diagnostic parameters. These variables are noninvasive measurements except estradiol assay. The sensitivity and specificity of our scoring model was 89.6% and 87.4%, and its accuracy rate was 89.8%. According to a previous research, the sensitivity and specificity of the GnRH test was 74 - 100% using a cut-off pLH level of 5 IU/L (
4). In our cases, the cut-off value for pLH was 4.37 IU/L, and the sensitivity and specificity of pLH was 79.6% and 74%, respectively (
Table 1). In this study, the sensitivity and specificity values of model were higher than those for the GnRH test. Thus, this new scoring system, which does not rely on the GnRH test, had high sensitivity, specificity and accuracy rates. We believe that this system could be a complementary diagnostic tool or an alternative to the GnRH test in case of diagnostic challenges. Despite that the new scoring system too uses blood test, it has not the disadvantages of GnRH test which is time consuming, expensive and uncomfortable.
Although the specificity of growth velocity was high (90%), its sensitivity was low (37%) for PP (
2). In our study, most of the girls referred with early pubertal signs were in the grey zone. The diagnostic challenges are the most common in this period. Moreover, we had not time long enough to follow patients’ growth velocities due to health insurance payment instructions. As the scoring system does not include growth velocity, it can also be applied as a diagnostic tool in girls for whom growth history data are unavailable. In addition, it can be a useful alternative in patients in whom the GnRH test cannot be performed in practice. Because the scoring system is based mostly on clinical findings, it provides a faster diagnosis, non-invasive and more cost-effective approach than the GnRH test.
The first part of our study was retrospective, and we selected conventional diagnostic variables. The accuracy of the scoring system could be increased by including other significant findings, such as the results of pituitary gland MRI. We suggest that country-specific scoring systems need to be developed. Our study is the first to develop a scoring system for PP. The findings could not be compared with those in the literature due to absence of similar studies. However, our results were compatible with findings reported in studies of scoring systems of different diseases (
12,
13).
We applied the constructed model to a second cohort group, which consisted of girls who were referred with early pubertal signs. The sensitivity and specificity of M in this cohort group was 90% and 89.4%, respectively, and its PPV was 53%. In the cohort group, PPV was not as high as in the study group. We attributed this finding to the small size of the study population (PP n = 10, PT n = 7). The GnRH test was performed in all the girls in the cohort group.
One of the limitations of this study (first part) is that data was collected retrospectively. The number of cases that we could not reach their records could affect results. Second limitation is that it was a single-center study.Therefore, this first scoring model must be approved by multicenter trials. Another limitation is borderline scores. Using this system, patients with borderline scores (total score of 5 points in M) are considered to have PT. This may pose a diagnostic challenge. In such cases, we recommend taking advanced bone age into account.
5.1. Conclusions
The proposed diagnostic scoring system based on clinical and laboratory findings offers a standard, cost-effective and simple approach to the differential diagnosis of PP, PT and its variants. It also eliminates some disadvantages of the GnRH test and may serve as an alternative or complementary tool for use in the differential diagnosis of PP.