1. Background
The differential diagnosis of precocious puberty (PP) from premature thelarche (PT) and its variants sometimes poses (1). The clinical findings and tests have various drawbacks, and none of the tests alone can identify early pubertal disorders (2). The gonadotropin releasing hormone (GnRH) stimulation test is considered the gold standard. However, it is expensive, time consuming, painful and invasive, resulting in patient discomfort. Furthermore, studies have reported different specificity, sensitivity and cut-off values for GnRH (3, 4). Many factors, including the specificities of follicle stimulating hormone (FSH) and luteinizing hormone (LH) assays, influence GnRH test results (5). Estradiol measurements are less reliable indicators of the stage of puberty (2). In addition, estradiol levels sometimes fluctuate because of the pulsatile secretion of gonadotropin, which stimulates ovarian cysts (6). Since the pubertal process may stall or regress and be reactivated at a later point, breast development is not always reliable indicator of the pubertal status (7, 8). The differential diagnosis is particularly challenging in patients who fall into grey zone category (9).
Pelvic ultrasound has been shown to be an efficient tool in distinguishing PP from PT, particularly when the results of the GnRH stimulation test are equivocal. A number of studies concluded that pelvic ultrasound may provide an earlier clinical indication of PP than pubertal GnRH test results (10, 11).
Scoring systems are commonly used to diagnose or predict progress in many diseases and also provide a simple and systematic approach for inexperienced junior staff clinicians (12, 13). A diagnostic scoring system could contribute to distinguish equivocal cases of PP and PT and also provide a simpler alternative to current methods for use by clinicians who are faced with making decisions on cases of early puberty. Thus far, no scoring system for use in cases of PP and PT or its variants has been developed.
2. Objectives
The aim of the present study was to establish a new simple, fast, useful and cost-effective scoring system based on clinical and laboratory findings to distinguish PP from PT and its variants as an alternative or complementary approach to the GnRH test.
3. Methods
The first part of this study was performed as a retrospective case-control study to build scoring model and also its second part was applied as prospective cohort study to validate model. Study consisted of 267 girls previously diagnosed with PP or PT according to conventional diagnosis protocol (including GnRH test) of early puberty in 2010 - 2014. All girls with PP were diagnosed as idiopathic. The girls, diagnosed with secondary PP were not included into the study. Retrospective medical records of all girls were used to construct a new scoring model. The inclusion criteria comprized all girls aged < 8 years who had breast development classified as at least B2 according to the Tanner stage. This study was approved by Gaziantep University Clinical Research Ethic Committee (No, 26.5.2014/186). Informed consents were provided by all the children’s parents.
All girls were previously diagnosed according to following conventional applications:
Breast development was defined by the diameter of palpable glandular tissue (diameter of at least 1 cm) when holding the breast. Growth percentile charts for Turkish children were used to calculate height and weight standard deviation scores (SDSs) (14, 15). An increase of at least 6 cm in height in the last year was accepted as an accelerated growth rate. Growth velocity could not be calculated in some girls (n = 52) due to the absence of growth records within the previous six months. In these cases, height in the 90th percentile or greater was used instead of growth velocity (16). LH levels < 0.3 IU/L were considered prepubertal, as described by Neely et al, and levels > 0.3 IU/L were considered pubertal (17).
The following criteria were previously considered to denote conventional pubertal findings in practice: a) growth velocity of at least 6 cm/last year or height in at least the 90th percentile (if there were no growth records); b) at least B2 on the Tanner stage of breast development; c) BA/CA ratio ≥1, d) mean ovarian volume of at least 1 cm3, e) uterine length of at least 35 mm; and f) baseline LH (bLH) > 0.3 UI/L, peak LH (pLH) > 5 UI/L, stimulated LH/FSH (sLH/FSH) ratio > 0.3 or estradiol >12 pg/ml (17-20). The patients were classified into two groups: PP and PT. PT referred to the isolated appearance of breast budding, without any pubertal finding before the age of 8 years. PP referred to breast development in girls younger than 8 years, together with the presence of any pubertal finding, including gonadotropin activation and bLH > 0.3 UI/L, pLH > 5 UI/L or sLH/FSH ratio > 0.3.
Breast development stage was assessed according to the Tanner stage. Height was measured using a wall-mounted stadiometer (Harpenden, Haltein, UK). The BMI SDSs were calculated based on available data on Turkish children (21). The radiological determination of BA was interpreted according to Greulich and Pyle.
The GnRH stimulation test was just performed to diagnose in 210 girls (PP, n = 137; PT, n = 73). LH and FSH assays were performed at baseline and 30 and 60 min after intravenous administration of a standard dose (100 μg/m2) of GnRH (LH-RH Ferring, Ferring, Switzerland). This test was not performed in cases of bLH > 1 IU/L (n = 37), because this level is accepted to diagnose with PP.GnRH test was performed in all cases in whom any pubertal finding was present. The GnRH test was performed for differential diagnosis in all 73 girls in the PT group. In the cohort group (n = 86), the test was performed in just 17 girls.
All blood tests were drawn at 09 am. LH, FSH, estradiol and FSH levels were measured in all the girls. LH and FSH levels were determined using an electrochemiluminescence immunoassay, with a Cobas®6000 (Roche Diagnostic, Manheim, Germany) analyzer. The sensitivity of the analyzer for LH and FSH was 0.1 IU/L and 0.1 IU/L, respectively.
Pelvic ultrasound scans were performed in all cases using a linear VF13-5 (13.5 MHz) transducer and Siemens Sonoline Antares ultrasound machine (Siemens Medical Solutions USA Inc., Malvern, PA). The same radiologist (A.O.) performed all the scans. The following parameters were measured: a) uterine length, transverse diameter (width) and fundal anteroposterior diameter and b) ovarian height, width and length. The uterine and ovarian volumes were calculated using the formula for ellipsoid bodies (V: longitudinal diameter × transverse diameter × anterioposterior × 0.5233).
All the girls diagnosed with PP underwent cranial MRI. Any cranial pathology (e.g. hamartomas, adenomas or hydrocephalus) that could be a potential cause of PP was recorded. The shape and any heterogenic opacification of the pituitary gland was noted.
The predictive variables used in the differential diagnosis were as follows: age at diagnosis, BA, growth velocity or height, weight, body mass index (BMI), bLH level, stimulated LH/FSH (sLH/FSH) level, peak LH (pLH) level, estradiol level, DHEA-S, additional disease, cranial/pituitary pathology, uterine length and ovarian volume.
3.1. Statistical Analysis
The normality of distribution of continuous variables was tested by Shaphiro Wilk test. The Student’s t-test (normally distributed data) and Mann-Whitney U test (for non-normal data) were used to compare the two groups. The predictive five variables that are determined to be included in the model are tested with univariate analyses: age at diagnosis, BA-CA ratio, estradiol level, uterine length, ovarian volume and pLH level. The ones that are found to be statistically significant are put into multivariate analyses model. Variables that had higher than 70% sensitivity or specificity were selected for inclusion in the scoring model. Basal LH was not included in scoring system due to being prepubertal level, in generally. Cut-off values for all the predictive variables were determined by using receiver operating characteristic curve (ROC) analysis. A logistic regression model was used to calculate beta coefficients for each variable included in the scoring model. Beta coefficients were calculated for each variable included in the scoring model. Multicollinearity was checked by calculating variance inflation factors. Finally, cut-off values for total scores were determined by a ROC curve analysis. The total score was 12 points. The score of 5 or above points were accepted as PP diagnosis. This model was prospectively applied to a second cohort group in 2014 - 2015.
All univariate analyses were performed in SPSS for Windows, version 22 (IBM). A two sided P value < 0.05 was defined as statistically significant.
4. Results
In this study, 164 (61.5%) girls were diagnosed with PP and 103 (38.5%) girls were diagnosed with PT according to conventional diagnostic procedure, retrospectively. The mean age of the patients in the PP and PT groups at onset was 7.21 ± 1.36 and 5.09 ± 2.64 years, respectively (Table 1). One hundred and seventy-four (66.3%) girls were in the grey zone. The mean follow-up time was 7.8 ± 0.07 months before diagnosis. According to Tanner stage, 134 had B2, 25 had B3 and 5 had B4 grades in PP group. Of PT girls, 92 had B2 and 11 had B3 grades. Seventy-one girls (PP, n = 66; PT, n = 5) showed changes in the pituitary gland on MRI that could not be well defined. We found that following variables were statistically significant to be used in scoring model: Age at diagnosis (years), BA/CA (year), estradiol (pg/ml), Uterine length (mm) and ovarian volume (cm3) (P = 0.001 for each variable). Cut off values of variables were determined as follows: Onset age 6.5 years, BA/CA 1.1, estradiol 12pg/ml, uterine length; 32 mm, ovarian volume;1.09 cm3 (Table 2). Beta coefficients for each variable included in scoring model are given in Table 3. Total score obtained from model was 12 points. We calculated the definitive point for each parameter as follows: Onset age 3, BA/CA; 2, estradiol 3.5, uterine length 2, ovarian volume 1.5 points. Compared to conventional applications (including GnRH test), the sensitivity and specificity of M was 89.6% and 87.4%, respectively (Figure 1). The accuracy rate of M was 89.8%. The model was applied to a cohort group of 86 girls with early pubertal signs. Table 4 shows diagnostic test results and accuracy rates of the M in the study and cohort groups.
Mean Values of the Diagnostic Variables in the PP and PT Groups
Variablesa | Cut Off | Sensitivity | Specificity |
---|---|---|---|
Age at diagnosis, year | > 6.5 | 86.6 | 57.3 |
BA/CA, year | > 1.1 | 70.7 | 86.4 |
Estradiol, pg/ml | > 12 | 53 | 94.2 |
Uterine length, mm | > 32 | 80.5 | 83.5 |
Ovarian volume, cm3 | > 1.09 | 76.8 | 73.8 |
Cut Off Values, Specificity and Sensitivity of the Different Variables Included in the Scoring System for PP
Model: Nagelkerke’s R2 = 0.77 (Sensitivity, 89.6; Specificity, 87.4) | ||||
---|---|---|---|---|
Variables | β | Rounded Score | Adjusted OR [95%Cl] | P |
Age at diagnosis | 3.02 | 3 | 20.4 [5.87 - 70.90] | 0.001a |
BA/CA | 2.15 | 2 | 8.61 [3.34 - 22.22] | 0.001a |
Estradiol | 3.63 | 3.5 | 37.64 [9.33 - 151.9] | 0.001a |
Uterine length | 1.89 | 2 | 6.65 [2.75 - 16.12] | 0.001a |
Ovarian volume | 1.54 | 1.5 | 4.64 [1.89 - 11.35] | 0.001a |
Total: 12 |
Rounded Scores for the Variables Parameters Used for Scoring Model
Group | Study Group (n = 267) (PP, n = 164; PT, n = 103) | Cohort Group (n = 86) (PP, n = 10; PT, n = 76) |
---|---|---|
Model cut-off | M Total score >5 (%) | M Total score >5 (%) |
Sensitivity | 89.6 | 90 |
Specificity | 87.4 | 89.4 |
PPV | 91.8 | 53 |
NPV | 84.1 | 98.5 |
Accuracy rate | 89.8 | 90.5 |
Sensitivity, Specificity and Accuracy Rates for PP Diagnosis in the Study and Cohort Groups
5. Discussion
Distinguishing precocious puberty from premature thelarche sometimes poses diagnostic dilemma. Pubertal signs inconsistent with laboratory findings and multifactorial nature of pubertal onset can cause confusion for clinicians during the decision-making process. Any diagnostic tool cannot allow the definitive diagnosis alone. Although GnRH test is considered as gold standard to make diagnosis, it has some considerable drawbacks (22, 23). It is a time consuming, painful and invasive procedure and causes injection anxiety in children. Moreover, its variable sensitivity, specificity and cut off results limit diagnostic value (24).
Scoring models provide estimating simple and useful approach in case of diagnostic equivocal conditions. It is also used to predict prognosis in many diseases (25). Thus, it guides therapeutic process, effectively. In literature, no scoring model has been used in differential diagnosis of precocious puberty, so far.
We aimed to establish a newly scoring model as a complementary or alternative diagnostic approach to GnRH test that distinguishes PP from PT. In this study, we designed that the newly developed scoring system was a reliable method for the differential diagnosis of PP and PT without GnRH test.
In our study, we enrolled 164 (61.5%) girls who were diagnosed with PP and 103 (38.5%) girls who were diagnosed with PT according to conventional diagnostic procedures including GnRH test, retrospectively. Age at presentation of pubertal signs is very important in distinguishing between benign early pubertal conditions and true PP (26). Since age at onset of pubertal signs had high sensitivity in our study, we included age at onset in the scoring model. The mean age of the patients in the PP and PT groups at onset was 7.21 ± 1.36 and 5.09 ± 2.64 years, respectively (Table 1). One hundred and seventy-four (66.3%) girls were in the grey zone. When compared to previous studies, mean age of our cases was found mildly higher than that in similar reports (26, 27). Most of our cases were in grey zone including ages of 7 - 8 years. Later we interpreted that these cases applied to early pubertal signs. Therefore, we couldn’t follow growth rate of most cases.
The enlarged uterine length, increased ovarian volume and advanced bone age usually represent the exposure to estrogenic effects due to activation of hypothalamo-hypophysial-gonadal axis or its excessive peripheral production. These findings indicate reliable evidence of pubertal signs. It was shown in many reports that both increased uterine length and increased ovarian volume can be used to distinguish PP from PT and its variants (28). However, some studies also measured other parameters such as shape, thickness and volume of uterine, the uterine length was used as diagnostic parameter in the present study because it is measured easier (29). We found that cut off value of uterine length is 32 mm (sensitivity 80.5% and specificity 83.5%). In our study we measured ovarian volume and used mean volume of bilateral ovaries. We calculated that cut off value of ovarian volume is 1.09 cm3 (sensitivity, 76.8% and specificity, 73.8%). There are different measurements for uterine length and ovarian volume as pubertal signs in the literature (30, 31). These differences may result from different onset age, duration and stages of pubertal status.
Advanced bone age guides to make diagnosis and predict prognosis in precocious puberty. It also plays a role in making-decision for treatment. Moreover, in a study, it was demonstrated that advanced bone age is the most effective predictor of the result of GnRH test (32). This indicates that advanced bone age can be used as an alternative diagnostic tool to GnRH test. We found that cut off value of bone age to chronological age is 1.1 (sensitivity 70.7% and specificity 86.4%). This measurement is consistent with similar studies (33).
In present study, mean estradiol level (pg/ml) in PP (mean 17.4 ± 5.54) was higher than in PT (mean 5.99 ± 3.6) (P = 0.001). Its cut off value was 12 pg/ml found to use for scoring model. In our scoring model, esradiol had the highest point (3.5 point). This result suggests that level of estrogenic exposure is important and plays a role in development of pubertal changes (34).
Our scoring model is the first report that establishes differential diagnosis of precocious puberty. Therefore, we couldn’t compare it with similar studies. We compared with other diagnostic or prognostic scoring models regarding it diagnostic value. There are many clinical scoring models (35, 36). Compared with previous scoring models, such as scoring system to distinguish uncomplicated from complicated acute appendicitis, our models have similar diagnostic value (37). Scoring models can be created with combination of many variables (25). The following variables were statistically significant to be used in scoring model: age at diagnosis (years), BA/CA (year), estradiol (pg/ml), uterine length (mm) and ovarian volume (cm3) (P = 0.001 for each variable). We chose diagnostic variables with both significant and non-invasive diagnostic parameters. These variables are noninvasive measurements except estradiol assay. The sensitivity and specificity of our scoring model was 89.6% and 87.4%, and its accuracy rate was 89.8%. According to a previous research, the sensitivity and specificity of the GnRH test was 74 - 100% using a cut-off pLH level of 5 IU/L (4). In our cases, the cut-off value for pLH was 4.37 IU/L, and the sensitivity and specificity of pLH was 79.6% and 74%, respectively (Table 1). In this study, the sensitivity and specificity values of model were higher than those for the GnRH test. Thus, this new scoring system, which does not rely on the GnRH test, had high sensitivity, specificity and accuracy rates. We believe that this system could be a complementary diagnostic tool or an alternative to the GnRH test in case of diagnostic challenges. Despite that the new scoring system too uses blood test, it has not the disadvantages of GnRH test which is time consuming, expensive and uncomfortable.
Although the specificity of growth velocity was high (90%), its sensitivity was low (37%) for PP (2). In our study, most of the girls referred with early pubertal signs were in the grey zone. The diagnostic challenges are the most common in this period. Moreover, we had not time long enough to follow patients’ growth velocities due to health insurance payment instructions. As the scoring system does not include growth velocity, it can also be applied as a diagnostic tool in girls for whom growth history data are unavailable. In addition, it can be a useful alternative in patients in whom the GnRH test cannot be performed in practice. Because the scoring system is based mostly on clinical findings, it provides a faster diagnosis, non-invasive and more cost-effective approach than the GnRH test.
The first part of our study was retrospective, and we selected conventional diagnostic variables. The accuracy of the scoring system could be increased by including other significant findings, such as the results of pituitary gland MRI. We suggest that country-specific scoring systems need to be developed. Our study is the first to develop a scoring system for PP. The findings could not be compared with those in the literature due to absence of similar studies. However, our results were compatible with findings reported in studies of scoring systems of different diseases (12, 13).
We applied the constructed model to a second cohort group, which consisted of girls who were referred with early pubertal signs. The sensitivity and specificity of M in this cohort group was 90% and 89.4%, respectively, and its PPV was 53%. In the cohort group, PPV was not as high as in the study group. We attributed this finding to the small size of the study population (PP n = 10, PT n = 7). The GnRH test was performed in all the girls in the cohort group.
One of the limitations of this study (first part) is that data was collected retrospectively. The number of cases that we could not reach their records could affect results. Second limitation is that it was a single-center study.Therefore, this first scoring model must be approved by multicenter trials. Another limitation is borderline scores. Using this system, patients with borderline scores (total score of 5 points in M) are considered to have PT. This may pose a diagnostic challenge. In such cases, we recommend taking advanced bone age into account.
5.1. Conclusions
The proposed diagnostic scoring system based on clinical and laboratory findings offers a standard, cost-effective and simple approach to the differential diagnosis of PP, PT and its variants. It also eliminates some disadvantages of the GnRH test and may serve as an alternative or complementary tool for use in the differential diagnosis of PP.