In this study, we used a novel and appropriate method to estimate sensitivity and specificity of noninvasive fibrosis tests. When a gold standard is absent, latent class models are often used where the unknown gold standard test is treated as a latent variable (
19). LCM-R model showed well compatible application value for diagnosis. Since the reference standard for liver fibrosis stages was deficient and the limitation of liver biopsy fibrosis restricted its use, the model without using reference for estimating was compatible with distribution of above four noninvasive fibrosis tests. This work for the first time evaluated diagnostic value of the four common noninvasive fibrosis tests for HBV-relative liver fibrosis stages without a gold standard.
Many studies have been published on predicting significant fibrosis and cirrhosis among CHB patients in the past few years (
20). The biochemistry markers model and physical detection methods are the two main ways of noninvasive tests. Salkic et al. study suggested that algorithm based on routine laboratory tests was an usable, applicable and accurate tool for diagnosis of CHB related fibrosis and cirrhosis, which was suitable for resource-limited settings where more expensive modalities were unavailable (
21). Noninvasive fibrosis tests have been considered as the appropriate substitution to overcome limitations of liver biopsy to assess liver-fibrosis. All noninvasive fibrosis tests were firstly confirmed by the liver biopsy as the reference standard. However, the limitations of sample error and subjective bias in pathological diagnosis made liver biopsy not perfect enough as a gold standard to assess other tests. The sample of liver biopsy is only a small part of the whole liver, which might not be representative for the severity of hepatic fibrosis and lead to underdiagnosis of cirrhosis with sampling error (
22). Although increasing the length of liver biopsy and using 16 gauge needle to ensure enough caliber of biopsy specimens could reduce the risk of sampling error, sampling variability still cannot be completely avoided (
22,
23). Liver biopsy could not serve as the gold standard without strict conditions. The latent class model with random effects took a full consideration to the random variability factor in the model. All the tests (APRI, FIB-4, GP and LSM) were initially validated using biopsy and therefore it was rational to use a method that considered this non-independence among tests. Moreover, the latent class without random effects could not fit tests results distribution after estimating, which further testified the fit of LCM-R from the other side.
In estimation of LCM-R for noninvasive fibrosis tests, FIB-4 showed the best performance for diagnosis of significant fibrosis with high specificity and sensitivity (> 90%). The comprehensive performance was assessed by the Youden index, with higher value representing higher quality. Therefore, FIB-4 showed the best value for diagnosis of significant fibrosis. Although FIB-4 was initially applied to predict significant fibrosis in patients with HIV/HCV coinfection, its usage has been expanded to CHB patients (
24). In previous studies, FIB-4 was only recommended for diagnosis of mild liver fibrosis, but the cut-off of FIB-4 for cirrhosis was still controversial (
25,
26). In this study, we found that FIB-4 indeed had wonderful diagnostic value for significant fibrosis in CHB patients, while its ability to detect cirrhosis was deficient. Considering past studies and meta-analysis, we supposed the cut-off for cirrhosis as 3.6 and corresponding specificity was nearly 90% for cirrhosis; however, the sensitivity was less than 20%. Beyond that, hepatic fibrosis might be a risk factor for HCC. A research from Korea supported that high FIB-4 was a highly predictive risk factor of HCC incidence in CHB carriers (
27), which was consistent with its high value for assessing hepatic fibrosis in our study.
APRI and GP both had well performances for diagnosis of significant fibrosis (sensitivity > 90%, specificity > 70%). GP was a new biochemistry marker model for HBV-relative liver fibrosis test, which had the sensitivity and specificity of 72.4% and 69.6% for minimal fibrosis, 72.7% and 84.5% for cirrhosis in the first report (
15). However, more other rigorous clinic studies of GP were unavailable for further confirmation. One side, we selected this innovative method for revaluation; for another, it was regarded as a matched group for other tests estimation. Similar to FIB-4, APRI was a widely used test and the calculating parameters were the most common. Accuracy of the two tests for significant fibrosis in CHB patients had been compared in a meta-analysis; sensitivity and specificity values of FIB-4 were 65.4% and 73.6%, while those of APRI were 70.0% and 60.0%, respectively (
28). Despite comparison in our evaluation was a little different from that meta-analysis, FIB-4 and APRI were both recommended for diagnosis of significant fibrosis with moderate accuracies in CHB patients.
LSM showed an unsatisfactory performance with lower specificity (75.11%) and sensitivity (66.01%) for diagnosis of significant fibrosis, compared with above tests. For diagnosis of cirrhosis, performance of all tests weakened, especially sensitivity (< 40%), while specificity was relatively high (< 75%). The Youden index of all tests was too small to indicate their suitable value for cirrhosis diagnoses. Even so, LSM had the most balanced diagnosis for cirrhosis with the highest sensitivity (37.03%), well specificity (78.64%) and biggest Youden index. LSM was also first suggested for predicting hepatic fibrosis in patients with HCV (
29). Subsequent studies had confirmed it to be reliable for detection of significant fibrosis or cirrhosis in HBV patients and cut-off values were only slightly different from those observed in HCV patients (
30,
31). Consistent with existing research conclusions, our estimation also suggested that diagnostic accuracy of LSM was relatively high for cirrhosis, but relatively poor for significant fibrosis (
32).
Compared with performances of tests in previous studies and meta-analyses with biopsy as the gold standard, the APRI, FIB-4, GP and LSM showed better performances for diagnosis of significant fibrosis, while less value for diagnosis of cirrhosis by LCM-R (latent class model with a random-factor) model (
14,
15,
20,
24,
26,
33). Models using LCM without random effects for significant fibrosis and cirrhosis did not fit the observed distribution (P value of L
2 was less than 0.05,
Table 5), which suggested a random effect due to dependency among tests (as expected due to previous validation of APRI, FIB-4, GP and LSM by biopsy). In the LCM-R model assessment, relative performances of APRI, FIB-4, GP and LSM would be helpful in the absence of a gold standard.
5.1. Impaired Sources of Major Variability Among Tests
To identify the strength of LCM for estimation, we considered random effect of initial dependency among noninvasive fibrosis tests and discovered their paired residual. As previously estimation mentioned above, bivariate residuals of LSM-APRI and LSM-FIB-4 were the impaired source in modeling for significant fibrosis. The rational explanation might be that necrosis and inflammation increase LSM independent of fibrosis stages (
34) and ALT increases LSM linearly in chronic hepatitis B patients at any fibrosis stage (
35). In spite of exclusion of patients with obviously increased aminotransferase in this study, we still needed to think over recessive necrosis, inflammation and steatosis in liver. The LSM-GP pair was the most important residual for diagnosis of cirrhosis. PLT was usually significantly influenced in the later period of cirrhosis with hypersplenism, which would make GP disable to distinguish earlier cirrhosis.
5.2. Limitations
As a diagnosis-evaluation, this study could not give the AUROC of each test, because LCM-R could only give an estimation of test performance, which was the main limitation of this method. There was no confidence interval for the LCM-derived sensitivity and specificity estimates, because these estimates were calculated from combinations of conditional probabilities, which had individual maximum-likelihood and estimated standard errors. During the analysis of the tests, bivariate residuals of LSM-APRI, LSM-FIB-4 and LSM-GP impaired the fit of models; therefore, more studies should be performed to identify the causes of high discordances rates between these pairs including their intra- and inter-observers variability. The cut-off of all tests had been controversial in different studies, which were still not unanimous. Much more clinic researches are needed to get accurate cut-off and raise the diagnostic efficacy. In this study, we quoted the WHO HBV guideline and strict meta-analysis to define the cut-offs for the tests. Liver biopsy was not performed in this study, so we could not compare and verify assessment of tests between traditional analysis according to liver biopsy as a gold standard and model-estimation using LCM-R. In spite of these limitations, this study was estimation and verification of the tests and the performances cannot represent the true status, but we believe it is approximate to the truth.
5.3. Conclusions
In this model without gold standard, high specificity and sensitivity (> 90%) of FIB-4 were confirmed for diagnosis of significant fibrosis. APRI also had sub-optimal diagnosis accuracy (sensitivity > 90%, specificity > 70%) for significant fibrosis. LSM showed the best diagnosis value for cirrhosis with the highest sensitivity (37.03%) and well specificity (78.64%).
Through the estimation of above four noninvasive fibrosis tests by LCM-R model, we could get their diagnostic performances and relative dominance of each test in diagnosis and select the best or combined test to achieve the best accuracy for clinical application depending on their diagnostic values.