According to this study, the inter-observer agreement in breast density type is good (substantial agreement) though not perfect. Previous studies that assessed inter-observer variability showed moderate agreement such as the study carried out by Berg and co-authors (Kappa=0.43) (
21), and another study performed by Ciatto et al. (Kappa=0.54) (
22), or higher agreement, as a study conducted by Ooms et al. (Kappa=0.77) (
23). Our study is comparable with the last mentioned study (Kappa=0.74). This improvement could be due to more education, as was also mentioned by Ooms (
23). Furthermore, D’Orsi et al. recommended some modification in the defined percentages of some density types, for example almost entire fat would be up to 10% density instead of 0-25% and scattered fibroglandular densities might then range from 11-50%, instead of 25-50% (
18), so tissue type discrepancy in this study could be justifiable. Double reading of 642 breast mammograms in this study resulted in only one more detected malignancy or 0.2% increase in the cancer detection rate (CDR) that is significantly lower than improvement in the cancer detection rate of previous studies (
7,
12,
24-27). In addition, the readers’ agreement in this study on the final report of positive or negative is lower than the comparative study by Duijm (
20). This can be due to limitation in the number of cases and readers. More numerous cases and more readers may cause different results. On the other hand, Beam et al. believed that expected gain in true-positive results (TPR) in double reading studies depends on the experience of the radiologists (
28). More improvement in TPR may be achieved by repeating the reading by more experienced radiologists, so another cause of no significant improvement in CDR in this study may be due to the similarity of the reader’s experience. The most common type of significant discrepancy in BI-RADS categories was category 0 versus 1 (64%) and 0 versus 2 (23%), which were mostly related to focal asymmetric densities and were mentioned only by one of the readers. It is important to note that all these focal asymmetries that were mentioned by only one reader were related to nonspecific or benign findings in the follow-up and none of them were related to significant or malignant pathologies. The recall rate of both readers in this study was significantly higher than the suitable or target recall rate (
19), and this could be another cause of less improvement in CDR by double reading in this study. As we know, a higher recall rate is related with more false positive results, more anxiety and cost. Although a higher recall rate for this study could be partly due to mixed diagnostic and screening purposes instead of pure screening purpose, this is still higher than the optimal recall rate and one of the most important aspects of this study could be the idea of lowering the recall rate by double reading in our practice. Based on this idea, we can recall a patient when both readers’ agreement is achieved and therefore we may expect a less recall rate and related anxiety and cost for patients. Another detected cancer by double reading in this study was related to architectural distortion which was detected only by one of the radiologists and this is similar to a previous study conducted by Cornford et al. (
25). Finally, further studies with more readers and more cases with pure screening mammograms are recommended. In addition, further studies are necessary on the evaluation of recall rate in Iran and if the recall rate is higher than optimum (as expected), lowering the recall rate might be a more important consequence of double reading in our practice. This study shows no significant improvement in the cancer detection rate by double reading; however a lower recall rate could be a more helpful consequence.