Introduction:Cancer Registry is one of the important components of health information systems in developing countries. Continuous monitoring of data quality can have a crucial role in controlling cancers. This study aimed to assess the quality of cancer registry data in terms of completeness of coverage and validity.
Methods:Data were collected from three main sources, including Pathology registry, Hospital and national death registries in five provinces in Iran during March 2008-March 2011. We used two source capture-recapture method for estimate of cancer registry coverage and measures of validity were percentage of death certificate only (DCO%), histological verified cases (MV%); cancer incidence in childhood based on sex and age group, percentage of cancer in the elderly (80 years or above) and mortality-to-incidence ratio (M:I).We compared them to international standards.
Results:The overall completeness was estimated at 54.2% and 32.4% under reporting for stomach cancer in a period of three years (2008 - 2010). MV% and percentage of unknown primary site of the tumor were 68.7%, and 5% respectively. The mortality-to-incidence ratio for men and women was 37.6% and 28.2%, and percentage of cancer in the elderly was 10.9% in 2010 year. The age-specific rate in girls and boys in age groups of 5 - 9 and 10 - 14 years was lower than minimum of the recommended international standards.
Conclusions:The results of this study showed data quality of cancer registry is relatively low in terms of the completeness and validity. Cancer registries should pay great attention to the quality of their data. In addition to technical measures in data processing, continuous evaluation of their quality in order to achieve the set goals is essential.
Cancer is a non-communicable disease, which imposes a significant burden on the society (1). Considering the growth and population ageing, the global burden is expected to grow to 21.7 million cancer cases and 13 million cancer death by 2030 (2).
According to estimations of IARC in 2012, in Iran about 53000 persons lost their lives due to all kinds of cancers (3). Stomach cancer was the first cause of death of cancer in both sexes (4). With this background in mind, prevention and control of cancer through implementing national cancer registry programs is a health priority in the society. Cancer registry encompasses regular data collection and analysis of cancer patients and estimation of cancer incidence in different geographical regions (5). With accurate and inclusive data, cancer registry could reduce the burden of this disease (6). In the national registration systems of Iran, the Center of disease control is responsible for the collection and analysis of cancer data. This center receives the data on cancer patients from urban health centers of different cities and provinces. Classification and coding systems in this regard are based on the standards of the International classification of diseases for oncology, third edition (ICD-O-3) (7). Although cancer registry has begun since 1999 by the Iranian ministry of health and medical education at the center of disease control, it has failed to cover all cancer data of pathology laboratories and other departments. As such, results of initial reports in this regard could not provide an actual estimation of cancer incidence since this rate was reported to be 18% (8). Nevertheless, cancer registry based on pathology continued afterwards, and the coverage rate of 86% was reported for pathology registry in 2009. Cancer pathology reports are published based on the data collected from a wide range of health centers. In order to improve the quantity and quality of cancer registry data, a population-based cancer registry program was administered in 20 universities as an adjunct to histopathology since 2008 (9).
Independent data of cancer patients are collected from science departments, such as population-based research centers (8, 10). Cancer registry data are interpreted using qualitative or quantitative approaches. Qualitative or semi-qualitative methods suggest a lack of completeness, compared to other registries or over time, but not actually quantify the number of missing cases. Despite the completeness of collected data, value of registry data lowers in case information deficiency is confirmed. Therefore, verification of completeness is essential to predicting cancer incidence and implementing related planning and policymaking. Complete registry and accuracy of details are two main factors for determining the quality of data in the cancer registry system. In other words, completeness of coverage determines the proportion of cancer cases in a population that is submitted in a registry system (11, 12). Validation ensures the high quality of available data and accuracy is the adjustment of the measured value with the actual value, which is assessed quantitatively using various methods. Some of these methods are reabstracting and recoding use of diagnostic criteria (e.g. histological verification and Death Certificate Only (DCO), analysis of missing information, and internal consistency (13).
The mentioned methods are essential to data report and analysis, and failure to register any of these variables increases error and reduces data accuracy. National cancer registry is inherent to the monitoring of cancer incidence and mortality, and developing strategies to control this disease. Since the Ministry of Health and Medical Education is the only center for the national registration of cancer incidence data in Iran, and given the importance of this issue, this study aimed to assess the quality of cancer registry data in Iran. Evidently, proper use of accurate data by health managers results in efficient care services and health promotion of the society.
2.1. Data Sources
In this cross-sectional study, reported data were collected from three main sources, including the national cancer registry (pathology registry), hospital registry, and national death registry in five provinces of Isfahan, Golestan, Semnan, Bushehr and Kermanshah in Iran during March 2008 -March 2011. The number of studied populations was based on the age group and sex of cancer patients, as proposed by the statistical center of Iran (SCI).
2.2. Statistical Analysis
We used the two-source capture-recapture method, which is a quantitative approach to estimate the extent of cancer registry coverage (14). Over the past few decades, capture-recapture methods have been widely applied in epidemiological health promotion programs. These methods are used in various medical fields to determine hidden populations, completeness of registry coverage, and incidence or prevalence of certain diseases. Capture-recapture methods are based on four main assumptions: closed population, ability to determine the shared features of two (or more) lists, independent sources, and the homogeneity of the population (15).
In cancer registry, underestimation rate is determined via Lincoln-Petersen and Chapman methods (5), sample coverage method (16), and log-linear model. In this study, we used the Petersen-Chapman model due to the limited number of registered cancer cases in hospitals, which were merged with the data of pathology registry.
In the Petersen-Chapman model, a sample from the target population is captured, labeled, freed, and recaptured at different times, and number of cases in each sample and common cases between the populations is estimated. The larger number of shared cases of samples is associated with a smaller reference population, while the smaller number of shared cases suggests the greater estimation of the reference population. In this study, each data source was determined as a sample, and patients’ names were considered as the unique characteristic of each case. Total population was estimated based on the proportionality argument in the Petersen-Chapman model (17), where n1 includes the reported cases in source one (pathology and hospital registries), n2 represents the reported cases in source two (cancer death registry), and m2 shows the shared features of the two sources. To remove bias in the Lincoln-Petersen model, the following formula was proposed by Chapman in 1951 (18) (Equation 1):
To calculate the completeness of coverage of cancer registry, the number of the cases registered was divided by the number of the cases estimated using the following formula (Equation 2):
Underestimation rate was obtained by subtracting the percentage of coverage completeness from 100. Moreover, coverage completeness was determined based on the sex and age of cancer cases in three age groups of less than 40 years, 40 - 59 years, and 60 years or above.
Measures of validity were as follows: 1) percentage of cases with death certificate only (DCO%) for which no information other than a death certificate mentioning cancer could be obtained; 2) percentage of cases with morphologically verified (MV%); (accurate cancer diagnosis based on histological examinations and pathological analysis by a pathologist); 3) percentage of cancer in the childhood and the elderly (80 years or above) population; and 4) mortality-to-incidence ratio (M:I). All these parameters are numerical indices to confirm data validity (14).
Percentage of DCO is calculated in the following steps:
Step one: Linkage of cancer death records in a specific year to all the records available in the cancer registry database in order identify the records that do not match.
Step two: Elimination of non-reportable cases (deaths not caused by cancer but coded as cancer death, out-of-jurisdiction residents, and cancers diagnosed before the central cancer registry reference date).
Step three: Resolving potential DCOs, where the remaining unmatched cases must be cleared according to the death clearance protocol of the central cancer registry, and cases that are not resolved at the time the DCO rate is calculated are considered as true DCO cases.
Another index of data quality assessment in this study was the rate of cancer cases with incomplete data. To control the missing data, percentage of unknown primary site of the tumor was determined and high percentage of tumors with primary site unknown (PSU) was interpreted as the deficiency of the cancer registry system. Childhood cancers were defined within the age range of 0 - 14 years, and the cases were categorized into three age groups of 0 - 4, 5 - 9 and 10 - 14 years. Incidence of new cases in the same period was defined based on the age-specific rate and compared with the international standards (20).
Cancer incidence and mortality rate were defined as the number of new cancer cases and deaths within one year in a high-risk population in the same year. To calculate these variables, data on the age groups and sex of the cancer cases during the same year were obtained from SCI database.
To identify possible duplicates, we controlled all the registered cancer cases using the available information, including name and surname, father’s name, date of diagnosis, and ICD-O diagnostic codes in the Excel software. In all calculations and analyses, duplicate patient records and tumors were excluded from the study. In addition, each tumor was examined only once, and data on patients and tumors were integrated. Data confidentiality was ensured during all the procedures of data extraction and analysis, and the study protocol was approved by the ethics committee of Shahid Beheshti University of Medical Sciences.
In this study, preliminary data were obtained from three sources in the selected provinces, including 53,398 cases (24,941 pathology reports, 20,468 death certificates, and 7,989 hospital registrations), after the exclusion of duplicate records and data linkage,35,643 cases were observed (44.7% females and 55.3% males).
According to our findings, the majority of cancer cases were aged 60 years or above (62.4%), and the male-to-female ratio was 1.2 (year 2010). It is noteworthy that 20 cases were excluded from further evaluation due to lack of data on age. Final percentage of the shared data is depicted in Figure 1.
According to the results of the two-source capture-recapture method, 65,797 new cases (95% CI: 66153 - 65441) have occurred in the selected provinces over three years, with the underestimation rate of 30,154 cases. Collected registry data indicated that mean age of patients with stomach cancer in the selected regions was 65.8 ± 16 years in men and 61.6 ± 18 years in women; as such, 60.4% and 39.6% of male and female patients were diagnosed with stomach cancer. In this study, we determined the overall completeness of cancer registry coverage based on the age group, sex and year; findings in this regard are presented in Table 1.
|Year||Sex||Reported New Cases||Xa||Estimated New Cases||95% CI for Estimated||Completenessof Registration, %|
|2008 - 2010||M||All cancer||19716||15518||35234||35009||35459||56|
According to the results of this study, overall completeness of cancer registries was 54.2%, which reached from 44.2% in 2008 to 59.7% in 2010. The highest completeness of coverage in all-site and stomach cancer incidence in patients aged 60 years or above was 55.7% and 72.3%, respectively, while the lowest coverage completeness was observed in patients aged less than 40 years (53.1% and 53.5% for all-site and stomach cancer, respectively). Overall, in a period of three years (2008 - 2010), the underestimation rate for stomach cancer was 32.4%. Validity of cancer registry data in five provinces of Iran is presented in Table 2.
One of the indices of cancer registry data quality is the evaluation of childhood cancers in different age groups based on sex and comparison with the international standards. According to the literature, childhood cancers (age group 0 - 14) account for 1% of all the reported cases, while in our study, this was 0.5% higher than the recommended standards. Childhood cancers are presented in Table 3.
|00 - 04||13.8||< 11.3||> 23.2||15.9||< 13.7||> 25.6|
|05 - 09||6.5||< 7.0||> 12.7||9.8||< 8.9||> 16.5|
|10 - 14||9.1||< 7.9||> 14.9||7.6||< 9.2||> 16.3|
The age-specific rate in girls (5 - 9 years) and boys (10 - 14 years) was lower compared to the standard recommendations according to the Surveillance, Epidemiology, and End Results (SEER), while it was within the standard range in other age groups.
Finally, study of the age-specific rates of all cancers indicated that the risk of cancer incidence increases until the age of 80 - 84 years, while it declines after this age.
In the present study, estimation of the completeness of cancer registry data using the two-source capture-recapture method and Petersen-Chapman model indicated the underestimation rate to be 45.8% during a three-year period. According to the literature, stomach cancer is the most life-threatening type of cancer in Iran, and underestimation rate for this cancer was 32.4%. The sensitivity of cancer registry systems was lower for women compared to men in this regard which was consistent with other research (21). Despite the low coverage, cancer registry increased by 15.5% for all cancers from 2008 to 2010 year; however, this rate might vary in previous studies depending on different regions of the country. For instance, in a previous study, cancer registry coverage based on pathology registry during 2000 - 2007, as well as population-based and pathology registries during 2007-2009, varied from 22.68% to 118.7% (6). In another study conducted in northwestern Iran using the capture-recapture method, under-ascertainment rate for all cancers during 2008 - 2010 was 16.1%, and coverage of the Iranian population-based registry was 52%, while it was 93.1% according to both data sources. In the mentioned study, the underestimation rate was 6.9% (22). Furthermore, overall estimation of cancer registry coverage using the three-source capture-recapture method during 2008 - 2010 was 51%, and it ranged from 46.8% to 85.3% for stomach cancer (23).
In the current study, cancer registry coverage was observed to increase with the age of patients, with the highest and lowest coverage reported within the age groups of 60 years or above and less than 40 years, respectively. This finding is consistent with the results of the studies conducted in Iran and Japan in this regard (23, 24).
During 1990 - 2009, cancer registry coverage based on pathology registry, clinical records and cancer deaths in Gambia was reported to be 50.4% using the capture-recapture method (25). Although cancer registry coverage may vary depending on the cancer type, coverage improvement in some regions of Iran could be attributed to the effective communication with reporting centers, such as laboratories and clinical or pathological centers.
In comparison with the indicators measuring the quality of cancer registry data in European countries, where coverage completeness is 96% - 100% (26), cancer registry coverage is relatively low in our country. One of the indicators to evaluate the quality of cancer registry involves the verification of the status of childhood cancers due to the stable incidence in this patient population (19). In 2010, childhood cancers and cancers in persons more than 80 years of age were accounted for 1.5% and 11% of the total number of cases, respectively, which almost corresponded with the international standards. However, the age-specific incidence rate of cancer in girls aged 5 - 9 years and boys aged 10 - 14 years was lower than the minimum of the recommended international standards, while these values were above the standard range in developed countries, such as Norway (26). Low age-specific rate of cancer in these patient groups could be due to factors such as underestimation and lack of registration, report from pediatric hospitals and attention of healthcare authorities to this issue. The incidence-to-mortality ratio of cancer was 67%, while it is 80% based on the international standards (19). As such, it could be concluded that cancer mortality rate in our country is 13% higher than the international standards (mortality-to-incidence ratio of cancer in our study was 33.7% for men and 28.2% for women).
Evaluation of cancer registry data in Golestan province in 2007 indicated that the mortality-to-incidence ratio for male and female patients was 47.6% and 35.8%, respectively (20). In Japan, this ratio was 47% and 44% for men and women, and was 48.4% and 55.9% for male and female in northern Portugal, respectively (24). In China in 2010, overall mortality-to-incidence ratio of cancer was 61%, which is consistent with the results of the present study (27). Differences in this index in various countries could be due to the difficultly in the collection of data on cancer mortality, high case fatality rate, high rate of death registration, late diagnosis of cancer, and ineffective care of cancer patients. However, the exact causes of these differences are not distinguishable. For instance, in countries such as Finland, where the quality of data registry is relatively high, differences in the aforementioned ratios are insignificant, which could be attributed to factors such as the high rate of survival among cancer patients, effective screening programs, and accurate diagnosis and treatment of cancer. It is also noteworthy that these ratios may vary as much as 20% depending on the geographical region, age group of patients, and type of cancer (19). Use of this index requires high-quality death registration data and accurate registry of the cause of death (28). In the current study, percentage of cancer registry data through DCO was higher compared to the international standards. In the population-based cancer registry of the ministry of health and medical education in 2005, this index was reported to be 2% and 37.3% in Isfahan and Lorestan provinces, respectively. In another study, pathology registry during 1998 - 2001 indicated the number of cancer registries in terms of DCO to be 24% (29), while this value was 9.9% for men and 7.3% for women in Golestan province (20).
Rate of cancer registries based on DCO was reported to be respectively 2% and 1.7% for men and women in Antalya, while it was 4.4% and 3.9% in Izmir, 1.5% and 1.3% in Singapore, 2.9% and 3.2% in New Zealand, and 0.2% in Iceland, all of which were lower than the international standards. On the other hand, in countries such as Zimbabwe, this rate was reported to be respectively 13.3% and 9.7% for men and women, and 13.1% and 13.3% in Osaka (Japan) (27). In a study performed in this regard in Norway in 2009, DCO for all cancers was 0.9% (26). In addition, in a study of 25 population-based cancer registries in Japan in 2008, DCO was estimated at 13.2% and 14.1% for men and women, respectively (24).
High number of registries based on DCO is suggestive of deficient coverage; as such, these cases should be interpreted in regional terms. In some developing countries, quality of death certificates may be very low or death certificates might be issued erroneously for other cases as cancer. In such cases, tracking hospital records by registries to prove or disprove death certificates in hospitals could be problematic. Data linkage methods in cancer registries should be applied to successfully detect the death certificates that already exist or are missing in the database. Except for DCOs where cancer is listed inaccurately as the cause of death, DCO represents the deficient identification of incident cancer cases. High percentage of DCO in Iran may be due to the underestimation by other sources, incomplete follow-up of patients or both these factors.
In the current study, we investigated the cancer data during 2008 - 2010, and found that the high percentage of DCO could be due to the fact that the obtained death files were not linked to cancer registry the period before the study. Typical sites for DCO are lung, liver and pancreas (19), which are mostly of the metastatic type, the majority of DCO cases in both sexes were of lung and liver cancer (20.4% in women and 26.8% in men). To evaluate the percentage of morphological verification, this index was compared with the data of 17 countries from the Cancer incidence in five continents VOL.X; in this comparison our country was ranked 15, which is indicative of the low quality of morphological verification in Iran (Figure 2). However, according to the cancer registry in Golestan province, the rate of morphologically verified cancer was estimated at 69.5% in men and 71.2% in women (28). Moreover, the reported rates in Japan were 76% and 74.9% among men and women, respectively; these findings were consistent with the results of the present study (24).
Low percentage of morphological verification in the current study could be due to the underreporting of some pathology laboratories and centers, as well as the fact that a significant number of cancers are diagnosed at the time of death.
Out of 11,873 cases reported in 2010, the primary site of 593 cases (5%) was unknown, this could be due to metastases, or the fact that primary site was not determined or the provided reports lacked adequate information as to verify the primary site of tumors. As expected, incidence of tumors of unknown primary site was higher in elder patients, which corresponds with the international standards in this regard. According to international standards, the percentage of morphologically diagnosed cases was estimated at 80%, while DOC% ranged between 1% and 5%. Also, in 5% of cases, the site of tumor remained unspecified; other reported cases were found to be clinical (19). In the current study, the highest age-specific rate was observed in cancer patients aged 80 - 84 years, and this rate dropped in older age groups. This reduction could be due to the lack of patient referral to cancer diagnosis centers, lack of access to diagnostic facilities and failure to perform diagnostic tests by physicians in this age group.
There was some kind of limitation regarding this study: The cancer registry data was unavailable for the period before 2008 in chosen provinces due to lack of Health Information System in some of hospitals, so we could not match DCOs of 2008 with prior years. Furthermore, in some cases, cancer recorded as the cause of death while the patient maybe died for another reason which there was no possibility to track back follow the cause of death.
According to the results of this study, the quality of cancer registry data is relatively low in terms of the completeness and validity in our country. To increase coverage, cancer registry programs should be implemented in the national healthcare system, based on information obtained from various sources, including the laboratories, physicians’ offices, medical records, hospital information systems and death registries and other simple places. In this regard, quality of data should be assessed systematically in order to achieve the gold standards. Therefore, it is recommended that healthcare authorities implement the required interventions so as to prevent unnecessary costs associated with the collection of low-quality cancer registry data.
Akbari ME, Mohammadi G. Iranian female cancer Report. 2014;
Global Cancer Facts & Figures 3rd Edition. 2014;
GLOBOCAN 2012: Estimated cancer incidence, mortality and prevalence worldwide in 2012. 2014;
Mosavi M, Ramezani H. National report on registered cancer cases in 2005. 2005;
Jensen OM. Cancer registration: principles and methods . 1991;
Lankarani KB, Khosravizadegan Z, Rezaianzadeh A, Honarvar B, Moghadami M, Faramarzi H, et al. Data coverage of a cancer registry in southern Iran before and after implementation of a population-based reporting system: a 10-year trend study. BMC Health Serv Res. 2013; 13 : 169 [DOI][PubMed]
Guideline : National Cancer Registry. 2012;
Bouvier AM, Dancourt V, Faivre J. [The role of cancer registries in the surveillance, epidemiologic research and disease prevention]. Bull Cancer. 2003; 90 (10) : 865 -71 [PubMed]
Minestry of Health and Medical Education Deputy of Health and treatment center for Disease control and Prevention cancer Office . Iranian Annual of National Cancer Registration Report 2008. 2008;
Fujimoto I, Hanai A, Tsukuma H, Hiyama T. [Role of population-based cancer registry in cancer epidemiology--epidemiological studies in the Cancer Registration Scheme in Osaka, Japan]. Nihon Eiseigaku Zasshi. 1994; 49 (2) : 543 -58 [PubMed]
Aghaei A, Ahmadi-Jouibari T, Baiki O, Mosavi-Jarrahi A. Estimation of the gastric cancer incidence in Tehran by two- source capture-recapture. Asian Pac J Cancer Prev. 2013; 14 (2) : 673 -7 [PubMed]
Parkin DM, Bray F, Ferlay J, Pisani P. Estimating the world cancer burden: Globocan 2000. Int J Cancer. 2001; 94 (2) : 153 -6 [PubMed]
Hook EB, Regal RR. Capture-recapture methods in epidemiology: methods and limitations. Epidemiol Rev. 1995; 17 (2) : 243 -64 [PubMed]
Ashworth TG. Inadequacy of death certification: proposal for change. J Clin Pathol. 1991; 44 (4) : 265 -8 [PubMed]
Buckland ST, Goudie IB, Borchers DL. Wildlife population assessment: past developments and future directions. Biometrics. 2000; 56 (1) : 1 -12 [PubMed]
Wittes JT, Colton T, Sidel VW. Capture-recapture methods for assessing the completeness of case ascertainment when using multiple information sources. J Chronic Dis. 1974; 27 (1) : 25 -36 [PubMed]
Hofferkamp J. Standards for cancer registries volume III: standards for completeness, quality, analysis, management, security and confidentiality of data. 2008;
Bray F, Ferlay J, Laversanne M, Brewster DH, Gombe Mbalawa C, Kohler B, et al. Cancer Incidence in Five Continents: Inclusion criteria, highlights from Volume X and the global status of cancer registration. Int J Cancer. 2015; 137 (9) : 2060 -71 [DOI][PubMed]
Forman D, Bary F, Brewster DH, Gomba Mbalawa C, Kohler B, Pineros M. Cancer incidence in five continents, volume X. 2014;
Ghojazadeh M, Mohammadi M, Azami-Aghdash S, Sadighi A, Piri R, Naghavi-Behzad M. Estimation of cancer cases using capture-recapture method in Northwest Iran. Asian Pac J Cancer Prev. 2013; 14 (5) : 3237 -41 [PubMed]
Mohammadi G, Akbari ME, Mehrabi Y, Ghanbari Motlagh A, Partovi Pour E, Roshandel G, et al. Estimating Completeness of Cancer Registration in Iran with Capture-Recapture Methods. Asian Pac J Cancer Prev. 2016; 17 Spec No. : 93 -9 [PubMed]
Matsuda A, Matsuda T, Shibata A, Katanoda K, Sobue T, Nishimoto H, et al. Cancer incidence and incidence rates in Japan in 2008: a study of 25 population-based cancer registries for the Monitoring of Cancer Incidence in Japan (MCIJ) project. Jpn J Clin Oncol. 2014; 44 (4) : 388 -96 [DOI][PubMed]
Larsen IK, Smastuen M, Johannesen TB, Langmark F, Parkin DM, Bray F, et al. Data quality at the Cancer Registry of Norway: an overview of comparability, completeness, validity and timeliness. Eur J Cancer. 2009; 45 (7) : 1218 -31 [DOI][PubMed]
Bray F, Znaor A, Cueva P, Korir A, Swaminathan R, Ullrich A, et al. Planning and developing population-based cancer registration in low-and middle-income settings. IARC Technic Publicat. 2014; (43)
Mohagheghi MA, Mosavi-Jarrahi A, Malekzadeh R, Parkin M. Cancer incidence in Tehran metropolis: the first report from the Tehran Population-based Cancer Registry, 1998-2001. Arch Iran Med. 2009; 12 (1) : 15 -23 [PubMed]