Which Type of Univariate Forecasting Methods Is Appropriate for Prediction of Tuberculosis Cases in Razavi Khorasan Province? A Need for Surveillance and Biosurveillance Systems

authors:

avatar Nayereh Esmaeilzadeh 1 , avatar Alireza Bahonar 1 , * , avatar Abbas Rahimi Foroushani 2 , avatar Mahshid Nasehi ORCID 3 , avatar Mohammad Taghi Shakeri ORCID 4

Department of Food Hygiene and Quality Control, Faculty of Veterinary Medicine, Tehran University, Tehran, Iran
Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences, Tehran, Iran
Department of Epidemiology, Faculty of Health, Iran University of Medical Sciences, Tehran, Iran
Department of Epidemiology and Biostatistics, Faculty of Public Health, Mashhad University of Medical Sciences, Mashhad, Iran

how to cite: Esmaeilzadeh N, Bahonar A, Rahimi Foroushani A , Nasehi M, Shakeri M T . Which Type of Univariate Forecasting Methods Is Appropriate for Prediction of Tuberculosis Cases in Razavi Khorasan Province? A Need for Surveillance and Biosurveillance Systems. J Arch Mil Med. 2019;7(3):e96229. https://doi.org/10.5812/jamm.96229.

Abstract

Background:

TB surveillance and preventing the further spread of the disease need the full knowledge of the biological characteristics influencing TB and detecting mathematical patterns to interpret the mechanism of TB spread. These models can provide explanations and knowledge of the dynamics of diseases and can be used for forecasting the ensuing values. To determine the possible number of patients, the time ahead is vital for decision making in public health. However, it is essential to determine forecasts’ accuracy utilizing genuine forecasts. Thus, we obtained the TB cases from April 2007 until March 2018 in Razavi Khorasan province to develop a fit model and forecast the number of TB cases for the next 24 months.

Methods:

We considered a time series of monthly incidence counts of TB in Razavi Khorasan province from April 2007 until March 2018. The data included total TB, pulmonary TB, new pulmonary TB, retreatment TB, and extrapulmonary TB cases. For choosing models and forecasting, we use about 20% of all data (24 data) for testing and the rest for training the model. The optimization of parameters was done automatically according to the smallest root mean squared error for these time-series analysis techniques with STATA. The models were EWMAs models (single exponential and double exponential smoothers) and totally, we compared the quality of forecasts provided by EWMAs models through the stand-alone measurement (RMSE).

Results:

The patterns of raw series of total TB, pulmonary TB, and new pulmonary TB were almost the same. They illustrated slowly downward trends with oscillation around the trend that was a property of cyclic trend. For retreatment TB and extrapulmonary TB cases, reductions occurred over time although with no pattern. The results of statistical models indicated that the values of smoothing constants of all series were near zero that indicated a very smooth series with slowly changing counts. Total TB, pulmonary TB, and new pulmonary TB series had double exponential patterns with noisy and long-standing trend and they might be increasing in the 24 months ahead. Retreatment TB and extrapulmonary TB series had simple exponential patterns with noisy and without secular trends; they might be with no changes in the 24 months ahead.

Conclusions:

The end TB strategy, MDG 6, target 8 is to stop and start to inverse the incidence of TB by 2015 and we joined this strategy in January 2006. However, TB control remains one of the main public health concerns. In recent years, our country has experienced immigrants from neighboring countries, sanctions or/and attacks with category C of biological agents in moving toward tuberculosis elimination. Our implementation requires changes in strategies and activities that should evolve over time. The findings of this study are helpful in achieving this goal.

1. Background

Tuberculosis (TB) is a necrotizing chronic or acute disease that usually involves the lungs although it can involve different body organs and tissues such as lymph nodes, pleura, pericardium, kidney, and bones. TB can occur as a result of either a new infection with Mycobacterium tuberculosis or reactivation of a latent TB infection. Most cases in endemic countries, such as Iran, occur due to a new infection (1).

One of the Millennium development goals is to end TB by decreasing the TB mortality rate by 50% compared to 1990, stopping or decreasing the TB incidence and prevalence until 2015, and decreasing the TB incidence to less than one case per million population by 2050 (2). The universal health coverage (UHC) has a vital role in this regard (2).

Generally, the disease burden of TB is decreasing but not at a speed sufficient to achieve the first milestones of the end TB strategy by 2020. The TB incidence rate should be decreased by 4% - 5% per year while it is decreasing by 2% per year (2). TB is reappearing in many countries as a public health crisis although it is not an emerging disease, it is an important reemerging disease. Reemerging TB might be caused by multidrug-resistant M. tuberculosis (category C of biological agents), the emergence of the HIV epidemics and a large number of immigrants from countries with common TB (3, 4).

During two decades, the TB incidence rate in Iran declined from 36 per 100000 in 1990 to 17 per 100000 in 2010, but it is not enough to reach the STB goal; hence, TB control is a prime concern for public health among policymakers. One of the important causes of this situation is the common frontier with three countries in which TB is a public health problem, i.e., Pakistan, Iraq, and Afghanistan (5, 6).

A way for the development of control programs and allocation of resources is reviewing temporal changes and forecasting. This method can have a major role in identifying health problems in the future (7).

2. Objectives

The aim of this investigation was to compare single exponential and double exponential smoothers to determine which model is more accurate for forecasting TB cases in Razavi Khorasan province and utilize this approach for surveillance and biosurveillance systems.

3. Methods

3.1. Study Setting

Razavi Khorasan province is located in the northeastern part of Iran in the vicinity of Afghanistan. It is the fifth-largest province in Iran, with an area of 118854 km2. Its population is about 6.5 million people, which makes it the second-largest populated province in Iran. The growth rate is 1.4%, which is higher than the national growth rate, with an incidence TB rate of about 14 per 100000 that is also higher than the national rate (about 11 per 100000).

3.2. Data Collection

We obtained the data from the Bureau of Tuberculosis the Center for Disease Control, the Ministry of Health and Medical Education of Iran.

The following definitions were used according to the national guideline of TB control:

A “new case” is a patient who has never received treatment for TB or who has taken anti-TB drugs for less than four weeks.

A “retreatment case” is a patient who has taken anti-TB drugs for at least four weeks.

Pulmonary tuberculosis is a patient who has Smear positive TB or Smear negative TB and extra pulmonary TB is as an infection of other parts of the body, except for lungs (1).

We presented a time series of monthly incidence counts of TB cases in Razavi Khorasan province between April 2007 and March 2018.

3.3. Model Fitting

For finding the best fitting model, according to previous studies (8, 9), we used two univariate time-series smoothing techniques (10, 11), including Simple Exponential and Duple Exponential smoothing. We pooled the number of TB cases per month of daily cases; thus, 132 time-points (months) were obtained.

3.3.1. Simple Exponential (SE) Smoothing

This method is applied for forecasting a time series when there is no trend or seasonal pattern, but the meantime series gradually varies with time. The model needs one parameter (α) to create the fitted and forecasted values. The SE method is frequently applied to forecast the value of the time ahead, given the present and previous values.

St=αxt+1-αSt-1

In this equation, St is the forecasted number of TB cases, xt denotes the actual value in the period of the previous year, and S t-1 is the prior forecast.

3.3.2. Holt’s Trend (HT) Corrected Exponential Smoothing

It is also called the Double-Exponential method that is obtained by smoothing the smoothed series, as follows:

A. The exponentially smoothed series value

St=αxt+1-αSt-1

in which, xt is the raw series and αt and St denote the smoothing parameter and forecasted xt, respectively.

B. The Double-Exponential smoother

St[2]=αSt+1-αSt-1[2]

C. The difference exponentially smoothed series value trend estimate. The constant term:

aT=2ST-ST2

D. The linear term:

bT=α1-α(ST-ST2)

E. The τ, the-step-ahead out-of-sample prediction is given as follows:

x-t=at+τbT

To do the above-mentioned procedure, we used STATA V. 14.0 and Excel. The filters, which can produce forecasts, was run through the tssmooth command. Each of the two methods operates differently and is suitable for a specified type of forecasting task.

To compare the forecasts provided by these methods, we constructed several forecasted counts of TB, pulmonary TB, extrapulmonary TB, new TB, and retreatment TB.

For choosing models, we used about 20% of all data (24 data) for testing the model and 80% of the data for training the model. We used the testing data to measure how well the model forecasts the latest data (12).

The optimization of parameters was done automatically with STATA according to the smallest root mean squared error for these time-series analysis techniques. Smoothing constant (α) lies between 0 and 1 and controls the amount of inertia in the local mean. The values of α near 0 produce very smooth series with slowly changing mean, and the values near 1 produce more volatile series with rapidly changing means (10). Forecast accuracy was calculated for 24 month-ahead forecasts by one of stand-alone measurement (RMSE) (12).

4. Results

During 132 months from April 2007 to March 2018, there were 12406 TB cases in Razavi Khorasan province, including 9273 (74.7%) cases of pulmonary TB and 3133 (25.3%) cases of extrapulmonary TB. All of the extrapulmonary TB cases were new cases but 8797 cases of pulmonary TB were new TB cases and 476 cases were retreatment TB cases. The annual data are given in Table 1.

Table 1.

Annual Data of TB Cases Reported in Razavi Khorasan Provincea

YearTotal TB CasesPulmonary TBNew pulmonary TBRetreatment Pulmonary TBExtrapulmonary TB
March 2008119285280349340
March 2009125892687650332
March 2010122992886860301
March 2011123090685749324
March 2012122790485450323
March 2013105780676145251
March 2014115987183140288
March 2015109883980237259
March 2016106379776433266
March 2017100475772235247
March 201888968765928202

During 132 months from April 2007 to March 2018, there were 12406 TB cases in Razavi Khorasan province, including 9273 (74.7%) cases of pulmonary TB and 3133 (25.3%) cases of extrapulmonary TB. All of the extrapulmonary TB cases were new cases but 8797 cases of pulmonary TB were new TB cases and 476 cases were retreatment TB cases. The annual data are given in Table 1.

Time series plot for TB cases in Razavi Khorasan province from April 2007 to March 2018
Time series plot for TB cases in Razavi Khorasan province from April 2007 to March 2018

Figure 1 represents the sketched time series plot of raw monthly counts of TB (type and site of the body) during the 11-year period from April 2007 to March 2018. According to raw series, the patterns of total cases of TB, total pulmonary TB, and new cases of pulmonary TB were almost the same. They illustrated a slow downward trend with oscillation around the trend that is a property of cyclic trend. For retreatment pulmonary TB cases and extrapulmonary TB cases, reductions occurred over time but with no patterns.

In the next stage to isolate the systematic component, we used two smoother techniques to remove noise components in a series. By using STATA on training data (April 2007 to March 20016), we obtained the optimization of parameters for five smoother techniques automatically. Table 2 gives the optimization of parameters for simple exponential and double exponential methods for total TB, pulmonary TB, new pulmonary TB, retreatment TB, and extrapulmonary TB series. The patterns are noisy with or without a secular trend. The values of smoothing constants of all series are near zero that indicates very smooth series with slowly changing counts.

Table 2.

The Parameters of TB Series Data for Two Smoothers Techniques

Smoother TechniqueParameterTotal TBPulmonary TBNew pulmonary TBRetreatment Pulmonary TBExtrapulmonary TB
Simple exponentialα0.03510.03040.02130.03530.0306
Double exponentialα0.02660.03190.03130.02470.0001

For using genuine forecasts, we needed to examine the accuracy. We calculated the forecast accuracy measures for the two models for each series. Table 3 shows forecast accuracy measures for two smoother techniques for estimating based on testing data of each series and Table 4 indicates the actual and forecast values according to fitted models for the period from March 2016 to March 2018. Thus, total TB, pulmonary TB, and new cases of pulmonary TB series had double exponential patterns with noisy and secular trend and they might be increasing in the 24 months ahead. The two other series, retreatment TB and extrapulmonary TB series, had simple exponential patterns with noisy and without secular trend and they might be with no change in the 24 months ahead.

Figure 2 shows the sketched time-series plot of actual, fitted, and forecasted values for monthly models estimated based on the training data (April 2007 to March 20016) and forecasts for the next 24 months (March 2016 to March 2018) were produced according to the fitted model for each series.

Table 3.

Forecast Accuracy Measures for two Smoother Techniques for Estimating Based on Testing Data of Each Series

Smoother TechniquesTotal TBPulmonary TBNew pulmonary TBRetreatment TBExtrapulmonary TB
RMSERMSERMSERMSERMSE
Simple exponential27.5824.3324.851.986.18
Double exponential26.1921.6720.952.016.53
Table 4.

The Actual and Forecast Values According to Fitted Models for the Period from March 2016 to March 2018

Month-YearTotal TBPulmonary TBNew pulmonary TBRetreatment Pulmonary TBExtrapulmonary TB
ActualForecastActualForecastActualForecastActualForecastActualForecast
Apr - 16484833333131201515
May - 1610248.02668533.03197731.031382.02471715.0002
Jun - 1611148.02668333.03198331.031302.02472815.0003
Jul - 169750.8247836.245367533.8221832.3138691915.0008
Aug - 169253.977026839.183126636.8474422.2068552415.0035
Sep - 167256.279645141.672074939.2461622.2483152115.0044
Oct - 167758.231215643.423795540.9860912.2400322115.0063
Nov - 166159.068174844.040024641.6119322.2320211315.0076
Dec - 167960.129396144.933665742.570542.1760791815.00889
Jan - 178060.317996145.288125542.9361762.170361915.00859
Feb - 179261.429676646.426486243.9497242.261222615.00929
Mar - 179362.547247247.508236644.7922962.4451132115.01019
Apr - 174464.240164148.844814046.0176712.525828315.01248
May - 177365.915684850.485754747.4259212.6999092515.01378
Jun - 178264.984056750.129896447.1881632.6265511515.01148
Jul - 176865.58885550.219685247.3827532.5557941315.01358
Aug - 178866.633316951.475766748.596422.5839461915.01367
Sep - 177366.908625751.927255549.0223422.6109921615.01337
Oct - 176668.206355453.218715350.3350512.5887771215.01427
Nov - 176068.672923753.703613550.8551822.5673022315.01457
Dec - 176668.755465153.976064751.22642.4983481515.01406
Jan - 188168.520916653.180926250.4836542.4800541515.01576
Feb - 188868.603496953.284426850.4999712.5587651915.01586
Mar - 1810069.456967354.306096951.4219542.634532715.01596
Actual, fitted, and forecasted values for monthly models according to the best fitting model for each series
Actual, fitted, and forecasted values for monthly models according to the best fitting model for each series

5. Discussion

Adem and Ummu Atiqah (2009) showed that double exponential smoothing was the best forecasting model (9) and the results of the application of univariate forecasting models for TB cases in Kelantan (updated in 2014) indicated that the smallest MSE was related to Holt’s exponential smoothing method (8). Therefore, we used EWMAs models to determine which forecasting models forecast TB cases more accurately in Razavi Khorasan.

We considered a time series of monthly incidence of TB in Razavi Khorasan province from April 2007 to March 2018. The data included total TB, pulmonary TB, new pulmonary TB, retreatment TB, and extrapulmonary TB cases. The models were EWMAs models and the forecast accuracy measure was RMSE (10-12).

According to RMSE, total TB, pulmonary TB, and new pulmonary TB series had double exponential patterns and retreatment TB and extrapulmonary TB series showed simple exponential patterns.

This study indicated that total TB, pulmonary TB, and new pulmonary TB, and retreatment TB cases had slowly increasing trends with noisy patterns while pulmonary TB and extrapulmonary TB had somewhat unchanging trends with noisy patterns. These findings indicated that TB is an infection with low virulence and sputum smear-positive (SS+) patients are more important for the transmission of disease (13). We can also conclude that the number of persons getting infection over the time ahead depends on the number of infectious cases at present.

In our study, we considered all types of pulmonary. If we separated these patients into sputum smear-positive and sputum smear-negative patients, we would have clearer patterns. As shown in a study that forecasted the incidence of smear-positive TB in Iran, it had a seasonal pattern (14).

There are different factors affecting the incidence of TB in various areas, such as weather, epidemiological transition, drug resistance, HIV, migration, and poverty. These factors might increase the incidence of TB (5, 6, 15).

Another goal of this study was to do forecasting. We found that the number of total TB, pulmonary TB, and new TB cases might increase in the 24 months ahead. We forecasted no change in retreatment TB and extrapulmonary TB cases in the 24 months ahead.

A weakness of the forecasting method is that the trend of forecasting is influenced by the end value of the past data. If the last data level is higher than the earlier data, the forecasting section will have a growing trend and vice versa (11, 12).

The findings of this study and other studies from Iran and other countries indicate that the number of TB cases might increase (7, 14). In recent decades, our country has experienced immigrants from neighboring countries, sanctions or/and attack with category C of biological agents. Despite the fact that TB as a biological agent is not a present public health threat, it can be a growing hazard in the future. The predicted growth of TB might be alarming. The prediction of bioterrorist attacks is difficult but they can impose heavy demands on the public health care system (16, 17). Finally, according to the end TB strategy, MDG 6, target 8 is to stop and start to inverse the incidence of TB by 2015 and we joined the end TB strategy in January 2006. However, TB control remains one of the main public health concerns. Although the goals and functions of TB control programs are constant, for moving toward TB elimination, our implementation requires changes in strategies and activities and should evolve over time. Recently, healthcare delivery systems are changing, as there is a trend toward the increased privatization of health care for the delivery of services; these can also create opportunities. A way to develop controlling programs and allocation of resources is reviewing the temporal changes and forecasting.

Acknowledgements

References

  • 1.

    Nasehi M, Mirhaghani L. National guidelines for TB control. Iran Ministry of Health; 2009. 1 p.

  • 2.

    Global tuberculosis report 2018. 2018. Available from: http://www.who.int/gho/tb/en/index.html.

  • 3.

    Pinto VN. Bioterrorism: Health sector alertness. J Nat Sci Biol Med. 2013;4(1):24-8. [PubMed ID: 23633831]. [PubMed Central ID: PMC3633289]. https://doi.org/10.4103/0976-9668.107256.

  • 4.

    Navin TR, McNabb SJ, Crawford JT. The continued threat of tuberculosis. Emerg Infect Dis. 2002;8(11):1187. [PubMed ID: 12453340]. [PubMed Central ID: PMC2738542]. https://doi.org/10.3201/eid0811.020468.

  • 5.

    Hassan Zadeh J, Nasehi M, Rezaianzadeh A, Tabatabaee H, Rajaeifard A, Ghaderi E. Pattern of reported tuberculosis cases in Iran 2009-2010. Iran J Public Health. 2013;42(1):72-8. [PubMed ID: 23514975]. [PubMed Central ID: PMC3595631].

  • 6.

    Jimma W, Ghazisaeedi M, Shahmoradi L, Abdurahman AA, Kalhori SRN, Nasehi M, et al. Prevalence of and risk factors for multidrug-resistant tuberculosis in Iran and its neighboring countries: Systematic review and meta-analysis. Rev Soc Bras Med Trop. 2017;50(3):287-95. [PubMed ID: 28700044]. https://doi.org/10.1590/0037-8682-0002-2017.

  • 7.

    Moosazadeh M, Nasehi M, Bahrampour A, Khanjani N, Sharafi S, Ahmadi S. Forecasting tuberculosis incidence in Iran using box-jenkins models. Iran Red Crescent Med J. 2014;16(5). e11779. [PubMed ID: 25031852]. [PubMed Central ID: PMC4082512]. https://doi.org/10.5812/ircmj.11779.

  • 8.

    Kilicman A, Atiqah Mohd Roslan U. Tuberculosis in the Terengganu region: Forecast and data analysis. ScienceAsia. 2009;35(4):392. https://doi.org/10.2306/scienceasia1513-1874.2009.35.392.

  • 9.

    Abdullah S, Sapii N, Dir S, Jalal TMT. Application of univariate forecasting models of tuberculosis cases in Kelantan. International Conference on Statistics in Science Business and Engineering (ICSSBE). 2012. p. 1-7.

  • 10.

    StataCorp. Stata timeseries reference manual release 13. Stata Press Publication. 13th ed. 2013.

  • 11.

    Becketti S. Introduction to time series using Stata. Stata Press College Station, TX; 2013.

  • 12.

    Hyndman RJ, Athanasopoulos G. Measuring forecast accuracy. Business forecasting: Practical problems and solutions. John Wiley & Sons; 2015. p. 177-84.

  • 13.

    Hertzberg G. The infectiousness of human tuberculosis. An epidemiological investigation. Acta Tuberculosea Scandinavica Supplementum. 1957;2(38).

  • 14.

    Moosazadeh M, Khanjani N, Nasehi M, Bahrampour A. Predicting the incidence of smear positive tuberculosis cases in Iran using time series analysis. Iran J Public Health. 2015;44(11):1526-34. [PubMed ID: 26744711]. [PubMed Central ID: PMC4703233].

  • 15.

    Moosazadeh M, Khanjani N, Bahrampour A, Nasehi M. Does tuberculosis have a seasonal pattern among migrant population entering Iran? Int J Health Policy Manag. 2014;2(4):181-5. [PubMed ID: 24847484]. [PubMed Central ID: PMC4025095]. https://doi.org/10.15171/ijhpm.2014.43.

  • 16.

    Geiter L. Tuberculosis elimination and the changing role of tuberculosis control programs. Ending neglect: The elimination of tuberculosis in the United States. National Academies Press (US); 2000.

  • 17.

    Das S, Kataria VK. Bioterrorism : A public health perspective. Med J Armed Forces India. 2010;66(3):255-60. [PubMed ID: 27408313]. [PubMed Central ID: PMC4921253]. https://doi.org/10.1016/S0377-1237(10)80051-6.