Case Study: Synthesizing Data Sources to Improve Disease Mapping
This post summarizes findings from our study, conducted in partnership the Uganda National TB and Leprosy Control Program (NTLP), that develops new methods for TB mapping. You can read the full open-access article here.
Identifying spatial variation in tuberculosis (TB) burden can help national tuberculosis programs effectively allocate resources to reach and treat all people with TB. However, data limitations pose challenges for subnational TB burden estimation. Many countries have access to two data sources for nationwide TB burden:
National TB prevalence surveys: These surveys use a well-defined screening process to identify TB in a population. Although they offer the best epidemiological evidence, prevalence surveys are conducted infrequently and do not typically have a large enough sample size to estimate TB prevalence in small geographic areas.
Case notifications: TB diagnoses collected in a nationwide health information system can be aggregated to estimate TB incidence in particular years and districts. However, active TB diagnoses do not fully capture true TB incidence due to lack of access to health care and different diagnostic standards across facilities.
The first key data source: TB prevalence point estimates from the 2014-2015 National Tuberculosis Prevalence Survey, aggregated by Ugandan district. Notice that not all districts were sampled in the prevalence survey.
The second key data source: the TB case notification rate by district in Uganda from 2016 through 2019, collected by the national health management information system. In most districts, we can observe the TB case notification rate rising from 2016 to 2019 due to greater data completeness.
Methods
Henry Spatial Analysis worked with the Ugandan National Tuberculosis and Leprosy Control Program (NTLP) to develop a new spatial modeling framework that synthesizes these two data sources, improving estimates of TB burden and notification completeness across Ugandan districts.
We link the two data sources using rigorously-tested assumptions about the relationship between case notification competeness and the duration of active TB, which in turns links TB prevalence and incidence. The TB incidence surface is also stabilized using a small-area model with five geospatial covariates: household crowding, nighttime lights (a proxy for economic activity), HIV prevalence, refugee centers, and cattle per capita. We fully specify this model and describe its underlying assumptions in the journal article.
Flow chart for our custom TB mapping model, which combines case notifications with TB prevalence survey data to estimate incidence and notification completeness over time. Dark orange boxes indicate the two key outcomes, TB incidence and case notification completeness, estimated by the model. Light orange boxes indicate intermediate outcomes that are used to compare estimated outcomes to data. TB incidence is estimated by district, and case notification completeness is estimated by district and year.
Findings
TB incidence varied more than 10-fold across the districts of Uganda in 2019, ranging from 94 cases per 100,000 in Bukedea District, Eastern Region to 1,313 cases per 100,000 in Kalangala District, Central Region.
District clusters with below-average TB incidence were apparent in the southwest and southeast of Uganda, while districts with above-average TB incidence were concentrated in the center and north of the country.
Estimated incidence of pulmonary TB per 100,000 population by district in Uganda, 2019.
Between 2016 and 2019, the case detection rate increased in 109 out of 136 Ugandan districts.
In 2016, fewer than 1 in 10 districts had a case detection rate greater than 70%, while 4 in 10 districts had a case detection rate below 50%. By 2019, over 3 in 10 districts had case detection greater than 70%, and fewer than 1 in 10 districts had case detection rates below 50%. This matches evidence from the NTLP, which recorded a 33% increase in case notifications from 2016 to 2019.
Estimated TB case detection rate by district in Uganda, for the years 2016 (left) and 2019 (right).
We also found that our joint model outperformed standard spatial models for TB burden estimation.
We developed a comparison model using on TB prevalence survey data: this model strongly smoothed towards the national average in unobserved districts and produced much more uncertain estimates. By incorporating case notification data, our model enables greater confidence in the identification of low-burden and high-burden districts across the country. Our model also outperformed the comparison model in out-of-sample validation testing.
Results of a performance comparison between the statistical model presented in this manuscript (left) and an alternative small-area model that does not incorporate data from TB case notifications (right). Districts with model-estimated prevalence over 600 per 100,000
Takeaways
Our study shows that a joint modeling approach, integrating multiple data sources and geospatial covariates, can improve TB incidence estimation for small geographic areas. This approach also generates estimates of case notification completeness, which can also be useful metrics for targeting improvements to TB care. This approach could be applied in many other countries that have access to one or two TB prevalence surveys as well as a time series of case notifications.
While this study offered an avenue for improving TB burden estimates using existing data, more firsthand evidence is needed as countries progress towards the End TB Goals:
As the underlying risk factors for TB shift over time, we will need more population-based surveys to ground future TB burden estimates.
Model-based estimates of TB burden should always be interpreted in the context of local epidemiological evidence and expertise.
To learn more about this project, read the full study or get in touch.