methods

Detailed information about how we collect, process, and analyze dengue case data to provide near real-time estimates of the current global dengue situation.

Data Collection

Real-Time Data Collection

We collect near real-time dengue case data for each country from several WHO global and regional dashboards: the WHO Global Dengue Surveillance Dashboard, the WHO South-East Asia Region Dengue Dashboard, and the Pan American Health Organization PLISA database. Automated data scraping tools visit each dashboard daily to check for new updates since the last download and store the data in a structured format within the relevant data repository. The scraping tools for each dashboard, along with all downloaded data, are publicly available here:

WHO Global Dengue Surveillance Dashboard:
https://github.com/DengueGlobalObservatory/WHOGlobal-crawler

WHO South-East Asia Region Dengue Dashboard:
https://github.com/DengueGlobalObservatory/SEARO-crawler

PAHO PLISA Pan American National Dengue Data:
https://github.com/DengueGlobalObservatory/PAHO-crawler

Data from the regional (PAHO, SEARO) and global (WHO) databases are combined to create a single data frame for the current year. For countries included in both the regional and global databases, the regional database is prioritised.

Historical Data

Our predictions are based on the OpenDengue project, which compiles dengue case data reported by national health authorities, international organisations such as the World Health Organization, and literature. This global database provides a long-term view of dengue patterns across many regions and years. To ensure full coverage and consistency, we use a version of the OpenDengue data that fills in missing months and adjusts gaps in reported data using statistical models that estimate monthly dengue cases for 143 countries from 1990 to 2024 based on typical seasonal patterns https://github.com/ahyoung-lim/OD_gap_filling_public. These harmonised data form the foundation of our real-time nowcasting system and for our evaluation of the current season compared to historical years.

Backfilling

Surveillance systems, such as mandatory passive reporting systems used to track dengue, are extensive networks that involve point-of-care testing, laboratory facilities, local and regional public health agencies, national public health authorities, and international cooperation. This complexity means that identifying cases and incorporating them into a multinational data system involves multiple people across various agencies, and the process can take anywhere from days to months. Because of this, to obtain an accurate estimate of current and recent cases, we need to account for cases that have not yet been reported. This process is known as backfilling. Currently, we limit our backfilling to the nations included in the Americas (PAHO), due to data availability. We are working to provide similar backfilling for other regions.

Estimating the reporting factor of American Nations

For American nations, reporting factors were empirically estimated from the PAHO DENV cases dashboard PLISA Health Information Platform for the Americas. The dashboard was downloaded weekly from June 2022 to April 2024. Additionally, DENV case data for the same period was downloaded again in July 2025. Using this data, the ratio of under- or over-reporting of cases (reporting factor) for each country at each reporting delay could be estimated using the following equation:

\[\mathbf{f}_{c,d} = (\frac{1}{T} \sum_{t=i}^{T} \left( \frac{N_{t,c,d}}{V_{t,c}} \right))/1\]

where N is the number of DENV cases for a given country (c), at a given epiweek (t) for a particular delay (d), V is the validated count of DENV cases for a given country (c), at a given epiweek (t), and f is the reporting factor for a given country (c), at a given epiweek (t) for a given delay (d). T is the total number of observations recorded.

Correcting DENV cases

For PAHO nations, this case count was multiplied by the average reporting factor at the corresponding delay and country. Four nations (Belize, Dominica, Barbados, and Paraguay) did not undergo this correction due to the complex nature of reporting factors in those countries. These corrections resulted in small differences in the monthly and overall case counts.

This process will soon be available and applied in other regions

Defining the Dengue Season

Due to the seasonal nature of Dengue, which is highly affected by temperature and precipitation, countries in different regions will experience the peak dengue season at different times. This affects critical dynamics of the disease when considering nowcasting or forecasting. In our work, we share the data and our results in calendar years but rely on an understanding of the dengue season for nowcasting.

For each country, the dengue season starts in the month with the lowest average case counts and ends 12 months later. For example, if April has the lowest average case load, it marks the beginning of the season, and March of the following year marks the end. Using this alternative window helps evaluate the average dengue season by addressing cases where peak months span across the new year. In such cases, analysing by calendar year could cause small shifts in the timing of peak months, leading to significant redistribution of cases between years, which may distort the results. Aligning data with the dengue season instead of the calendar year mitigates this by defining a time frame that prevents peak months from overlapping consecutive seasons.

Nowcasting

Using the season-aligned data, we define an average seasonal profile. First, monthly data is normalised by dividing the number of cases observed in each month by the total across the entire season. This normalisation scales the data to exist between 0 and 1.

\[\text{Monthly proportion of cases} = \frac{\text{Monthly cases}}{\text{Total season cases}}\]

Once normalised, the typical season is characterised by calculating the average proportion of cases observed in each month. This typical season helps to fill data gaps. The total number of cases expected for that season can be estimated by combining the cases observed so far with the expected proportion.

\[\text{Expected total seasonal cases} = \frac{\text{Cases observed to date}}{\text{Expected proportion to date}}\]

Once the total number of cases expected for that season has been calculated, the monthly proportions of the average seasonal profile can be used to estimate expected cases for months without data so far.

\[\text{Predicted monthly cases} = \text{Expected total seasonal cases} \times \text{Monthly proportion}\]

This method is here used here for nowcasting, the process of filling missing data gaps to date.

Uncertainty around nowcasted months

We quantified uncertainty for the proportion-based nowcast using empirically calibrated prediction intervals derived from a retrospective leave-one-season-out validation. For each country with at least three complete seasons, we iteratively withheld one season, estimated the mean monthly and cumulative seasonal proportions from the remaining seasons, and applied the operational nowcasting algorithm at each information cutoff k (last observed season month, k = 1, …, 11). For each withheld season and cutoff we then compared predicted and observed monthly case counts for all subsequent months.

Forecast errors were summarised as signed relative residuals, defined as the difference between the predicted and observed count divided by the observed count, and computed only when the observed count was greater than zero. We estimated the 2.5th, 25th, 75th, and 97.5th percentiles of these relative residuals separately for each country, cutoff month, and prediction month. Strata with fewer than five non-missing residuals were excluded from the operational lookup table to avoid unstable quantile estimates. Region- and global-level residual quantiles were computed in the same way for diagnostic use.

At deployment, intervals around a point forecast Ĉ are constructed multiplicatively from the stored quantiles: the lower limit is max(0, Ĉ(1 + q_α)) and the upper limit is max(0, Ĉ(1 + q_1−α)), yielding 50% intervals from the 25th and 75th residual percentiles and 95% intervals from the 2.5th and 97.5th percentiles. This approach anchors uncertainty to historical forecast error for the same country and timing within the season rather than assuming a parametric error model.

On country pages we display the 95% interval as whiskers on estimated months of the current-year time series. When a country–cutoff–prediction-month cell has fewer than five validation residuals, the dashboard falls back to the corresponding region-level quantiles, and then to global quantiles, so that all estimated months still carry an interval. Coverage statistics reported in the validation summary are computed from the country-only lookup.