Skip to main content

Table 2 Issues with data quality and availability

From: The role of models in the covid-19 pandemic

Issue

Description

Resolution

Often data were available only at broad, summary scales. Some models required more detailed, individual information. For example, the daily number of infected was a standard measure for tracking the status of the epidemic, but (i) it was available at the country level and (ii) with no age breakdown for a long period of time. Hence it had only limited information for modeling the effects of social contact.

Integration

Data came from many different sources, including the Ministry of Health, hospitals, the health funds, the Central Bureau of Statistics, Ben Gurion airport, etc. Integrating the data was challenging, especially in early stages of the epidemic. Legal restrictions also limit the ability to combine these data for use in modeling.

Uniformity

The use of data from diverse sources also highlighted the need for uniformity in recording and reporting. For example, hospitals and the Ministry of Health were not always synchronized with regard to defining which patients should be counted as “severely ill” COVID-19 cases.

Quality

Are the data accurate and reliable? When they are combined across different sources or time periods, are they uniform? For example, a change in the definition of what constitutes a “severely ill” COVID-19 patient can have dramatic impact if no adjustment is made in models. Similarly, reported data on infection rates, and fraction of positive tests, are affected by the false positive and negative rates of the testing protocols.

Completeness

What are essential features that are missing from the data? For example, knowledge of day of exposure to the virus is relevant for some of the models, but was generally not provided.

Temporal Relevance

Many models focused on “nowcasting”. Effective use of these models requires rapid data availability. The necessary data were not always immediately obtainable.

Chronology

Time course data was important for many models. For these models, it is essential to know and to correctly model the time lags typical for, say, time from exposure to infection, infection to recovery or infection to hospitalization and to severe illness and death.

Cohort Relevance

Some models involved data borrowed from other settings, for example social contact data from the pre-COVID-19 period, or data on infection rate or disease severity from other countries. Assessments were needed to determine whether these data could safely be used to drive models for Israel.