The models covered a range of approaches, from regression analysis to stochastic epidemic models. The former are largely empirical models, whereas the latter are “first principle” science-based models. This is a natural basis for distinguishing between models.
There are several advantages to the science-based models.
-
They are derived from first principles that mirror scientific understanding of how epidemics spread in a population. One important consequence is that their assumptions are transparent and open to criticism; another is that it is not difficult to introduce modifications that may make them more realistic.
-
They have been effectively used in modeling many infectious diseases.
-
They have parameters that dictate the dynamics of an epidemic.
-
Straightforward modifications allow stratification of the models, say by age groups, although this leads to a substantial increase in the number of parameters.
The major advantage of the empirical models is their simplicity and their focus on the primary task of providing accurate forecasts. The assumptions that drive the science-based models can turn from a strength to an Achilles heel if they prove to be inaccurate. One of the lessons repeatedly learned throughout the COVID-19 pandemic has been the ability of the virus to call into question commonly accepted truths regarding infectious disease spread.
Since the 18th century, when Swiss mathematician and physicist Daniel Bernoulli developed mathematical models to study how variolation could be used to diminish the spread of smallpox, researchers have sought to develop models that can examine and explore the dynamics of infectious disease transmission. In today’s world, and especially during the SARS-CoV-2 pandemic of the last two years, models are the only means of predicting disease spread and thus are essential for national and international decision-making. For further discussion about the need for mathematical models in epidemiology, see [2, 3].
The models can be roughly divided into three categories:
1. Fully empirical models, i.e., regression, machine learning and deep learning. These statistical models are very powerful tools which use known data to predict the future, and can also accommodate large amounts of data. The main drawback to statistical models in COVID-19 is their inability to predict new and future confirmed cases in the presence of changing conditions, for example a scenario of a new variant or the immunization of the population.
However, for predicting the clinical outcome of a COVID-19 patient following infection, statistical models are important, since disease progress depends heavily on the patient’s health status and medical condition.
2. Mathematical models for population disease spread. These models use a set of coupled differential equations to predict the spread of disease. The SIR methodology [4] and its refinements, such as the SEIR model [5], are the outstanding examples of this class. They have been the dominant approach in the scientific literature for studying infectious diseases and were applied by several of the modelists. SEIR stands for “Susceptible, Exposed, Infected, Removed” which serve to decompose the population. The model describes an epidemic via movement of the population from one compartment to another. Individuals who are susceptible become exposed, then become infected, and finally are removed from the population. Removal can be either by cure or by death. Transitions from one state to another are governed by rate equations and resulting sojourn time distributions. The former describe the rate at which individuals move from one compartment to another, the latter the length of time they remain in a state before moving.
The single most important parameter in these models is the reproduction number, R, which relates the average number of susceptible individuals that will be infected by a newly infected individual, and can be estimated from data on the population counts in each compartment. The reproduction number became a mainstay of monitoring and reporting throughout the epidemic. From the onset of the pandemic, all Israelis became familiar with the idea that hearing R > 1 on the evening news was a sign that things were getting worse.
See [6, 7] for more detailed information and a review of applications of the SIR family to modeling and forecasting annual influenza outbreaks.
The modelists who remained at the empirical end of the spectrum emphasized regression techniques. These models link an outcome \({Y_t}\)on day t to k predictor variables \({X_{1,t}},{X_{2,t}}, \ldots ,{X_{k,t}}\)that might be relevant for predicting the value of \({Y_t}\). Often the predictors were the outcome itself, or other related variables, recorded on earlier days.
3. Agent-based (or particle) models. These are mathematical models that operate at the level of the individual person rather than the population as a whole. One of the disadvantages of population-level mathematical models is their inability to model system dynamics, in particular when various population subgroups are characterized by different dynamics. In principle, this can be reflected by modeling the population as a sum of the sub-populations, with each one characterized by features unique to it, such as age, dynamics, etc. Since non-pharmeceutical interventions (NPIs) play an essential role in controlling COVID-19, a geography-based model is necessary because these NPIs fluctuate across countries. Also, a successful model should divide the population into several age groups, matching their varying patterns of social interactions. Making the model realistic leads to a large number of groups, though, and would require writing a different equation for each, which complicates the model greatly.
One effective way to overcome these complications is to use instead an agent-based model which represents every single person by a unique “particle”. This leads to a very granular model, but with easy-to-understand rules governing social interaction for each individual, based on the subgroup of the population to which the individual belongs. Hence, microsimulation modeling comes into play in which we have a high degree of heterogeneity, with multiple individuals, each behaving differently. The modeling and simulation proceed by allowing all the individuals to behave and interact according to these rules. Then the resulting macroscopic impact on society is observed.
Other modeling approaches were also used. Some modelists took a translational approach, using science-based models developed with related settings in mind and demonstrating that they could be effectively applied to COVID-19 data. Others were at more of a middle ground on the empirical-mechanistic axis, using process analysis to decompose the route from predictors to outcomes into more detailed steps and then applying empirical analysis to these building blocks.
In this paper, we show how all three types of models have been used in Israel by different research groups to model the spread of the coronavirus under various constraints (NPIs, effective vaccines, etc.) and the clinical course of COVID-19, e.g., predicting the a patient’s health status during the period after infection or hospitalization.
The models in practice.
The SIR/SEIR family was used directly both by Gazit and his partners at the Hebrew University and by Huppert and the team at the Gertner Institute. Both groups used the age-stratified refinement of the model. Gazit’s group also used a model developed by De-Leon and Pederiva [8] which is described below. They used the models to produce accurate predictions of infections, severely ill, and mortality for both the short-term and for periods extending to 5–6 weeks ahead. The team also developed a method for estimating R using only very recent data, adding valuable temporal relevance to the estimates. An important contribution of the group to the Israeli cabinet deliberations was their use of the models to assess the effects of policy interventions. In December 2020, taking account of international data relating quarantines and lockdowns to reduced infection and severe illness, they quantified the effect of such restrictions on near-term impact for Israel and compared alternative times for their implementation. Similar analyses were used to predict the effects of the vaccination campaign [9]. These changes highlight the role these measures had in tempering the impact from widespread infection. The omicron variant, which began to dominate infections in December 2021, was both more infectious and less severe than the previous variants. Both of these properties are essential ingredients of good predictions and thus posed new challenges. The delayed onset of the omicron wave in Israel, due to limiting entry to Israel at Ben Gurion Airport, made it possible to adjust the models by incorporating data on omicron from other countries with earlier initiation times. The resulting model-based predictions played a role in the decision to avoid implementing another lockdown in January 2022.
Huppert and the group from Gertner also found that the SIR/SEIR models produced accurate short-term forecasts. Their age-stratified model required as input both stratified infection counts and social contact data for each pair of age groups. The former came from the Ministry of Health, but there was no official source for the latter. The team used Google mobility indices to fill in the gap. After Israel commenced its vaccination campaign in late December 2020, vaccination status was added as a further stratifying variable. The models further assumed that infection counts would follow Poisson distributions about their expected values. This assumption was borne out in the data and gave good fits. The models adapted well to the onset of the omicron variant. By the end of the first week of January, 2022, early in the omicron wave, the model provided accurate forecasts of how the infection counts would increase and when they would peak.
De-Leon and her colleagues at the University of Trento derived an innovative model inspired by physics. This novel approach uses basic principles of statistical physics, in the spirit of Monte Carlo algorithms, to define an “agent-based” model, in which each individual in the population is explicitly represented. This class of models has a rich history; see for example [10, 11]. De-Leon’s model treats individuals as “particles” in a physical system, with the probability of disease transition a function of the distance between the associated particles. Social mobility is reflected in the model by a parameter that governs motion of the particles within the domain of the system. For details, see [8, 12]. Although agent-based models are developed at the “micro” level of individuals, it is common to assess their value by their ability to mimic macro-level behavior. Here this meant comparisons of the model predictions to observed infection patterns. De-Leon reported close tracking to observed Israeli and UK data thru all the early stages of the epidemic. The model also adapted well to the effects of the vaccination campaign and the waning effect of the vaccine after about 5 months [12, 13]. The particles can be divided into many sub-populations, making stratification easy to include [14].
Rossman, Gazit and Sprecher all described efforts to predict the number of COVID-19 patients requiring hospitalization and treatment in intensive care. This was a major concern early in the epidemic, when it appeared that the extent of available respirators would fall well short of the number of patients in need of them. Rossman emphasized, as well, the importance of the models as a basis for comparison and policy evaluation. His group looked at questions like whether the length of hospital stays for COVID-19 patients were decreasing over time and what was the effect of the vaccination campaign at a population level [14]. The model that he and his colleagues developed is a compartment model similar in nature to the SIR model, but focusing on the stages that arise following infection. Does hospitalization occur? If so, how long is the patient in the hospital? How long does it take for patients to move from standard care to intensive care? Initial efforts to work with simple count data were limited in their ability to answer important questions, as they failed to account for the full time course of patients who were currently hospitalized but without final outcomes. Higher resolution data were needed that traced these individual flows. Once those data were obtained, the team resolved the analysis problems using techniques from statistical survival analysis to account for the censoring of patients still hospitalized, Cox proportional hazards regression models to assess factors affecting sojourn times, and competing risk analysis to account for the different outcomes that might lie ahead [15]. Among the interesting findings was that in-hospital death rates for COVID-19 patients were higher during times of heavy load [16].
Gazit’s team reported similar methods and results to those presented by Rossman for the Weizmann group. Gazit also reported on the effective use of sojourn time models and infection counts to forecast the number of patients who would enter the hospital.
Sprecher reported on the modeling efforts at TASMC. The goals were to facilitate planning and preparation in the hospital. In addition to the challenges of managing COVID-19 patients, TASMC was concerned about care for non-COVID patients, due to the resources that had to be diverted from regular services. With the help of an international advisory panel, the TASMC task force produced a dashboard for continual and up-to-date monitoring of patient loads. For planning ahead, the team focused on short-term forecasting (1–2 weeks ahead) of the number of patients, including a breakdown by status with forecasts of severely ill or patients in need of ventilation. The forecasts were computed from regressions that use as input recent infection data, smoothed to remove weekly trends. As more data were accumulated, the model also incorporated information on the typical sojourn time from infection to severe disease. Sprecher noted that one weakness of the model is a tendency to over-predict peak loads. Like Gazit’s group, the TASMC team used data from foreign sources to revise the model for accurate predictions when the omicron variant became dominant. He also noted that the usefulness of the model was assessed in terms of whether the forecasting errors were small enough to permit the hospital to function successfully, a goal that was consistently achieved.