Statistics for Epidemics

Statistical considerations for epidemics

THE REGISTRATION OF ATTENDANCE TO THIS ON-LINE LECTURE IS ACTIVE! To register your's attendance please type in your matricola number
Notice that your attendance will be registered only if you completed the exercises linked to the text, and that you cannot interrupt and resume the session (but you can repeat it as many times as you like). Remember to press the button before leaving this page! A confirmation message will appear at the end of this page.

Epidemics change considerably the statistical reasoning we usually apply to medical (laboratory) diagnosis because they suddenly change the pre-test probability that appears in Bayes' formula. An epidemics is a transient condition, which appears, lasts for some time and then vanishes (if the disease is constantly present in the population we call it endemic or sub-endemic, rather than epidemic). Thus the pre-test probability is correlated to the expected or measured number of new cases registered in the period in which the laboratory test is run.

Some best studied examples of diseases causing epidemics are as follows:
Measles: a RNA virus transmitted by airborne spread (droplets), it has a very high transmission rate (R₀ > 12) and all the population is susceptible (i.e. there is essentially no genetic resistence). Letality in developed countries is around 0.03%. The virus does not mutate easily and thus immunity lasts for life. In the rare instances where measles was introduced into a naive community it caused attack rates higher than 90%. The vast majority of cases are clinically evident. Before the vaccine was introduced in most countries over 90% of the population above 15 years of age presented antibodies against measles, indicative of previous disease and immunity. Measles epidemics typically occur every 5-7 years; this periodicity is explained because after each epidemics the fraction of the immune populationis raised above 90%, a level where herd immunity supervenes. Thus the next epidemics has to wait for new births to restore a sufficient fraction of sensible people in the population. Between epidemics the disease is maintained in the population under a sub-endemic condition. No animal reservoir exists. In conclusion measles is an excellent example of an epidemics which is limited only by the availability of non-immune members of the population, and a demonstratio of herd immunity.
Influenza (flu): a RNA virus transmitted by airborne spread (droplets), it has a high transmission rate (estimated R₀ ~ 4). Probably there is some genetic resistence, because the disease preferentially affects some groups over others (e.g. individuals of blood group B are affected to ha higher extent). The virus causes an epidemics every winter; however since it mutates frequently, the antigenic serotype changes and having been affected in the preceding year may not confer immunity for the strain of next year. Each epidemics has an attack rate of about 10% of the population; letality is 0.1%, mostly observed on elderly people (note: measles spares elderly people because of its lifelong immunity; thus when comparing the letality of measles and flu one should refer to specific age cohorts, rather than to the overall letality of the two diseases). It is estimated that in winter epidemics about one third of the population may be susceptible, the remaining being at least partially immune because of some past epidemics. Flu presents a case where multiple factors may limit the attack rate; moreover, since a fraction of cases may be clinically mild, the attack rate may often be underestimated. In some cases severe pandemics were caused by the influenza virus: e.g. the spanish flu of 1918 had a worldwide attack rate of approx. 30% (500 million case, corresponding to one third of the world population at the time), with a letality of 5-10% (between 20 and 50 million deaths). Notice that the R₀ equals the experimentally determined R value (R_eff) at the beginning of the epidemics times the fraction of susceptibles; thus for most flu epidemics:
R_eff = R₀ * F_susceptibles ~ 4 * 0.33 ~ 1.3.

The simplest statistical model of epidemics was developed in the early 1920s, for teaching purposes, by Lowell Reed and Wade Hampton Frost then working at Johns Hopkins Medical School. To explore this model, on which the following discussion will be based, the student may visit this interactive program that simulates the time course of an epidemics.

It will be appreciated that the model includes several simplifications:
(i) the main reason why the epidemics ends its course is the exhaustion of susceptible individuals in the population. This implies that a vast majority of the population is affected (indeed it may be demonstrated that the model is not compatible with epidemics that affect less than 50% of the susceptible individuals in the population). This is consistent with some infectious diseases (e.g. measles or smallpox) but surely not with the majority of them.
(ii) The model is incompatible with diseases that require more states that S(usceptible), A(ffected), and I(mmune). For example it is incompatible with typhoid fever, for which the state of the healthy carrier exists, or with diseases having an animal reservoir, e.g. plague by Yersinia pestis which affects many species of rodents.
(iii) The model cannot describe epidemics due to diseases in which the A state is prolonged and overlaps successive generation times, e.g. AIDS.
More complex models are available to describe the above cases, but they lack the immediate simplicity of the Reed-Frost model and result difficult to grasp for physicians not specializing in medical statistics.

The Reed-Frost model grasps some essential characteristics of epidemics, that we may summarize as follows:
(i) an epidemics has a beginning, a peak phase, and then vanishes. It may either disappear or be maintained as a sub-endemic condition. The time duration of the epidemics is expressed as a multiple of its characteristic serial generation time.
(ii) Containment (e.g. quarantine) operates by reducing the number of potentially infectious encounters K; you can explore the effect of quarantine using this interactive program. This effectively reduces the average number of transmission per infected individual (R₀). Vaccination operates by directly converting S(usceptibles) to I(mmunes), bypassing the C(ase) state. You can explore the effect of vaccination using this interactive program.
(iii) The time evolution of the epidemics dictates the pre-test probability of disease.
(iv) The actual duration and extension of the epidemics in a country may be envisaged as the sum of several interrelated Reed-Frost episodes. This is because the model assumes an equal probability of encounter among the members of the population and is suitable to describe an epidemics affecting a village or a small city. Over larger expanses of space the epidemics has to be carried from a village to another. Each village or city independently follows the Reed-Frost model but each starts at a different time. In some cases it is possible to trace an actual path of the diffusion of the epidemics, that sequentially affects villages and cities over a communication pathway. Obviously, in this case the duration of the epidemics is longer than one would expect on the basis of its serial generation time. The student may explore this type of progression using the interactive program for the "two villages" case.

The fraction of the members of the population that contracted the disease is called the attack rate of the epidemics. A feature of the Reed-Frost model is that an epidemics ends because of the decrease of the Susceptible population below the threshold required for disease transmission. This threshold depends on the parameter K, but is never lower than half of N, i.e. an epidemics that obeys this mechanism cannot affect less than 50% of the population. This is plausible for some diseases, e.g. measles and smallpox occurring in a previously naive population, but many epidemics stop long before reaching this threshold. There are several possible reasons that contribute to lower the attack rate, an incomplete list of which is as follows:
1) the probability of disease transmission depends on variable external factors (e.g. climate, in which case the epidemics follows a seasonal course).
2) Contagion affects only or preferentially a fraction of the population because of some risk factor (e.g. old age, risky lifestyle, professional exposure, living under poor hygienic conditions, etc.).
3) Some members of the population, though not immune, present different proneness to develop the disease (a paradigmatic case is that of the different sensitivity to HIV due to genetic polymorphism of the CCR5 receptor). A very minor modification of the model can be developed to take into account the last possibility, as the student may verify using the interactive program for the "two sub-populations" case.
4) A fraction of the population is immune because of vaccination or because of a previous epidemics by the same or a related germ. For example, the usual condition for measles in the pre-vaccine era was to cause epidemics in populations that were immune for 80% or more because of previous epidemics.
5) A significant fraction of the cases of disease may run an asymptomatic course and not be diagnosed. In this case the attack rate may be quite high, but it appears low because many cases are not correctly diagnosed. The measure of the fraction of population presenting specific antibodies at the end of the epidemics may provide an estimate of the effective attack rate. A very interesting case occurs if the mild cases are also poorly contagious, and behave like a sort of vaccination experiment occurring in parallel with the disease, as demonstrated in this interactive program.

Home of this course