Introduction

The control of communicable diseases is an endeavor that has witnessed remarkable successes over the past century; diseases that previously caused large scale mortality have been eradicated (Hinman, 1999; Roeder, Mariner, & Kock, 2013), locally eliminated (Papania et al., 2014), or have been markedly reduced in incidence globally as a result of vaccination, antimicrobial therapy, water and sewage treatment, and advances in food safety (Armstrong, Conn, & Pinner, 1999; Liu et al., 2015; Murray et al., 2014). Nonetheless, the threat of communicable diseases persists; emerging infectious diseases continue to be identified, often in association with changes in human and animal mobility, agricultural practices, environmental degradation, and misuse of antimicrobial therapy (Jones et al., 2008; Keesing et al., 2010; Kuehn, 2010). Recent outbreaks or epidemics associated with MERS coronavirus (Azhar et al., 2014), influenza A (H7N9) (Cowling et al., 2013), and the West African emergence of the Zaire strain of Ebola virus (Baize et al., 2014), have challenged epidemiologists as the natural history, modes of transmission, and/or means of control of these diseases have not been well understood during initial periods of emergence.

When novel infectious diseases emerge or familiar diseases resurge, mathematical models can serve as useful tools for the synthesis of available data, management of uncertainty, and projection of likely epidemic trajectories (Fisman, 2009). While it may be challenging to parameterize detailed mechanistic mathematical models when there is little information on mechanisms of transmission, baseline immunity in a community, or the nature of the infecting pathogen, a number of descriptive approaches exist which may permit fitting, and forecasting, of an epidemic curve. One single equation approach that has been applied to emerging infections is the Richards model, which treats cumulative infections as a logistic growth process (Hsieh & Chen, 2009; Wang, Wu, & Yang, 2012). However, the concept of modeling an epidemic curve as a simple function, without reference to mechanisms of transmission, is in fact much older, and may originate in the work of the English polymath Dr. William Farr (1807–1883), who rose from humble beginnings to become a physician, mathematician, hygienist and protege of Lancet founder Dr. Thomas Wakley (Brownlee, 1915a; Fine, 1979; Greenwood, 1933). Dr. Farr spent almost 40 years at the General Register Office of the United Kingdom, and the esteem in which he was held is apparent in the “letters” he published annually as appendices to the reports of the Registrar General, in which he supplemented the dry statistical reports with thoughtful and creative musings on topics as wide-ranging as the relationships between occupation and disease, suicide and mortality in the mentally ill, population density and mortality, and as above the “laws” governing epidemics (Farr, 1840).

William Farr's analysis is a classic in the epidemiology literature. Farr examined the course of mortality attributable to smallpox between mid 1837 (when death registration was introduced into England and Wales) and 1839, and noted that the numbers peaked in the spring quarter of 1838 and then declined until summer 1839 (Fig. 1) (Farr, 1840). He noted that the pattern of decline was very close to what would be predicted if the ratios of cases in successive quarters declined at a constant rate. He provided numbers demonstrating this in his annual report to the Registrar General in 1840 (Fig. 1), but did not develop the idea at length. Looking back on this, we may note that this approach is analogous to assuming that the number of transmissions per case (or the “reproduction number” in modern terminology), were to decline at a constant rate during the course of an epidemic. The key difference is that Farr worked before the germ theory, and analysed data in terms of successive calendar time periods rather than successive generations of cases. There is a further irony to the story, in that he never returned to this idea until 1866, at which time there was a major epidemic of rinderpest, which some feared would destroy the British cattle population (Brownlee, 1915a). Farr applied a similar analysis, but this time based upon assuming that the third ratio of cases per month was a constant (in effect assuming that the reproduction number declined at a constantly accelerating pace). He used this approach to predict that the epidemic would decline rapidly over the subsequent six months, and published this, including predicted monthly incidence numbers, in the Daily News of London in February 1866. His predictions were close to what subsequently happened (Brownlee, 1915a).

It fell to other contemporaries (Evans, 1876) and later epidemiologists (most notably Dr. John Brownlee) to formalize “Farr's law” (Brownlee, 1915c; Fine, 1979; Serfling, 1952). (It should be noted that the term “Farr's law” is ambiguous. Farr himself referred to a “law” in his letter on rinderpest (Brownlee, 1915a), but the term has also been used by others to describe Farr's observations on the relation between population density and death (Brownlee, 1915b), and to his description of the relationship between cholera mortality and altitude (Lilienfeld, 2007). In his elaboration of the “law”, Brownlee referred to it as “Farr's theory of epidemics” (Brownlee, 1915a)).

We recently proposed a descriptive approach to the initial estimation of the basic reproduction number (R0) of an emerging or re-emerging pathogen, which also provides information on the rate at which the process is being controlled, as well as reasonable short-term projections of incidence. This two-parameter model, which we have referred to as the “Incidence Decay with Exponential Adjustment” (IDEA) model, offers advantages of simplicity, explicit linkage to theory of epidemic growth, and also acknowledges the fact that epidemics and outbreaks do not peak and end simply due to depletion of susceptibles, but because of a complex constellation of public health actions and behavioral changes that may modify the course of an epidemic and reduce the effective reproduction number Re(t) during an outbreak (Fisman et al., 2013). In our previously published description of this model, we validated model projections by showing that they were identical to those derived from a discrete-time susceptible-infectious-removed (SIR) compartmental model, provided the SIR model had a low basic reproduction number (R0) and exponential improvement in control over the course of the epidemic (Fisman et al., 2013).

One of us (PF) had previously written about Farr's law and its importance in the development of epidemic theory (Fine, 1979), and noted the conceptual similarity between IDEA and Farr's law. Upon exploration of these two approaches we realized that they, notwithstanding having been formulated some 160 years apart, and being based on very different theoretical constructs, are fully consistent with one another. Here we demonstrate the equivalence of these methods for epidemic modeling and forecasting, explore the potential advantages of using these methods to complement other approaches, and identify the circumstances under which these methods closely approximate the commonly-used SIR modeling approach. We explore numerical examples derived from recent emerging infectious diseases, including 2014–2015 West African Ebola virus outbreak. We show that these methods work fairly well in a number of cases, and discuss what this fact implies about difficult-to-measure components of transmission and control parameters.

Farr's law

The empirical relationship between observed cases of infected individuals, in sequential time intervals, during an epidemic outbreak according to Farr's law is given by(1)(I(t+3)I(t+2))(I(t+1)I(t))=Kwhere I(t) represents the number of observed newly infected individuals at time t, and K is a constant. For values of K<1, the rate of change in the observed cases (acceleration) decreases as time evolves, and the family of curves I(t) that satisfy equation (1) correspond to familiar bell-shaped epidemic curves.

The law was investigated and elaborated upon by the epidemiologist John Brownlee who noted that I(t) under this law would correspond to the function (Fine, 1979)(2)exp(−At2+Bt+C)where A, B, and C are constants. Of note, Brownlee's formulation identifies a process by which cases increase as a first order process, but decrease as a second order process, as is the case with the IDEA model.

IDEA model

In the basic form of the IDEA model, the time evolution of infected cases, I(t), satisfies equation (3),(3)I(t)=(R0(1+d)t)twhere t is an integer “generation” of an outbreak (thus t={1,2,3,…}∈ℕ). In other words, in the IDEA model, the unit of time is the generation interval (the time interval between the time when an individual is infected by an infector and the time when this infector was infected). In Farr's model, t is the time at which subsequent observation are made.

The parameter R0 is the basic reproduction number as usually defined; that is, the number of secondary cases created by a primary case in a totally susceptible population and in the absence of intervention. The parameter d>0 which we have referred to as a “control parameter” defines the rate at which transmission declines over the course of an epidemic. The empirical underpinnings of d are not yet well defined, but based on current understanding of disease dynamics could represent public health interventions, population adaptation or behavior change, improved availability of personal protective items or effect of drugs to treat infection, or reductions in population susceptibility as a result of immunity or vaccination. As noted above, by fitting the model to data we have previously obtained reasonable estimates of R0 early in the course of epidemics, and have also been able to produce plausible near-term projections of future case counts.

Difference equation susceptible-infectious-removed model

In an earlier publication we commented on the almost identical projections generated by IDEA and a compartmental difference equation (Susceptible-Infectious-Removed, or SIR) (Fisman et al., 2013), in the asymptotic limit when R0 is small and when there is exponential reduction in risk over time. Below we generalize this model equivalence, to any situation where the depletion of susceptibles due to infection is small relative to the total population size (and not only when R0 is small). We used a “damped” version of the standard SIR model whose formulation in generation interval time scale is given by:(4)St+1=St−Re(t)It(5)It+1=Re(t)Itwith St the number of susceptible individuals at time t and Re(t) the effective reproduction number at time t defined by:(6)Re(t)=R0StNρt

The “damping” parameter ρ represents the relative risk of infection in each generation of the epidemic, compared to the risk seen in the last generation (i.e., if there were no improvement in control in a given generation compared to the last, most recent generation). If an outbreak is small relative to the size of the total population (as would be true if R0 were modest and control achieved relatively quickly) S(t)/N will be approximately 1 throughout the outbreak and the expression can be rewritten as:(7)I(t)=I(0)∏s

Numerical simulations were implemented in the R software (Theroject for Statist, 2015) and code is available in electronic supplement files (Supporting Information: Supplement 1).

Numerical examples using recent outbreak data

We have previously used IDEA to explore the nature of epidemic growth during the recent West African Ebola epidemic (Fisman & Tuite, 2014; Fisman, Khoo, & Tuite, 2014) and the more recent Chikungunya virus invasion in the Western hemisphere (Nasserie, Fisman, & Tuite, 2015). As publicly available data have taken the form of cumulative incidence curves (with absent dates of onset) we fit IDEA to cumulative curves, but it is possible to estimate incidence by taking the interval to interval difference cumulative incidence over time.

Here we fit IDEA to the incidence time series and calculated Farr's K for sequential generation tetrads, and converted K values to the d parameter in IDEA using the relation K=1(1+d)4 described below. Model fits were performed in a Microsoft Excel spreadsheet using the Solver add-in. Data sources used for these analyses are available at http://figshare.com/authors/Tahmina_Nasserie/686527 (Chikungunya) and https://github.com/cmrivers/ebola (Ebola). The Original Microsoft Excel files have been included as Supplemental Information.

Equivalence of IDEA and Farr's law

We need to show both, (A) that the IDEA model satisfies Farr's Law, and (B) that Farr's Law can be derived from the IDEA model. This equivalence requires the intuitive assumption that t in Farr's Law (the subsequent times at which observations are made) are integer values and coincide with the serial (or generation) time interval in the IDEA model.

(A) An incidence curve described by IDEA naturally satisfies Farr's law. Indeed, substituting (3) into (1) gives (see Appendix 1):(9)K=1(1+d)4

It is interesting to note that the value of R0 in the IDEA model is irrelevant in the proof presented in Appendix 1.

(B) On the other hand, by expressing the IDEA model as(10)I(t)=(R0(1+d)t)t=exp(tlogR0(1+d)t)=exp(−t2log(1+d)+tlogR0)we see that we recover John Brownlee's Gaussian curve (2) with A=log(1+d), B=logR0 and C=0 that recapitulates Farr's Law as stated in Equation (1).

Relation of SIR model to Farr's K and IDEA

When the depletion of susceptibles is negligible compared to the total population size (that is, a small outbreak), we can actually express IDEA's basic reproduction number R0,IDEA and its control factor d as a function of the basic reproduction number R0,SIR and the control factor ρ of the damped SIR model described in (4–6). The relationship between the parameters of these two models is:(11)R0,IDEA≃R0,SIRρ(12)d≃1ρ−1

Note that both equations (11), (12) are only sufficient conditions for the respective parameters of both models to produce the same incidence when the size of the epidemic is negligible compared to the total size of the population (see Appendix 2).

Substituting (12) in equation (9) we can link Farr's law with the damped SIR model:(13)K≃ρ2

The relationship between Farr's law, the IDEA model, and the damped SIR model is illustrated graphically in (Fig. 2).

Numerical simulations

In this section, we aim to test numerically the validity of approximations (11), (12). In particular, given a damped SIR model, we explore the parameter space R0,SIR and ρ for which these approximations hold. Note that the link between IDEA and Farr's law given by equation (9) is not an approximation, but a genuine equality (subject to the time step used in Farr's law being equivalent to one generation), so there is no need to test it numerically. To measure the performance of the approximation, we consider the distance between the simulated incidence time series. Let N be the number of generations simulated and ISIR(k) (resp. IIDEA(k)) the incidence from the SIR (resp IDEA) model at the kth generation, we define their distance by(14)δ=∑k=1N(ISIR(k)−IIDEA(k))2

Fig. 3 shows the values of the distance δ for different values of R0,SIR and ρ. We see that for a combination of R0,SIR and ρ such that the depletion of susceptible is not too large, the approximation is very good. But when the values of R0,SIR and ρ generate a depletion of susceptible individuals that is no longer negligible (white area in Fig. 3), then the incidence curve from the IDEA model diverges from that generated by the damped SIR model. This is demonstrated in Fig. 4.

Application: ebola and Chikungunya

Considering real epidemic data from Ebola and Chinkungunya, it can be seen that the interval to interval variability in K was substantial, likely reflecting variability in reporting (Fig. 5, Fig. 6). Simple arithmetic means of K over time were also unstable due to skewing by values substantially greater than 1. However, when we estimated the geometric mean of K over time we found that the resultant d estimate approximated that derived through fitting IDEA.

Furthermore, we noted that in the Chikungunya time series there was a large perturbation in best-fit values of d occurring in October 2014, corresponding with an apparent multi-wave epidemic. We have previously noted that this abrupt change in the generation-to-generation best fit value of d corresponds with the occurrence of multiwave epidemics when IDEA is fit to simulated data (Fisman et al., 2013); using Farr's approach, the onset of a possible new Chikungunya wave seems to correspond with an abrupt increase in K to a value far greater than one (Fig. 7). The utility of large values of K as a signal of an incipient epidemic wave warrants further investigation.

Discussion

Although the real-time application of mathematical modeling to understanding and control of outbreaks is often perceived as representing a recent development in infectious disease epidemiology (Heesterbeek et al., 2015), disease modeling has deeper historical roots, including work by Bernoulli on smallpox in the 18th century (Greenwood, 1941); work by Ross on malaria transmission (Smith et al., 2012), and as mentioned above, Farr's work on the growth and cessation of epidemics (Brownlee, 1915a, 1915c; Fine, 1979). We had published a simple, phenomenological approach to the description and projection of outbreaks and epidemics (Fisman et al., 2013) which we had initially regarded as a novel formulation rooted in the concept of the basic reproduction number R0.

In that work, we demonstrated concordance with projections derived using a 3-compartment difference equation model (damped SIR model). We have subsequently realized, and demonstrate above, that our approach simply represented a restatement of Farr's work, albeit in a manner that is tied to the concept of R0. According to Brownlee, Farr promised to describe the derivation of his model in greater detail in future reports, but never did so (Brownlee, 1915a), and much of the mathematical elaboration of Farr's work was in fact done by Brownlee after Farr's death (Brownlee, 1915c). Nonetheless, Brownlee notes that to Farr, the predictive accuracy of his approach reflected three characteristics of epidemics, according to Farr's (pre-microbial) understanding: (i) diminution in the number of susceptibles over time due to recovery from infection (“immunity”, though to use this term in application to Farr is an anachronism); (ii) diminished population density due to death from infection; and (iii) diminishing pathogenicity of the disease with each passing generation of infection as a result of (to quote Farr) “[loss of] part of the force of infection in every body through which they pass…the matter…diminishes in strength at every transmission by innoculation” (Brownlee, 1915a). The first two characteristics are not incompatible with the current understanding of epidemic dynamics, whereas the third is not (though it does anticipate more modern ideas around evolution of virulence and disease ecology (Ewald, 2004; Lipsitch & Moxon, 1997)).

However, as this model is phenomenological, rather than mechanistic in nature, the putative epidemiological mechanisms underlying model performance are not of immediate importance. Indeed, while the simplicity of this approach may be regarded as a limitation, the simplicity of the form, and its implicit incorporation of biological, social, medical, and behavioral drivers of control into a single parameter estimated via fitting, may be a strength, especially given that such control factors as behavior change due to fear may be difficult or impossible to measure in real time.

When we applied IDEA to current day outbreaks and epidemics, we have remained agnostic about the factors that cause second order deceleration of epidemic growth. Referring to first principles, the components of a reproduction number are duration of infectiousness, contact rate, and probability of transmission conditional on contact, as well as susceptibility in a population (Vynnycky & White, 2010). We presume that public health interventions, population behavior change (as a result of education or rumours, prudence or fear), and the occurrence of silent infections with immunity could all contribute to deceleration of epidemic spread, even when the effective reproduction number is expected to be greater than 1 due to widespread susceptibility in the population. Furthermore, we note that Farr's original “time step” appears to have been arbitrary (reflecting the form of data available to him: weekly for rinderpest, quarterly for smallpox), whereas we have used generations as time steps in our more recent applications of Farr's “law”, and in IDEA and SIR models. This potential discrepancy between Farr's initial efforts and our more recent efforts can be addressed by a simple rescaling of the time parameter, and is thus immaterial to our conclusions.

Summary

In demonstrating that the IDEA model and Farr's model are mathematically equivalent (and can be virtually identical to an SIR model with abundant susceptibility in the population) we demonstrate that recognizing the underlying mechanism of epidemic control may be unimportant for generating reasonable forecasts of epidemics with control, or recognizing when their fundamental dynamics have changed. Our contribution in the current work is to show that Farr's law, while derived in the pre-microbial era, can be reformulated in terms of the concept of the basic reproduction number, combined with exponential increase in control via an unspecified mechanism. We observe, unexpectedly, that Farr's K can be expressed as a function of the IDEA d parameter alone, independent of R0, implying that epidemic trajectory is (and has historically been) more a function of control efforts and changing behavior than of the fundamental characteristics of a given infectious disease. Whether or not the ratio K can have stand-alone value as a tool to identify unexpected shifts in epidemic trajectory (e.g., the two wave epidemic of Chikungunya referred to in Fig. 7 above) will be the subject of future work.