Introduction

Developing tools for stable parameter estimation and reliable forecasting of emerging and re-emerging infectious disease epidemics represents a key priority for public health officials and government agencies in their work to prevent and mitigate disease threats. At the early stage of an outbreak, when incidence data are limited and subject to reporting delays, it is often premature to characterize the transition rates of individuals between various disease epidemiological compartments using, for instance, SEIR-type systems of differential equations, which may involve a substantial number of unknown parameters. For the early transmission period, phenomenological models of a logistic type, describing the progression of the epidemic in terms of the cumulative number of reported cases, C, provide a simple alternative. In what follows, we employ the generalized Richards model (Chowell et al., 2016, Smirnova et al., 2017, Turner et al., 1976)(1.1)dCdt=rCp[1−(CK)a],CK(τ)=(pa+p)1a,τ∈(0,T),to estimate crucial disease parameters, such as the intrinsic growth rate (r), the deceleration of growth (p), the final size of the epidemic (K), the disease turning point (τ), and the extent of deviation from the S-shaped dynamics of the classical logistic-growth curve (a), see Fig. 1. In (1.1), T is the duration of the outbreak.

For the original Richards model (p=1), the closed-form solution is available(1.2)C(t)=K[1+ae−ar(t−τ)]1/a,which simplifies its theoretical and numerical study. However, the statistical analysis of the fitting accuracy for the original and generalized Richards models indicates that the generalized Richards model outperforms its original version for epidemics characterized by an early sub-exponential growth phase (Chowell et al., 2016). For such outbreaks, the value of parameter p in (1.1) is less than 1 (0 Several mechanisms could give rise to initial sub-exponential growth in case incidence including (Chowell et al., 2015, Viboud et al., 2015): (i) spatially clustered contact structures (e.g., high clustering levels) (Chowell et al., 2015, Szendroi and Csanyi, 2004); (ii) early onset of population behavioral changes and control interventions (Chowell et al., 2015, Szendroi and Csanyi, 2004); and (iii) substantial heterogeneity in susceptibility and infectivity of the host population that introduces high local variability in the local reproduction number that fluctuates around the epidemic threshold at 1.0. The two limiting cases of the generalized Richards model are those in which p=0 (constant incidence growth) or p=1 (exponential growth) (Viboud et al., 2015).

Fig. 2 illustrates representative epidemic profiles, dCdt, that the generalized Richards model supports, as the deceleration of growth parameter, p, is varied. Overall, as parameter p increases, the epidemic profile shows faster growth that converges to exponential rate as p→1 while keeping the epidemic size, K, fixed. On the other hand, parameter a modulates the epidemic turning point (Fig. 2), which occurs earlier as this parameter increases.

Thus, in our investigation we utilize the generalized four-parametric model (1.1), and solve the ODE-constrained least squares problem to estimate r, p, K, τ, and a from early data of the 2014–15 Ebola epidemics in Guinea, Sierra Leone, and Liberia. We then use the reconstructed parameter values to forecast future incidence cases by propagating uncertainty in the system forward in time.

The regularized least squares problem

To present a regularized numerical algorithm for stable parameter estimation, we introduce(2.1)b:=rK1−p,H(t):=C(t)K,and arrive at the normalized model (Cavallini, 1993)(2.2)dHdt=bHp(1−Ha),H(τ)=(pa+p)1a,to be fitted to early (noisy) incidence data, D=[D1,D2,…,Dm], which approximate the vector [dCdt(t1),dCdt(t2),…,dCdt(tm)], and dCdt is defined by (1.1). This yields the following least squares problem (LSP)(2.3)minb,p,a,K,τ12K‖dHdt(b,p,a,τ)−D‖2.

In (2.3), the function KdHdt is evaluated at finitely many points, t1,

t2, …, tm, and ||⋅|| is the Euclidian norm in ℝm. By the first order necessary condition for the minimum, one concludes that K||dHdt||2−(dHdt,D)=0, and therefore(2.4)K=1||dHdt||2(dHdt,D),where (⋅,⋅) is the inner product in ℝm. Substituting (2.4) into (2.3), one gets the LSP with respect to the parameters b, p, a, and τ:(2.5)minqf(q):=minq12||K(q)dHdt(q)−D||2,q:=[b,p,a,τ]T.

Minimization problem (2.5) is ill-posed due to the lack of stability with respect to noise in the input data. To overcome the danger of noise propagation, which is evident from the severely ill-conditioned Jacobian, our next step is(a)to regularize (stabilize) the functional f(q) in (2.5) by incorporating a priori information about the unknown parameters;(b)to develop an efficient problem oriented numerical solver that would further increase our chances of computing a reasonably accurate solution.

Without regularization, the algorithm may diverge due to error accumulation at every step, or the uncertainty could get out of control. The most common a priori information is the one that uses our best guess on the potential values of q. Less common, but very helpful for this particular application, is the information on the expected orders of magnitude for b, p, a, and τ. Specifically, since τ has the meaning of the disease turning point, its value should be about one or two orders of magnitude larger than b, p, and a. This suggests that, as we introduce a priori information, parameters b, p, and a must be appropriately weighted to balance the sensitivity of our model to all four unknowns. Combining all of the above, we add the following term to the cost functional (2.5)(2.6)Ω(q):=||L(q−q˜)||2,L=[ω0000ω0000ω00001],ω>1.

Here q˜ incorporates the expected (reference) value of q, and L is the weighting matrix. Since the information contained in Ω(q) is less reliable than the one contained in the fidelity term, ‖K(q)dHdt(q)−D‖2, we penalize Ω(q) by a small regularization parameter, α, and obtain the following auxiliary minimization problem (Tikhonov et al., 1995, Tikhonov et al., 1998, Yagola, 2010, pp. 17–34)(2.7)minqfα(q):=minq12∥Φ(q)−D∥2+α2∥L(q−q˜)∥2,which is a particular case of the variational (Tikhonov's) regularization aimed at stabilizing the original ill-posed model. In (2.7),(2.8)Φ(q):=K(q)dHdt(q).

Numerical optimization method with Hessian reduction

Since Φ(q) defined in (2.8) is a non-linear operator, the regularization needs to be combined with a robust problem-oriented numerical algorithm capable of producing an accurate approximation to the unknown parameters. If one solves (2.7) by a classical Gauss-Newton method and drives α to 0 as iterations progress, then one gets what is known as iteratively regularized Gauss-Newton (IRGN) scheme (Bakushinsky, 1993, Bakushinsky and Kokurin, 2005)(3.1)[Φ′*(qk)Φ′(qk)+αkL*L]pk=−[Φ′*(qk)(Φ(qk)−D)+αkL*L(qk−q˜)],qk+1=qk+λkpk,λk>0,where Φ′*(q) is the adjoint to Φ′(q). In (3.1), αk and λk execute regularization and line search, respectively. While this method is generally efficient, for our model one is forced to sacrifice too much accuracy to stabilize the process, and even with a relatively large regularization parameter the uncertainty in the computed solution remains high. Hence our goal is to modify IRGN (3.1) by truncating a “non-essential” part of the Hessian approximation in order to bring down its condition number. To that end, consider the gradient of f, ∇f(q)=Φ′*(q)(Φ(q)−D), where(3.2)Φ′(q)=[∂K∂q1dHdt⋯∂K∂q4dHdt]+K[∂∂q1dHdt⋯∂∂q4dHdt]:=Φ1′(q)+Φ2′(q).

One can easily verify that the residual, Φ(q)−D, is in the kernel of the matrix Φ1′*(q). Indeed, from definitions of K(q) in (2.4) and Φ(q) in (2.8), one concludes that for any i=1,2,3,4,(∂K∂qi(q)dHdt(q),K(q)dHdt(q)−D)=∂K∂qi(q)[K(q)||dHdt(q)||2−(dHdt(q),D)]=0.

Therefore Φ1′*(q)(Φ(q)−D)=0 and ∇f(q)=Φ2′*(q)(Φ(q)−D). This implies the Hessian approximation(3.3)G(q)=Φ2′*(q)Φ′(q)=Φ2′*(q)(Φ1′(q)+Φ2′(q)),which we reduce further by dropping Φ1′(q) and setting G(q)≈Φ2′*(q)Φ2′(q). This permits to lower the condition number and to enforce the symmetric non-negative definite structure. Dividing both sides of the modified linear system by K2(qk), which enables us to move all noise from the matrix to the right-hand side, one arrives at what we call Reduced IRGN (RIRGN) (Smirnova et al., 2017)(3.4)[A′*(qk)A′(qk)+α˜kL*L]pk=−[A′*(qk)(A(qk)−D/K(qk))+α˜kL*L(qk−q˜)],where A(q):=dHdt(q), and α˜k:=αk/K2(qk).

Simulation results and discussion

In this section we address the important aspects of practical implementation of RIRGN algorithm (3.4). To that end, we investigate the trajectories of the 2014–2015 Ebola epidemic in Guinea, Liberia and Sierra Leone (World Health Organization, 2016). For illustration, we use early incident reported data to estimate the values of five unknown parameters, r, τ, p, a, and K, by solving the regularized least squares problem with the RIRGN method (3.4). Then we substitute the estimated parameters into (1.1) to find the approximate values of future incident cases using the built-in Matlab ode23s solver. The data for this numerical experiment have been retrieved in 2015 from the World Health Organization (WHO) web site. The weekly series of cases were directly obtained from the WHO patient database.

For successful application of any regularized procedure, the right choice of the stabilizing sequence, {α˜k}, is very important. The value of α˜0 has to be sufficiently large to ensure that we incorporate enough stability into the system to reduce the excessive noise propagation. On the other hand, over-regularization may slow down the convergence of the iterative sequence {qk}, and result in unnecessary loss of precision. In our experiment, we take α˜0=5⋅10−4 for Sierra Leone and Liberia and α˜0=10−3 for Guinea, where the data are more noise contaminated. We set α˜k=α˜0exp(−k/2), since this choice gives us the most aggressive convergence rate for {qk} while maintaining stability for every value of k.

The outer step of the RIRGN scheme (3.4) identifies the direction of the decent. In order to optimize the step size in that direction, we implement a version of the Armijo-Goldstein line search strategy (Nocedal & Wright, 2000), i.e., a backtracking with λk=1/2,1/4,… until‖A(qk+λkqk)−D/K(qk+λkpk)‖2<||A(qk)−D/K(qk)||2+λkβ(A′∗(qk)(A(qk)−D/K(qk)),pk),which is commonly used for Gauss-Newton type algorithms. Iterations are stopped at the moment when the discrepancy for the approximate solution is consistent with the level of noise in our data in order to guarantee convergence and, at the same time, it is not too small to prevent over-fitting that may compromise the accuracy of the estimated parameters, i.e., the stopping index, K=K(δ), is evaluated by a posteriori stopping rule(4.1)||A(qK(δ))−D/K(qK(δ))||2≤ρκδ<||A(qk)−D/K(qk)||2,0≤k≤K(δ),ρ>1.

To balance the sensitivity of the cost functional to all four unknowns in the penalty term, we choose L based on our prior knowledge of the difference in the orders of magnitude between the value of τ and the values of three other parameters. This suggests L in the form (2.6) with ω>1 (ω=10 works well for our code).

The selection of the initial vector, q0, and the reference vector, q˜, in the penalty term is rather difficult. In the beginning of an emerging outbreak, it is not straightforward to make a sophisticated guess about when the disease is going to peak. Hence, in our experiment, we take τ0 in q0 to be slightly greater than tm, the last week for which the data is available, and we take τ in q˜ to be equal to 60, i.e., we assume the incidence curve will turn before 60 weeks of the epidemic have passed. The values of p in q0 and q˜ are taken to be 0.7 and 0.1, respectively, to enforce the sub-exponential growth rate. The values of two remaining parameters, b (used to compute r) and a are taken to be 1 in both vectors for the lack of better information.

Fig. 3 illustrates parameter estimation and the epidemic forecasts based on the generalized Richards model with 20 weeks of incident case data for the Ebola epidemic in Sierra Leone. After the five unknown parameters have been recovered from early epidemic data, 100 additional data bootstrap curves are generated by adding a Poisson error structure to the weekly series of reported cases (see (Chowell, Ammon, Hengartner, & Hyman, 2006)) in order to quantify uncertainty in the recovered parameters. An alternative is to employ analytic results from asymptotic theory to estimate parameter uncertainty (Banks, Davidian, Samuels, & Sutton, 2009). The histograms in the upper row show the recovered values of the parameters, while the collection of curves at the bottom of the figure demonstrates the accuracy of the forecasting.

In a similar manner, we estimated parameters and produced forecasts for the Ebola epidemics in Guinea (Fig. 4) and Liberia (Fig. 5). Not surprisingly the model has most difficulties tracking the epidemic in Guinea due to its multi-mode nature (Fig. 4). Since this outbreak is also the longest, we need more data (30 weeks) to forecast future incidence cases. On the other hand, the Liberia outbreak, which is 45 weeks total, can be projected from as little as 12 weeks of early data. This inverse modeling approach could be adapted to analyze outbreaks of other disease system, e.g., avian influenza, Middle East Respiratory Syndrome (MERS), and Zika. One could envision disease forecasts for an emerging outbreak in real time with updates every few weeks or so (depending on the disease system).

In conclusion, we have illustrated the potential of rigorous mathematical approaches for generating reliable parameter estimates and useful epidemic forecasts in the context of the generalized Richards equation, a simple phenomenological 4-parameter model. Our results here suggest that carefully linking mathematical models with regularization techniques could lead to stable parameter estimation and associated forward projections.