SciSurf: Index of 'A Bayesian Ensemble Approach for Epidemiological Projections'

A Bayesian Ensemble Approach for Epidemiological Projections

Tom Lindström, Michael Tildesley, Colleen Webb

Published in PLOS Comp. Biol., April 2015

Abstract

However, different models and model parameterizations may provide different prediction of outcomes. In other fields of research, ensemble modeling has been used to combine multiple projections. We explore the possibility of applying such methods to epidemiology by adapting Bayesian techniques developed for climate forecasting. We exemplify the implementation with single model ensembles based on different parameterizations of the Wan/Vick model run for the 2001 United Kingdom foot and mouth disease outbreak and compare the efficacy of different control actions. This allows us to investigate the effect that discrepancy among projections based on different modeling assumptions has on the ensemble prediction. A sensitivity analysis showed that the choice of prior can have a pronounced effect on the posterior estimates of quantities of interest, in particular for ensembles with large discrepancy among projections. However, by using a hierarchical extension of the method we show that prior sensitivity can be circumvented. We further extend the method to include a priori beliefs about different modeling assumptions and demonstrate that the effect of this can have different consequences depending on the discrepancy among projections. We propose that the method is a promising analytical tool for ensemble modeling of disease outbreaks.

Author Summary

However, different projections may be made, depending on the choice of models and parameterizations. Ensemble modeling offers the ability to combine multiple projections and has been used successfully within other fields of research. A central issue in ensemble modeling is how to weight the projections when they are combined. For this purpose, we here adapt and extend a weighting method used in climate forecasting such that it can be used for epidemiological considerations. We investigate how the method performs by applying it to ensembles of projections for the UK foot and mouth disease outbreak in UK, 2001. We conclude that the method is a promising analytical tool for ensemble modeling of disease outbreaks.

Introduction

Faced with such uncertainty, policy makers must still make decisions with high stakes, both in terms of health and economics. For instance, global annual malaria mortality was recently estimated at around 1.1 million [2] and to optimize control efforts, policy makers must make seasonal predictions about spatiotemporal patterns [3]. The prospect of an emergent pandemic influenza outbreak remains a global threat and emergency preparedness must evaluate the costs and benefits of control measures such as border control, closing of workplaces and/or schools as well as different vaccination strategies [4]. Livestock diseases are major concerns for both animal welfare and economics. As an example, the United Kingdom (UK) 2001 outbreak of foot and mouth disease (FMD) involved culling of approximately 7 million animals, either in an effort to control the disease or for welfare reasons, and the total cost has been estimated at £8 billion [5]. To minimize the size and duration of future outbreaks, various strategies for culling and vaccination must be compared [6—8]. As a tool to address these challenging tasks, mathematical models offer the possibility to explore different scenarios, thereby informing emergency preparedness and response to epidemics [13-12].

Forecasting aims at estimating what will happen and can be used for example to predict seasonal peaks of outbreaks [3,14] or to identify geographical areas of particular concern [15]. Projecting, which is the main focus of this study, instead aims at comparing different scenarios and exploring what would happen under various assumptions of transmission, e.g. comparing the effectiveness of different control actions [7,16—19].

Typically, dynamic models are constructed and outbreaks are simulated repeatedly, thus generating predictive distributions of outcomes [1,17,18,23]. This variability in outcomes caused by the mere stochasticity of the transmission process includes one level of uncertainty, but still only relies on a single set of assumptions about the underlying disease transmission process. However, multiple assumptions can often be justified, leading to further uncertainty in the predictions. For instance, different models may have different projections because of different assumptions about transmission or because they incorporate different levels of detail. It may also be informative to explore different projections in terms of different parameterizations of a single model, for example corresponding to worst or best case scenarios. Faced with a set of projections, an important issue is how to combine these in a manner such that they can be used to inform policy.

The key concept is that rather than relying on a single set of assumptions, a range of projections is used for predictive purposes. For instance, climate forecasting has employed ensemble techniques to account for uncertainty about initial conditions, parameter values and structure of the model design when predicting climate change [24,25]. Weather forecasting has been improved by combining the results of multiple models [26,27]. Similarly, hydrological model ensembles have been demonstrated to increase reliability of catchment forecasting [28] and have been used to assess the risk of flooding events [29]. Ensemble methods have also proven to be a powerful decision tool for medical diagnostics [30,31] and ecological considerations including management [32] and prediction of future species distribution [33].

However a few implementations exist, commonly by feeding climate or weather ensembles into disease models. Daszak et al. [34] coupled a set of climate projections to an environmental niche model of Nipah virus to predict future range distribution of the virus under climate change. Similarly, Guis et al. [35] investigated the effect of climate change on Bluetongue emergence in Europe by simulating outbreaks under different climate change scenarios. Focusing on a shorter time scale, Thomson et al. [3] used an ensemble of seasonal forecasts to predict the spatiotemporal pattern of within seasonal variation in malaria incidence. These studies all used a single disease model projection, coupled to an ensemble of climate or weather forecasts and the use of structurally different epidemiological models are to our knowledge still rare. However, Smith et al. [36] compared different malaria vaccination strategies by implementing an ensemble approach with different alterations of a base model. Also, in order to estimate global malaria mortality, Murray et al. [2] used weighted averages of different predictive models.

For that purpose however, there is a need for methods that combine multiple projections. A central issue in ensemble modeling is how to weight different projections, and we envisage four main procedures for this. Firstly, all models can be given equal weights. For instance, the IPCC 2001 report on climate change [37] used a set of climate models and gave the range of probable scenarios by averaging over different models and uncertainty by envelopes that included all scenarios. Gardmark et al. [32] used seven ecological models for cod stock and argued that in order to prevent underestimation of uncertainty, weighted model averages are not to be used and when communicating with policy makers, it is preferable to present all included projections as well as the underlying assumptions behind them. A similar approach was also used by Smith et al. [36] , who presented the prevalence of malaria under different vaccination strategies by medians of individual models and the range of the whole ensemble.

To our knowledge, no ensemble study has implemented weights based exclusively on expert opinion, but Bayesian model averaging can incorporate expert opinion as a subjective prior on model probabilities [38]. This approach relies on engaging stakeholders and communicating the underlying assumptions of the projections.

This approach was implemented by Raisanen and Palmer [39] , who used cross-validation to weight climate models. As a more informal approach to the use of model consensus, the third IPCC report excluded two models because these predicted much higher global warming than the rest of the ensemble, thus acting as outliers [24].

If all models are fitted to the same data using likelihood based methods, weights can be given directly by Akaike or Bayesian Information Criterion (AIC or BIC) [40,41]. In the FMD context, this may be a suitable approach if all included models are data driven kernel models that estimate parameters from outbreak data, such as those proposed by Iewell et al. [42] or Tildesley et al. [43]. However, such weighting schemes would be unfeasible when including detailed simulation models that rely on a large number of parameters, that are determined by expert opinion or lab experiment, such as AusSpread [44], NAADSM [45] and InterSpread Plus [46]. We propose that the future of ensemble modeling for epidemiology will benefit from combining structurally different model types, and methods of weighting need to handle both kernel type models as well as detailed simulation models.

Such methods have been developed within the field of climate forecasting. Giorgi and Mearns [48] introduced a formal framework in which model weights were assessed based on model bias compared to observed data as well as convergence, i.e. agreement with the model consensus. Tebaldi et al. [47] extended the approach to a Bayesian framework. This approach is appealing because it provides probability distributions of quantities of interest, hence uncertainty about the projected outcomes may be provided to policy makers. As such, it would be a suitable approach also for epidemiological predictions.

Tebaldi et al. [47] points out that lack of data at fine scale resolution is a limiting factor for climate forecasting. Yet, at courser resolution climate researchers have access to long time series of climate variables to assess model bias. Comparable data may be available for endemic diseases, such as malaria [36] or tuberculosis [49] , or seasonally recurrent outbreaks, such as influenza [14] or measles [50]. However, for emerging diseases, long time series would rarely be available, making the lack of data an even bigger issue for epidemiology.

[47] for epidemiological projections. The Tebaldi et al. methodology focus on ensembles where projections are made with different models and our aim is to provide a corresponding framework for disease outbreak projections. To investigate the potential of the framework for epidemiology, we here use variations of a single model as a proxy for different models, thus allowing us to investigate how the methodology performs under different levels of discrepancy among projections in the ensemble. We exemplify the implementation by using the UK 2001 FMD outbreak and projections modelled by different parameterizations of the Warwick model [7,9].

In addition, livestock on farms that were identified to be at high risk of infection were culled as either traditional dangerous contacts (DCs) or contiguous premises (CPs). CPs were defined as “a category of dangerous contact where animals may have been exposed to infection on neighboring infected premises” [5,8]. We start by focusing on ensemble prediction of epidemic duration under the control action employed during the 2001 outbreak compared with an alternative action that excludes CP culling. We investigate sensitivity to priors and explore a hierarchical Bayesian extension of the method to circumvent potential problems with prior sensitivity. We also explore the potential of including subjective a priori trust in the different modeling assumptions and extend the methodology further to allow incorporation of multiple epidemic quantities, here exemplified by adding number of infected and culled farms to the analysis. Through a simulation study, we finally explore the capacity and limitations of the proposed ensemble method, pinpointing some important features of ensemble modeling

Materials and Methods

In the ensemble, each action is simulated under different modeling assumptions about the underlying process, eXpressed as either different models or, as in the example described here, different parameterizations of the same model. We refer to the combination of control action and modeling assumption as a projection. Each projection is further simulated With several replicates, which produce different outcomes merely due to the stochasticity of simulation models. We are also interested in how discrepancy among projections influences the performance of the weighting method and refer to sets of different projections as different ensembles with small and large discrepancy. A flow chart that demonstrates the relationship between different concepts and weighting schemes are presented in Fig 1.

The Warwick model, control actions and ensembles

This model was developed in the early stages of the 2001 FMD outbreak by Keeling and coworkers to determine the potential for disease spread and the impact of intervention strategies [9]. Here, we utilized a modified version of the model used in 2001, and we briefly describe relevant aspects of the Warwick model with regard to ensemble modeling. Full details of the model can be found in [2,2]. The rate at which an infectious farm I infects a susceptible farm I is given by: Where is the susceptibility of farm I and is the transmissibility of farm I and K(dU) is the distance-dependent transmission kernel, estimated from contact tracing [9]. In this model Z5, 1 is the number of livestock species 5 (sheep or cow) recorded as being on farm I, SS and T5 measure the region and species-specific susceptibility and transmissibility, d1] is the distance between farms I and I and K(dU) is the distance dependent transmission kernel. The parameters, ps, 3, pc, 5, p5, T and p6, T, are power law parameters that account for a nonlinear increase in susceptibility and transmissibility as animal numbers on a farm increase. Previous work has indicated that a model with power laws provides a closer fit to the 2001 data than when these powers are set to unity [43,51,52].

Region-specific transmissibility and susceptibility parameters (and associated power laws) capture specific epidemiological characteristics and policy measures used in the main hot spots of Cumbria, Devon and the Welsh and Scottish borders. The model is therefore fitted to five regions: Cumbria, Devon, Wales, Scotland and the rest of England (excluding Cumbria and Devon). A table listing all the parameter values used in this model is given in Tildesley et al. [43]. In order to obtain multiple modeling assumptions for ensemble modeling, we specified different transmission rates, 1' as where k1,- and k2,- are constants, specific for each modeling assumption, that scale the transmissibility and the spatial kernel, respectively. k1,- : k2,- = 1, follow the parameterizations of Tildes-ley et al. [43] and by decreasing or increasing these constants, we obtain parameterizations that correspond to best or worst case expectations about the transmissibility and spatial range of transmission. We are interested in how the level of discrepancy among modeling assumptions influences the performance of the ensemble method and we therefore created two different ensembles with different k1,- and k2,, as listed in Table 1. We refer to these as the large and small Table 1. Scaling constants k1,- and k2,- used for each modeling assumption i in the small and large dis-Small discrepancy Large discrepancy discrepancy ensemble, corresponding to large and small differences, respectively, in the underlying modeling assumptions used for projections.

CPs were defined as farms that share a common boundary and were determined on an indiVidual basis. The model was seeded with the farms that were predicted to be infected prior to the introduction of movement restrictions on the 23rd February. For each modeling assumption 1' and control action, 200 replicates were simulated and each simulation progressed until the epidemic died out.

Adapting the Tebaldi et al.

method for emerging diseases

[47] approach for epidemiological considerations we initially focus on outbreak duration. This is often considered to be the most costly aspect of FMD outbreaks due to its implication for trade [53]. In section 2.7 we extend the methodology to multiple epidemic quantities. However, the outbreak duration example offers transparent transition from the original climate analysis of Tebaldi et al. [47] that considers the ensemble estimated difference between current and future mean temperatures. In order to introduce the framework to epidemiology, we consider the difference between the implemented and an alternative control action, attempting to show whether the inclusion of CP culling was an appropriate choice given the conditions at the start of the outbreak. As this is a post outbreak analysis, we know the final outbreak duration of the observed outbreak, but that is just a single realization and due to the stochastic nature of disease transmission, the exact outcome may be quite variable. We also have no observed outbreak under the alternative control action to compare with the implemented control. Under these conditions, the most appropriate quantities to compare are the mean duration of a large number of outbreaks under the two control actions, something that can only be achieved through epidemic modeling.

Using the Bayesian method of Tebaldi et al. [47], weights as well as statistics of the outbreak, like duration, are considered unknown random variables, and we denote the mean outbreak duration under the implemented and the alternative control action as [,4 and v, respectively, corresponding to the mean current and future temperature, respectively, for the climate application. In order to fit with the normal assumptions of the method, we consider log-duration in the analysis.

., in, With l,- denoting the precision of modeling assumption 1'. The projection specific parameter x,- indicates the mean of all replicates under the implemented control action (analog of current climate) for modeling assumption 1'. For the UK 2001 outbreak this included culling of IPs, DCs and CPs. The corresponding projection for the alternative control action (analog of future climate), that included culling of IPs and DCs is denoted yi. The relationship between projections and ensemble parameters is eXpressed as with Normal([,t,7t,-'1) denoting the normal distribution with mean [,4 and variance M1. Parameter 9 is included to allow for difference in overall precision of the modeling assumptions under implemented and alternative control actions. However, since projections x,- and y,- are based on simulations, it is fair to assume that modeling assumption 1' that has a high precision for the observed control action also will do well for the unobserved action. This is incorporated by the 7»,term in both Eqs 5 and 6. For the same reason, we may eXpect that a projection of a large x,also corresponds to a large value for y,- and thus fl is included to allow for correlation between corresponding projections for the two control actions; a projection that e.g. over-predicts duration of the outbreak for the observed control action can be eXpected to also over-predict the alternative control action. The analysis of Tebaldi et al. [47] also assessed bias of projections by their ability to reproduce observed current climate by incorporating the relationship between observed current climate xo, an unobserved true mean climate variable (/4) and the precision of natural variability T0 through

That would rarely be the case for the corresponding epidemic considerations, at least for emerging diseases. Using a single outbreak to evaluate bias, we clearly have no way of assessing variability in outcomes. We therefore include To as an unknown, random variable that is estimated in the analysis as described in the following section. To aid the interpretation and transfer from the climate to the epidemiological interpretation, we have included Table 2 that lists the variables used in the analysis.

Stochasticity and variability

However, it is clear that in addition to the mean duration of the outbreak, the uncertainty about the process also results in some variability in the outcomes that we need to consider. The stochastic simulations used to generate projections provide not only a mean simulated outbreak quantity, but also a range of outcomes that projects the variability. In the absence of repeated outbreaks to evaluate variability of outcomes, an obvious choice would be to use this information to inform the variability To. Defining the variability T,- as the precision of projections under the implemented action for modeling assumptions 1' = 1,2,. . .,n we include a hierarchical structure in the analysis so that for i = 0,1,2,. . .,n where Gamma(aT, (9,) indicates the gamma distribution with shape parameter a, and rate 19, both of which are unknown parameters and are estimated in the analysis. Thus, as it would be cumbersome to elicit a fixed prior for To based on our prior expectations about variability, we instead assume that To comes from some, unknown distribution, and make use of T1, 72,. . ., Tn to inform what this distribution should be. Similarly, we need to model the variability of projections under the alternative control action, and denoting this (p,- we specify The parameters a and 19¢ are conditionally independent from all other parameters in the analysis and can be niodelled separately in the Bayesian analysis. As xi, yi, T,- and (p,- are calculated from a finite number of realization with each modeling assumption and control action, there is some uncertainty related to this. Tebaldi et al. [47] however points out that while it is certainly possible to construct a Bayesian model that takes this uncertainty into account, the effect is minimal if the number of replicates is large. With the R = 200 replicates preformed here, the uncertainty of the mean will in practice have very little effect, and we have included xi, y,, T,and (p,- as fixed observations.

Similarly, the priors for ago and 19¢, were specified as a gamma distribution with shapes AW, and A W, respectively, and rates BW and Bbgo, respectively. We eXplored different parameter choices for the hyperpriors and found that the results were insensitive to the choice of prior for a wide range of values. In the analysis presented, we used A“, 2 Ab, 2 Ba, = 31,, = A“, AW, 2 Bag, 2 BM, 2 0.001. This corresponds to prior distributions with a mean of one and a variance of 1000, thus allowing for a wide range of plausible values.

Bayesian model

We follow Tebaldi et a1. [47] With priors specified as uniform on the real line for {4, v, and fl, and ki~Gamma(ax, bx) for i = 1,2,. . ., n and 9~Gamma(a9, be). We also need to specify hyperpriors for a, and b1, and we

in that we include T0 as an unknown variable and use a hierarchical structure for its prior. Using Markov Chain Monte Carlo (MCMC) techniques as described in 2.9, we first performed the analysis with priors as specified by Tebaldi et al. [47] where applicable, i.e. a ,1 = b}, = a6 2 be 2 0.001, because they argue that this ensures that the prior contributes little to the posteriors.

In particular 2»,could be eXpected to be sensitive to priors because it is essentially only fitted to two data points, x,- and y,. Yet, based on approximations of conditional distributions, Tebaldi et al. argued that the gamma distribution with a ,1 = (9,1 = 0.001 is appropriately vague for the analysis. For transparency we here follow their approach and investigate the effect of the prior for the simplified model where fl = 0. The mean of the conditional distribution of l,- is then approximated by Where [1 is the conditional mean of the distribution of {4, given by and 17 the corresponding value for 1/, given by We stress that 21,- need not sum to one, as might be intuitive when using weights. As given by Eqs 11 and 12, the mean of [,4 and v only depends on the relative values of 11-, but the absolute values influence the width of the distribution, with the variance of the conditional distributions increasing with lower absolute values of 2»,- (Table 3).

—> [i and y —> v, the denominator actually approaches 19,1. Hence, to ensure that a low value of b}, can be considered vague such that our posterior is informed primarily by x0, x and y, we must conclude that |xi — [1| or lyi — 17| is clearly separated from zero. However, if li>>lj for all 1' 7E j and li>>T0, then [i w x1. and 17 % yl. and nothing in the model prevents this relationship. In fact, if we consider the gamma prior with shape and scale parameters set to 0.001, the distribution has most of its density near zero, however with a fat tail (yet exponentially bounded) that allows for high values. In the current analysis, this corresponds to the prior belief that the majority of modeling assumptions will have very low precision while a few will have very high. Under this prior belief, it is eXpected that for some model i, li>>lj for all 1' 7E j. In the instance where Table 3. Conditional distributions used for updating via Gibbs sampling for the single quantity example. Parameter Conditional distribution AI Gamma(al + 1, 5M- + for 5,1,. = bl for all modeling assumptions i in the instead To>>li for all i, then [r w x0 and the approximation would hold, but we cannot expect that relationship. As we cannot a priori be sure that the choice of a A and (9,1 does not influence our posterior as long as they are arbitrarily small, we performed a prior sensitivity analysis and reran the analysis with al 2 (9,1 = 0.01 and a; = (91 = 0.0001. We could eXpect that the sensitivity to priors depends on the difference among modeling assumptions, and we investigate this by analyzing ensembles with little and large discrepancy between assumptions in the ensemble as given by Table 1. We refer to this as the Non Hierarchical Weighting (NHW) method.

Standard Hierarchical Weighting model

Using prior beliefs is sometimes desirable, and in section 2.6 we consider the situation where we trust some modeling assumptions more than others. However, it would rarely be the case that we would have substantial expectations that could inform the shape, a ,1, and scale, (93,, of the prior for A. A potential solution might be to extend the model to include hierarchical priors such that the prior for l,- is estimated in the model rather than a priori fixed. We make a slight change to the parameterization of the prior distribution such that i.e. specifying the distribution by its mean m1 and shape all, which are estimated in the model. In that way, we move our uncertainty up a level and express our beliefs about the distribution of mg and a ,1, rather than 1. Using ml rather than 191 facilitates the specification of a prior for the mean precision parameter that corresponds to the priors previously specified on individual 1,. This parameterization further aids the use of prior beliefs about weights in section 2.6. While we can never be strictly uninformative in Bayesian analysis, the hierarchical prior can allow for a wide range of plausible ml and a ,1 whereas the model presented in section 2.4 requires these to be specified explicitly. This also allows for the concept of “borrowing strength” [54], such that the distribution of each i,- can be indirectly informed by all other precisions via the hierarchical distribution. This is often beneficial in situations Where individual parameters are fitted to a small amount of data [55,56] , Which clearly is the case for 11- here. To extend Eq (10) to a hierarchical model, we include hyperpriors such that and

We refer to this as hierarchical sensitivity setup one. Secondly, we performed a sensitivity analysis, hierarchical sensitivity setup two, Where we fixed the shape parameters Ad ,1 = Am; = 0.001 and only varied Bag 2 BM, again set to either 0.01, 0.001 or 0.0001. We refer to this as model as the Standard Hierarchical Weighting (SHW) method.

Informative Hierarchical Weighting model

For instance, a policymaker might have more trust in one model type over another, and rather than excluding the models that are considered less reliable (i.e. giving them a priori zero weigh), it could be useful to include them, yet with less contribution to the ensemble.

For the analysis with fixed a A and (93,, described in section 2.4, we could merely elicit a different scale parameter b; for each 11-, such that modeling assumptions with high trust are given a low value. However, with the shape parameter a ,1 set to a low value (“vague” shape), the prior may have little effect on the posterior 11-. Eliciting a high value of a1 would instead result in a posterior that is merely the results of our prior beliefs and we have no foundation for which to elicit some intermediate value. In order to combine the hierarchical approach with informative priors, we propose a modification of the analysis presented in section 2.5, where the assumption of exchangeability is relaxed in the hierarchical structure with where 161,1. 2 Wim/I and wz- indicates the a priori trust in modeling assumption 1'. With wz- = kwj, the prior distribution of 2,- will have a mean that is k times that of 21- and from Eqs (12) and (13) the relationship also implies that before A is estimated (i.e. involving the data x0, x and y), the outputs of modeling assumption 1' will contribute k times as much to [,4 and v as does assumption j.

Given that the density at percentiles 2.5 and 97.5 then is 0.15 of that at the mode, we specify w,- = 0.15 for i = 2, 3, 4 and 7, i.e. for modeling assumptions where one of the varied parameters follows the most likely scenario, whereas the other one is set to either worst or best case. With the same rationality, we specify wz- = 0.021 for i = 5, 6, 8 and 9, i.e. modeling assumptions where both parameters follow either best or worst case expectations.

Consistently, modeling assumptions 1' = 5 predicted the shortest duration for all actions and ensembles. We therefore also performed the analysis with W5 2 1 and W1 2 0.021, and all other weights are as above. This allows us to investigate the performance of the informative weighting method when an outlier is up-weighted. We refer to this method as the Informative Hierarchical Weighting (IHW) method.

Multiple epidemic quantities

work [47] that focused on temperature. For epidemiology, it may however be useful to consider multiple epidemic quantities. This could be done in different ways, but here we offer a straightforward multi-quantity extension of the Bayesian model for the single epidemic quantity, based on the supposition that the relative weights are equal for all quantities. As such, we implement a single weighting parameter A, shared among all quantities. For other parameters, we use a similar notation as for the single quantity analysis, but give many of the parameters an additional indeX q, indicating that the parameter is quantity specif1c. We expand the Bayesian model by defining where xi, q and yi, q are the mean projections of modeling assumption 1' for epidemic quantity q for the implemented and alternative control action, respectively, and x0, q is the corresponding observed value. As for the single epidemic quantity example, yqand anre the eXpected values of quantity q, and because we cannot eXpect to have the same correlation between control actions for all quantities, fiq is included as unique for each q. Parameters 9% q and 01,, q scales the precision of models between actions and quantities and the parameters of the Bayesian model are identifiable by defining 9H, 1 = 1. Similarly, we specify quantity specific parameters The conditional distributions for the multi-quantity extension are provided in Table 4. We denote the total number of quantities in an analysis as Q.

Analysis of simulated outbreaks

However, a limitation to this approach is that we are confined to investigating the behavior of the ensemble methodology for that particular outbreak. To further investigate the potential and limitations of the proposed methods, we also performed analysis of simulated outbreak data. With simulated data, we have “true” estimates of [,4 and v, and we want to explore the ability of the ensemble to predict these under two different conditions; when the true values lies within the range of X and Y predicted by the Table 4. Conditional distributions used for updating via Gibbs sampling for Q epidemic quantities. Parameter Conditional distribution individual models of the ensemble and When it does not. For multi-model ensembles, this corresponds to the situation Where the true behavior of the outbreak is encapsulated Within the range of underlying assumptions of the individual models and When it is not.

This simulates outbreaks Where the true behavior of the outbreak is encapsulated Within the range of underlying assumptions of the individual projections for both ensembles. We also simulate outbreaks With a parameterization Where both k1 and k2 are set to 0.9. This produces outbreaks Where the true behavior is outside of the assumptions of the projections for the small discrepancy ensemble, yet inside the range of the large discrepancy ensemble.

We therefore apply both the small and large discrepancy ensembles to ten realizations of each of the simulation parameterizations. We implement both the single and multiple epidemic quantity analysis, thus further highlighting the effect of using multiple quantities.

Computation

For many parameters, the conditional distribution belongs to a standard parametric family, thus allowing for Gibbs sampling. We list these conditional distributions in Table 3 for single quantity analysis and Table 4 for multiple quantities. We also rely on Metropolis-Hastings (M-H) updates, and with the computation used for multi-quantity analysis being a straightforward extension of that used for the single quantity, we start by describing the update scheme for the single quantity analysis. The conditional dis-would allow for Gibbs sampling of 197, whereas M-H updates need to be implemented for 617. We however found strong correlation between the marginal posterior estimates of a1 and b1, and mixing was improved by performing joint M-H updates of these parameters by multivariate Random Walk (RW) proposals. MiXing can be further improved by updating parameters on a transform that resembles a Gaussian distribution, and we therefore performed updates on the log-transform, i.e. based on current values of a, and 191 We proposed candidate parameters [log(a:), log(b:)] from MVN( [log(aT),log(bT)] 2,). Here MVN indicates the multivariate normal distribution and ET the covariance matriX. Candidate values are accepted with the probability

However, this is not known prior to the analysis. We therefore implement an optimized method of the Robbins-Monroe search process as presented by Garthwaite et al. [58]. This estimates the covariance during the MCMC and finds the scaling parameter p such that Z, = pCI) provides a chosen long term acceptance rate, here set to 0.234 based on suggestions by Roberts et al. [59]. The method has been demonstrated to be appropriate also for RW on transformed scales of the parameters [60]. ed them with probability We also found strong correlation between [,4 and v. In order to improve miXing, we repeated the updates of these parameters five times for each iteration of the MCMC. The same update scheme was used for the multi-quantity consideration, yet With a separate 2,, q, 21¢, q and 2;, q adaptively estimated for each quantity q. setts, United States) and code is available as supplementary information (81 File).

Results

Single quantity analysis

Fig 2, panels A and B show the estimates of outbreak duration for the two control actions for the large discrepancy ensembles using the NHW method and reveals rather large prior sensitivity. Note that we plot marginal posteriors of M = e” and N = e”, respectively. As such, the posteriors represent the geometric mean outbreak duration. The corresponding arithmetic mean can be calculated as e’HI/(ZTO) and eVH/ (2%), respectively, yet we use the geometric mean as it more clearly shows the relationship with individual projections, here presented by x,- 2 ex,- and Yz- = eyi, respectively. For a; = (9,1 = 0.0001 (solid gray lines), the distributions are multimodal with peaks at individual model predictions, whereas a more smooth shape is obtained for a A = b}, = 0.01 (dashed black lines) and a A = b; = 0.001 (solid black line) yields an intermediate result. With the SHW method, we instead obtain posteriors that are largely insensitive to the choice of hyperprior. Fig 2, panels E and F present the result of sensitivity setup one, showing near identical posterior estimates when hyperparameters A“, Bag, Am ,1 and Bml are set to 0.01, 0.001 or 0.0001. Sensitivity setup two produced results that were visually indistinguishable from panels E and F and are not presented. This further indicates that the hierarchical method is robust to the choice of hyperpriors.

We therefore consider the posterior predictive distributions of individual outbreak durations under the two control actions. For the non-hierarchical model (Fig 2, panels C and D), there is an obvious effect of the choice of prior with higher probability of long outbreaks for lower values of a ,1 and b). For the hierarchical model (Fig 2, panels G and H), there is again little difference among posteriors corresponding to different priors.

In the example presented here, this estimates how much longer the outbreak would have been if culling of CPs had been excluded from the control. As shown in Fig 3, the estimates are again sensitive to the choice of prior with the NHW method, yet insensitive with the SHW method. The range of the posterior under the NHW method is less sensitive to the prior for the low discrepancy ensemble (panel B) than for the large discrepancy ensemble (panel A), where higher probability of less difference is estimated with a A = b), = 0.01 than for a A = (91 = 0.0001. However, the multimodal behavior of the NHW method with low values of a A and (91 is obtained also for the low discrepancy ensemble.

NHW, large disc.

NHW, small disc.

A B

SHW, large disc.

SHW, small disc.

C D

Ensemble predicted difference between control actions. Posterior predictive distributions of the difference in outbreak duration between implemented and alternative control actions forthe 2001 UK FMD outbreak using the NHW method (panels A, B) and the SHW method (panels C, D,). Results are shown for large (panels A, C) and small (panels B, D) discrepancy among projections in the ensemble. Marginal posterior estimates correspond to different priors, with prior parameters all = b/l = 0.01 (light gray), all = b/l = 0.001 (black) and all = b/l = 0.0001 (dark gray) for the NHW method (panels A, B) and Aa/l = AbA = Ball = BbA set to 0.01 (light gray), 0.001 (black) and 0.0001 (dark gray) forthe SHW method (panels C, D). Due to high similarity of estimates, the plots are largely overlapping forthe hierarchical method.

When using a priori higher weights for the most likely scenarios (modeling assumption one; black dotted lines), the posterior estimates are shifted and become more centered on projections of that particular modeling assumption compared to the case where a priori weights are equal (black solid line). The outcome of up-weighting the outlier (modeling assumption five; solid gray lines) is however different between the two ensembles. For the small discrepancy ensemble (panels A and B), similar results are found as for the up-weighting of the most likely scenarios; posteriors are shifted towards the projection with a priori high weight. For the high discrepancy ensemble (panels C and D), the posterior estimates of outbreak duration instead become wider for both control actions, indicating larger uncertainty about the eXpected duration of outbreaks.

When using a priori equal weights, there is little difference in the estimates for the small discrepancy ensemble (top left panel) whereas moderate differences are obtained for large discrepancy (bottom left panel). Note that while the error bars are overlapping, the mean estimate of the most likely scenarios lm . control, small disc. Alt. control, small disc.

X5 X4 X6 X2 .. Y5 Y4 Y2 Y6 Y1 Y3 Y8 v 200 300 400 200 300 400 Imp.

control, large disc.

Alt.

control lar e disc.

Ensemble prediction of expected outbreak duration with the IHW method. Panels A and C show ensemble estimates of mean duration for the implemented control action of the 2001 UK FMD outbreak under small and large discrepancy ensemble, respectively. Panels B and D shows the corresponding estimates for an alternative control action. Marginal posterior distributions indicate the expected outbreak duration estimated by the ensemble with a priori equal weights (solid black line), up-weighting of the most likely scenarios (i = 1; dashed black line), up-weighting of the outlier (i = 5; solid gray line). Observed outbreak duration is indicated by X0 and predicted means underthe implemented and alternative control action are indicated by X1, X2,. . .X9 and Y1, Y2,. . ., Y9, respectively. (modeling assumption one) is approximately 1.7 times as large as that of the outlier(modeling assumption five), meaning that the former will contribute approximately 1.7 times as much to the posterior means of [,4 and v than the latter (Eqs (12) and (13)).

For the up-weighting of the outlier, projections corresponding to modeling assumption five, the same is found when there is little discrepancy among projections (lower right panel). This is however not found for the high discrepancy ensemble (top right panel), where the main effect is that compared to equal a priori weights (top left panel), the error bars are wider; this indicates larger uncertainty about weights and consequently about the contribution of individual projections to the posterior estimates of outbreak durations.

Multiple quantity analysis and simulated outbreaks

Here we aim to illustrate the effect of using multiple quantities and focus on the SHW scheme. Fig 6 plots the marginal posterior density of mean outbreak duration under the two control actions as estimated for the multiple quantity analysis (solid) together with the corresponding estimates for the single quantity analysis (dashed). The figure illustrates how inclusion of multiple quantities in the analysis leads to tighter distributions, centered on projections for i = 1. The multi-quantity analysis produces a probability distribution of all considered quantities, and Fig 7 further illustrates how the marginal posterior densities are located above zero for all three considered quantities. To illustrate the performance of the method under different conditions, we also analyzed simulated outbreaks. Fig 8 shows the posteriors of mean duration for outbreaks simulated with

Imp.

control Alt.

control

Ensemble prediction of outbreak duration with Q = 1 and Q = 3. Posterior estimates of mean outbreak duration for the implemented (Panel A) and alternative (Panel B) control action, with the dashed line indicating the posteriors corresponding to the single quantity analysis and the solid line indicating the corresponding posterior when number of infected and control culls are included. The figure shows the result of the large discrepancy ensemble.

Note that individual realizations, indicated by stars, are expected to frequently be outside of the credibility envelopes. Error bars are inclusive of the true mean outbreak duration (dashed lines) for all ten analyzed realizations for both the implemented and alternative control actions. However, the credibility envelopes are tighter and medians closer to the true value for the multi-quantity analysis. This indicates that the ensemble prediction is improved by including multiple quantities.

As With Fig 8, credibility envelopes are tighter for the multi-quantity analysis. The error bars of the small discrepancy ensemble that all rely on simulations With parameterizations With higher k1 and k2 than the true value, are not inclusive of the true value, indicating that the small discrepancy ensemble fails in predicting the true values of the outbreak.

Discussion

Within weather forecasting, the approach has given more robust predictions, and we could eXpect that to be the case for epidemiology as well. However, there is a need for the development of methods describing how to combine several epidemiological projections. The aim of this study was to investigate the possibility of using the Bayesian framework introduced by Tebaldi et al. [47]. We find that it is a promising approach, for primarily three reasons. Firstly, when the methodology is implemented in a hierarchical Bayesian framework, it provides an appealing interpretation of model exchangeability. Essentially, projections and their underlying modeling assumptions are treated as random draws from a population of possible 0 1000 2000 3000 Difference in nr. infected farms

o 500 10 1500 Difference in nr.

culled (uninfected) farms

Posterior estimates of difference between controls. The figure shows the marginal posterior predictive estimates of difference in outbreak duration (Panel A), number of infected farms (Panel B) and number of control culls (Panel C) between the implemented and alternative control action. The figure shows the result of the large discrepancy ensemble.

By estimating the hierarchical parameters a A and m}, jointly with individual preci-sions (weights) 1,, the characteristics of this hypothetical population are estimated. Smith et al. [61] used a similar approach for climate ensembles and pointed out that this reduces the impact of which models are included or excluded in the ensemble. That is, we should eXpect to get similar results when using a different set of model assumptions if they are chosen independently. We stress that this interpretation is more valid for multi-model ensembles, however. Also, the term “random draws” should not be interpreted as arbitrary. Rather, the interpretation is that models should come from a population of well-informed, reasonable models. The analysis treats the outputs of the performed simulations under different assumptions as data (Eq 1, Table 1), and as such they are used to inform the quantities of interest (1,1 and v). This may seem counterintuitive, yet it only serves as a formal means to combine the results of multiple projections, and by Eqs (7) and (20), these are combined with available outbreak data. Secondly, the framework can handle several different weighting schemes simultaneously. The original methods introduced by Tebaldi et al. [47] used convergence and bias to assess weights. Here, we further extend the framework such that informative priors can be included

A ale—9i.

B._e_-—*a—« n—¥QL« ._ei' .—"§=4I;u ._e-_-fi:.

ale—9i.

._eJ—' 'fi‘ n—ei' W n—Q'Zha deb.

.—e—'_%‘ I481—c I—eic .—'fi.

.—e—'_b n—W I—e—HE'

Analysis of synthetic data. Triangles and circles indicate the median posterior estimates of mean outbreak duration after analysis with the small and large discrepancy ensemble, respectively, underthe implemented (Panels A and C) and alternative (Panels B and D) control action. Results are shown for ten realizations of synthetic data simulated with k1 = k2 = 1. The error bars indicate the 95% credibility interval, the dashed lines the true values and the star the individual realization (expected to frequently lie outside the predicted mean). Panels A and B show the results of single quantity analysis (only outbreak duration) and Panels C and D the corresponding results of analyses where number of infected farms and control culls are included.

Epidemiological predictions suffer from lack of available data to assess model bias, and we propose that expert opinions will play a larger role than in other fields of research. With the analytical tool proposed here, a policymaker can choose to include a range of projections based on different modeling assumptions, yet give them different weights, rather than including one or a few (given a weight of one) and excluding others (given a weight of zero).

Analysis of synthetic data. Triangles and circles indicate the median posterior estimates of mean outbreak duration after analysis with the small and large discrepancy ensemble, respectively, underthe implemented (Panels A and C) and alternative (Panels B and D) control action. Results are shown for ten realizations of synthetic data simulated with k1 = k2 = 0.9. The error bars indicate the 95% credibility interval, the dashed lines the true values and the star the individual realization analyzed (expected to often lie outside the predicted mean). Panels A and B show the results of single quantity analysis (only outbreak duration) and Panels C and D the corresponding results of analyses where number of infected farms and control culls are included.

Importantly, our methods can incorporate these subjective beliefs in the hierarchical framework, requiring only the specification of the a priori relative confidence in the underlying assumptions of the projections. Definition of an individual, flxed prior would undoubtedly be cumbersome to elicit from expert opinion; it would not be feasible to ask policymakers to define an individual gamma prior for each modeling assumption.

Uncertainty about parameters will be an issue for most epidemiological models, and we propose that multi-model ensembles should incorporate projections with different models and different parameterizations. Thus, different mechanistic assumptions as well as parameter uncertainty would be incorporated in the ensemble.

It is important that communication with policymakers include uncertainties about prediction rather than just the most likely outcome [16]. In the ensemble context, these uncertainties take into account different assumptions about the transmission process. Gardmark et al. [32] suggested that uncertainty should be communicated with policymakers by presenting the full range of predicted outcomes. However, that would give equal weights to all included projections and would require that the results be communicated with a detailed description of all assumptions made, thus allowing the policymaker to decide how much to trust each modeling assumption. This would be a cumbersome task, particularly for detailed simulation models that rely on a large number of parameters. We therefore argue that it is beneficial to communicate the aggregated and weighted result as easily interpretable probability distributions. With further modifications of the methodology, we propose that the approach could also be used as a forecasting tool during an outbreak, e.g. by letting x,- and yz- denote current and future numbers of infected farms. In such a situation, there is a great need for rapid and clear communication of model results to aid policy decisions. The visual manner in which uncertainty is presented using probability distributions makes them easy to understand and communicate [62].

However, the impact of the prior is heavily reduced when using the hierarchical framework (Fig 2, panels EH, Fig 3, panels C, D). Thus, our results demonstrate that the hierarchical approach is preferred for ensemble modeling and using the non-hierarchical approach can lead to spurious conclusions. We argue that this would also be the case for other fields, such as climate ensembles, but it is likely to be a larger concern for epidemiology where data to modify the prior are fewer. Considering Eq (11), we could ensure that b has little contribution to the denominator if T0>>7ti for all i, ensuring that the prior has little contribution to the posterior. For climate considerations, we envisage that the precision of natural variability, To, would be large relative to each 2,- if bias is assessed by comparing model simulations to long time series of climate data. For epidemiological considerations, this would however rarely be the case. In the proposed method, we instead inform T0 largely by the simulation outputs, letting the projections of the ensemble determine how variable outcomes are.

Epidemiology is not only concerned with mean projections but also with other quantities such as the probability of very large or long outbreaks occurring. Fig 2, panels G and H illustrates the probability of a given epidemic duration occurring for a single outbreak under the two control actions with the preferred SHW method. Comparing the posterior predictive distribution to the density of merely lumping the results of all simulations, as illustrated by the colored bars, the posterior predictive distribution of the ensemble method has a lower probability of both very long and short outbreaks. This is because projections of such outbreaks are down-weighted when their bias is assessed in the analysis; the observed outbreak duration would be unlikely under the modeling assumptions that produce these projections. Thus, ensemble methods that give equal weights to all projections can overestimate the uncertainty about outbreaks, preventing the models from informing appropriate policy decisions.

Compared to climate models, epidemiology often has far less available data to assess model bias. As such, expert opinion will often play a larger role within this field. Fig 4 illustrates the behavior of the ensemble prediction under such informative priors. When up-weighting projections for i = 1, which is also likely under the observed outbreak duration, the posteriors are shifted towards these projections and produce tighter distributions. This is also found when up-weighting the outlier, i = 5, in the small discrepancy ensemble (Fig 4, panels A and B), in which no projection is particularly unlikely for the observed duration. Projection x5 is however unlikely in the large discrepancy ensemble. As a result, the effect of up-weighting the underlying modeling assumptions of this projection primarily makes the distribution wider, resulting from a larger uncertainty about individual weights (Fig 5). This is to be interpreted such that if expert opinions a priori determine that a modeling assumption that is unlikely to predict the observed data is better than other assumptions, the conclusions should be that there is less information in the ensemble as whole. However, when expert opinions are well informed and do not contradict with observed data, they can lead to more precise predictions.

A crucial difference between the original method applied to climate change and the epidemiological consideration presented here is that To is unknown for the latter and therefore needs to be estimated. We argue that in the absence of multiple outbreaks, it is sensible to inform this by the model simulations. Stochastic simulations are often used to estimate the range of outcomes for non-ensemble projections [1,17,18,23], and we propose that when extending the use of models to the ensemble context, they can be used to estimate this feature as well. We have therefore chosen a Bayesian model structure where To is informed largely by the within projection variability, T1, 12,. . ., Tn, via the Gamma(aT, 19,) distribution in Eq (8). All projections of the implemented control action contribute equally to this distribution in the method presented here, thus we are giving equal weights to all modeling assumptions in terms of informing T0. Estimation of different weights in terms of informing To based on a single outbreak, analogous to the estimation of A, would not be conceivable. However, if policymakers believe that some modeling assumptions are more reliable in terms of capturing the variability of outcomes, we envisage that the Bayesian model structure can be altered to include this. If applied to endemic disease, To could be informed similarly to the natural variability of temperature in climate application, and the algorithm we supply is set up to handle this situation. Also, data from multiple outbreaks could be used to inform T0 when available. Yet, data quality will rarely be comparable to climate data, which highlights one of the major challenges for epidemiological modeling.

Fig 5 shows that when adding number of infected and culled farms to the analysis, the marginal posteriors of outbreak duration become narrower and centered on x1 and y 1, i.e. the projections based on the most likely scenario. This illustrates that predictions can be improved by incorporating multiple quantities when assessing the weights.

However, Fig 7 illustrates the types of conclusions the method can provide. The three quantities we include in the multi-quantity analysis are all of great concern to policy makers when assessing the impact of control actions. The probability distributions represent the ensemble predicted difference in the outcome of the outbreak if the control action had excluded culling of CPs. The distributions all have most of the density above zeros, indicating that excluding culling of CPs would most likely have resulted in a prolonged and larger outbreak. We should however point out that these results are based on a single model. To make more robust predictions, we propose that the same type of analysis be made with projections of different models.

Fig 8 shows the result of analysis of ten simulated outbreaks with the parameterization in the center of the ensemble, i.e. k1 2 k2 = 1. As this is in agreement with both the small and large discrepancy ensemble, the true values (dashed lines) consistently lie within the 95% credibility intervals. However, when using the k1 2 k2 = 0.9 parameterization, the assumptions of the model used to simulate the outbreak is only inclusive of the large discrepancy ensemble, and consequently only the large discrepancy ensemble error bars are inclusive of the true values. Noting that we primarily use the different parameterization as a proxy for different models, this simple simulation example illustrates some obvious but essential points. Ensemble modeling should not be interpreted as a remedy for models based on poor assumptions about the modeled process. It offers the ability to combine multiple assumptions, thus integrating uncertainty with regards to this in the predictions. However, if all models are based on similar but inaccurate assumptions, ensemble modeling will not improve predictions. Intentionally making models similar to each other increases this risk and should be avoided if the models are to be used for ensemble purposes.

Thus, uncertainty with regard to this is incorporated in the predictions, which is important as projections of different models have been reported to deviate [63—65]. The use of multi-model ensembles would rely on collaboration of modeling teams, as well as overcoming confidentiality constraints in accessing outbreak data and population demographics. The current development in FMD modeling is seeing encouraging development in that area. The Quadrilateral Epiteam [19] has compared simulation of several outbreak scenarios in a subset of the UK demographics with five different models: NAADSM [45], Netherlands CVI [66], InterSpreadPlus [46], AusSpread [44] and ExoDis [67].

Combining the results of multiple models however requires means of weighting these. We conclude that the presented framework is a promising approach because it provides easily interpretable probability distributions of quantities of interest. It also offers an appealing interpretation of model exchangeability, while at the same time combining several different weighting schemes, including a priori beliefs when such are available.

The aim of the study has been to introduce the methodological framework to epidemiology and solve some key issues associated with this transfer, including prior sensitivity, informing weights by expert opinion, using models to inform the variability in the outcome of individual outbreaks and extension to consider multiple epidemic quantities. We have purposely chosen the simple example because it allows for a straightforward transfer from the original climate implementation, and at the same time lets us demonstrate essential concepts and the potential of the framework. Models are however used to answer a range of different questions in epidemiology, and combining multiple projections has the potential to improve the way models are used to inform policy. We argue that the framework we introduce here has great potential, and foresee that many of the questions addressed in epidemiological modeling would require further developments of the Bayesian model, structured to fit with the specific problem. To facilitate this, we have supplied the algorithm (81 File) and hope that it Will aid further development of ensemble methods for epidemiology.

Supporting Information

Author Contributions

Performed the experiments: TL MIT. Analyzed the data: TL. Contributed reagents/materials/ analysis tools: TL MIT. Wrote the paper: TL MIT CW.