Predicting Epidemic Risk from Past Temporal Contact Data
Eugenio Valdano, Chiara Poletto, Armando Giovannini, Diana Palma, Lara Savini, Vittoria Colizza

Abstract

This can be achieved by identifying the elements at higher risk of infection and implementing targeted surveillance and control measures. One important ingredient to consider is the pattern of disease-transmission contacts among the elements, however lack of data or delays in providing updated records may hinder its use, especially for time-varying patterns. Here we explore to what extent it is possible to use past temporal data of a system’s pattern of contacts to predict the risk of infection of its elements during an emerging outbreak, in absence of updated data. We focus on two real-world temporal systems; a livestock displacements trade network among animal holdings, and a network of sexual encounters in high-end prostitution. We define the node’s loyalty as a local measure of its tendency to maintain contacts with the same elements over time, and uncover important nontrivial correlations with the node’s epidemic risk. We show that a risk assessment analysis incorporating this knowledge and based on past structural and temporal pattern properties provides accurate predictions for both systems. Its generalizability is tested by introducing a theoretical model for generating synthetic temporal networks. High accuracy of our predictions is recovered across different settings, while the amount of possible predictions is system-specific. The proposed method can provide crucial information for the setup of targeted intervention strategies.

Author Summary

While the knowledge of the pattern of disease-transmission contacts among hosts would be ideal for this task, the continuously changing nature of such pattern makes its use less practical in real public health emergencies (or otherwise highly resource-demanding when possible). We show that in such situations critical knowledge to assess the real-time risk of infection can be extracted from past temporal contact data. An index expressing the conservation of contacts over time is proposed as an effective tool to prioritize interventions, and its efficiency is tested considering real data on livestock movements and on human sexual encounters.

Introduction

The explicit pattern of potential disease-transmis-sion contacts has been extensively used to this purpose in the framework of theoretical studies of epidemic processes, uncovering the role of the pattern’s properties in the disease propagation and epidemic outcomes [1, 2, 3, 4, 5, 6, 7, 8]. These studies are generally based on the assumption that the entire pattern of contacts can be mapped out or that its main properties are known. Although such knowledge would be a critical requirement to conduct risk assessment analyses in real-time, which need to be based on the updated and accurate description of the contacts relevant to the outbreak under study [9] , it can hardly be obtained in reality. Given the lack of such data, analyses generally refer to the most recent available knowledge of contact data, implicitly assuming a non-evolving pattern.

Traditional centrality measures used to identify vulnerable elements or influential spreaders for epidemics circulating on static networks [1, 2, 4, 23, 24, 25, 26, 27, 28, 29, 30] are unable to provide meaningful information for their control, as these quantities strongly fluctuate in time once computed on the evolving networks [19, 31]. An element of the system may thus act as superspreader in a past configuration of the contact network, having the ability to potentially infect a disproportionally larger amount of secondary contacts than other elements [32] , and then assume a more peripheral role in the current pattern of contact or even become isolated from the rest of the system [19]. If the rules driving the change of these patterns over time are not known, what information can be extracted from past contact data to infer the risk of infection for an epidemic unfolding on the current (unknown) pattern?

They are based on the extension to temporal networks [33, 34] of the so-called acquaintance immunization protocol [4] introduced in the framework of static networks that prescribes to vaccinate a random contact of a randomly chosen element of the system. In the case of contacts relevant for the spread of sexually transmitted infections, Lee et al. showed that the most efficient protocol consists in sampling elements at random and vaccinating their latest contacts [33]. The strategy is based on local information gathered from the observation and analysis of past temporal data, and it outperforms static-network protocols. Similar results are obtained for the study of face-to-face contact networks relevant for the transmission of acute respiratory infections in a confined setting, showing in addition that a finite amount of past network data is in fact needed to devise efficient immunization protocols [34].

For this reason, protocols are tested through numerical simulations and results are averaged over starting seeds and times to compare their performance. Previous work has however shown that epidemic outcomes may strongly depend on the temporal and geographical initial seed of the epidemic [35], under conditions of large dynamical variability of the network and absence of stable structural backbones [19]. Our aim is therefore to focus on a specific epidemiological condition relative to a given emerging outbreak in the population, resembling a realistic situation of public health emergency. We focus on the outbreak initial phase prior to interventions when facing the difficulty that some infected elements in the population are not yet observed. The objective is to assess the risk of infection of nodes to inform targeted surveillance, quarantine and immunization programs, assuming the lack of knowledge of the eXplicit contact pattern on which the outbreak is unfolding. Knowledge is instead gathered from the analysis of the full topological and temporal pattern of past data (similarly to previous works [33, 34]), coupled, in addition, with epidemic spreading simulations performed on such data under the same epidemiological conditions of the outbreak under study. More specifically, we propose an egocentric view of the system and assess whether and to what extent the node’s tendency of repeating already established contacts is correlated with its probability of being reached by the infection. Findings obtained on past available contact data are then used to predict the infection risk in the current unknown epidemic situation. We apply this risk assessment analysis to two large-scale empirical datasets of temporal contact networks—cattle displacements between premises in Italy [19, 36], and sexual contacts in high-end prostitution [16] —and evaluate its performance through epidemic spreading simulations. We also introduce a model to generate synthetic time-varying networks retaining the basic mechanisms observed in the empirical networks considered, in order to eXplain the results obtained by the proposed risk assessment strategy within a general theoretical framework.

Results and Discussion

The cattle trade network is extracted from the complete dataset reporting on time-resolved bovine displacements among animal holdings in Italy [19, 36] for the period 2006—2010, and it represents the time-varying contact pattern among the 215,264 premises composing the system. The sexual contact network represents the connectivity pattern of sexual encounters extracted from a Web-based Brazilian community where sex buyers provide timestamped rating and comments on their experiences with escorts [16].

The probability distributions of several quantities measured on the different yearly networks are considerably stable over time, as e.g. shown by the in-degree distribution reported in Fig. 1A, where the in-degree of a farm measures the number of premises selling cattle to that farm. These features, however, result from highly fluctuating underlying patterns of contacts, never preserving more than 50% of the links from one yearly configuration to another (Fig. 1C), notwithstanding the seasonal annual pattern due to repeating cycles of livestock activities [37, 38] (see 81 Text). Similar findings are also obtained for the sexual contact network (Fig. 1B-D), where the lack of an intrinsic cycle of activity characterizing the system leads to smaller values of the overlap between different configurations (< 10%). In this case we consider semiannual configurations, an arbitrary choice that allows us to extract six network configurations in a timeframe exhibiting an approximately stationary average temporal profile of the system, after discarding an initial transient time period from the data [16]. Different time-aggregating windows are also considered (see the Materials and Methods section and 81 Text for additional details).

Loyalty

This is the outcome of the temporal activity of the elements of the system that reshape up to 50% or 90% of the contacts of the network (in the cattle trade case and in the sexual contact case, respectively), through nodes’ appearance and disappearance, and neighborhood restructuring. By framing the problem in an egocentric perspective, we can explore the behavior of each single node of the system in terms of its tendency to remain active in the system and reestablish connections with the same partners vs. the possibility to change partners or make no contacts. We quantitatively characterize this tendency by introducing the loyalty 9, a quantity that measures the fraction of preserved neighbors of a node for a pair of two consecutive network configurations in time, c—1 and c. If we define V16 as the set of neighbors of node i in configuration c—l, then 65‘” is given by the Iaccard indeX between Vi.” and Vf: Loyalty takes values in the interval [0, 1], with 9 = 0 indicating that no neighbors are retained, and 9 = 1 that exactly the same set of neighbors is preserved (V25—1 : Vi). It is defined for discrete time windows (c, c+1) and in general it depends on the aggregation interval chosen to build network configurations. In case the network is directed, as for example the cattle trade network, 9 can be equivalently computed on the set V of incoming contacts or on the set of neighbors of outgoing connec-in one phenomenon or the opposite. This measure originally finds its inspiration in the study tions, V depending on the system-specific interpretation of the direction and on the interest of livestock trade networks, where a directed connection from holding A to holding B indicates that B purchased a livestock batch from A, which was then displaced along the link direction

If we compute 9 on the incoming contacts of the cattle trade network, we thus quantify the propensity of each farmer to repeat business deals with the same partners when they purchase their cattle. This concept is at the basis of many loyalty or fidelity programs that propose explicit marketing efforts to incentivize the reinforcement of loyal buying behavior between a purchasing client and a selling company [39], and corresponds to a principle of exclusivity in selecting economic and social exchange partners [40, 41]. Analogously, in the case of the sexual contact network we consider the point of view of sex buyers. Formally, our methodology can be carried out with the opposite point of view, by considering out-degrees with loyalties being computed on out-neighbors. Our choice is arbitrary and inspired by the trade mechanism underlying the network evolution.

In SI Text we compare and discuss alternative choices. For the sake of clarity all symbols and variables used in the article are reported in Table 1. Finally, other mechanisms different from fidelity strategies may be at play that result in the observed behavior of a given node. In absence of additional knowledge on the behavior underlying the network evolution, we focus on the loyalty 9 to explore whether it can be used as a possible indicator for infection risk, as illustrated in the following subsection.

2A-B and SI Text), once again indicating the overall global stability of system’s properties in time and confirming the results observed for the degree. A diverse range of behaviors in establishing new connections vs. repeating existing ones is observed, similarly to the stable or exploratory strategies found in human communication [42]. Two pronounced peaks are observed for 9 = 0 and 9 = 1, both dominated by low degree nodes for which few loyalty values are allowed, given the definition of Eq. (1) (see SI Text for the dependence of 9 on nodes’ degree and its analytical understanding). The exact preservation of the neighborhood structure (9 = 1) is more probable in the cattle trade network than in the sexual contact network (P(9 = 1) being one order of magnitude larger), in agreement with the findings of a higher system-wide memory reported in Fig. 1. Moreover, the cattle trade network exhibits the presence of high loyalty values (in the range 9 E [0.7, 0.9] ), differently from the sexual contact network where P Table 1. List of variables and their description. Notation Description 0 index for network configurations e or 91.0410 loyalty of node i between configurations c—1, c a loyalty threshold 3 epidemic seed T duration of the outbreak early stage I: set of infected nodes for outbreak starting from s in config c k degree (in-degree for the cattle trade network) pf epidemic risk for node i in config c 12,7,ng set of infected nodes with high(low) epidemic risk Ph, P, probability of a high(low) risk node to be infected a)?“ predictive power (fraction of infected nodes for which it is possible to compute the epidemic risk) b, of node probability of becoming active or inactive pa node probability of keeping an in-neighbor a number of kept in-neighbors ,Bout number of new out-neighbors (9) is always equal to zero in that range except for one pair of consecutive configurations giving a positive probability for 9 = 0.8. Farmers in the cattle trade network thus display a more loyal behavior in purchasing cattle batches from other farmers With respect to how seX buyers establish their sexual encounters in the analyzed sexual contact dataset. For the sake of simplification, we divide the set of nodes composing each system into the subset of loyal nodes having 9 greater than a given threshold 6, and the subset of disloyal nodes if instead 9 < e. We call hereafter these classes as loyalty statuses L and D, respectively, and we Will later discuss the role of the chosen value for e.

Epidemic simulations and risk of infection

Sexually transmitted infections spread among the population of individuals through sexual contacts [43, 44], whereas livestock infectious diseases (e.g. Foot-and-mouth disease [45], Bluetongue virus [46] , or BVD [47]) can be transmitted from farm to farm mediated by the movements of infected animals (and vectors, where relevant), potentially leading to a rapid propagation of the disease on large geographical scales.

No additional details characterizing the course of infection are considered here (e.g. recovery dynamics), as we focus on a simplified theoretical picture of the main mechanisms of pathogen diffusion and their interplay with the network topology and time-variation, for the prediction of the risk of infection. The aim is to provide a general and conceptually simple framework, leaving to future studies the investigation of more detailed and realistic disease natural histories.

Here, we consider a deterministic process for which the contagion occurs with probability equal to 1, as long as there exist a link connecting the infectious node to a susceptible one. Although a crude assumption, this allows us to simplify the computational aspects while focusing on the risk prediction. The corresponding stochastic cases exploring lower probabilities of transmission per link are reported in 81 Text.

This choice allows us to study invasion stage only, while the epidemic is no more trivially confined to the microscopic level. Additional choices for T have been investigated showing that they do not alter our findings (see 81 Text). Network configurations are kept constant during outbreaks, assuming diseases spread faster than network evolution, at least during their invasion stage. Examples of incidence curves obtained by the simulations are reported in 81 Text.

This is generally considered in the study of highly contagious and rapid infections, and corresponds to regarding a farm as being infected as soon as it receives the infection from neighboring farms following the transport of contagious animals. Under this assumption, both case studies can be analyzed in terms of networks of contacts for disease transmission. In addition, for sake of simplicity, we do not take into account the natural definition of link weights on cattle network, representing the size of the moved batches. In 81 Text we generalize our methodology to the weighted case, including a weighted definition of loyalty, reaching results similar to the unweighted case.

The details on the simulations are reported in the Material and Methods section. We define I: the set of nodes infected during the early stage invasion. In order to eXplore how the network topology evolution alters the spread of the disease, we consider an outbreak unfolding on the previous configuration of the system, c—1, and characterized by the same epidemiological conditions (same epidemic parameters and same initial seed 5). By comparing the set of infected nodes 1:4 obtained in configuration c—1 to 1:, we can assess changes in the two sets and how these depend on the nodes’ loyalty. We define a node’s infection potential 712—1’C(s) org—1715)) measuring the probability that a node will be infected in configuration c by an epidemic starting from seed 5, given that it was infected in configuration c—1 under the same epidemiological conditions and provided that its loyalty status is L (D):

71L and 71D thus quantify the effect of the temporal stability of the network at the local level (loyalty of a node) on the stability of a macroscopic process unfolding on the network (infection). They depend on the seed chosen for the start of the epidemic, on the pair (c—l, c) of network configurations considered along its evolution, and also on the threshold value 6 assumed for the definition of the loyalty status of the nodes.

axis. Results are qualitatively similar in both cases under study, with peaks reached for TIL/71D 2 2.5 in the cattle trade network and TH] 71D 2 3 in the sexual trade network (Fig. 3A-B). An observed infection in c—1, based on the knowledge of the epidemiological conditions and no information on the network evolution, is an indicator of an infection risk for the same epidemic in c more than twice larger for loyal farms with respect to disloyal farms. Analogously, loyal seX buyers have a threefold increase in their infection potential with respect to individuals having a larger turnover of partners. Remarkably, small values of loyalty threshold 6 are able to correctly characterize the loyal behavior of nodes with status L. Results shown in Fig. 3A-B are obtained for e = 0.1. Findings are however robust against changes in the choice of the threshold value, as this is induced by the peculiar bimodal shape of the probability distribution curves for the loyalty (see 81 Text). This means that intermediate values of the local stability of the nodes (i.e. 9 > 6) imply that a possible risk of being infected is strongly stable, regardless of the dynamics of the network evolution. Valid for all possible seeds and epidemiological conditions, this result indicates that the loyalty of a node can be used as an indicator for the node’s risk of infection, which has important implication for the spreading predictability in case an outbreak emerges.

Focusing on the initial stage of the outbreak, we disregard the effect of interventions (e.g. social distancing, quarantine of infectious nodes, movements bans) or of adaptive behavior following awareness [37, 50, 51, 52, 53, 54]. Such assumption relies on the study’s focus on the initial stage of the epidemic that may be characterized by a silent spreading phase with propagation occurring before the alert or outbreak detection takes place; or, following an alert, by a contingent delay in the implementation of intervention measures.

Risk assessment analysis

The observed relationship between loyalty and infection potential can be used to define a strategy for the risk assessment analysis of an epidemic unfolding on an unknown networked system at present time, for which we have however information on its past configurations. This may become very useful in practice even in the case of complete datasets, as for example with emerging outbreaks of livestock infectious diseases. Data on livestock movements are routinely collected following European regulations [55] , however they may not be readily available in a real-time fashion upon an emergency, and a certain delay may thus be eXpected. Following an alert for an emerging livestock disease epidemic, knowledge of past network configurations may instead be promptly used in order to characterize the loyalty of farmers, simulate the spread of the disease on past configurations and thus provide the eXpected risk of infection for the farms under the ongoing outbreak. The general scheme of the strategy for the risk assessment analysis is composed of the following steps, assuming that the past network configurations {c—n, . . ., c—1, c} are known and that the epidemic unfolds on the unknown configuration 6+ 1: 1. identify the seed 5 of the ongoing epidemic; 2. characterize the loyalty of the nodes from past configurations by computing 05—1": from ECl- (1); 3. predict the loyalty of the nodes for the following unknown configuration c+1: BS’CH; 4. simulate the spread of the epidemic on the past configuration c under the same epidemiological conditions of the ongoing outbreak and identify the infected nodes IE; 5. compute the node epidemic risk for nodes in statuses L and D.

It is based on configurations from c—n to c as they are all used to build the probability distributions needed to train our approach. In the cases under study such distributions are quite stable over time so that a small set of configurations ({c—2, c—1, c}) was shown to be enough.

As with all other variables characterizing the system, indeed, also 9 may fluctuate from a pair of configurations (c—l, c) to another, as nodes may alter their loyal behavior over time, increasing or decreasing the memory of the system across time. Without any additional knowledge or prior assumption on the dynamics driving the system, we measure from available past data the probabilities of (dis)loyal nodes staying (dis)loyal across consecutive configurations, or conversely, of changing their loyalty status. This property can be quantified in terms of probabilities of transition across loyalty statuses. We thus define TfL(k) as the probability that a node with degree k being loyal between configurations c—1 and c will stay loyal one step after (c, c+1). It is important to note the eXplicit dependence on the degree k of the node (here defined at time c), which may increase or decrease following neighborhood reshaping (it may also assume the value k = 0 if the node becomes inactive in configuration c). Analogously, TgD(k) is the probability of remaining disloyal. The other two possible transition probabilities are easily obtained as TLD = l—TLL and TDL = 1_TDD-Fig. 3C-D show the transition probabilities of maintaining the same loyalty status calculated on the two empirical networks for e = 0.1. Stability in time and nontrivial dependences on the degree of the node are found for both networks. In the cattle trade network, loyal farmers tend to remain loyal with a rather high probability (TLL > 0.6 for all km values). In addition, this probability markedly increases with the degree, reaching TLL 2 1 for the largest values of km. Interestingly, the probability that a disloyal farmer stays disloyal the following year dramatically decreases with the degree, reaching 0 in the limit of large degree. Among the farmers who purchase cattle batches from a large number of different premises, loyal ones have an increased chance to establish business deals with the same partners the following year, whereas previously disloyal ones will more likely turn to being loyal.

TLL shows a relatively more pronounced dependence on k, ranging from 0.3 (low degree nodes) to 0.6 (high degree nodes). Differently from the farmers behavior, seX buyers display a large tendency to keep a high rate of partners turnover across time. Moreover, the largest probability of preserving sexual partners is obtained when the number of partners is rather large.

3C-D). With this information, it is then possible to compute the epidemic risk of a node i in configuration c+1, having degree k = kf in configuration c and known loyalty status {L, D} between configurations c—1 and c as follows: It is important to note that in our framework the epidemic risk is a node property, and not a global characteristic of a specific disease.

Validation

We consider the set of nodes 1: for which we are able to provide risk predictions and divide it into two subsets, according to their predicted risk of infection pf“. We indicate with 1:}, the top 25% highest ranking nodes, and with 1:, all the remaining others. We then compute the fraction Pk of nodes in the subset 157,1, i.e. predicted at high risk, that belong to the set of infected nodes 1:“ in the simulated epidemic aimed at validation. Analogously, Pl measures the fraction of nodes in 1:71 that are reached by the infection in the simulation on c+1. In other words, Ph (Pl) represents the probability for a node having a high (low) risk of infection to indeed get infected. The accuracy of the risk assessment analysis can thus be measured in terms of the relative risk ratio v = Ph/Pl, where values v s 1 indicate negative or no correlation between our risk predictions and the observed infections, whereas values v > 1 indicate that the prediction is informative. For both networks we find a significant correlation, signaled by the distributions of the relative risk ratio v peaking around values v > 1 (Fig. 4A-B). The peak positions (11 2 1.4 and v 2 1.7 for cattle and seX, respectively) are remarkably close to the benchmark values represented by the distributions computed on the training sets (red lines in Fig. 4A-B). In addition, the comparison with the distributions from a null model obtained by reshuffling the infection statuses of nodes (dotted curves peaking around v = 1 in Fig. 4A-B) further confirms the accuracy of the approach. Findings are robust against changes of the value used to define Zih or against alternative definitions of this quantity (see 81 Text).

Our predictions indeed are limited to the set I: of nodes that are reached in the simulation performed on past data, proxy for the future outbreak. If a node is not infected by the simulation unfolding on configuration c or it is not active at that given time, our strategy is unable to provide a risk assessment for that node in the future. We can then quantify the predictive power (0 as the fraction of infected nodes for which we could provide the epidemic risk, i.e. a)?”1 2| 1:“ D I: | / | 1:“ High values of (0 indicate that few infections are missed by the risk assessment analysis. Fig. 4C-D display the distributions P(w) obtained for the two case studies, showing that a higher predictive power is obtained in the cattle trade network (peak at w 2 60%) with respect to the sexual contact network (peak at w 2 40%). Our methodology can potentially be applied to a wide range of networks, other than the ones presented here, as shown with the example of human face-to-face proximity networks relevant for the spread of respiratory diseases reported in 81 Text. We also tested whether our risk measure represents a significant improvement in prediction accuracy with respect to simpler and more immediate centrality measures (namely, the degree). Through a multivariate logistic regression, in 81 Text we show that our definition of node risk is predictor of infection even after adjusting for node degree.

Memory driven dynamical model

The observed differences in the predictive power of the approach are expected to be induced by the different temporal behavior of the two systems, resulting in a different amount of memory in preserving links (Fig. 1) and different loyalty of nodes and their time-variations (Fig. 2 and 3C-D).

The model is based on a set of parameters that can be tuned to reproduce the empirically observed features of the two networks, i.e. : (i) the topological heterogeneity of each configuration of the network described by a stable probability distribution (Fig. lAB); (ii) a vital dynamics to allow for the appearance and disappearance of nodes; (iii) a tunable amount of memory characterizing the time evolution of the network contacts (Fig. lCD). These specific properties differentiate our approach from the previously introduced models that display instantaneous homogeneous properties for network configurations [56, 57, 58, 59] , reproduce bursty inter-event time distributions but without the explicit introduction of memory [33, 60, 61] or of its control [58].

They are characterized by stable in-degree and out-degree heterogeneous distribution across time (Fig. 5A where high memory and low memory regimes are displayed) and by profiles for the probability distribution of the loyalty as in the empirical networks (Fig. 5B). The number of nodes with zero loyalty can be computed analytically (see Materials and Methods) and it is confirmed by numerical findings (see 81 Text). A high memory regime corresponds to having nodes in the system that display a highly loyal behavior (e.g., 9 > 0.7), whereas values in the range 9 E [0.7, 1) are almost absent in a low memory regime, in agreement with the findings of Fig. 2.

5C). Different degrees of memory are however responsible for the fraction of the system for which a risk assessment can be made. In networks characterized by higher memory, the distribution of the predictive power (0 has a well defined peak, whereas for lower memory it is roughly uniform in the range (0 E [0, 0.4] (Fig. 5D). Such a regime implies that not enough structure is maintained in the system to control more than 40% of the future infections. Our risk assessment analysis allows therefore accurate predictions across varying memory regimes characterizing the temporal networks, but the degree of memory impacts the amount of predictions that can be made. The model also shows that the analysis is not affected by the choice of the aggregating time window used to define the network configurations [61, 62, 63], as long as the heterogeneous topological features at the system level and the heterogeneous memory at the node level are kept across aggregation, as observed for the empirical networks under study (see [19] and 81 Text).

Conclusions

The measure is local and it is empirically motivated from two case studies relevant for disease transmission. By focusing on the degree of loyalty that each node has in establishing connections with the same partners as time evolves, we are able to connect an egocentric view of the system (the node’s strategy in establishing its neighborhood over time) to the system’s larger-scale properties characterizing the early propagation of an emerging epidemic.

A theoretical model generating synthetic time-varying networks allows us to frame the analysis in a more general perspective and disentangle the role of different features. The accuracy of the proposed risk assessment analysis is stable across variations of the temporal correlations of the system, whereas its predictive power depends on the degree of memory kept in the time evolution. The introduced strategy can be used to inform preventive actions in preparation to an epidemic and for targeted control responses during an outbreak emergency, only relying on past network data.

Methods

Datasets

We consider animal movements during a 5 years time period, from 2006 to 2010, involving 215,264 premises and 2,973,710 directed links. Nodes may be active or inactive depending whether farms sell/buy cattle in a given timeframe. The cattle network is available as $1 Dataset. From the dataset we have removed slaughterhouses (~ 1% of the nodes) as they are not relevant for transmission.

Timestamped posts are used as proxies for sexual intercourse and multiple entries are considered separately, following previous works [16, 31]. A total of 13,855 individuals establishing 34,509 distinct sexual contacts are considered in the study, after discarding the initial transient of the community growth [16]. Nodes may be active or inactive depending whether individuals use or not the service, and join or quit the community. Six-months aggregating snapshots are chosen. A different aggregating time window of three months has been tested, obtaining similar results (see 81 Text).

Risk of infection

This family of functions depending on four parameters (see 81 Text for the specific functional form) was chosen as it well reproduces the distribution profiles of the risk potentials, and it was used to compute the nodes’ epidemic risk. A goodness of fit was not performed, as this choice was automatically validated in the validation analysis performed on the whole prediction approach.

Memory driven model

The basic iterative network generation approach allows to build configuration c+1 from configuration c through the following steps: o vital dynamics: nodes that are inactive in configuration c become active in c+1 With probability 19, While active nodes become inactive With probability d; o memory: active nodes maintain same in-neighbors each With probability pa; then they form o out-degree heterogeneity: each node is assigned flout out-stubs, Where flow is drawn from another power-laW distribution: P(fi0m) N flag. Then each of the in-stubs is randomly matched to an out-stub.

All five parameters 19, d, 7/, pa, 6 are assumed constant in time and throughout the network. The amount of memory in the system is tuned by the interplay of the two parameters pa and d. Starting from an arbitrarily chosen initial configuration c = 0, simulations show that the system rapidly evolves towards a dynamical equilibrium, and successive configurations can be obtained after discarding an initial transient of time. The parameters values used in the paper are: N = 104; b = 0.7; d = 0.2; y = 2.25; 6 = 2.75; pa = 0.3, 0.7. The influence of such parameters on the network properties is eXamined in 81 Text. If we denote with a the number of neighbors that a given node keeps across two consecutive configurations (c— 1, c), we can eXpress the loyalty simply as: Where the superscript c for a, fiin indicate the values used to build configuration c. The number of nodes With 9 = 0 as a function of the degree can be computed analytically: P(9cyc+1 = 0) =

Similarly, it is possible to compute the probability f6, c+1 that a link present in configuration c is also present in configuration c+1. In the 81 Text we show that f6, 6+1 2 (1—d)pa and confirm this result by numerical simulations.

Supporting Information

Cattle trade network dataset. We provide the cattle trade network as yearly edge lists, from 2006 to 2010. The dataset consists in five CSV files (one for each year) compressed in a ZIP archive. (ZIP)

Additional analyses. We provide a description of the seasonal pattern of cattle network (Section 1), a more in-depth characterization of loyalty (Section 2), a comparison between loyalty and other similarity measures (Section 3), the specific modeling function for the infection potential (Section 4), the robustness of the risk assessment procedure to variations in parameters and assumptions (Section 5), further analyses of the memory driven model in terms of analytical results (Section 6) and additional properties (Section 7), an extension of our methodology to take into account transmissibility lower than 1 (Section 8), and links weights (Section 9).

Author Contributions

Performed the experiments: EV. Analyzed the data: EV CP VC. Contributed reagents/materials/analysis tools: EV CP AG DP LS VC. Wrote the paper: EV CP AG DP LS VC.

Topics

predictive power

Appears in 8 sentences as: predictive power (8)
In Predicting Epidemic Risk from Past Temporal Contact Data
  1. Ph, P, probability of a high(low) risk node to be infected a)?“ predictive power (fraction of infected nodes for which it is possible to compute the epidemic risk)
    Page 6, “Loyalty”
  2. One other important aspect to characterize is the predictive power of our risk assessment analysis.
    Page 12, “Validation”
  3. We can then quantify the predictive power (0 as the fraction of infected nodes for which we could provide the epidemic risk, i.e.
    Page 12, “Validation”
  4. 4C-D display the distributions P(w) obtained for the two case studies, showing that a higher predictive power is obtained in the cattle trade network (peak at w 2 60%) with respect to the sexual contact network (peak at w 2 40%).
    Page 12, “Validation”
  5. The observed differences in the predictive power of the approach are expected to be induced by the different temporal behavior of the two systems, resulting in a different amount of memory in preserving links (Fig.
    Page 13, “Memory driven dynamical model”
  6. In order to systematically explore the role of these temporal features on the accuracy and predictive power of our approach, we introduce a generic model for the generation of synthetic temporal networks.
    Page 13, “Memory driven dynamical model”
  7. In networks characterized by higher memory, the distribution of the predictive power (0 has a well defined peak, whereas for lower memory it is roughly uniform in the range (0 E [0, 0.4] (Fig.
    Page 13, “Memory driven dynamical model”
  8. The accuracy of the proposed risk assessment analysis is stable across variations of the temporal correlations of the system, whereas its predictive power depends on the degree of memory kept in the time evolution.
    Page 15, “Conclusions”

See all papers in March 2015 that mention predictive power.

See all papers in PLOS Comp. Biol. that mention predictive power.

Back to top.

probability distribution

Appears in 6 sentences as: probability distribution (3) probability distributions (3)
In Predicting Epidemic Risk from Past Temporal Contact Data
  1. The probability distributions of several quantities measured on the different yearly networks are considerably stable over time, as e.g.
    Page 3, “Results and Discussion”
  2. By eXploring all seeds and computing the infection potentials for different couples of years, we obtain sharply peaked probability distributions of 71L and 71D around values that are well separated along the 7'!
    Page 8, “Epidemic simulations and risk of infection”
  3. Findings are however robust against changes in the choice of the threshold value, as this is induced by the peculiar bimodal shape of the probability distribution curves for the loyalty (see 81 Text).
    Page 8, “Epidemic simulations and risk of infection”
  4. It is based on configurations from c—n to c as they are all used to build the probability distributions needed to train our approach.
    Page 10, “Risk assessment analysis”
  5. : (i) the topological heterogeneity of each configuration of the network described by a stable probability distribution (Fig.
    Page 13, “Memory driven dynamical model”
  6. 5A where high memory and low memory regimes are displayed) and by profiles for the probability distribution of the loyalty as in the empirical networks (Fig.
    Page 13, “Memory driven dynamical model”

See all papers in March 2015 that mention probability distribution.

See all papers in PLOS Comp. Biol. that mention probability distribution.

Back to top.

case studies

Appears in 3 sentences as: case studies (3)
In Predicting Epidemic Risk from Past Temporal Contact Data
  1. Under this assumption, both case studies can be analyzed in terms of networks of contacts for disease transmission.
    Page 7, “Epidemic simulations and risk of infection”
  2. 4C-D display the distributions P(w) obtained for the two case studies , showing that a higher predictive power is obtained in the cattle trade network (peak at w 2 60%) with respect to the sexual contact network (peak at w 2 40%).
    Page 12, “Validation”
  3. The measure is local and it is empirically motivated from two case studies relevant for disease transmission.
    Page 14, “Conclusions”

See all papers in March 2015 that mention case studies.

See all papers in PLOS Comp. Biol. that mention case studies.

Back to top.

real-time

Appears in 3 sentences as: real-time (3)
In Predicting Epidemic Risk from Past Temporal Contact Data
  1. We show that in such situations critical knowledge to assess the real-time risk of infection can be extracted from past temporal contact data.
    Page 1, “Author Summary”
  2. Although such knowledge would be a critical requirement to conduct risk assessment analyses in real-time , which need to be based on the updated and accurate description of the contacts relevant to the outbreak under study [9] , it can hardly be obtained in reality.
    Page 2, “Introduction”
  3. Data on livestock movements are routinely collected following European regulations [55] , however they may not be readily available in a real-time fashion upon an emergency, and a certain delay may thus be eXpected.
    Page 10, “Risk assessment analysis”

See all papers in March 2015 that mention real-time.

See all papers in PLOS Comp. Biol. that mention real-time.

Back to top.