Communicable diseases spread through contacts within social or sexual networks,. The dynamic structure of these networks govern the spread of the infection, and can inform public health measures to contain infectious epidemics. A common way to define important network features is through interview and partner tracing, but these techniques are of limited value when the infectious disease has a long incubation period between transmission and disease state and a low transmission rate per contact, like human immunodeficiency virus (HIV-1). Recent advances in molecular epidemiology have greatly enhanced our ability to characterize transmission networks of infectious diseases.
The high evolutionary rate of HIV-1 gives rise to an essentially unique HIV-1 genetic sequence for each infected individual, enabling detailed studies of local and global epidemics,. Because partial HIV-1 pol sequences are generated for routine drug resistance testing, the data necessary to perform such molecular analyses are often readily available and centralized in commercial laboratories. These laboratories interpret HIV sequence data to estimate antiretroviral drug resistance. Without the immediate prospect of a broadly effective vaccine for HIV-1, molecular epidemiology has the potential to identify individuals most likely to transmit infection, who could be targeted for efficient and effective delivery of scarce prevention resources.
In this study, we analyzed HIV-1 pol sequences generated over a period of more than 15 years from recently HIV-1 infected individuals and their sexual and social contacts identified in San Diego, California. Based on these data, we inferred the local molecular transmission network and evaluated if network hubs could be targeted for effective prevention efforts.
HIV-1 screening was offered to adults and adolescents between 1996 and 2011 at multiple HIV-1 testing and counseling sites in San Diego, California. All HIV-1-positive individuals were offered study participation with confidential partner services. All persons identified with recent infection who were antiretroviral treatment (ART)-naïve formed the San Diego Primary Infection Cohort (SDPIC). HIV-1 screening was also provided to recent sexual and social network contacts of newly infected participants. The UCSD Human Research Protections Program approved the study protocol, consent and procedures for consent. All study participants provided voluntary, written informed consent before any study procedures were undertaken.
An estimated date of infection (EDI) was computed for all recently infected participants as previously reported (supplemental Table S1), that characterized acute HIV-1 infection in persons presenting with negative HIV-1/2 serologies and a positive HIV-1 RNA (Procleix HIV-1/HCV Assay: Chiron, Emeryville, California, and Genprobe, San Diego, California). HIV-1 risk behaviors (using computer assisted self-interviews), blood viral load (Amplicor, Roche) and CD4 count (flow cytometry) were obtained at baseline and every 12 weeks throughout follow-up. All participants were assessed for baseline HIV-1 transmitted drug resistance via bulk sequencing of the partial HIV-1 pol coding region (GeneSeq HIV-1; Monogram Biosciences, Inc., South San Francisco, CA or Viroseq v.2.0; Celera Diagnostics, Alameda, CA). The sequenced region included the protease gene and between 305 and 335 5′ amino-acids of the reverse transcriptase gene. Repeated HIV-1 pol sequences were generated in a subset of participants. The GenBank accession numbers for the 648 baseline pol sequences included in this analysis are KJ722809–KJ723456. In an effort to avoid unintended disclosure of study participants, data accompanying HIV sequences are limited to year of sampling, country of origin and a random unique participant ID. Although ART was not provided, treatment was generally encouraged.
Sequence curation, alignment, and network inference were performed using either the HyPhy package or freely available software (https://github.com/veg/HIV-1Clustering, https://github.com/veg/TN93, details and justification provided in Supplemental Methods). After quality control procedures to remove potential contaminant sequences, the partial transmission network was inferred based on the nucleotide genetic distances between bulk HIV-1 pol sequences from each participant (Figure 1),. In accordance with previous analyses,,, we linked two individuals (nodes) in the networks whenever their pol sequences were less than 1.5% distant (TN93 distance measure, see Figure S1 in File). The degree (connectivity) of each individual was defined as the number of links (edges in the transmission network) to other individuals. Clusters were defined as connected components of the network comprising two or more nodes (Figure 1). Epidemiologic contact information was not a requirement for clustering in the molecular network, since the presence of a link does not imply direct transmission (but rather two recently related viruses). Whenever possible, we assigned a direction to the network edge (i.e., an arrow to indicate the likely transmission direction), if the EDI of the secondary partner (i.e., putative “recipient”) node was at least 30 days past the date at which the initial partner (i.e., putative “source”) sequence was isolated (Figure S2 in File S1). We conservatively assumed that chronically infected subjects had an EDI of at least 180 days from enrollment (based on the reliability of available detuned HIV assays to estimate the duration of recent infection–). Since multiple bulk HIV-1 pol sequences generated from participants with longitudinal follow-up were available and could boost our power to detect transmissions originating from chronically infected individuals, we used all available sequences to define links.
Network Properties and Transmission Network Score
In order to determine whether baseline network characteristics could predict an individual's future transmission risk, we formulated a numeric transmission network score (TNS). Using only baseline data, we characterized the risk of HIV-1 transmission within the first year after study entry, for participants entering the study between 2005–2010. At least one year of network follow-up was available for all participants (i.e., network sampling ended in 2011), beginning in 2005 when network sampling was sufficiently dense for these analyses (Figure S3 in File S1). We defined TNS as the function of the total degree (d) of the node at baseline (d = 0 if no connections are inferred), conditioned on the network inferred at the time of each subject's baseline sequence (N). Specifically, TNS(
d|N) = Prob (degree of a node in N<
d), with the probability computed using the best-fitting parametric density for the network N. In other words, TNS of a node with degree d is the proportion of all network nodes with degrees less than d, estimated from the histogram smoothed by the fitted parametric distribution (Figure S3 in File S1). TNS could range between 0 and 1, with higher values representing nodes with unusually high connectivity [see Supplemental Methods]).
Next, we examined associations between the calculated TNS and baseline characteristics, including viral load (VL), CD4 count, risk behaviors (number or sex partners and unprotected anal intercourse [UAI]), stage of infection and demographics. We also investigated the relationship between putative transmissions, as measured by the accumulation of edges with assigned directions away from the participant between baseline and year 1. We also tested whether nodes with higher TNS were associated with a greater risk of participating in putative transmissions (i.e., out-edges) between baseline and year 1 (as calculation of TNS is independent of associated edge direction). The TNS values were neither shared with study participants, nor generated in real time to influence clinical decision-making.
Evaluation if TNS Could Inform Prevention Strategies
To assess how network information and the calculated TNS could possibly inform prevention strategies, we estimated the impact of the timing of ART within our cohort on the transmission dynamics. Specifically, we tested the level of network connectivity of participants between those who started ART early (i.e., within 12 weeks of EDI) versus those who delayed ART >12 weeks from EDI. This analysis was based on the total degree network statistic developed by Wertheim et al. The statistic is the difference between the total degree (defined as the sum of all the node degrees) of the groups. To decide whether this statistic is unusually low or unusually high, a null distribution is generated by permuting node labels, conditioned on the structure of the network. We also modeled the impact of targeted treatment with ART of a subset of individuals on preventing other infections using computer simulations (Figure 2 and Figure S4 in File S1). In this model we liberally assumed that ART would be 100% effective at stopping onward HIV transmission (see Supplementary Methods for details).
For TNS, a value of 0.75 or higher was defined as a “high” score. This represented the top quartile of TNS scores in our sample; all others were classified as “low”. The association between high TNS and patient characteristics, transmission risks, and clinical and epidemiological factors was tested using Wilcoxon-Mann-Whitney test for continuous characteristics, and Fisher's exact test for binary and categorical characteristics. Behavioral characteristics were examined independently. To ensure a linear relationship between each independent variable and the logit of the outcome, we log-transformed the viral load and the number of sex partners in the past year. Wilcoxon rank sum tests and Fisher's exact tests were used to compare participants with a new out-edge network connection (i.e. a putative transmission) to those without a new connection for continuous and categorical variables. A multiple logistic regression model was developed by considering variables that were statistically important (p<0.10) at the univariate level and then removing them using backward elimination (though the same results were obtained using forward and stepwise elimination). Benefits from adding covariates to the model were assessed with the likelihood ratio test. Goodness of fit of the final model was assessed by inspecting residuals and using the Hosmer-Lemeshow test. Confidence intervals on inferred network properties were obtained by drawing 1000 bootstrap replicates of the pol sequences, repeating network inference, and tabulating relevant statistics.
Between 1996 and 2011, the SDPIC screening program enrolled 648 HIV-1 infected individuals in the described network analysis, including 478 (73.8%) with recent HIV-1 infection and 170 of their HIV-1-infected sexual and social contacts. For the recently (i.e., acute and early) infected participants, the median time from the estimated date of infection (EDI) to presentation was 70 days (Table 1). Baseline participant characteristics were consistent with the epidemiology of HIV-1 in San Diego,: most participants were male (96.0%), with a median age of 33 years, and men reporting sex with other men as the primary HIV-1 risk factor (Table 1). A total of 921 HIV-1 population pol sequences were isolated from 648 persons, nearly all being HIV-1 subtype B (98.5%) with sequences from 17.6% of participants harboring some drug resistance mutations (Figure S5 in File S1). A subset of 89 participants had multiple HIV-1 pol sequences generated during study follow-up with a median of two sequences per individual (range: 2–21) and a median duration of follow-up of 49 weeks (range: 1–413 weeks).
Transmission Network Characteristics
The HIV-1 pol sequences generated from each participant were used to infer the transmission network. Overall, the mean genetic distance between pairs of randomly selected baseline sequences was 5.83% (s.d. 1.46%), and pairwise distances below the threshold of 1.5% used to define a link between individuals, were rare overall (0.25%, Figure S1 in File S1). Using this 1.5% threshold, 339 individuals (52.3%, 95% CI: 333-392) were connected to at least one other study participant. Individuals were then divided into connected (i.e., clustered) and disconnected (i.e., singletons) nodes. Connected nodes (Figure 1) were arranged in 90 clusters (95% CI: 68–90), ranging in size from 2 to 62 individuals. It was possible to discern the direction of the putative HIV-1 transmission in 332 of the 540 connections (61.5%) by comparing the sampling date of the secondary partner and the EDI for the putative initial (i.e., transmitting) partner (Figure S2 in File S1). Overall, 18.5% (n = 29) of clustered participants had a new outbound connection within one year of enrollment. A total of 208 connections (38.5%) remained undirected because neither individual had an EDI (n = 29) or neither direction could be ruled out by examining EDI and sampling date information (n = 179). Interestingly, participants enrolled during acute and early infection were not significantly more likely to develop a new outbound edge within the first year of follow up than persons with established (i.e., chronic) infection (75.5% vs. 67.2%, p = 0.52). However, similar to previously described HIV-1 networks derived mostly from populations of men who have sex with men,, our network was best described by a preferential attachment model, indicating that new connections (i.e., putative transmissions) are more likely to form (or “attach”) to nodes that are already more highly connected (Figure S6 in File S1).
High TNS at Baseline Was Associated with Future Connections
Among 339 clustered participants, 157 were identified after 2004 and had TNS determined using only the information available at the time of study enrollment. The top quartile of the TNS distribution was designated as ‘high’ (TNS >0.75, n = 33) and all others as ‘low’ (n = 124). Participants with a high TNS were significantly more likely to have a putative transmission event within the first year, defined as one or more acquired outbound network connections in the first year (44.8% vs. 15.6%, p<0.01). The association between TNS and predicted risk of transmission was robust with regard to the cutoff chosen to determine “high” TNS (p<0.02 for TNS range in 0.70-0.95 [i.e., those in the top 25th percentile]). Even as the network grew over time, the majority of high TNS nodes retained their unusually high connectivity.
Clinical Correlates of Transmission
Participants with a higher baseline VL (median of 5.2 vs. 4.7 HIV RNA log10 copies/ml, p<0.01), and those with more sex partners at baseline (median of 3 vs. 1.5 partners, p = 0.03), were also significantly more likely to have a putative transmission event within the year of enrollment (Table 2). White participants (29.3%) were significantly more likely than Hispanics (11.1%) or participants of other race(s) (5.9%) to experience a putative transmission event (p = 0.02). There were no significant associations between baseline CD4 count (p = 0.53), insertive (p = 0.50) or receptive UAI (p = 0.34), age (p = 0.79), or stage of infection (p = 0.50) and putative transmission events. Baseline VL and high TNS were not significantly correlated in univariate analysis (p = 0.27), but in a multivariable analysis, number of unique sex partners, VL and high TNS (>0.75) at baseline were independently correlated with predicted risk of HIV transmission within the first year after presentation (p = 0.030, p = 0.003 and p = 0.005, respectively). Adding TNS to a logistic regression model of new network connections with VL as an explanatory variable contributed significantly to the model (p<0.001). There were also significant associations between TNS and baseline number of unique sexual partners (≥1) in the past month (p = 0.014) and insertive (p = 0.0455), but not receptive (p = 0.733) UAI.
Evaluation of robustness to network inference error
TNS inferred from an incompletely sampled molecular network was a good predictor of the TNS in the unobserved larger transmission network, implying that despite the limitations of our approach (see Discussion), molecular networks likely retain key qualitative properties of actual transmission networks. Based on 100 simulations of transmission dynamics and sequence evolution of 5,000 HIV-1 pol sequences, followed by a subsampling of 648 sequences, network inference and TNS calculation, we found that in all 100 cases there was significant (p<0.05, Kendall rank correlation test) correlation between the true and inferred TNS. Furthermore, the median positive predictive value for high TNS (top quartile) based on molecular network data was 0.66.
To further understand how network information could be used for prevention efforts, we investigated if the preventative efforts of early initiation of ART could be observed in the sampled network. A total of 177 out of 339 (52.2%) clustered participants initiated ART at a median of 168 days from study entry (i.e., baseline). Retrospective analysis using a network statistic developed by Wertheim et al. showed that ART initiation within 12 weeks of EDI (n = 64) resulted in significantly less putative transmission (i.e. fewer out- and undirected- connections) than when ART was started later (p<0.05). We also evaluated if the network could possibly inform targeting of prevention interventions, and found through simulation that ART given to the 11 individuals with the highest TNS (≥0.90) and assumed to be 100% effective at preventing transmission, showed a greater probability of reducing HIV-1 transmissions compared to ART provided to the same number of randomly selected individuals in the network (Figure 2 and Table 3). This observation remained in 91% of simulated treatment scenarios to those with TNS ≥0.90. We also investigated whether selecting a subset for immediate treatment based on baseline demographics and reported risk behaviors would yield prevention improvements comparable to those achieved by TNS. Such simulations suggested that targeting ART for individuals based on the number of sex partners in the last month, whether or not they reported always having UAI in the last month or whether or not they had another sexually transmitted infection (STI), did not provide a measurable improvement over a randomly selected subset of nodes (probability of improving on a random treatment >0.5 in all cases).
Although HIV-1 network structure and transmission dynamics have characterized temporal trends in identified HIV-1 cases,, these studies have rarely, except in Wertheim et al., been used to predict or simulate future HIV-1 infections. By combining methods from classical and molecular epidemiology, we were able to infer and characterize the local transmission network. Specifically, we inferred the HIV-1 transmission network in San Diego, California using HIV-1 sequence data generated during routine drug resistance testing from a well-characterized cohort of recently infected individuals and their sexual and social contacts. We then evaluated if network characteristics could predict future transmission patterns.
As a proof of principle, local HIV-1 transmission network characteristics were used to derive a score (TNS) that estimated the risk of transmission, during the first year of HIV-1 infection, when transmission risk may be greatest,. This objective score identified a subset of participants (TNS >0.75) who had a significantly greater predicted risk of HIV transmission within their first year of infection than those with lower TNS. As evidence that TNS reflected a biologic correlate of transmission risk, a positive and correlation was observed between baseline VL (and TNS) and likelihood of acquiring an outbound edge within the first year–. When TNS was incorporated into a multivariate model with VL, the prediction of transmission risk significantly improved, suggesting that VL and TNS are informed by independent transmission risk factors (e.g., per contact transmission risk and number of high risk contacts). Taken together, TNS provides a new method to estimating transmission risk within a network, and this method could likely be extended to infer regional transmission networks from the extensive archives of HIV sequence data stored in commercial databases. Since the TNS is derived only from information available at the time of enrollment, the score could be readily utilized in clinical practice (Figure 3), as patient level TNS results could be integrated into routine baseline genotype test reports providing a general transmission risk interpretation to the patient's healthcare provider and the patient.
We then evaluated if network data can be used to help target prevention strategies. First, we retrospectively observed that self-selected early initiation of ART was associated with a cumulative decrease in putative transmission in our network, as compared to delaying ART. We then used data from our local network to inform simulations, that demonstrated that targeting individuals with highest TNS (≥0.90) with highly effective prevention interventions (e.g., fully suppressive ART), we would expect to reduce local network transmissions more efficiently than with the same prevention intervention targeting individuals with the greatest number of recent sexual partners or STI (Table 3). While encouraging, one still must prove that such interventions can disrupt the entire network if they are to appreciably reduce the incidence rate in the at risk community. Based on these robust HIV-1 network transmission observations, we therefore propose a method to use the connectivity of individuals to guide targeted prevention interventions, like early ART.
There are limitations to the interpretation of these results. The inferred transmission network is incomplete and inaccurate, and the presence of a directed link between two individuals does not guarantee an HIV transmission event occurred; it simply reflects recent relatedness of the virus, possibly through a series of unobserved intermediaries. While the inferred network only proposes “putative” transmissions, a limitation of the convenience-based sampling methods used for these analyses is that they are inadequate to discern with certainty true transmission chains from clusters of epidemiologically unlinked persons infected by a common or intermediate source. In addition, individuals who were infected by partners outside the well-sampled area (e.g. infected in a different city) will likely be assigned a low TNS score even if they pose a high risk of onward transmission. Similarly, nodes characterized by high TNS values may also represent lower risk individuals who are genetically linked to unobserved (i.e., unsampled) high risk intermediaries. Our recent work on large-scale network reconstruction, where hundreds of thousand sequences can be included in the analysis of local transmission networks suggests a possible solution to by conditioning the local network in the context of a global network. In addition, by conditioning on the network structure, using robust statistics, and using community level measures these unobserved connections can be mitigated. Further, more sophisticated methods are being developed to help better associate molecular and epidemiological links. Also, these results may not be generalizable to other networks. The efficiency of HIV-1 transmission per contact (influenced by sexual behavior, VL, STI, etc.) may vary by geographic region, thus optimal prevention interventions strategies may depend upon a thorough understanding of local transmission dynamics. Finally, there remains concerns about the potential loss of privacy related to disclosure of putative transmission between two or more individuals, even though there are significant limitations in proving direct HIV transmission links,. Nevertheless, with appropriate privacy protection protocols, it is reasonable to consider using HIV transmission network data to develop prevention intervention strategies (Figure 3).
When adequately sampled, HIV-1 sequence analysis can help characterize local HIV epidemics. This network based study in San Diego, California corroborated previous findings that higher VL was associated with transmission risk and that early ART decreased this risk. This study went further to identify that network connections at baseline also predicted future transmission risk, and prevention efforts targeted to these individuals may be a better use of prevention resources than random implementation or targeting individuals with higher number of sexual partners or recently diagnosed with an STI. While traditional HIV partner services are critical to effective HIV prevention services, when combined with HIV molecular epidemiologic analyses, targeted use of available prevention and treatment resources to maximally limit HIV transmission may significantly reduce network, and ultimately population, HIV incidence. Awareness of HIV-1 transmission network characteristics could also help local public health officials and clinicians to focus HIV-1 screening and prevention education messages for particular groups over time.