Broadly neutralizing antibodies (bnAbs) against the HIV-1 are relatively rarely observed in patients; however, discovering HIV-1 vaccine candidates to elicit such bnAbs remains a challenge due to the extensive genetic sequence variability and complex immune evasion strategies of the HIV-1 (Burton, 2002; Johnson and Desrosiers, 2002; Haynes and Montefiori, 2006; Prabakaran et al., 2007). Among the different factors thwarting the induction of bnAbs, we previously found that all known HIV-1 bnAbs are highly divergent from germline antibodies; germline antibodies of bnAbs could not bind to the epitopes of respective mature antibodies, which led to a hypothesis that HIV-1 may have evolved to use the “holes” (absence of or weak binding to germline-lineaged bnAbs) in the human germline B cell receptor repertoire (Xiao et al., 2009). Consistent with our earlier hypothesis, we did not find any specific binders against the HIV-1 envelope glycoproteins (Envs) but only identified binders against the SARS CoV receptor-binding domain (RBD), and soluble Hendra virus G protein (sG) when combinatorial phage display libraries mimicking human antibody repertoire constructing from human IgM libraries had been used for panning experiments (Chen et al., 2012). These findings had indicated that the major problem could be related to a high level of somatic mutations required for bnAbs to accurately target the conserved structures on the HIV-1 Envs.
In this article, we have used high-throughput 454 sequencing of a large naïve library of human IgM antibodies to explore antibody repertoire landscape for finding germline usages, somatic mutations, intermediates, and phylogenetic relationships between the intermediates and corresponding antiviral-related bnAbs including the HIV-1, SARS CoV, and henipaviruses. This study helped to identify germline predecessors of bnAbs observed in normal individuals, and find maturation pathways of antiviral bnAbs. Indeed, most of the known HIV-1 bnAbs are highly divergent from their closest respective germlines as well as their intermediates as they undergo somatic mutations required for their neutralization function. The results corroborate that the HIV-1 may use a strategy to eliminate strong binding of germline antibodies due to the absence of closer anti-HIV antibody intermediates as an escape mechanism from adaptive immune responses, and finding of closer intermediates of bnAbs from rare individuals might help designing the effective vaccines against the HIV-1 and other viral diseases.
PCR amplification and high-throughput 454 sequencing
To amplify IgM antibody sequences, cDNA was prepared from peripheral blood B cells of 10 healthy donors as received under the Research Donor Program of Frederick National Laboratory for Cancer Research, USA, which we previously used to construct a naïve human Fab phage display library for selecting antibodies against SARS CoV and henipaviruses. The complete set of primers used in the PCR amplification of IgM-derived heavy and light chains were described in detail elsewhere (Zhu and Dimitrov, 2009). For 454 sequencing, primer combinations used to amplify cDNA in separate reactions included the Roche A and B adaptor sequences along with target amplification sequence for heavy and light chain variable domains. The gene fragments were amplified in 20 cycles of PCR using the High Fidelity PCR Master from Roche. More detailed description of 454 sequencing can be found in our recent articles (Prabakaran et al., 2011, 2012). The standard Roche 454 GS Titanium shotgun library protocol was adapted as found in the Roche sequencing technical bulletin.
Databases and tools
For quality control of antibody sequences, we trimmed the 454 sequence data and retained only sequences of length more than 300 nucleotides (nt), covering the entire antibody variable domains consisting of the three complementarity determining regions (CDR) along with framework regions (FR). We used IMGT/HighV-QUEST (Alamyar et al., 2012), a high-throughput version for deep sequencing NGS data analysis resource for the immunogenetic analysis. The output results from the IMGT/HighV-QUEST analysis in CSV files were stored at PostgreSQL database, and Structured Query Language (SQL) was used to retrieve the data for the further analysis. Heatmap generation and statistical calculations involving distributions of antibody HCDR3 lengths and mutations were carried out using SAS JMP10® statistical software (SAS Institute, Cary, NC).
Computational analyses of antibody sequences
Translated heavy and light chain variable sequences from the 454 sequencing that shared the IGHV genes of selected antiviral antibodies and associated immunogenetics data including the details of germlines, HCDR3 lengths, and mutations were retrieved from the database by using SQL. Sequence identities between the 454 sequence data and germlines were calculated based on the pairwise alignment using local BLAST as implemented in BioEdit v7.0.9 (Hall, 1999). Phylogenetic analysis was carried out using the Archaeopteryx software (Han and Zmasek, 2009).
Germline gene usages of antiviral bnAbs
To analyze germline origin of antiviral antibodies against the HIV-1, SARS CoV, and henipaviruses as expressed in the human IgM repertoire, we performed 454 sequencing of a non-immune library which was previously constructed from peripheral blood B cells of 10 healthy donors and used to select antibodies against SARS CoV and henipaviruses (Prabakaran et al., 2006; Zhu et al., 2006). A total of 113,139 sequences were obtained from which 91,528 sequences were found as unique with each had >300 nt in length. The total number of unique amino acid (aa) sequences for each V-gene subgroup in heavy and light chains that were found functionally productive as determined by IMGT/HighV-QUEST (Alamyar et al., 2012) are shown in Figures 1A,B, respectively. The read coverage or gene frequencies observed in the study suggested for biased germline usages and were comparable to the previous studies (Glanville et al., 2009; Prabakaran et al., 2012) but way far less than the theoretical diversity attainable by antibodies attributing to several factors such as library sampling, primer efficiency, and sequencing errors and limitations. Nevertheless, we selected known bnAbs against the viral targets including the HIV-1, SARS CoV, and henipaviruses (Table 1), and created sequence data sets related to those bnAbs from the 454 analysis as depicted in Figures 1C,D showing the germline usage frequencies of IGHV genes in the VH domains, IGKV, and IGLV genes in the Vκ and Vλ domains, respectively. We found that while all antiviral-related germlines were expressed in human IgM repertoire, some preferential germline usages were noted, for example, HV1-69 gene in IGHV subgroups and KV3-20/LV2-14 genes in IGKV/IGLV subgroups were overrepresented (Figures 1C,D).
HCDR3 length distributions, somatic VH mutations and unique VDJ frequencies
The role of heavy chains of antiviral antibodies in antigen recognition is found to be associated with longer HCDR3s and extensive VH mutations (Table 1). Most of the bnAbs have longer HCDR3s with aa lengths ranging from 20 to 30, except for 2G12, VRC01 and m396. All of the VH genes of anti-HIV-1 antibodies have a high degree of somatic mutations when compared to non-HIV-1 antiviral bnAbs. We analyzed HCDR3 length distributions and VH mutations preexisting in germline-lineaged precursor antiviral antibodies from the IGHV genes of IgM repertoires from which bnAbs were generated. The box plots display the distributions of HCDR3 lengths and VH mutations, Figures 2A,B, respectively, which indicates a high level HCDR3 length diversity and lesser extent of somatic mutations compared to bnAbs (Table 1).
To assess the VDJ repertoire usage among different antiviral related IGHV genes, we computed the frequencies of VDJ recombination patterns as observed in the VH genes expressed in human IgM repertoire involving those IGHV genes of antiviral antibodies. The heatmap is shown in the Figure 3 depicting the most (red) and least (blue) abundant VDJ types existing in the germline-lineaged repertoire for the corresponding IGHV genes used in association with different IGHD and IGHJ genes. The IGHV genes V1-69 and V1-2 were frequently found to recombine with IGHJ genes J4 and J6, and IGHD genes D3 and D6.
Identification of intermediate antiviral bnAbs and germline-linage analysis
The intermediate antibodies corresponding to bnAbs against the HIV-1, SARS CoV, and henipaviruses were found by analyzing the human IgM repertoire, and such intermediates with the closest similarities to the matured antiviral bnAbs were selected for germline-linage analysis by using phylogenetic method. IGHV germline gene alleles of bnAbs were obtained from the IMGT database. The mid-point phylogenetic neighbor-joining tree showing the evolutionary relationships of different antiviral antibodies with their corresponding germlines and intermediates is given in Figure 4. We observed that some of the anti-HIV-1 antibodies (2G12, CH01, and VRC01) were found at distal nodes in the phylogenetic tree indicating high divergence from their corresponding germline and intermediate counterparts. In contrast, bnAbs against SARS CoV, and henipaviruses, m396 and m102, were found closer to their intermediates.
Analysis of intermediates of anti-HIV-1 bnAb b12 and mapping of somatic VH mutations to the complex structure
We found 169 unique IGHV sequences from the V1-3 gene family as intermediates of bnAb b12 by using the 454 sequence analysis of a human IgM library. Phylogenetic analysis of those intermediates revealed two major groups, one group consisting of germline related antibodies and the other having potential intermediates closer to the bnAb b12. We then constructed a phylogenetic sub-tree selecting only the potential intermediates and the V1-3*01 germline along with bnAb 12. The tree was rooted at the known germline V1-3*01 of bnAb b12, and phylogram showed evolutionary relationship among the different intermediates (Figure 5A). One of the intermediates, G3JY1, had the maximum of 72% sequence identity (82% sequence similarity) at aa level to the bnAb b12 (Figure 5B). However, the HCDR3 length of that intermediate was found to be 17 aa long, which is 3 aa shorter than that of b12 antibody. To find the closest HCDR3 to that of b12, we scanned 28,925 unique HCDR3 sequences from the entire IgM 454 sequence data. We identified a HCDR3 with the same length (20 aa) and 50% sequence identity to that of b12 (Figure 5C), which was found to be the most similar to the HCDR3 of b12 but the IGHV gene associated with that HCDR3 was found to be V4-b. We used the HIV-1 gp120-b12 complex structure and mapped the VH somatic mutations, which showed the overlapping of three mutated residues of b12 (N36 from HCDR1, Y59 from HCDR2, and W111.1 from HCDR3) that contribute to the most of binding interactions with the gp120 as previously observed (Zhou et al., 2007) (Figure 5D).
Conflict of interest statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.