The ultimate goal of bovine genomics is the identification of genetic variation that modulates corresponding variation in economically important production traits, differential susceptibility to disease, and favorable host response to vaccines, which is expected to enable the improvement of these phenotypes via informed genomic selection (for review see). The bovine genome sequence and first-generation HapMap projects, have directly enabled genome-assisted selective breeding, nascent investigations of non-traditional traits such as marker-assisted vaccination (as diagnostics for enhanced vaccine design or animal response), the development of a new class of anti-infectives known as innate immunologicals, and the elucidation of loci that have evolved under strong selection, thus providing important computational evidence for genomic regions which may underlie economically important traits.
Relevant to the suppression of infectious diseases, the mammalian innate immune system provides host defense against a variety of pathogens without requiring prior exposure,. Consequently, genes that modulate innate immunity have often been considered as candidate loci for improving host resistance to disease in agricultural species-. Among mammals, the Toll-like receptor genes (TLRs) facilitate host recognition of pathogen-associated molecular patterns (PAMPs), thereafter eliciting host innate immune responses, aimed at suppressing invading bacteria, viruses, protozoa, and fungi. Essential to their role in host defense, the mammalian TLRs encode type I transmembrane proteins of the Interleukin-1 receptor (IL-1R) family with N-terminal leucine-rich repeats (LRR) involved in ligand recognition, a transmembrane domain, and a C-terminal intracellular Toll/IL-1 receptor homologous (TIR/IL-1R) domain for signal transduction,,. The mammalian TLR genes are primarily expressed by antigen-presenting cells (i.e., macrophages or dendritic cells), and most of the TLR ligand specificities have been experimentally elucidated, with six gene family members (TLR1, TLR2, TLR4, TLR5, TLR6, TLR9) known to recognize microbial (bacteria, fungi, protozoa) and/or synthetic ligands, and five (TLR3, TLR4, TLR7-TLR9) known to recognize viral components,. Presently, TLR10 remains the only functional human TLR gene family member for which natural and/or synthetic ligands have not been fully elucidated. However, given evidence for functional mammalian TLR protein heterodimers (TLR10/TLR1; TLR2/TLR10), the host protein encoded by TLR10 may collaboratively enable recognition of a diverse array of microbial PAMPs, including those recognized by TLR2-.
Several studies have demonstrated that some naturally occurring TLR variants enhance the risk of severe infections in humans, mice, and domestic cattle, including the potential for increased susceptibility to Johne's disease, a debilitating and economically important disease of ruminants caused by infection with Mycobacterium avium spp paratuberculosis (MAP) (for review see-). Furthermore, several important bovine health-related QTL have also been localized to genomic regions either proximal to or directly overlapping one or more TLR loci (for review see,-). Therefore, we utilized massively parallel pyrosequencing of a pooled TLR amplicon library (TLRs 1-10) to comprehensively evaluate nucleotide variation and haplotype structure for 31 cattle breeds representing Bos taurus taurus, Bos taurus indicus, and their subspecific hybrids (composites). Overall, 276 single nucleotide polymorphisms (SNPs) and 4 insertion-deletion (indel) mutations were detected and validated. Bovine TLR SNPs and indels leveraged from the pyrosequencing study were used in a case-control analysis to identify risk factors underlying differential susceptibility to MAP in U.S. dairy cattle. In addition, we also comprehensively report on bovine TLR haplotype structure, the extent of haplotype sharing among specialized breeds and subspecific lineages, and provide median joining networks as putative representations of bovine TLR haplotype evolution. Finally, we provide computational evidence for several bovine TLR genes evolving under disparate modes of non-neutral evolution, thereby underscoring their potential importance to bovine innate immunity and health-related traits. The results of this study will enable bovine translational genomics, QTL refinement, and ultimately, genome-assisted methods for animal selection to develop cattle populations with enhanced disease resistance and favorable vaccine response.
Bovine TLR pyrosequencing, SNP detection, variant validation, and haplotype inference
For 96 elite bovine sires representing 31 domestic cattle breeds (B. t. taurus; B. t. indicus; and composites), we generated and purified 81 amplicons targeting all 10 bovine TLR genes (n = 7,776 total amplicon targets; see methods). The majority of the amplicons were pooled (n = 6,816) to form a normalized fragment library (Table S1) which was subjected to a workflow involving Roche 454 Titanium pyrosequencing with downstream variant detection using the Neighborhood Quality Standard algorithm as recently described, and the remaining purified amplicons (n = 960) were analyzed by standard dye-terminator cycle sequencing (Sanger) with alignment-based variant detection-. Sanger sequencing was necessary for amplicons that were intolerant to the addition of 5′ oligonucleotide barcodes for PCR amplification. In total, 474 variable sites were predicted from intragenic analyses of all sequence data, which included 212 previously validated SNPs, 4 known insertion-deletion mutations (indels), and 258 new putative SNPs. Evaluation of the genic distributions of all newly predicted TLR variable sites detected within the pyrosequencing data revealed that≥62% of the 258 new putative SNPs were located either within or immediately flanking homopolymer repeats. Nevertheless, to allow for inclusion of all possible SNPs in downstream analyses, we investigated all 474 variable sites via fluorescent allele-specific genotyping assays. Collectively, we validated 280 biallelic TLR variants (276 SNPs + 4 indels; Table S2) using custom genotyping assays applied to the sequencing discovery panel (n = 96 elite sires; 31 breeds), a panel of Holstein dairy cattle (n = 405; 3 herds), and a panel of purebred Angus beef cattle from a single herd (n = 48).
Of the 276 validated SNPs, 71 were predicted to encode nonsynonymous substitutions (nsSNPs), and one was predicted to encode a nonsense mutation in bovine TLR5 (AA substitution R125*; SNP C2332T). For the validated SNPs detected via pyrosequencing (n = 244), we investigated the relationship between minor allele frequencies (MAFs) estimated from the analysis of pyrosequencing data, as compared to corresponding allele frequencies derived from individual fluorescent allele-specific genotyping assays, and found significant correlations across all 10 TLR genes (discovery panel; Table 1). Moreover, an analysis performed across all genes (n = 244 SNPs) revealed that there was little or no bias in the estimates of allele frequencies produced via targeted pyrosequencing (P = 0.999846; Ho: slope = 1; Figure 1).
Collectively, 266 SNPs and 4 indels were successfully incorporated into 243 unique haplotypes via Bayesian reconstructions, (Table 2), which included one discrete haplotype carrying the putative TLR5 nonsense SNP. Ten SNPs (TLR2: 9431, 10047, 12121; TLR3: 3624, 3804, 5201, 6382; TLR4: 8166; TLR5: 1562, 1685; see Table S2) could not be incorporated into discrete haplotypes with best-pair phase probabilities≥0.90. Summary data representing the total number of predicted haplotypes, number of cattle with phase probabilities≥0.90, total number of variable sites with MAF≤0.10, genic distributions of validated variable sites, size of the investigated regions, and average estimates of linkage disequilibrium (LD; r2) between adjacent variable sites are depicted in Table 2. Across all investigated loci (n = 549 cattle; 31 breeds), the MAF spectrum derived from allele-specific genotyping assays ranged from 0.001 to 0.498, with 64% of the validated SNPs possessing MAFs≤0.10 (Table 2).
Characterization of LD architecture, recombination, and intragenic tagSNPs/Indels
Evaluation of the intragenic patterns of LD across all 31 breeds of cattle via 95% confidence intervals constructed for D',, application of the four gamete rule, and estimates of recombination between adjacent variable sites, revealed one or more blocks of strong LD within each of the 10 bovine TLR genes. Statistical evidence for historical recombination was detected within TLR2, TLR3, and TLR6, resulting in at least two detectable LD blocks within each gene. All other genes exhibited a single block of strong LD spanning either all, or the majority of all validated intragenic SNPs and indels, as supported by the majority rule of all three analyses-. A comparison of average intragenic r2 values calculated between adjacent variable sites across all 10 genes revealed a dynamic range of LD (0.09-0.70; all cattle, 31 breeds; Table 2). Discrete regions of high and low LD, the latter due to historical recombination, were also detected using the general model for varying recombination rate,,. Cumulatively, four adjacent SNP sites [TLR2 (1), TLR3 (2), and TLR6 (1)] produced estimates of median recombination rates that exceeded the background rate (),, by a factor of at least 2.5. The highest median estimate of recombination rate was observed in TLR3 (between SNP positions rs42851925, rs55617222; rs55617241, rs55617451, Table S2), and exceeded the background rate by a factor of at least 5.2. Analyses to identify tagSNPs/Indels which predictively captured 100% of the variation at 280 validated variable sites within all 10 genes for all cattle yielded 160 tagSNPs and 2 tagIndels (Table S3). Similar analyses restricted to the B. t. taurus breeds demonstrated that only 118 tagSNPs and 1 tagIndel were predicted to capture 100% of the variation at 235 variable sites (Table S3). Interestingly, the cumulative tagging efficiency (total tags predicted/total number of validated variable sites) was similar for both analyses (all cattle vs B. t. taurus), with this result largely due to the preponderance of taurine cattle in the total sample (94.4%), and the significant sharing of SNPs, indels, and haplotypes among the subspecific lineages.
High resolution bovine TLR haplotype networks and breed distributions
Median joining haplotype networks (Figures 2,3,4, Figure S1; Table S4) constructed for all 10 genes revealed that: 1) The specialized B. t. taurus beef and dairy breeds cannot be genetically discriminated despite an average polymorphism density (266 SNPs + 4 indels; see Table 2) of one variable marker per 158 bp; 2) Haplotype sharing occurs among B. t. taurus and B. t. indicus breeds at all 10 loci; 3) Shared haplotypes were often the highest frequency haplotype(s) within a network; 4) Despite haplotype sharing between the subspecific lineages, the 250 Kyr divergence between B. t. taurus and B. t. indicus was evident in most, but not all, haplotype networks (i.e., TLR1-7, TLR10). With very few exceptions (i.e., TLR3 Network 1, TLR4, TLR10), the high frequency network nodes demonstrating subspecific haplotype sharing often included at least two indicine sires. Using summary data derived from the median joining networks (Table S4), we estimated the relationship between the total number of discrete TLR haplotypes predicted (TLR1-10) in seven major U.S. taurine beef breeds (Angus, Charolais, Gelbvieh, Hereford, Limousin, Red Angus, Simmental), and four U.S. taurine dairy breeds (Braunvieh, Brown Swiss, Holstein, Shorthorn), and found a significant correlation (r = 0.71, P≤0.0224). This correlation was driven by the large number of haplotypes predicted to be shared among the beef and dairy breeds. For the investigated beef breeds, we predicted 84 discrete haplotypes across all 10 TLR loci, and at least 60 (71.4%) were predicted to be shared with the four dairy breeds. However, we also detected disparities between the numbers of haplotypes predicted for TLR4 and TLR5, with the dairy breeds possessing 3.8X and 2.3X more discrete haplotypes for these loci, respectively, than did our beef cattle. Exclusion of these two outlying loci resulted in a nearly perfect correlation (r = 0.98, P<0.0001) between the numbers of discrete haplotypes predicted in beef and dairy breeds across the remaining TLR loci. Interestingly, the single haplotype possessing the TLR5 putative nonsense mutation was almost exclusively predicted in Holstein cattle (Figure S1, TLR5 Node Q; n = 53 Holstein, n = 1 Braford).
Functional modeling of bovine amino acid (AA) substitutions and tests of selection
Using both PolyPhen and SIFT to evaluate the putative functional effects of AA substitutions encoded by TLR SNPs, we determined that 54/72 (75%) of AA substitutions were predicted to be benign and tolerated, whereas 23/72 (32%) were predicted to impact protein function by at least one of the analytical methods employed (Table 3). For those mutations predicted to impact protein function, 18/23 (78%) were detected at frequencies<0.05, and 5/23 (22%) located in TLR2 (1), TLR3 (2), TLR5 (1; putative nonsense SNP), and TLR8 (1) were observed at frequencies≥0.05, with moderate frequency substitutions detected in TLR8 (0.562) and TLR3 (0.432; see Table 3). The MAF for the TLR5 putative nonsense SNP, as estimated from 405 Holsteins in three herds was 0.068 (Table 3). Across all polymorphisms involving AA substitutions, PolyPhen and SIFT produced analogous predictions for 61/72 (85%) observed replacements.
To collectively estimate the extent of functional and/or selective constraint(s) related to bovine TLR protein function, we used a goodness of fit test to examine disparities between the observed distributions of AA phenotypes (PolyPhen + SIFT results; benign/tolerated vs damaging/affect). Assuming equal probabilities for the occurrence of both classes of AA phenotypes across all bovine TLRs, we found there to be significantly fewer substitutions predicted to impact protein function than those classified as benign or tolerated (P = 0.00022). This is consistent with some degree of functional and/or selective constraints that generally operate to maintain the functional products of most protein coding genes-. However, this result describes a general trend across the bovine TLR gene family, and does not provide locus-specific insights regarding the evolutionary origin and magnitude of these constraints.
To elucidate gene-specific departures from a strictly neutral model of molecular evolution, we used Tajima's frequency distribution test (D statistic), as applied to the discovery panel samples (all cattle from 31 breeds vs B. t. taurus), and evaluated the significance of the observed values (D) via coalescent simulation (Table 4). Departures from neutrality were detected for TLR3, TLR8, and TLR10. However, the direction of the deviation was not uniform across all three loci (Table 4), suggesting that disparate modes of evolution (i.e., selection) may have influenced genetic diversity within these genes, and that there may be differences among cattle lineages (Table 4, TLR10). For both TLR3 and TLR8, a significantly positive Tajima's D reflected an excess of moderate frequency alleles, whereas a large negative value for TLR10 (B. t. taurus) reflected an overabundance of rare, low frequency variants consistent with purifying selection. Therefore, it is important to note that although a significant nonrandom trend toward benign or tolerated AA substitutions was detected across all investigated loci, the underlying reason for this functional and/or selective constraint appears to be fundamentally different between some gene family members (i.e., TLR3, TLR8 vs TLR10). Notably, we observed at least one moderate frequency AA substitution that was predicted to impact protein function in both TLR3 and TLR8 (Table 3), whereas all AA substitutions predicted to impact protein function in TLR10 were detected at very low frequencies (Table 3). To further investigate the overall magnitude and origin(s) of the most significant deviations from a strictly neutral model (Tajima's D; pyrosequencing discovery panel; Table 4), we used Fu's FS statistic to estimate the probability of observing a number of haplotypes less than or equal to that predicted in our samples for TLR3 (B. t. taurus); TLR3-1 (B. t. taurus), and TLR8 (all cattle; B. t. taurus). For TLR3, we recognized that the inability to phase all individuals in the pyrosequencing discovery panel could lead to the absence of some low frequency alleles, thus potentially driving both Tajima's D and Fu's FS toward larger positive values. Consequently, we calculated Fu's FS and Tajima's D for TLR3 (B. t. taurus) and TLR3-1 (B. t. taurus) using the following approach: 1) Both test statistics were first calculated only for sires that could be phased with best-pairs probabilities≥0.90, as depicted in Table 4; and 2) If a significant result was achieved in this analysis, we then added the taurine haplotypes with phase probabilities<0.90 into our analyses (D; FS) by choosing the best haplotype pairs reconstructed for each sire. For Fu's FS, only TLR8 displayed unequivocal evidence for a departure from neutrality (All cattle FS = 10.2712, P<0.01; B. t. taurus FS = 10.296, P<0.01), with levels of significance that withstood conservative correction for multiple testing (correction = α/n locus-specific tests, 0.05/2 = Minimal P≤0.025). For Tajima's D, inclusion of the best TLR3 haplotype pairs for sires with phase probabilities<0.90 resulted in very similar test statistics (TLR3 B. t. taurus D = 3.6034, P<0.001; TLR3-1 B. t. taurus D = 3.4895, P<0.002; Table 4), with levels of significance that endured correction for multiple testing (0.05/8 = Minimal P≤0.00625).
A regression-based approach considering all validated variable sites and the effective number of SNPs at each site also demonstrated that TLR3 and TLR8 possess significantly more gene diversity than do the eight other TLR loci (P≤0.05; Figure 5) in taurine and all cattle combined. In contrast, both regression analyses (all cattle; B. t. taurus only) indicated that TLR10 and TLR2 possess significantly less gene diversity than other members of the bovine TLR gene family (Figure 5). With the exception of TLR2, these results are precisely congruent with the results of Tajima's test (D; Table 4).
Single Marker and Haplotype Association Tests with MAP Infection
Unphased diploid genotypes for a subset of the validated SNPs and indels (n = 35; nonsynonymous, putative nonsense, 5' upstream regions, and introns) within bovine TLR genes either known or postulated to primarily recognize bacterial ligands (TLR1, TLR2, TLR4, TLR5, TLR6, TLR9, TLR10) were tested for associations with bacterial culture status for MAP (fecal and/or tissue) in three Holstein dairy herds (n = 68 cases, 270 controls). All nonsynonymous TLR SNPs previously associated with MAP infection (TLR1, TLR2, TLR4) were monomorphic in our samples (n = 549; 31 breeds). Conditional logistic regression models were constructed for each of 35 variable sites meeting our selection criteria (see methods) to estimate the relative odds of MAP infection given the defined diagnostic criteria adjusted for the effects of herd and age. Collectively, six SNPs produced suggestive associations, as evidenced by uncorrected P-values (Table 5). Interestingly, three SNPs in TLR2 and one in TLR6 were associated with increased odds of MAP infection in animals with 1 or more copies of the minor allele (Table 5). Two SNP loci, 1 in TLR4 and 1 in TLR10, were associated with decreased odds of infection given increasing copies of the minor allele (Table 5). Following locus-specific correction of the P-values using the FDR method (http://sdmproject.com/utilities/? show = FDR), two SNPs (TLR6-rs43702941; TLR10-rs55617325) remained significant (P≤0.05), and three SNPs (TLR2-rs68268245, ss470256479, rs43706433) displayed P-values (P≤0.053) that were suggestive of a potential recessive genetic association with MAP infection (Table 5). Two of these SNPs (TLR2-ss470256479, rs43706433) were recently hypothesized to occur on a haplotype associated with an increased risk for Johne's disease. Consequently, we used PHASE 2.1 to test the hypothesis that haplotype frequencies for bacterial-sensing TLRs differ between cases and controls. However, none of the investigated loci possessed significantly different haplotype distributions between cases and controls (P>0.05; 1,000 permutations).
Our detailed analysis of the haplotype structure, LD architecture, and tagSNP/Indel prediction for all 10 bovine TLR genes will enable studies aimed at assessing the statistical and functional relationships between validated variation, and differential susceptibility to infectious disease–, (Table 5). Moreover, because extensive haplotype sharing was confidently predicted for specialized beef and dairy cattle breeds, the deliverables of this study will broadly impact many facets of bovine health research, including the potential for marker-assisted vaccination; using genotypes as indicator variables for enhanced vaccine design or as predictors of animal response.
In view of the emerging global interest in genomic selection in beef and dairy cattle, we provide evidence for balancing selection on at least two of the TLR genes (TLR3 and TLR8), with detection of a weaker selective signal consistent with purifying selection in TLR10
 (Table 4). Interestingly, TLR3 and TLR8 encode molecular sentries that recognize invading double-stranded (ds) and single-stranded (ss) RNA viruses, respectively, thereafter eliciting host innate immune responses (11, 12). Importantly, selection on TLR3 and TLR8 may have direct implications on aspects of differential susceptibility to major viral production diseases such as bluetongue (dsRNA; Reoviridae), foot and mouth disease (ssRNA; Picornaviridae), bovine viral diarrhea (ssRNA; Flaviviridae), calf coronavirus (ssRNA; neonatal diarrhea; Coronaviridae), and bovine parainfluenza 3 (ssRNA; Paramyxoviridae) (see,). Moreover, evolution under repeated exposure to many of these diseases may provide some explanation for the observed patterns of variation detected within TLR3 and TLR8. However, it is also possible that more ancient host-pathogen interactions (i.e., eradicated Rinderpest, ssRNA, Paramyxoviridae; etc) may have contributed to the signatures of selection detected in this study. It should also be noted that because frequency distribution tests generally lack power to detect selection, departures from neutrality noted in this study are likely to underscore the strength of the selective signals observed (for review see). For these reasons, future studies involving all species of the subfamily Bovinae are needed to help elucidate whether selective signals in TLR3 and TLR8 extend beyond modern domestic cattle lineages. Moreover, variation within these genes should be comprehensively evaluated with respect to differences in ligand-induced signaling, disease susceptibility, and the potential for marker-assisted vaccination in domestic cattle.
In addition to selective signals observed for TLR3 and TLR8, several tentative associations were detected between bovine TLR SNPs (Table 5) and differential susceptibility to MAP infection which have not previously been reported, with one implicated locus (TLR10) also exhibiting evidence of purifying selection (Table 4). However, because the natural ligand(s) for TLR10 have yet to be comprehensively elucidated, the precise origin of this selective signal remains unclear. Previous studies, indicate that human TLR10 forms functional heterodimers with both TLR2 and TLR1, thereby enabling the resulting protein complexes to recognize a wide variety of microbial ligands, including those derived from Mycobacteria,,,. Similarly, TLR2 is also known to form functional heterodimers with TLR6. Recently, AA substitutions in human TLR1 and TLR10 were demonstrated to negatively impact receptor function-, with TLR10 ligand recognition similar to the known range of ligands established for TLR1. The results of our single marker association tests indirectly support the biological concept of functional unity with respect to bovine TLR2, TLR6, and TLR10, with variation at all three loci categorically linked to a common microbial phenotype (bacterial culture status for MAP) in Holstein cattle.
DNA Samples for SNP Discovery
Bovine DNA samples (n = 96) representing B. t. taurus, B. t. indicus, and their hybrids were isolated from spermatozoa as previously described,,. Bovine subspecies designation, breed names, and sample sizes (in parentheses) were: B. t. taurus - Angus (5), Belgian Blue (2), Blonde d'Aquitaine (1), Braunvieh (4), Brown Swiss (2), Charolais (6), Chianina-Chiangus (4), Corriente (1), Gelbvieh (4), Hereford (3), Holstein (6), Limousin (4), Maine-Anjou (3), Red Angus (4), Red Poll (1), Salers (2), Senepol (2), Shorthorn (4), Simmental (5), Texas Longhorn (2); B. t. indicus - Brahman (8), Nelore (2); Hybrids, termed Composites - Beefmaster (4), Braford (2), Brahmousin (2), Brangus (3), Piedmontese (1), Red Brangus (2), Romagnola (2), Santa Gertrudis (2), Simbrah (3). Bovine subspecies were assigned based on phenotype and breed origin (http://www.ansi.okstate.edu/breeds/cattle/).
Bovine TLR Sequencing and SNP Detection
Procedures involving primer design, PCR amplification with gene-specific primers, and standard dye-terminator cycle sequencing (Sanger) of all 10 bovine TLRs have previously been described-,. For this study, we synthesized gene-specific amplification primers with a unique 10 bp 5′ barcode (Roche MIDs) for each of the 10 bovine TLR genes (Table S5). Thereafter, we standardized all 96 discovery panel DNAs to 50 ng/µl and created three DNA pools, with each pool consisting of 32 elite sire DNAs mixed at equal concentrations. Notably, larger-scale DNA pooling in a human amplicon study supports the accuracy and reliability of this approach when coupled with Roche 454 pyrosequencing. Three bovine DNA pools were used to amplify all TLR targets via barcoded primers (Table S5), with PCR conditions and thermal parameters as previously described-,. Targets that were intolerant to the addition of 5′ oligonucleotide barcodes for PCR amplification were amplified using standard primers in conjunction with downstream dye-terminator cycle sequencing methods previously described-,, with one exception: A second set of DNA pools (n = 12) was created, with each pool containing equal concentrations of DNA from eight elite sires derived from the sequencing discovery panel. Importantly, both sets of DNA pools (Sanger and Roche 454) were seeded with one or more reference DNAs that had previously been sequenced and/or SNP genotyped across all 10 bovine TLR genes-,, which collectively included≥12 reference DNAs possessing 216 validated diallelic variants (212 SNPs + 4 indels). All amplicons were purified using the Qiaquick PCR purification kit (Qiagen, Valencia, CA) as previously described,, and the concentrations were estimated by Nanodrop. For preparation of a Roche 454 Titanium fragment library, we standardized all barcoded amplicons to 10 ng/µl and devised a normalization procedure that accounted for differences in amplicon size (Table S1). Because the TLR amplicons differed in size, an adjustment was necessary to ensure balanced 454 pyrosequencing results. Specifically, using amplicon size, we computed the mean (bp) and standard deviation (SD; bp) across all PCR targets. Thereafter, any amplicon deviating from the mean by≥0.5 SDs in either direction was subject to proportional adjustment within the fragment library (Table S1). The direction of adjustment (plus or minus) was determined by the direction of the deviation (i.e., smaller = proportionally less template; larger = proportionally more template; Table S1). Because the emulsion PCR process involved in the preparation of Roche 454 Titanium fragment libraries favors smaller fragments, amplicons smaller than the mean by≥0.5 SDs must be proportionally reduced in the final library, whereas the opposite is true for larger amplicons. Following normalization, the bovine TLR sequencing library was constructed via random ligation of sequencing adaptors provided with the GS FLX Titanium library kit (Roche Applied Science, Indianapolis, IN). All library preparation, emulsion PCR, quantitation, and sequencing steps followed the manufacturer's protocol (Roche Applied Science).
SNP detection analyses for the resulting pyrosequencing data employed the Neighborhood Quality Standard algorithm, implemented within CLC Genomics Workbench (v3.7.1), as previously described. Putative SNPs were filtered using a method devised from a priori knowledge of biallelic controls (212 SNPs + 4 indels) that were purposely seeded into the amplicon library. Briefly, we considered the possibility that some SNPs may only be found as one allele in a single elite sire (1/192 total alleles; see reference 30 for examples). Therefore, we filtered all putative SNPs predicted from our analysis of the pyrosequencing data using the following formula: 1/192×(Total SNP Coverage) = Theoretical minimum number of reads, which represents the smallest number of reads required to shuttle putative SNPs into a validation workflow involving custom, allele-specific genotyping assays. This method proved valuable for the discovery and validation of many low frequency SNPs, including those that occurred as one allele for a single discovery panel sire (i.e., TLR5 putative nonsense SNP = 1/192 alleles in the discovery panel). For SNP discovery using standard dye-terminator sequencing reads, we used an alignment-based method of variant detection within the program Sequencher 4.6,. Briefly, high quality electropherograms were manually inspected for any evidence of a double peak. Individual nucleotide sites displaying any evidence of heterozygosity within≥1 sequencing read were shuttled to our SNP validation workflow.
SNP Validation and Genotyping
All 96 DNAs from the pyrosequencing discovery panel were also used for allele-specific genotyping. Additionally for bovine TLRs recognizing bacterial ligands, we also utilized the following industry-relevant DNA panels: Beef (48 Purebred Angus, 1 Herd); Dairy (405 Holstein dairy cows, 3 Herds). SNPs and indels were genotyped using the KASPar allele-specific fluorescent genotyping system (Kbiosciences, Hertfordshire UK), as previously described,. Thermal cycling parameters and reaction concentrations followed manufacturer's recommendations, with some modifications to MgCl2 concentrations. Primer sequences and MgCl2 concentrations are available on request. Genotype clustering and calling was performed using KlusterCaller software (Kbiosciences). Genotype quality was assessed by manually inspecting the clustering data for every individual marker, and by comparing KASPar-derived genotypes to those derived from previously reported sequence data,,. Poor clustering or inconsistent genotypes precipitated the following workflow: 1) Further optimization and/or redesigning the SNP assay followed by; 2) Genotyping the inconsistent samples again. Notably, to minimize the frequency of missing genotypes from a very low proportion of failed assays, most SNPs were genotyped multiple times for every DNA sample. Genotype data are available in Table S6.
Haplotype Inference, LD Estimates and Variant Tagging
Unphased diploid genotypes were compiled and cross-checked for parsing errors using two custom software packages. Haplotype reconstruction and missing data imputation (<0.58%) was performed with PHASE 2.1,, using all validated intragenic polymorphisms, all cattle for a given locus, and the –X10 option. Haplotype estimation using PHASE 2.1 is not sensitive to departures from HWE 31,64,65. Predicted haplotype phases with best pair probabilities≥0.90 were retained for further analysis. Bovine X-linked haplotypes (TLR7, TLR8) were directly ascertained by genotype homozygosity in our sire panel used for SNP discovery. Estimates of recombination across each gene were also assessed in PHASE 2.1 using the general model for varying recombination rate,,. Deviation from the average background recombination rate (), by a factor≥2.5 between adjacent sites was considered evidence for historical recombination.
Intragenic LD was visualized within Haploview using unphased diploid autosomal genotypes and phase-known X-linked data (TLR7, TLR8) for B. t. taurus samples, and all cattle combined. LD patterns and blocks were estimated via majority rule from: 95% confidence intervals constructed for D',; application of the four gamete rule (4th gamete>0.02); and estimates of recombination between adjacent sites,. To further evaluate patterns of LD decay, pairwise r2 values were estimated with Haploview for all validated markers within each gene for B. t. taurus and all cattle combined. A minimal set of tagSNPs/Indels predicted to capture 100% of the variation (r2>0.80) segregating in B. t. taurus and all cattle combined was deduced using the Tagger algorithm implemented in Haploview.
Median Joining Haplotype Networks
Because median joining (MJ) networks require the absence of recombination, genes displaying evidence of historical recombination (TLR2, TLR3, TLR6) were each partitioned into two regions of elevated LD. Haplotypes were reconstructed for each intragenic region and best pairs were used for MJ network analyses. This approach improved the proportion of cattle with best pairs phase probabilities≥0.90 and eliminated regions displaying overt evidence of recombination. MJ networks were constructed using Network 22.214.171.124 (Fluxus Technology Ltd, Suffolk, England), and the default character weights of 10 for SNPs and 20 for indels. Results were visualized, annotated, and adjusted within Network Publisher (Fluxus Technology Ltd, Suffolk, England). Branch angles were adjusted to ensure proper network magnification and clarity without changing branch lengths.
AA Substitution Phenotypes and TLR10 Evolutionary Analyses
Bovine AA substitution phenotypes were predicted using PolyPhen and SIFT (http://genetics.bwh.harvard.edu/pph/; http://genetics.bwh.harvard.edu/pph/pph_help.html; http://sift.jcvi.org/; http://sift.jcvi.org/www/SIFT_help.html) with the default settings. Results other than “benign” or “tolerated” were categorized as substitutions predicted to impact protein function,,. To assess the potential for functional and/or selective constraint across the entire bovine TLR gene family, a goodness of fit test (χ2) was performed assuming equal probabilities for benign or tolerated AA phenotypes versus those predicted to impact protein function. Frequency distribution tests, including Tajima's D
 and Fu's F
, were performed in DnaSP v4.90.1 using all validated SNPs. Significance levels for frequency distribution tests were defined by confidence intervals estimated for each test statistic via coalescent simulation (10,000 replicates). Simulations were performed given the observed number of segregating sites, both with and without recombination,.
At each polymorphism we estimated the effective number of alleles as Ei = 1/(1 - 2pi(1-pi)) = 1/(pi
2 + (1 - pi)2) = 1/(expected HWE frequency of homozygotes) where pi is allele frequency at the ith locus. Thus a measure of polymorphism diversity is log2(Ei) which also represents the information content of each SNP. For monomorphic SNPs log2(Ei) = 0 and for SNPs with pi = 0.5, log2(Ei) = 1. Thus by summing across the Nj polymorphisms within the jth gene we obtain the diversity index Ij = . We used regression analysis to examine the relationship between Ij and Nj for these genes and to test for outliers using 95% confidence estimates for the fitted regression.
Association Tests with MAP infection status
A case-control study was performed to estimate the association between specific TLR genotypes and MAP infection in Holstein cattle. The study population was derived from an established repository that included whole blood samples preserved from adult Holstein cattle in three herds that were characterized on the basis of: 1) MAP bacterial culture of feces; 2) MAP bacterial culture of tissues for harvested cattle; 3) ELISA values for MAP-specific antibody. Cattle from which MAP was cultured in the feces and/or the tissues collected at harvest were selected as cases (n = 68). Herd-matched controls (n = 270) were selected from those cattle in the repository with negative ELISA and bacterial culture data. Cattle with multiple negative tests were preferentially selected to reduce the probability of misclassification relative to infection status due to the low sensitivity of available diagnostic methods for MAP. DNA was extracted from available blood specimens using a commercial kit (MoBio DNA non-spin, Carlsbad, CA) and assessed for quality as well as concentration by standard spectrophotometric methods. Genotypes for validated SNPs and indels in the 5′ upstream regions, introns, and those associated with nonsynonymous or putative nonsense mutations in bovine TLR genes recognizing bacterial ligands (TLR1, TLR2, TLR4, TLR5, TLR6, TLR9, TLR10) (see refs,) were evaluated for further analysis. Loci fixed for the major allele in our dairy population were excluded, leaving 35 nonsynonymous and 1 putative nonsense substitution, and 37 other SNP loci within the 5′ upstream regions or intragenic introns. For these 73 variable sites, we excluded SNPs and indels with MAFs<0.01 in our infected cases, leaving 32 SNPs and 3 indels for association tests (see Table S1).
Conditional logistic regression models were constructed for each of the 35 variable loci to estimate the relative odds of being infected with MAP based on the defined diagnostic criteria adjusted for the effects of herd using the PHREG procedure of SAS (SAS v. 9.2, SAS, Cary, NC). Effects of genotype were estimated using 3 different covariate specifications. First, an additive mode of inheritance was examined whereby the odds of infection associated with each additional copy of the minor allele was modeled as a single continuous covariate. Second, a recessive mode of inheritance was modeled, where the odds of infection in cattle homozygous for the minor allele were estimated relative to cattle heterozygous and homozyzgous for the major allele. Finally, each genotype was modeled as an indicator variable and effect estimates were generated for cattle homozygous for the minor allele, and for heterozygous cattle, both relative to cattle homozygous for the major allele. This allowed evaluation of assumptions in the additive model with respect to the effect of the additional copies of the minor allele being linear in the log odds, and potential intermediate effects of the minor allele not captured in the other models. Potential confounding by age was examined by including birth year as a fixed covariate (where available), and was defined as a change in the relative odds of greater than 20% after addition of the birth year term. For models where evidence of confounding by age was detected, birth year was retained in the model to adjust genotype estimates for this effect. With the exception of TLR1, TLR6, and TLR10, all single marker P-values were corrected for multiple testing by applying the FDR correction (http://sdmproject.com/utilities/?show=FDR) to the raw P-values derived from each investigated gene (locus-specific correction). Given the close physical proximity of TLR1, TLR6, and TLR10 on BTA6, these genes were considered a single locus for correction of multiple tests. However, it should be noted that none of the variable markers within TLR1 met our inclusion criteria (MAFs>0.01), and therefore, locus-specific correction was only applied to raw P-values from TLR6 and TLR10.
Haplotype association tests were performed in PHASE 2.1. Briefly, for dairy cattle with disease classifications based on bacterial culture status of MAP, we tested the hypothesis that haplotypes differ among cases and controls for all genes evaluated in the single marker association analysis (68 cases, 270 controls, n = 338 total). For maximum LD-based resolution of haplotypes, we used all variable markers within seven bovine TLR genes that recognize bacterial ligands. Significance was estimated via 1,000 permutations.