Analytical limit of detection (LOD); performance and confirmation
Capture of target genes demonstrated a preliminary cutoff for mapped reads above background levels in wild type cultures; however, the intended application of these probes was to detect ciprofloxacin resistance mutations. To test sequence coverage of the targeted genetic variants, we assessed strains with mutations known to confer resistance to ciprofloxacin4. Because WGA may incorporate bias22, these experiments used non-amplified whole genome extracted DNA to more accurately determine the assay LOD. Linear regression was utilized to fit data to a line and determine the point at which each genetic variant had 30X read coverage above the cutoff as determined by non-template controls24 (Fig. 4). This crossing point was determined to be the assay LOD. An LOD range was given for each genetic variant based on the standard error of the linear regression. In this context, each genetic variant fell within ranges of detection between 300 and 1200 input genome copies similar to the cutoff levels determined earlier. As demonstrated previously, the percentage of correctly assigned reads was between 80–100% at concentrations near or above the LOD (Fig. 4).
Diagnostic assays require statistically reproducible LODs for clinical applicability. To confirm LOD performance, we used 1.5x the upper LOD range to demonstrate assay reproducibility. Specifically, these analyses tested detection of B. anthracis, Y. pestis, and F. tularensis genetic variants at 2813, 1729, and 1368 total input genome copies, respectively (Fig. 5), across 16 replicates. The number of mapped sequencing reads demonstrated high reproducibility between experiments with the total read coverage at each genetic variant falling near or above the 30× coverage line. No replicate tested showed zero coverage at a given location.
Mock clinical performance
The ability to detect infectious agent directly from a patient sample is essential when evaluating new diagnostic tests. To determine assay performance over a broad range of input genomes in a complex sample matrix, we isolated DNA from whole blood spiked with WGA bacterial DNA. The MIP panel was tested on these samples to demonstrate assay performance in the presence of human genome. To directly compare the results to previous experiments the number of input genomes per reaction was calculated from ng of DNA per ml of blood assuming 100% extraction efficiency during sample processing (Fig. 6). As previously described, non-template controls were used to determine cutoff values for each gene. The number of input genome copies required for sequencing reads above background was approximately 10-fold higher than that of pure bacterial samples. Similar to the analytical analysis Person correlations demonstrated significant correlation between target regions suggesting similar capture efficiencies (Supplementary Table S2).
High Resolution Melt analysis
Sequencing data showed targeted amplification coupled to NGS was functional for determining ciprofloxacin resistance conferring genetic variants; however, this NGS strategy sacrifices cost, time-to-answer, and sensitivity for the benefit of a high degree of target multiplexability. In cases of limited sample volume and a large number of potential targets, real-time PCR or HRM real-time PCR screening is not feasible. A multiplexed upfront amplification strategy could be used to amplify targets of interest allowing for screening with these low cost, quick time-to-answer singleplex assays. To test MIPs applicability as a target amplification technique for other molecular assays, we assessed HRM assays targeting ciprofloxacin resistant genetic variants with amplicons generated by the MIP protocol. Detection of two gyrA mutations and a parE deletion in F. tularensis4 using the MIP protocol as an upfront target enrichment demonstrated differences between wild-type and ciprofloxacin resistant strains (Fig. 7a,b). Melting peaks with the gyrA C248T G259T mutation had an overall decrease in melting temperature, and the parE ΔTTAAA mutation had an increase in melting temp due to the increase in GC content. Furthermore, the use of MIPs significantly amplified target sequences based on Cq values for HRM experiments performed with purified genomic DNA versus amplified MIP DNA (Fig. 7c,d). Specifically, MIPs significantly reduced detection above background for the HRM analysis even after a 1:10,000 dilution of the MIP amplified material was required for the assays to fall within the optimal Cq range (Fig. 6c,d). Overall, cumulative datasets with these MIPs suggest this technique is a highly multiplexable capture system that can be utilized with multiple molecular detection assays for genetic variant identification.
Bacterial organisms frequently evolved innate immunity to antibiotics for survival in environmentally competitive niches25. Recently however ecological studies have shown development of antibiotic resistance in bacterial pathogens caused by increased antibiotic usage in animals, food, and humans26. A continuous cycle of newly acquired resistances results from the acquisition of genetic material or genetic variations in an organism’s genome. Currently, standard AR susceptibility testing is a laborious and lengthy process requiring pure bacterial cultures isolated from patient samples grown in the presence of antibiotics27. While susceptibility testing will likely remain the gold standard for AR confirmation, molecular techniques have arisen offering faster diagnosis to help guide therapeutic regimens528.
Real-time PCR is currently the standard molecular diagnostic for pathogen detection and more recently has been used for AR profiling2930 although NGS is quickly being adopted in clinical laboratories. Real-time PCR is advantageous to other molecular diagnostics in many aspects including cost, time to answer, and sensitivity. One major aspect for time to answer is assay sensitivity and the ability to detect pathogen directly from patient samples without the necessity of culture. Bacteremia in septic patients can range more than 10-fold although<1000 colony forming units/ml of blood is common3132. Real-time PCR offers sensitivities approaching one target copy per reaction282933 allowing sample processing and detection directly from patient samples. NGS capabilities have outpaced Moore’s law boasting continuous technical improvements over the past decade however, relative to real-time PCR, diagnostic sensitivities are still lacking9. Metagenomic sequencing from clinical samples requires ultra-deep sequencing to obtain desired coverage levels of targeted pathogen and is not practical for routine clinical use due to cost and informatics burden. Targeted sequencing helps to bridge the gap between the sensitivities seen with real-time PCR and the quantitative data associated with NGS.
Analytical LODs for MIPs as an upfront amplification strategy for NGS fell within a range of approximately 1000 input genome copies. Mock clinical samples required an approximate 10-fold greater number of genome copies for positive detection above background levels. This difference is likely attributed to the carryover of inhibitors from complex matrices and variable efficiencies of sample processing methods which are known to affect PCR LODs9343536. Other factors affecting LOD include poor signal to noise and misidentification of pooled samples. Specifically, excess MIP backbone within each sample resulted in higher noise at low sample input concentrations (Supplemental Figure S1). De novo assembly of the un-mapped reads showed residual MIP backbone present after the cleanup. This was due to the formation of concatemers during the amplification process. The resulting background was more problematic as input concentrations approached the background cutoff. Similarly, incorrect assignment of dual indexed samples or sample bleeding2223 was more pronounced at lower input concentrations. At these values the percentage of correctly mapped reads dropped below 50% for most capture regions tested. This read misidentification is an inherent limitation with multiplexed NGS which, as technology improves, can be overcome23.
While more sensitive, real-time PCR still suffers from an inability to significantly multiplex reactions. This is problematic with unknown samples which would require numerous screenings for identification and is not feasible with limited patient sample volumes. This is exacerbated in regards to AR which is multifaceted involving enzymes, pumps, and genetic variations requiring multiple target identifications5. Metagenomic sequencing could both classify the organism along with known resistance markers; however, the expertise and computing infrastructure required for analyzing data is still burdensome for clinical and biosurveillance programs. Metagenomic sequencing is not conducive for pooled sample NGS runs due to the difficulty obtaining adequate sequence coverage for genetic variants in complex matrices, increasing per sample costs. Depending on the NGS instrument, ultra-deep sequencing requires runs times ranging from days to over a week depending on desired coverage937. Targeted sequencing allows for significant coverage of desired regions over background genome and simple reference based mapping9. This permits sample pooling resulting in significant cost reduction with sequencing results in hours depending on desired amplicon length38. The cost burden of MIPs as an upfront amplification technology is marginal and increases NGS library preparation time by approximately 5–6 hours which is only slightly longer than multiplexed PCR based amplification technologies.
The requirement of a prior knowledge concerning the input etiologic agent or desired target region for probe design is an inherent caveat to targeted sequencing approaches; however, most new outbreaks such as Ebola virus or new resistant hospital acquired infections like carbapenem-resistant enterobacteriaceae have at least some representation within the public sequence repositories3940. Even completely novel pathogens such as SARS coronavirus have some predicate within the phylogeny that allows for taxonomic placement based on targeted genomic sequence41. For bacterial identification specifically, 16S ribosomal RNA has been used to classify organisms for decades42. Conserved DNA stretches which flank hypervariable regions have been used for PCR amplification and could be adapted to MIPs for pathogen identification4344. These probes along with viral specific targets would cover a significant percentage of human specific pathogens while new probes, due to the multiplex capabilities of MIPs, can be designed as completely novel organisms are classified by whole genome sequencing.
Efficient, specific, and reproducible multiplex capture and enrichment of DNA regions within an organism as well as across diverse species is an important part of a diagnostic or biosurveillance panel for targeted NGS. Techniques such as high level multiplex PCR often have reduced specificity when highly multiplexed due to non-specific amplification resulting from interactions in primer pairs45. Target circularization techniques such as MIPs have increased specificity attributed to 1) dual enzymatic events required for gap fill and ligation, 2) exonuclease digestion of non-circularized material, and 3) a common backbone which physically restricts the two homologous binding arms45. Sequencing data shown here demonstrated the ciprofloxacin MIP assay captured targeted regions of interest within the organism specifically and reproducibly at a high degree of multiplex. In fact, Pearson coefficients for intra-organism multiplex capture of targeted regions in B. anthracis and F. tularensis showed a high degree of correlation (Supplementary Table S2). Similarly, correlation analysis of inter-organism probes from all six capture regions was highly related and reproducible among experiments. This high degree of correlation suggests that the multiplexed probeset reproducibly captured regions of interest with similar efficiencies across a broad range of input DNA.
Diagnostic applicability of MIPs are not solely limited to NGS. MIPs also serve as a general enrichment strategy for other molecular detection technologies such as previously published HRM real-time PCR. Because these probes captured approximately 200 base pairs of genetic information, this allows quick identification for the presence or absence of known genetic variants that can later be characterized at higher granularity via NGS for other potential resistant conferring elements. This triaged approach becomes particularly vital when patient sample volumes are restricted, thereby limiting the application of multiple molecular diagnostic assays or non-multiplexed screening techniques. MIPs represent an overarching multiplexable amplification technology amenable with NGS technologies and other downstream molecular diagnostic techniques allowing for potential diagnostic applications including organism classification, gene identification, and genetic variant detection to name a few.
Strains used and DNA preparation
Bacterial strains used in this study included wildtype and ciprofloxacin resistant Bacillus anthracis ΔANR, Yersinia pestis KIM5, and Francisella tularensis Schu4. B. anthracis wildtype and ciprofloxacin resistant mutant S3–8 clones were originally obtained from Dr. Lance B. Price6, Y. pestis wildtype and ciprofloxacin resistant mutant M1 clones were originally obtained from Dr. Luther Lindler8, and F. tularensis wildtype and ciprofloxacin resistance mutants were obtained from Dr. David Kulesh4. DNAs were extracted and purified using the Qiagen EZ1 kit (Qiagen, Valencia, Ca) according to the manufacturer’s instructions. For Figs 3 and 6,WGA DNA was used and WGA was performed utilizing Qiagen’s REPLI-g-Midi Kit according to manufacturer’s protocol. For Fig. 6, 1 μg of WGA DNA was spiked into 1 ml of whole blood (BioreclamationIVT, Baltimore, MD) and 10 fold serially diluted. 100 μl of each dilution was extracted using TRIzol LS (Thermo Fisher Scientific, Waltham, MA) and the EZ1 (Qiagen) according to the manufacturer’s instructions For non-template controls water or, in Fig. 6, non-spiked extracted whole blood was used in lieu of bacterial DNA. DNA concentration was quantified utilizing Qubit dsDNA BR and HS assay kits (Thermo Fisher Scientific).
MIP identification and design
MIP probes were designed by Bioinnovation Solutions (Lausanne, Switzerland) to flank SNPs and deletions known to confer ciprofloxacin resistance and purchased by USAMRIID. Briefly, B. anthracis AMES gyrA, gyrB, and parC (GenBank NC_003997.3), Y. pestis A1122 (GenBank NC_017168.1), and F. tularensis gyrA and parE (GenBank AJ749949.2) genes were used as BLAST templates. Consensus sequence alignments with the highest E-value hits were then used to identify conserved regions for probe hybridization. Based on these conserved regions, 24 probes targeting B. anthracis, 12 targeting Y. pestis, and 8 targeting F. tularensis were designed. All probes were designed to capture sequences ranging from 180–210 bp. Probes were synthesized by Integrated DNA Technologies (IDT, Coralville, IA). Complimentary probe arms are represented in Supplementary Table S1.
Bacterial MIP probe analysis
Probes were re-suspended in buffer TE and pooled to a final per probe concentration of 3 nM. A total of 44 probes were combined and used as a master probe mix. Probe capture protocol was performed as previously described with slight modifications20. The modified protocol is as follows: Total probe mix was hybridized to 10-fold serial dilutions of WGA or non-amplified extracted DNA samples (see above) of B. anthracis, Y. pestis, and F. tularensis by adding 3 nM probe to 1.5× ampligase buffer (Epicentre, Madison, WI), heat to 94 °C for 2 minutes, and then ramped to 60 °C (0.1 °C/sec temperature decrease) with a 60 min hold at 60 °C. Gap filling and ligation were performed by the addition of Phusion high-fidelity PCR master mix with HF buffer (New England Biolabs, Ipswich, MA), 4 units Ampligase (Epicentre), ampligase buffer, and 0.4 μM dNTPs (BioFire, Salt Lake City, UT). The reaction was then incubated at 60 °C for 60 min and held at 15 °C.
Non-circular DNA was removed by exonuclease I and exonuclease III digestion. Specifically, the reaction was heated to 94 °C for 2 min and held at 37 °C while 10 units of exonuclease I and 50 units of exonuclease III were added. Samples were then incubated at 37 °C for 30 min followed by enzymatic inactivation at 94 °C for 15 min. Capture sequence was then amplified by the addition of Platinum taq (Thermo Fisher Scientific), 0.4μm forward 5′-CAGATGTTATGCTCGCAGGTC and reverse 5′-GGAACGATGAGCCTCCAAC primers, 1.0× buffer with 50mm MgCl2, and 0.4 μM dNTPs. Reactions were amplified at 95 °C for 3 min, 40 cycles of 95 °C for 30 sec, 60 °C for 30 sec, 72 °C for 1 min, and a hold of 72 °C for 10 min. Following amplification samples were purified utilizing Agencourt AMPure XP beads (Beckman Coulter, Pasadena, Ca) per the manufactures protocol with the slight alteration to a 1:1 mixture of bead to sample volume.
Sequencing and analysis
Library preparation was performed on the Apollo 324 system (Wafergen Biosystems, Fremont, CA) utilizing the PrepX ILM 32i DNA library kit (Wafergen Biosystems) and TruSeq dual indexes (Illumina, San Diego, CA) according to the manufacturer’s instructions. Samples were then enriched with the TruSeq HT preparation kit (Illumina) and purified as described above. Samples were quantified with the Qubit dsDNA BR and HS assay kit (Thermo Fisher Scientific) and pooled based on total concentration. Proper adaptor ligation confirmation and final normalization were performed using the KAPA library quantification kit (Kapa Biosystems, Wilmington, MA).
Amplicons were sequenced using the MiSeq platform (Illumina) using the 2 × 150 cycle sequencing kit. Sequencing reads were analyzed using CLC Genomic Workbench (CLC Bio, Cambridge, MA). Reads were trimmed for quality, and the universal primer sequences were used to filter proper capture sequences. The trimmed and filtered reads were then mapped against known reference genes including B. anthracis AMES gyrA, gyrB, and parC (GenBank NC_003997.3), Y. pestis gyrA (GenBank NC_003143.1), and F. tularensis gyrA and parE (GenBank AJ749949.2). Standard settings for read mapping were used and required 70% of the read length matching the reference with at least 80% identity while ignoring non-specific read mappings for positive hits. Non-template controls were run concurrently to measure sample bleeding which occurs when sequencing reads are misidentified during the demultiplexing process. A cutoff was determined as the average number of sequencing reads mapped to the reference gene in the non-template controls plus three times the standard deviation. 30X coverage represents the point at which there is 30 times the read coverage above this cutoff control. LODs were determined based on a defined 30× read coverage at an indicated genetic variant. Variant analysis was performed on mapped sequencing reads using a minimum read coverage of 5× and minimum variant frequency of 10%. Linear regression and Pearson correlations were performed using GraphPad Prism v. 6.04 graphing software (GraphPad, La Jolla, CA).
High Resolution Melt Assays
Amplicons resulting from the MIP capture protocol in Fig. 4 before NGS library preparation were used as templates and diluted 1:10,000 for high-resolution melt analysis. This was compared to concentrations of genomic DNA which were used as an input to the MIP protocol. HPLC purified primers for F. tularensis gyrA (forward 5′-CGGTAAATATCACCCTCATGGAG and reverse 5′- AGGTTGTGCCATTCTGACAATAGTAT) and parE (forward 5′-CTTACATGGCATTTTGAAACTGGAC and reverse 5′- CAGCTTCTAGTTTATGGTCAAGATAGCC) were utilized4. Reaction conditions included 1X Lightcycler 480 High Resolution Melting Master Mix (Roche Diagnostics, Indianapolis, IN), 0.4 μM forward and reverse primers, and MgCl2 to a final concentration of 3 mM. Reactions were performed on a Roche 480 Lightcycler (Roche Diagnostics, Indianapolis, IN) with thermocycling conditions consisting of a pre-incubation cycle of 95 °C for 10 min, a 45 cycle amplification step of 95 °C for 10 sec, 60 °C for 10 sec, and 72 °C for 10 sec with a fluorescence reading taken at the end of each 72 °C step, and a high-resolution melt step of 95 °C for 1 min, 45 °C for 1 min, and melt curve from 65 °C to 95 °C with 25 fluorescent readings taken per 1 °C temperature change. Values were generated using the melt curve genotyping analysis with the integrated Roche LightCycler 480 software version 1.5.1.
How to cite this article: Stefan, C. P. et al. Targeted next-generation sequencing for the detection of ciprofloxacin resistance markers using molecular inversion probes. Sci. Rep.
6, 25904; doi: 10.1038/srep25904 (2016).