Polymerase chain reaction (PCR) coupled to reverse transcription (RT) represents the most significant improvement in the area of RNA virus detection over classical cell culture based methods. In the classical culture based method, the principal mode of virus identification uses growth of the virus in permissive cells and observation of the morphological changes brought about by virus replication in the host cell. Although it is possible to differentiate between cytopathic and non-cytopathic hepatitis A virus (HAV) strains due to a difference in the morphology of infected cells, in practice such morphological identification is of limited value because the morphological effects are cell-line specific, and many viruses in the same genus (e.g. Enterovirus) produce rapid and similar cytopathic changes in many of the cell-lines normally used for virus detection. Moreover, using multiple cell-lines for virus detection is also labor intensive and time consuming, and further confirmation and identification often requires the use of additional techniques such as serotyping.
Molecular methods based on viral RNA amplification by RT-PCR have evolved as rapid alternatives to cell culture for the detection and identification of viral strains. For example, the differential identification of strains within a species is possible based on the difference in the size of the amplified PCR product (amplicon) detectable by gel electrophoresis ( or single-strand conformational polymorphism (SSCP). Indeed, we have utilized agarose gel electrophoresis following RT-PCR using primer pairs straddling a 14 base insertion at the non-coding region of some HAV genomes to identify specific cytopathic strains from non-cytopathic strains of HAV [4, 6]. We also reported the use of SSCP analysis following Alu 1 or Hinf 1 digestion of amplicons generated from the 3’ end of the viral genome to provide differential identification of multiple HAV strains. However, SSCP is a multi-step procedure involving radiolabeling of restriction fragments prior to electrophoretic separation of individual DNA strands. Consequently, this procedure works best when the restriction fragments are small enough to provide sufficient single-stranded DNA separation for effective strain identification. For genetically well-conserved viruses such as HAV, the region to be amplified for SSCP analysis has to be carefully chosen in order to represent areas of reasonable diversity [7, 8]. Due to these considerations, it has been preferable to sequence the PCR amplified DNA fragment in order to specifically identify the genotypes or strains of the viruses. While sequencing amplified PCR products is considered a precise technique for identification, PCR amplification of a mixed population of target sequences may be biased in favor of a dominant (by copy number) target such that subsequent sequence analysis may not reveal the presence of other closely related target sequences in starting populations. Putative mixed virus populations (e.g. of the same or different species) can exist in isolates obtained from environmental and infected-host samples particularly those resulting from RNA virus replication that is known to generate a sub-population of “quasi-species”. Therefore, a threshold number of RNA molecules must have the same specific mutation in order to be unambiguously detectable by RT-PCR and sequencing, due to possible inhibition of amplification of a less abundant template by template competition. Conversely, the dominant mutation present in a population may be preferentially amplified, and therefore, sequence analysis would represent the dominant mutant. Therefore, while sequencing remains a “gold-standard” for target sequence identification, the identification of multiple viral species or tracking species mutations necessitated the development and application of a broader approach to identification prior to undertaking sequence analysis.
As an alternative to sequencing, Proudnikov et al. applied a hybridization-based technique to the detection of genetic variants of poliovirus within a virus population or among viral strains. Oligonucleotide probes are synthesized and then immobilized on a solid surface. A target consisting of amplified viral complementary DNA (cDNA) then labeled and hybridized to the immobilized probes and the hybridization to the individual probes detected. The presence of a change in the nucleotide sequence in the target is detected by the absence or the reduction of hybridization to the wild type probes around the change, or by the ratio of the signals generated by a mutant against a reference strain. Modifications of the above technique including the use of amplified viral complementary RNA (cRNA) were used to identify genetic variations arising during cultivation of a vaccine strain of poliovirus and the emergence of vaccine derived poliovirus in immunized patients showing signs of vaccine associated paralytic poliomyelitis [13, 14]. Application of this procedure was restricted, however, to identifying known mutations in specific virus strains.
Advances in microarray technology have allowed the identification of genetic variability over very long stretches of DNA in bacterial genomes. These newly developed high density microarrays contain thousands to hundreds of thousands of oligonucleotide probes, instead of a few dozen, in a single array thereby expanding the power of identification. In the current investigation we report the design and use of a high density oligonucleotide microarray for the identification of HAV and coxsackievirus (CV), both foodborne human pathogens. Our results indicate that the microarray hybridization technique can be applied to the identification of viruses of differing genus and species present in a sample and detect single nucleotide polymorphisms (SNP) to identify closely related viral strains belonging to the same species.
Viruses and Plasmids
Hepatitis A virus strains HM175/clone 1 and 18f, and coxsackievirus (CV) serotypes B1, A3 and A5 strains used in this study were obtained from ATCC (Manassas, VA) and further grown in FRhK4 cells. The plasmid pHAV/7 contains a full length cDNA copy of wild type HAV strain HM175 cloned into the vector pGEM-1 that was grown and purified as previously described. HM175 clone1 and 18f are culture-adapted strains derived from continuous culture passage of the wild-type strain HAV HM175.
All microarrays used in this study were manufactured by NimbleGen Systems Inc. (Madison, WI) using a maskless array synthesis (MAS) technology for in situ synthesis of DNA oligonucleotides directly onto glass microscopy slides [16, 17]. Oligonucleotide design was based on available complete viral genome sequences obtained from GenBank for CV (n=25), HAV (n=23), Norovirus genogroup I (n=4), Norovirus genogroup II (n=21), rotavirus (various species) segments 3 (n=11), 4 (n=19), 8 (n=11), and 11 (n=12) where n equals the number sequences obtained for each virus group. All genomic sequences within a virus group were aligned using CLUSTALX, and dendrograms were generated and consensus sequences constructed based on these analyses. Examples of these dendrograms are shown for HAV and CV (Figs. 1, 2, respectively). For the purpose of generating representative viral genomic sequences on which to base subsequent oligonucleotide designs, the HAV strains were clustered into 5 groups whose viral genome sequences were constructed as follows: i) a consensus sequence based on the seven genotype Ib (i.e. genotype I, subgenotype b) strains that clustered into group 1 which includes the HAV HM175/wt strain (M14707), ii) a sequence derived from M20273 based on the pairing of M20273 and AF314208 (genotype Ib sequences in group 2), iii) a sequence derived from the single HAV genotype II sequence (IIb) available (AY032861) and assigned as group 3, iv) two consensus sequences based on either cluster group 4 or 5 derived from fourteen genotype Ia sequences that were clustered into either of these two groups. The three consensus sequences representing cluster groups 1, 4 and 5 were obtained following a group sequence alignment and the assignment of the most frequently occurring nucleotide at positions containing nucleotide differences. The clustering of either one or two sequences within a group (as in groups 3 and 2, respectively) resulted in the selection of a single sequence representing that group. Due to the highly diverse (genetic) nature of the CV genome sequences, clustering of strains for generating a group consensus sequence was only done for serotype strains B1 and B3 (groups 1 and 2, respectively). Four additional unique strain sequences were selected as representative sequences for broadly clustered strains identified as groups 3-6. Viral genomic sequences (approximately 3000 bases) from either the 3’ end of the HAV genome group sequences or the 5’ end of the CV genome group sequences were submitted for design of a tiling oligonucleotide array consisting of oligonucleotides of length 29, starting at every 5th base in every sequence, resulting in an overlap of 24 bases in two consecutive oligonucleotides. Similar methods were applied to the development and tiling of oligonucleotides as probes for norovirus and rotavirus sequences on the array. The resulting array contained approximately 13,000 viral probes.
Reverse Transcription and PCR of Viral Genomes
All reverse transcription (RT) reactions were completed using RNA templates obtained from linearized plasmid pHAV/7 transcribed in vitro with SP6 polymerase, total cellular RNA (1 µg) isolated from virus infected cells using the RNA AqueousKit (Ambion, Austin, TX), or viral genomic RNAs (equivalent to 5 x 106 infectious particles) isolated directly from clarified tissue culture supernatants using the RNeasy Micro Kit (Qiagen, Valencia, CA); a mixture of oligo(dT15) and random hexamers (pdN6) as primers; and AMV reverse transcriptase (Promega, Madison, WI) as previously described [4, 20]. In vitro transcribed and infected cell RNA templates represent in vitro and in vivo replication, respectively. PCR amplification with HAV or CV specific primers was carried out in 50µl reactions using 5µl of each RT reaction as template or 5ng of pHAV/7 plasmid DNA as previously described. PCR products (5µl) were analyzed by agarose gel electrophoresis to confirm authenticity of product formation (data not shown).
Two primers, 3399 - 3423 (forward) and 7084 - 7105 (reverse), were used to amplify an approximately 3.7 kb region of the HAV genome [4, 6, 20]. Tables 1 and 2 show the sequences around the primer binding sites of selected HAV strains represented on the array. Tables 3 and 4 contain the sequence alignments at the forward and reverse primer binding sites for selected CV strains. The reverse primer for CV is degenerate owing to sequence differences among strains in this region. These primers amplify a 746 bp fragment from several B and A strains (data not shown).
Labeling of PCR Products and Hybridization
PCR products were purified using a spin column procedure [Qiagen or Stratagene, (La Jolla, CA)]. One µg of each purified PCR product was labeled with biotin-dUTP in a primer extension reaction using random hexamers and Klenow polymerase (Exo-). Labeled products were purified by spin column chromatography, and concentrated by centrifugation through Microcon® (Millipore, Billerica, MA) filters. Biotin-labeled DNA was denatured in a total volume of 20µl of hybridization solution containing 5XSSC, 0.1%SDS, 5µg poly A, and 5µg human Cot-1 DNA and 6 µl used per hybridization reaction per well of a 12 well sample pod (NimbleGen Systems, Inc.). The microarray slide (NimbleGen) was laid on top (oligonucleotide side down) of the sample pod and held in place in a metal cassette provided by the manufacturer. Hybridization was carried out for 12h at 42 °C. The slides were washed sequentially with 2XSSC/0.1%SDS, and 0.1XSSC/0.1%SDS at 42 oC then distilled-deionized water at room temperature. The slides were then stained with a Cy3-streptavidin conjugate (Amersham Biosciences, Piscataway, NJ) as described in Jackson et al..
Data Extraction and Analysis
Hybridized, Cy3-stained microarrays were scanned using an Axon GenePix® 4200A scanner at 5 µm resolution using a 532 nm laser. Fluorescence intensities of each feature (oligonucleotide probe) were extracted utilizing NimbleScan™ software (NimbleGen Systems Inc), and all subsequent data analyses were performed using MS Excel. Data were analyzed independent of comparison to a reference strain assuming that each virus strain is unique. Following normalization for background fluorescence, the fluorescent intensity of each probe (normalized probe intensity) was plotted against the genome position of each probe to generate a hybridization profile for each viral strain [15, 17]. To generate the average probe intensity for each probe set per hybridized virus strain, the sum of all normalized probe intensities for individual probes within a probe set (i.e. set of probes derived from an individual strain or group sequence) was divided by the number of probes within that set.
Identification of HAV Genotype by Microarray Hybridization
Fig. (3) shows the hybridization profile obtained with a target synthesized by PCR amplification of the plasmid pHAV/7. This plasmid contains a copy of the entire HAV sequence of wild-type HM175 strain HM175 [19, 27] that originated from an Australian outbreak, and was designated as genotype Ib by subsequent sequence analysis [24, 28]. The hybridization signals (normalized probe intensities) produced a profile indicating areas of intense hybridization at the position where the HAV sequences are clustered in the array. However, variations in the intensity of hybridization can be observed within these sequences, where the target hybridization intensities against group 1 probes (hav1Cb) differ from probes derived from groups 2 through 5 (hav2b, hav3b, hav4Cb and hav5Cb) sequences. This is more clearly observed in Fig. (4), where the normalized probe intensities for individual probes within each group sequence present in the array were converted to average probe intensities and plotted for the target. The plot reveals that the HAV genotype 1b (HM175 wild-type) target hybridized most efficiently to probes from genotype Ib, group 1 consensus sequence (hav1cb). These results are consistent with the fact that the viral genome sequence for HAV HM 175 wt strain (14707) is a member of, and therefore most closely related to, group 1 derived probe sequences. Probes representing a closely related HAV Ib strain from group 2 (hav2b) hybridize the target about two thirds as efficiently, while probes from the more genetically distant genotype II virus (hav3b) hybridize with the least intensity. The other probe groups (hav4cb and 5cb) both representing genotype Ia consensus sequences (Fig. 1, cluster groups 4 and 5) hybridize less efficiently than genotype Ib. Given the readily observable differences in both the normalized and average signal intensities among the genotype group sequence probes (groups 1-5) following genotype Ib target hybridization, and the fact that viruses belonging to different subgenotypes can differ by as much as 7.5% in sequence [21, 24, 28], the data in Fig. (4) indicate that it is possible to identify HAV strains at the level of both genotype and subgenotype with this type of array.
To further explore genotype/subgenotype differentiation, different HAV strains belonging to the same subgenotype Ib sequence (Fig. 1, group 1) were hybridized to the array. As shown in Fig. (5), HAV strains HM175 wt, clone 1 and 18f hybridize most efficiently to genotype Ib (consensus group 1) probes (hav1Cb). Lower efficiencies of hybridization are observed for all three targets against all other probe sets. These results reflect a greater target specificity for the probe set that contains target member sequences than for the other genotype Ib derived probe set (hav2b) that does not contain target member sequences (group 2 in Fig. 1). For all target strains, the remaining probe sets yielded signal intensities equivalent to or less than intensities for probe set hav2b. Thus, in support of the interpretation of results of Fig. (4), this array has the potential to discriminate viral targets at the level of both their genotype and subgenotype.
Differential Analysis of Two Target (HAV) Strain Hybridization Profiles Reveals a Correlation with Known Nucleotide Differences
It is important to note that despite the variation in average probe intensities for the individual strains against probe set hav1Cb (Fig. 5), the information as presented cannot be used to identify actual target nucleotide differences. For example, differences in signal height could be attributed to differing hybridization efficiencies between two different experiments. Indeed, a target derived from in vitro synthesized RNA from pHAV/7 (representing in vitro replication of the viral genome) was indistinguishable from plasmid derived target, or the virus following several rounds of replication in culture except for the peak height (data not shown). We pursued, therefore, an alternative method of analysis because the tiling array design offers the potential to distinguish between these closely related strains following hybridization by i) determining the normalized probe intensities for each target, and ii) plotting the change in signal intensity of hybridization by each target to the same probe set as the ratio (fold-change in probe intensity) vs the individual probes. As discussed by Jackson et al., this method of analysis can reveal distinct peaks with defined slopes (above background/signal noise) where changes in signal strength would occur with probes tiled further up or down stream of the nucleotide change. The presence of a mutation in the genome causes a destabilization of a number of probes around the mutation, which can be identified by the appearance of well defined peaks. Therefore, this method of analysis offers the potential to differentiate closely related strains of virus belonging to subgenotype Ib at the level of individual nucleotide differences, thereby producing data that can be used to tell them apart.
In order to complete this analysis, the two different HM175 strains designated clone 1 and the cytopathic 18f strain were again subjected to hybridization and the total normalized intensities of all probes belonging to the different HAV probe groups were plotted as in Fig. (3). Again, we found no overall differences in the hybridization profile but rather found peaks of hybridization intensities with the strongest hybridization intensities for the group 1 (HAV1Cb) consensus sequence following calculation of average probe intensity (data not shown). The fold-change in intensity between clone 1 and 18f targets was calculated for each probe in the probe set HAV1Cb. As shown in Fig. (6), ten well defined peaks were observed over the range of the HAV1Cb probe set and the probe number that corresponds to each peak was identified. It is important to note that due to the initial size of the graphical analysis output, it was necessary to compress the scale of the x-axis (HAV1Cb probe number) in order to fit all data points within a smaller graph. As a result, analysis of the hybridization (signal) values revealed two features not readily discernable on the graph; i) a probable single peak at probe 109 rather than what appears as two adjacent (overlapping) peaks, and ii) a possible second overlapping peak adjacent to probe 441. Since the HAV1Cb probe set (group 1) is a consensus sequence developed from the alignment of seven strains assigned to this group (Fig. 1), there are nucleotide differences between each group member and the consensus sequence. Plotting the fold-change in intensity between clone 1 and 18f would potentially identify nucleotide sequences in a probe that are identical to clone 1 but not identical to 18f. Indeed, upon comparative analysis of clone1 and 18f amplified target sequences with HAV1Cb probes set sequence synonymous with the target sequences, one would predict a total of 11 peaks to occur by this method of analysis. We then sought to determine whether the “peak” probes contained nucleotide differences that could be mapped to nucleotide differences [e.g. single-nucleotide polymorphisms (SNPs), deletions, or insertions] that exist between clone 1 and 18f (and the probe set). As shown in Table 5, we were able to conservatively detect 10 out of 11 predicted nucleotide changes in the 18f genome identifiable by this method of analysis. It is important to note that these nucleotide changes represent mutations arising in the 18f virus during its emergence as a cytopathic strain from the HM175 noncytopathic strain which were identified by direct sequencing. These results demonstrate a strong correlation between results obtained by direct sequencing and array hybridization and strongly suggest that tiling arrays can be used to detect nucleotide changes instead of sequencing amplified PCR products over a much longer span of the genome in a single experiment.
Identification of CV Serotype by Microarray Hybridization
Unlike HAV strains, there is tremendous genetic diversity between CV strains, even within the same species as observed, for example, among serotype B strains although they are all members of HEV species [23, 26]. We sought, therefore, to determine whether this array hybridization technique could be used to identify a CV serotype strain target. A typical hybridization profile with a 746 bp segment amplified from CV strains is shown for CVB1 in Fig. (7, panel A) where the data is presented as average probe intensity for all probes derived from the same group sequence, i.e. probe set. Similar to the results obtained following hybridization with HAV targets, CVB1 targets hybridized very efficiently and with greatest intensity to probes (coxB1Ca) derived from a consensus sequence based on its own sequence, i.e. serotype B1 strains (Fig. 2, group 2). As indicated by the significantly lower probe intensities, minimal hybridization was observed among the remaining 7 CV probe sets indicating a lower efficiency of hybridization to non-CVB1 sequences represented on the array. In fact, hybridization to probes representing all other (non-CV) viruses was essentially at background signal intensity. The results are consistent with the extensive sequence heterogeneity that exists between the CV serotype A and B virus strains, the members within a serotype (A or B), as well as the probe sets derived from these strains. Importantly, these results demonstrate that even with highly (genetically) diverse viruses, such as coxsackieviruses, this array design can discriminate between strains of the same (or different) virus species. We next sought to determine whether discrimination between virus strains or species was possible when the viral target contains sequences not represented by either an individual or a consensus probe set on the array. To complete this experiment, a 746 bp targets derived from coxsackievirus serotype A3 and A5 strains were hybridized to the array. CVA3 and CVA5 serotype strains are both members of HEA species, however the probes’ sequence (group 7, coxA16a) for the species was derived from CVA16 (Fig. 2). Analysis of normalized probe intensities reveal a striking reduction in the overall level of probe hybridization (normalized) intensities for CVA3 and CVA5 derived targets compared to those values obtained following hybridization with a CVB1 target (data not shown). As shown in Fig. (7, panels B and C), this is also observed following conversion to average probe intensity. The peak average probe intensity for these hybridizations is approximately 2750 units and 900 units with CVA3 and CVA5 targets, respectively. The results indicate that in the absence of matching probe sets on the array the sequence heterogeneity between these CV targets and the existing probe sets precludes the establishment of any strong or efficient hybridization to a single probe set. It is important to note, however, that neither of these targets hybridizes with any significance to non-CV probe sets suggesting that the genetic diversity between CV targets and probe sets does not prevent or obscure virus target group (i.e. CV) identification. In addition, these hybridization profiles are not only distinct from B1 (Fig. 7) but also from one another suggesting the possibility that unique hybridization profile patterns (calculated as normalized and/or averaged probe intensity) could be used for CV serotype target identification. The results from Fig. (7) also suggest that in a single experiment it is possible to identify whether a virus belongs to group A or group B. Indeed, identification of coxsackieviruses at the level of serotype strain may be possible without single nucleotide polymorphism (SNP) analysis and limited only by the number of probe sequences/sets present on the array.
Currently, RT-PCR is the most widely used molecular method for the detection and identification of viruses in biological and environmental sources [27, 28]. Identification of genotypes of virus strains are based on the amplification of specific regions of the viral genome using gene specific primers followed by sequencing of the amplicon by standard procedures. In some instances a preliminary identification is possible using the techniques of single strand conformational polymorphisms (SSCP) or restriction fragment length polymorphisms (RFLP). Multiplex PCR allows the detection of more than one species of virus in a single analyte. However, these techniques have limitations on sensitivity and versatility, and require extensive prior knowledge of the sequences to be amplified. The requirement of size differences in the amplicons to be analyzed by gel electrophoresis following amplification by multiplex PCR also limits its utility.
Different strains of HAV and many enteric viruses show variable sequence diversity [23, 24, 26]. This allows easy identification of a virus at the genotype level by sequencing discrete segments of the viral genome amplified by RT-PCR. Ideally, sequencing should be done on amplicons that are known to have multiple nucleotide differences between strains. However, designing PCR primers that will capture a significant number of members of that group requires significant sequence homology, and therefore, a relatively variable region flanked by conserved regions is needed for sequence based identification. While for some virus groups such as HAV it is relatively easy to find PCR primers that can capture many members, it is much more difficult with CV genomes due to extreme sequence diversity. The length of the amplified region is another constraint for sequence based identification. Sequencing an amplicon larger that 500 bp generally will require designing multiple primers for sequence walking. Although automated sequencing techniques currently available can be used for rapid sequencing of a moderate sized amplicon, the process is still too time consuming to be used on a routine basis where a quick identification is needed.
We investigated whether hybridization of fluorescently labeled amplified DNA (target) to a microarray containing many oligonucleotide probes representing many different viral genomes can identify a virus without sequencing. Unlike sequencing, these arrays can interrogate thousands of bases of a viral genome in a single experiment. We determined the feasibility of this approach by using labeled targets amplified from either the DNA (i.e. as recombinant plasmid) or RNA from several strains of HAV and CV. As shown in Figs. (4-6), a single hybridization experiment using a multi-well array with different samples loaded in different wells of a 12-well sample pod can identify HAV and CVB by the unique profile generated with no ambiguity or cross-hybridization to oligonucleotides representing an unrelated virus. Within the broad genus of hepatovirus of which HAV is the only species member, different genotypes which differ from each other by 5% to 8% of base positions (Fig. 1) can be identified (Figs. 4-6). Within the same subgenotye Ib, strains such as wild type HM175 and the cell culture adapted variants including the cytopathic 18f strain differ by only 0.5% of base positions. We have shown that differentiation of these strains is possible by analyzing the ratio of the signal probe intensities generated by the isolates when hybridized to the probe sets present on this tiling array (Fig. 6 and Table 5). A sequence based identification of the same 3.7 kb amplicon would require several sequencing reactions with multiple primers in order to identify nucleotide differences. In addition, mutations accumulating in the HM175 genome during its evolution into the cytopathic 18f strain can be identified by ratio analysis (Fig. 6 and Table 5). Thus, the present array design is suitable for identification of species (e.g. CV and HAV) and HAV subgenotypes since in the latter case the nucleotide differences are very few.
We have also demonstrated that it is possible to distinguish between CVB and CVA strains by virtue of their hybridization profiles. In addition, individual members of A and B groups show distinct and characteristic hybridization patterns. Thus, members of CVA strains such as A3 and A5 can be easily identified not only as belonging to group A CV, but also a genotype A3 or A5. More virus strains need to be examined to determine if the method is applicable to other members of this group.
The more closely the target sequence matches the probe set, the stronger the hybridization signal. Diversity between and among probe sets representing virus strains within a group such as CV increases the power of discrimination (due to heterogeneity) particularly when the target is highly similar to one of the probe sets. This enables discrimination even at least at the strain level (e.g. among strains within the same serotype group such as CV group B serotypes). This level of discrimination is lost when a target whose sequence is not represented by a probe set is not present on the array. Again, this has been shown to be problematic with highly (genetically) diverse viruses such as CV. However, despite the loss of serotype discrimination, the diverse nature of such viruses does still enable the differentiation between virus groups as shown between CVA and HAV, NV, and rotavirus.
Our results show that an oligonucleotide array incorporating thousands of probes representing genomes of multiple foodborne RNA viruses including multiple hepatitis A virus genotype strains and multiple coxsackievirus serotype (A and B) strains can be used to differentiate between virus members of either genus to identify the genotype/serotype of these viruses by array hybridization assay. Because the large number of probes can bind and detect labeled targets over a much larger area of the viral genomes, producing distinctive signal patterns for each genotype/serotype, the need for large scale sequencing is eliminated for this level of discrimination.