Introduction
Multiple sclerosis (MS) is a chronic demyelinating disease of unknown cause, which affects the brain and spinal cord of about 400,000 individuals in the U.S. A number of viral infections of the CNS can lead to demyelination, including distemper (dogs), measles (subacute sclerosing panencephalitis, SSPE, humans), and influenza (humans).. Viruses have long been suspected as causative agents of MS, based on the epidemiology of the disease including geographic patterns, isolated outbreaks, and migration studies [2–5]. Novel viruses that cause human disease continue to be discovered including hepatitis C (1989), corona virus NL63 (2004), bocavirus (2005), and rhinovirus C group (2007) [6–9]. Novel human polyoma and arena viruses were recently identified as causes of serious human diseases by deep sequencing [10–12]. The National Multiple Sclerosis Society itself provides an excellent rationale for the search for viruses in MS and related diseases.
Past serology studies provided some evidence for the involvement of retroviruses in the pathogenesis of MS. Other groups have identified human endogenous retroviruses (HERVs), including HERV-Fc1 and HERV-W, as possible MS pathogens [16–24]. Polymorphisms of human genes involved in the control of retroviral replication, including TRIM5, TRIM22, and BST2, are associated with higher risk of developing MS and recent epidemiologic studies suggest that MS and HIV are mutually restrictive; that is, HIV patients develop MS less frequently than expected, adjusted for age, sex, and other factors [26–28].
Our group has used deep sequencing (also called next generation sequencing) to detect microbial sequences in donor brain tissues. This powerful new technique allowed identification of GBV-C in the brain of one MS subject and HSV or measles virus in the brains of several persons with encephalitis. Deep sequencing is applied here to cryopreserved progressive MS and NMO brain specimens in comparison to normal brain and encephalitis brain specimen controls.
Methods
Primary progressive MS cases were requested from the Human Brain and Spinal Fluid Resource Center (Los Angeles Veterans Administration, California) and the Rocky Mountain MS Center (Englewood, Colorado) brain repositories. Cases were selected by the directors of these two institutions. Specimens were studied from 11 persons with primary progressive MS (PPMS), one with secondary progressive MS (SPMS), and 2 persons with severe progressive neuromyelitis optica (NMO). Progressive MS and NMO cases were specifically selected because these tend to be the most severe subtypes of the MS-related diseases. Fourteen normal control and 5 frozen encephalitis brain specimens these resources were also studied. Two additional de-identified frozen encephalitis specimens were obtained from Dr. Don Gilden at the University of Colorado, Denver. The specimens were collected post-mortem within one day of death, either fresh frozen or snap frozen in liquid nitrogen, and were associated with a neuropathologic diagnosis. The research plan was submitted to the University of Utah Health Sciences IRB for review. This research was reviewed and approved by the University of Utah Health Sciences IRB, #00028658. Since this research involved only de-identified post-mortem material, it was found to be exempt from consent, review, and oversight.
The samples were assigned to one of three groups:
The frozen brain specimens were handled as previously described RNA was extracted from frozen brain with the RNeasy Lipid Tissue Mini Kit (Qiagen, Valencia, CA, Cat No./ID: 74804). The extracted RNA was DNase treated by incubating 15 minutes on the column. The resulting RNA was submitted for sequencing at the University of Utah High Throughput Genomics Shared Resource. Prior to sequencing, RNA was analyzed on an Agilent Bioanalyzer Nano chip (Agilent Technologies, USA) and evaluated for RNA abundance and integrity as previously described Samples were reverse transcribed and libraries were prepared with the Illumina TruSeq kit (Illumina, San Diego, CA). To ensure the inclusion of possible viral RNA genomes, oligo dT selection was not performed. To avoid possible enrichment intrinsic biases, rRNA depletion was not performed.
The samples were sequenced and the sequencing reads were processed and aligned as previously described Briefly, the reads were Illumina HiSeq 2500, 50 bp single-end. FASTQ format sequences were quality-filtered and high quality (HQ) sequences were retained. Reads that aligned to either the human genome (NCBI build GRCh37.p13) or human transcriptome, were removed from the HQ reads, yielding “screened reads” Screened reads were aligned to the non-redundant viral database (NRVDB) using MegaBLAST (v2.2.26) with a word size of 28 bp Viral hit counts (reads aligned to sequences in NRVDB) were normalized (divided) by the number of (HQ) high quality reads in a specimen.
The retroviral gene catalog (RVGC) was prepared by identifying regions of the human genome (build GRCh37.p13) with detectable similarity to core retroviral domains in the Gypsy 2.0 database (GyDB) using BLASTX Alternative retroviral databases were considered for this analysis, and Gypsy was chosen due to the presence of annotations that allowed categorization of the sequences into retroviral domains. The BLASTx subprogram of version 2.2.26 of NCBI blast all was used with default word size and scoring and an expect cutoff of 0.1. Human genomic alignments with length 50% or longer than the subject retroviral domain were placed in RVGC. RVCG entries were named with arbitrary unique codes of the form:
Statistical analysis of the demographic characteristics of the study population was performed using Vassar Stats, an open-source web application, and Microsoft Excel for Mac 2011. Differences among the groups were screened for significance by one way un-weighted ANOVA testing for continuous variables (age, year of collection, and post-mortem interval), and the two-tailed Chi-squared test with Yates correction for discrete variables (sex). For each taxon, viral hit rates (VHR) were compared between each demyelination and OND specimen and the set of controls using the Z-test with Bonferroni correction for multiple comparisons. Taxa where none of the demyelination or OND specimens VHRs were significantly different from controls were excluded from further analysis. Specific differences in VHRs between the groups were tested using the Mann-Whitney U-test or Tukeys HSD test. Correction for multiple comparisons was accomplished using the Benjamini–Hochberg procedure with a false discovery rate (FDR or q) of 0.05. The distribution of the sample types in the Retroviral Domain Discrimination Analysis was tested using the two-tailed Fishers Exact test for 2 × 2 tables.
Subject Demographics
Characteristics of the study samples are shown in table 1. Fourteen demyelination samples were compared to 14 normal controls and 7 OND specimens. All patients in the demyelination group had severe and progressive clinical disease. Eleven were categorized as PPMS, two as NMO, and one as secondary progressive MS. The postmortem interval (PMI), time between death and collection of the brain sample, range was 2–26 hours. There were no significant differences in PMI between the groups (p=0.66). Age and sex information was available for all 14 controls, 13 of 14 demyelination cases, and 5 of 7 OND cases. The proportion of known females in the demyelination group 9/13 (69%) was not significantly different than the control group 6/14 (43%) (p=0.32). Ages of the subjects ranged from 37–93 years. The demyelination group was significantly younger (mean age 58 ± 13 years) than the normal controls (mean 71 ± 12 years, p=0.017). The specimens were collected between 1981 and 2010. The year of collection was not significantly different between the groups (p=0.29).
Sequencing and Analysis
Deep sequencing yielded between 49.4 and 130.1 million 50 bp high quality (HQ) sequences per sample. The mean yield of HQ sequences was not different between the normal control (μ = 96.5 ± 7.1M) and demyelination (μ = 87.2 ± 16.6M) groups. The OND group yielded significantly fewer HQ sequences per sample (μ = 68.3 ± 17.0M) than both the normal control and the demyelination groups (p<0.01 for both comparisons).
A heat map showing log-transformed HRs was generated from the sequencing analysis (Figure 1). Only viral taxa where one or more specimens are significantly overrepresented are displayed in the figure. Fifty viral taxa were over represented in at least one of the demyelination or OND specimens compared to the set of normal controls. In this manner, false positive alignments to human and cloning sequence were removed. GB Virus C, previously shown to be present only in specimen 3840, was the only non-retrovirus definitively present in any of the demyelination samples. Bona fide and spurious viral alignments within the OND samples have been previously described.
Among the 50 viral taxa overrepresented in at least one OND or demyelination samples, 17 were herpes viruses and 9 were retroviruses. Statistical comparisons (corrected for multiple comparisons) of VHRs between all members of the groups were performed. This analysis revealed that only 3 retroviral taxa that were significantly over expressed in the demyelination group compared to the control group: human immunodeficiency virus 1 (HIV-1), human endogenous retroviruses (family), and human endogenous retrovirus K. There were no AIDS or HIV patients in the cohort; thus the high VHR to HIV-1 was inferred to be caused by the 50 bp reads aligning to sequences similar to HIV-1. None of the 9 herpes virus candidate taxa were significantly different between the groups.
Retroviral Analysis
The apparent retroviral sequence enrichment in the demyelination group led to a more inclusive retroviral sequence analysis. A new database called the retroviral gene catalog (RVGC) was prepared (see Methods). RVGC contains endogenous sequences similar to protein-coding retroviral genes: GAG, RT, ENV, etc. Specific information about all the sequences records in the RVGC, including GenBank identifiers, length, and location is available (S1 RVGC Table).
Reads aligning to the RVGC database were binned according to retroviral domains type (e.g. GAG, RT, ENV). GAG refers to the viral core, RT to the reverse transcriptase, and ENV the envelope. The functions of the SCAN domain are not well understood and KRAB is probably a transcriptional repressor. CHR refers to the chromo domain found at the C-terminal end of many retro transposon integrases.
Mean domain-type hit rates (HRs) for the demyelination, NMO and control samples were log2 transformed and centered in the domain-type axis (Figure 2). These values were hierarchically clustered (Pearson correlation) and compared between the demyelination and normal control specimens. The resulting sample clustering reveals the relative separation of demyelination and control specimens into the dominant nodes of the cluster (e.g. 9/10 demyelination samples cluster left, 12/16 control samples cluster right; p=0.004). This data set shows evidence of broad retroviral gene over expression in the brains of the demyelination subjects compared with normal controls. However, the magnitude of the retroviral over expression was small, less than 2-fold, and was not evident for any single RVGC sequence. This retroviral gene expression pattern was more pronounced among the OND (encephalitis) samples than in the demyelination group. Expression of the neural tissue control genes RPL13, RPL19, and UBC was not significantly different between the 3 groups.
Additional analysis showed that some specific retroviral genes are significantly over expressed in the demyelination (N=14) and PPMS only (N=11) groups compared with the normal controls (n=14). Two mapping procedures were employed: “Best Alignment” where each read was mapped to the RVGC only once to its best match, and “Comprehensive Alignment” where every reported Bowtie 2 alignment was counted. The results of this analysis are displayed in Table 2. Only one envelope gene and one RT gene were significantly over expressed by the best alignment procedure. Several integrase, GAG, and envelope genes, along with 3 KRAB genes, were over expressed by the comprehensive alignment procedure; limiting the analysis to the 11 PPMS specimens showed those 2 ENVs and 3 GAGs were over expressed compared to controls. The HERV annotations for these over expressed genes are shown in Table 2.
Discussion and Conclusion
This study employed deep sequencing and metagenomic analysis techniques to comprehensively investigate retroviral expression among frozen demyelination brain samples compared with normal controls and OND (encephalitis) controls. The results show some over expression of HERVs in general among most domains (Figure 2). Over expressed HERV and KRAB sequences were specifically identified corresponding to several retroviral domains, including core, envelope, integrase, and reverse transcriptase (Table 2). These results support the hypothesis that retroviral sequences are over expressed in demyelinated brain samples compared with normal brain. However, the magnitude of the observed retroviral domain over expression was small, less than 3-fold, and the pathological significance of this observation is unknown.
The data from this study are consistent with the concept that HERV over expression is part of the human immune response. Other groups have identified the MSRV (HERV-W) or HERV-Fc1 as possibly contributing to the pathogenesis of MS. Interestingly, the present sequencing study did not specifically confirm these findings (Figure 1). Instead, some other HERV genes from a variety of sources were shown to be significantly over expressed (Table 2). This highlights the difficulties inherent with these studies where multiple similar HERVs have been incorporated into the human genome. The gene expression mapping performed here relies on annotated sequences from the human genome build 37. The data generated in the present sequencing study is comprehensive across the entire human genome, but it is likely to be less specific and less quantitative than qPCR.
Human endogenous retroviruses (HERVs) are remnants of ancient retroviral infections of the host germ line that are transmitted vertically from parents to their offspring. The utility of these elements within the human genome remains largely speculative, although at least one HERV codes for syncitin, an important protein that allows for development of the placenta.
Interestingly, some animal ERVs efficiently interfere with the replication of related exogenous retroviruses. Sheep have been used to study the evolution of ERVs within a mammalian host due to the presence of related exogenous and endogenous retroviruses. The exogenous (i.e., horizontally transmitted) oncogenic retrovirus, Jaagsiekte sheep retrovirus (JSRV), causes fatal lung cancers. A closely related provirus, JSRV-20, entered the host genome within the last 3 million years during speciation within the genus Ovis. Endogenous JSRV has a defective Gag polyprotein resulting in a transdominant phenotype that blocks the replication of the closely related exogenous JSRV [46–48]. That is, the expression of an endogenous retrovirus effectively blocks a fatal infection with an exogenous retrovirus. This animal data strongly suggests that endogenization and selection of ERVs is a mechanism used by the host to fight retroviral infections. Support for this concept in humans is displayed by the increased expression of (endogenous) HERV-K in patients infected with the (exogenous) retrovirus HIV, where HERV-K envelope is neuroprotective.
Most of OND specimens were from patients with either herpes (3) or measles virus (2) encephalitis. The HSV infected specimens (4403, 710, 924) displayed the highest levels of HERV domain expression, as shown in Figure 2. This is consistent with the results of other groups that have shown HSV stimulates reverse transcriptase in PBMCs from MS patients, and HERV-W expression is induced by HSV1 in cell cultures.
One limitation of this deep sequencing method is that the RNA extractions are not completely DNA free, despite a DNAse treatment step. Qubit analysis of extracted brain specimens revealed that 1–5% of the analyzed material is DNA retained from the original sample. Prior to sequencing, the RNA were also analyzed on an Agilent Bioanalyzer Nanochip (Agilent Technologies, USA) and evaluated for RNA size, abundance and integrity. This provided relatively high quality RNA for the subsequent analysis, but it cannot be determined with absolute certainty that retained DNA did not affect the results of the study. Another limitation of the study is the post-mortem interval necessarily associated with these samples obtained from deceased human donors. While there was no detectable difference in the PMI between the demyelination and control specimens, this interval likely allowed some RNA to degrade in all the samples. This likely did interfere with the detection of HERV and other retroviral sequences during the sequencing reactions.