Middle East respiratory syndrome coronavirus (MERS-CoV), first detected in the Kingdom of Saudi Arabia (KSA) in 2012, causes severe acute respiratory tract infection in humans, with a high case fatality rate (CFR) (1–4). Dromedary camels are believed to be important reservoir hosts or vectors for human infection; bats may also be implicated (5–8). As of 17 July 2015, 1,368 laboratory-confirmed cases of human infection with MERS-CoV had been reported to the World Health Organization (WHO), including at least 490 deaths, corresponding to a CFR as high as 35.45% (9). Recent MERS clusters in South Korea are thought to be the largest outbreak outside the Middle East countries (10). As of 25 July 2015, 186 laboratory-confirmed cases of MERS-CoV infection have been confirmed (including 36 deaths) in South Korea (9). A South Korean man who was a relative of some of the laboratory-confirmed cases traveled to Guangdong Province (10) and was diagnosed as the first imported MERS-CoV case in China by molecular detection of MERS-CoV (11, 12). The rapid spread of disease in South Korea raised concerns that the imported virus had evolved to become more transmissible. Here, we report a comprehensive phylogenetic analysis of the complete MERS-CoV genome sequence of the first Chinese imported case of MERS (ChinaGD01), and the results indicate its probable origin and show evidence of genetic recombination.
Patient and sample history.
The current outbreak in South Korea and China was initiated when a 68-year-old Korean man flew back to Seoul on 4 May 2015 after a visit to four Middle East countries (Bahrain, United Arab Emirates, Saudi Arabia, and Qatar). On 26 May 2015, a 44-year-old South Korean man presented with fever to a hospital in Guangdong. He was in close contact with the index patient in South Korea on 16 May 2015 (Fig. 1), as well as a suspected second-generation patient. The timeline of the travel history, potential virus exposure, onset of disease, and diagnosis of the first imported MERS-CoV case in China are presented in Fig. 1.
Characterization of genome.
With informed consent and the approval of the ethical committee of the National Institute of Viral Disease Control and Prevention, China Center for Disease Control and Prevention (CDC), nasopharyngeal swabs were collected and used for RNA extraction, followed by reverse transcription PCR and genome sequencing. Through both Sanger and Ion Torrent sequencing, the full-length virus genome (30,144 bp) of ChinaGD01 was obtained and deposited in GenBank (accession no. KT006149). Over 2,000,000 paired-end reads were quality trimmed and processed to remove human genome sequences. Nonhuman reads were assembled into contigs by CLC Genomic Workbench and aligned against representative sequences of MERS-CoV. No nucleotide insertions or deletions were observed in the genome.
The genome sequence of this virus, referred to as ChinaGD01, had high levels of nucleotide identity (99.33% to 99.79%) to previously published MERS-CoV genomes (Fig. 2), with 99.31% to 99.78% sequence identity in the open reading frame 1a and -b (ORF1ab) gene segment and 98.91% to 99.60% identity in the S gene. The E, M, and N genes had 98.93% to 100% identity with previously described MERS-CoV strains. In total, ChinaGD01 possessed 11 nonsynonymous nucleotide substitutions (Table 1), which occurred in the ORF1ab (n = 8), ORF3 (n = 1), ORF4b (n = 1), and M (n = 1) genes, respectively (Table 1). Although there were five nucleotide substitutions in the S gene, no amino acid change was discovered. Of note, in comparison with previously published MERS-CoV genomes, the ChinaGD01 genome shows 11 unique amino acid substitutions, and 8 of them were shared with the newly released South Korean strains and the latest strains prevalent in Saudi Arabia (Table 1).
To further investigate the genetic relationship between ChinaGD01 and other MERS-CoV strains whose genomes are available, we performed phylogenetic analyses using the complete genome, the ORF1ab gene, and the S gene. From the whole-genome phylogeny, all available MERS-CoV strains can be clustered into two clades, the earlier clade A and the more recent clade B (Fig. 2A). ChinaGD01 fell into group 3 of clade B (Fig. 2A). Within group 3, ChinaGD01 and the South Korean and Saudi Arabian strains from 2015 were closely clustered and formed a long branch, separate from others of group 3. The nearest strain to this branch was Hafr-Al-Batin-1-2013 (GenBank accession no. KF600628), isolated in August 2013. Phylogenetic analysis of the ORF1ab gene indicated a similar topology in which ChinaGD01 and the recent MERS-CoV strains identified in South Korea were closely adjacent to Hafr-Al-Batin-1-2013 in group 3 (Fig. 2B). However, the phylogeny of the S gene differed in that the new viruses fell into group 5 and were closely related to viruses from both humans and dromedaries (Fig. 2C). These findings are consistent with recombination, a phenomenon not uncommon in coronaviruses.
Genetic recombination analysis.
To examine whether genetic recombination has occurred in ChinaGD01, we performed bootscanning analyses. We compared ChinaGD01 with representative viruses from group 3 (Hafr-Al-Batin-1-2013; GenBank accession no. KF600628), group 5 (KSA-CAMEL-378; GenBank accession no. KJ713296), and group 1 (Abu Dhabi_UAE_9_2013; GenBank accession no. KP209312) as controls. As shown in Fig. 3A, ChinaGD01 was more similar to the group 3 strain from position 1 to 15,000 and more similar to the group 5 strain from approximately position 18,000 to 24,000. We then compared the single-nucleotide polymorphisms (SNPs) of ChinaGD01 with consensus sequences of group 3 and group 5 (Fig. 3B; see also Fig. S1 and S2 in the supplemental material). There were 78 SNPs discovered along the ChinaGD01 genome (Fig. 3B). Whereas before position 17,206, ChinaGD01’s SNP pattern is nearly identical to that of the group 3 viruses, its SNP pattern is more similar to that of group 5 viruses between positions 17,311 and 23,804. The consistency in the results of bootscanning and SNP analyses supports the hypothesis that the gene segment from approximately position 17,300 to 24,000, representing portions of the ORF1ab and S genes, reflects a recombination event (Fig. 3B).
Phylogenetic analysis was further performed using BEAST with the complete genome, the nonrecombinant region (positions 1 to 17,300), and the potential recombinant region (positions 17,301 to 24,000), respectively (Fig. 4). The phylogenies revealed by the BEAST trees were consistent with those from the maximum-likelihood trees. In the trees constructed using the complete genome and the nonrecombinant region, ChinaGD01 fell within group 3; however, trees constructed using the recombinant region clustered with the group 5 sequences.
To date the recombination event, we estimated the time to most recent common ancestor for the novel MERS-CoV from 2015. Although there was a slight difference among results from different models, the time to most recent common ancestor of the 2015 cluster was estimated to be between 0.5 and 0.7 years before the identification of the imported case in the latter months of 2014 (Table 2). Given the observation of similar recombination events in the newly released South Korean strains and the latest strains prevalent in Saudi Arabia, the travel histories of patients, and potential opportunities for virus exposure, we surmise that the recombination likely occurred in the Arabian Peninsula.
Over the past 3 years, MERS-CoV infections have continued to increase, posing a serious threat to global public health. Previous studies have revealed that MERS-CoV infections are likely due to repeated introductions of MERS-CoV from dromedary camels to humans (13–15), resulting in only limited human-to-human transmission (16). However, the large number of second- and third-generation cases in South Korea raised concerns that MERS-CoV may have evolved to become more adapted to human-to-human transmission.
Our results indicate that at the whole-genome level, ChinaGD01 is >99% similar to the previously identified MERS-CoV strains. Phylogenetic analysis based on the whole-genome sequence revealed that it belongs to group 3 of clade B MERS-CoV strains and forms a separate small branch with viruses from South Korea and Saudi Arabia from 2015. Different phylogenies were observed in the trees constructed using the full-length genome and the S gene, indicating the possibility of a recombination event. Further evidence of a recombination event was obtained through bootscanning and SNP analyses. BEAST analysis revealed that it might have occurred recently, in the second half of 2014, in the Middle East.
Genetic recombination has been well established in severe acute respiratory syndrome coronavirus (SARS-CoV) (17, 18); however, there is only one report of genetic recombination in MERS-CoV (19). Dudas and Rambaut point to frequent recombination in MERS-CoV and partition the genome into two parts in which nucleotides 1 to 23,722 and nucleotides 23,723 to 30,126 have independent molecular clock rates. Based on the latest genome sequences from South Korea and the Kingdom of Saudi Arabia, our research indicated that a novel type of genetic recombination has occurred in the MERS-CoV strains prevalent in South Korea. We note that six MERS-CoV isolates from 2015 (ChinaGD01, the first MERS-CoV strain from South Korea, and the four latest strains from Saudi Arabia) had high levels of nucleotide identity (99.90% to 99.96%) and showed the same recombination signal in our analyses. We speculate that they arose from a common recombination event. However, more studies are needed to understand the relationship between genetic recombination of MERS-CoV, the biological properties it conveys, and its relevance to the recent high rate of transmission.
Full-length genomic sequencing.
Nasopharyngeal swabs from the South Korean patient diagnosed with MERS-CoV infection were collected and used for viral RNA extraction with the QIAamp viral RNA minikit. Forty-four sets of specific primer pairs were designed and used to amplify the complete genome, followed by Sanger sequencing; meanwhile, the extracted viral RNA was also used for next-generation sequencing with the Ion Torrent PGM after random amplification.
We downloaded all (n = 92) available full-length genome sequences of MERS-CoV from GenBank and used RAxML (20) for phylogenetic analyses of the complete genome, the ORF1ab gene, and the S gene, respectively. One thousand bootstrap replicates were run. Furthermore, the Bayesian Markov chain Monte Carlo method, implemented in BEAST (21), was used to estimate the time to the most recent common ancestor. Twelve different model combinations were applied. For all the analyses, we used the general time-reversible nucleotide substitution model with gamma-distributed rate heterogeneity. Bayesian Markov chain Monte Carlo analysis was run for 50 million steps. Trees and parameters were sampled every 5,000 steps, with the first 10% removed as burn-in.
Genetic recombinant analysis.
Similarity plots and bootscanning analysis were generated by SimPlot (22); a sliding window of 200 nucleotides was used, moving in 20-nucleotide steps. Single-nucleotide-difference analysis was used to confirm the recombination event.
Nucleotide sequence accession number.
The full-length virus genome (30,144 bp) of ChinaGD01 was deposited in GenBank under accession no. KT006149.