By definition, metagenomics refers to the direct study of microbes’ genetic material in their natural habitat. It is an approach that allows for the identification of both cultivable and uncultivable microbes in a mixed community, based on a genomic technique. The application of metagenomics was first reported in the late 19th century, when Norman Pace’s laboratory conceived the notion of gross extraction of deoxyribonucleic acid (DNA) from a sample with a mixture of nucleic acid. Since then, significant progress has been made in metagenomics in different types of environmental compartments. The presence of nucleic acid has been identified from diverse environment such as soil, ocean, sediment, groundwater, as well as in clinical samples. Currently, metagenomics studies are being explored in marine environments and in disease diagnosis to name but a few. In addition, other metagenomics studies have been conducted in terms of human and animal genetics, veterinary medicine the textile industry, food and pharmaceutical products, biosensors, and agriculture biotechnology. Metagenomic approaches have become an emerging and alternative tool for the study of viral taxonomy and varieties in the functional compositions within the aquatic environments, via next generation sequencing (NGS) technology. The merits and opportunities obtained from metagenomics include the study and discovery of microbial genomes that could not be determined previously, due to certain cultivation difficulties.
NGS is a genomic sequencing technique that enables massive parallel sequencing of the small fragments of the entire genetic material obtained from a microbial community, which generates massive data output in only one run, through the use of a high-throughput instrumentation NGS sequencing technologies are spread out under different sequencing platforms, though they follow the same experimental work flow. The general experimental workflow for metagenomics study applying NGS is presented in Figure 1.
Metagenome analysis by NGS involves several distinct steps, with the most important step being the extraction of high quality total DNA from a sample. This is followed by fragmentation and appropriate adapter ligation on the desired platform for the library preparation and sequencing. The analysis of the pieces of fragments and voluminous data generated from the different high-throughput platforms, is done by sorting and assembling them into contigs through bioinformatics tools, which is usually the most challenging and tedious task when undertaking metagenomics projects. The filtering of the raw sequences is the first step before downstream analysis, and this is achieved through the elimination of low-quality reads and adapters which were attached to the primer sequences. For instance tools like Btrim, Cutadapt, AdapterRemoval, FASTX toolkit and Krakeen are very efficient tools for filtering of low-quality read sequences, removal of adapters and barcodes and for a detailed quality control on raw reads. The genomes are assembled together to form a contigs using various assembly tools. Over the years, quite a lot of assembly tools or algorithms have been developed that depend solely on specific parameters for the assembling of the raw reads. The assembling of the raw reads are either through a reference-guide genome assembly or through a de novo genome assembly. Assembling tools such as SSAKE, Edena, Velvet, VCAKE, SOAPdenovo, De Bruijn graph-based assemblers and the latest addition to the group EULER has been used to assemble reads each with its own strength and weakness. After the assembling, the sequences are mapped or aligned against a reference database that contains genomes that are specific to taxonomic classification. In this regard, tools and software packages such as Newbler, MIRA, AMOS, Botiwe, BLAT, Bfast, BWA, NovoAlign and MetaAMOS are commonly used in metagenomics for performing referenced-based assemblies. The taxonomic designations and phylogenetic tree analysis of the organisms are done using sequences already deposited on the public sequence database that are specifically designed for the nucleotide and protein translations, with examples such as the European molecular biology laboratory (EMBL), GenBank, Basic Local Alignment Search Tool (BLAST), Reference Sequence (RefSeq) and the SWISS-PROT. Numerous tool programme and software packages such as ARB, Naïve Bayes Classification (NBC), k-SLAM, CLARK MEGAN, SILVAngs, MetaPhlAn, Kraken, CARMA, interpolated Markov models, to name just a few, have been used. Bioinformatics tools are playing significant roles in all fields, in medicines for the treatment and cure of some notable diseases, drug discovery and testing, microbial genome, gene discovery and therapy, agriculture, antibiotic resistance, alternative energy source and also in the study of climate changes.
Viruses undergo a vital part in the environment such as recycling of carbon in the marine environment, infecting and destroying bacteria in aquatic microbial communities. The existence and great quantity of viruses on Earth has been pointed out, hence this has increased awareness about their wide diversity. Generally, viruses are known to be intracellular parasites made up of a nucleic acid core. The viruses are enclosed by a protein coat known as capsid that is capable of replication through adsorption, penetration, uncoating, viral genome replication, maturation and release, which is only possible within the living cells of bacteria, animals and plants. Viruses depend on their host’s cells’ metabolism, for energy, enzymes, and precursors, in order to replicate and multiply. A virion is made up of a protein coat and genomic information, encoded in DNA/RNA. Viruses are categorized on the basis of their dimension, mode of replication, chemical configuration and morphology, as well as to establish whether they are single stranded or double stranded, linear or circular. The main function of the virion is to deliver its genome into the host cell for expression and replication of itself. Viruses are host specific and they depend on the host organism to supply the complex metabolic and biosynthetic machinery of eukaryotic or prokaryotic cells. For viruses to propagate successfully in any cell, the virion must be able to identify and bind to its cellular receptor, as well as replicate its own genome.
Studies have shown that the most prominent viral species within the aquatic ecosystem are human enteric viruses (HEV), which have the ability to survive in the intestinal tract of humans and animals. At present over 140 enteric viral serotypes that are acknowledged to infect humans, and the major illness associated with HEV is gastrointestinal illness. HEV have also been implicated in acute illnesses, such as meningitis, conjunctivitis, hepatitis, poliomyelitis, respiratory diseases and severe fever. These groups of viruses are easily transported and transmitted via adsorption phenomena, in the following way: from one contaminated water point to another (especially through the fecal–oral route), from wastewater treatment plants’ effluents, due to agriculture runoff, leaking septic tank systems, and recreational and food products. Although HEV cannot reproduce themselves outside their host’s cells, they still have the potential to stay alive for extended periods of time within the aquatic environment. Moreover, some serotypes have a strong resistance to chlorine disinfection, which is the most common treatment used at many wastewater treatment facilities. The resistance towards chlorine treatment may be due to their high resistant protein coat. However, after treatment, the effluents are released into the aquatic ecosystems, as they are the main sources for drinking water, aquaculture and recreation. The outbreak of HEV disease in both developed and undeveloped nations, has been globally documented by the World Health Organization (WHO). In the United Kingdom for instance, the effects of these outbreaks has led to a huge strain on the healthcare system, economic burden, and also decreased productivity in affected persons. Table 1 shows some known and identified HEV that are a threat to the global aquatic ecosystem.
In South Africa, hepatitis A, adenoviruses, astroviruses, noroviruses, enteroviruses, rotaviruses and bacteriophages, have been detected in surface water, wastewater treatment plants, and in treated drinking water sources in some provinces in South Africa. The identification and quantification of HEV in South Africa was mostly done using conventional and traditional methods in both clinical and environmental samples. Figure 2 shows the different provinces in South Africa where HEV have been studied and identified in different aquatic environments. Over the years, Taylor and his co-workers have extensively investigated the consecutive outbreaks and presence of some HEV outbreaks from some patients through the exposure to surface waters, dams, WWTPs. Techniques such as metagenomics, is still an emerging technique for the identification and diversification of HEV in both environmental and clinical samples in South Africa. There is little knowledge pertaining to the viral content and diversity in wastewater systems in South Africa, which demonstrates the need to survey viral communities using metagenomics. Based on the limitations of the existing molecular methods that target specific viruses, and specific bacterial indicators, new methodologies such as metagenomics are vital for the identification of unique or unlooked-for viruses in the aquatic ecosystems.
2.1. Culture Based Methods
In vitro growth methods such as cell culture are the most pronounced traditional standards used to identify and detect the occurrence of HEV in environmental samples. Cell culture is a technique whereby a microorganism’s cells are grown at a carefully controlled condition outside of the living animal. It is a very time consuming, laborious and expensive approach that usually demands prior knowledge of the targeted species. The limiting factor with this method is that there are some viral species that are not capable of producing any cytopathic effect when propagated on a cell line. HEV detection has also been explored using the integrated cell culture polymerase chain reaction (ICC-PCR), this technique has also been used for the discovery of HEV in ecological samples. The merit of this technique is that it gives room for several modifications of the protocols, enhanced the direct analysis and monitoring of HEV in environmental samples.
Epifluoroescence and transmission microscopy, is another type of conventional technique that has been explored for the abundance, morphological and enumeration studies of viral entities within the aquatic environments. Here, the virus-like particles are counted using fluorescent nucleic acid stains through visualisation. Flow cytometry and vortex flow filtration (VFF) have also been used for the quantification and counting of virus-like particles and prokaryotes in aquatic environments. Figure 3 exhibits the numerous molecular approaches that have been used in the diagnostics and identification of HEV in environmental samples.
2.2. Polymerase Chain Reaction Methods (PCR Assays)
Polymerase chain reaction (PCR) is a sensitive conventional assay technique that is used on targeted amplification of the viral DNA or RNA over a range of magnitude to produce thousands or millions of copies. PCR methods are designed to amplify a single specific nucleic acid sequence a million times under three distinctive steps that include denaturation, annealing and extension. For denaturation to take place, the target DNA is subjected to a high temperature in other for the DNA strands to be separated. Annealing of the primers to the target DNA allows the DNA to polymerase and selectively amplify the target DNA at a lower temperature. PCR assays are very sensitive, highly specific, and particularly attractive for detection of non-cultivable infectious agents thereby making it an attractive method for the detection of target pathogens. A comprehensive array of PCR systems exists for rapid detection and confirmation of the presence of HEV in different environmental samples. These samples include water sediments, wastewater treatment plants (WWTP), treated and untreated sewage, groundwater, and surface water. A wide range of primers have been designed for the precise detection of many HEV and an immediate overview of these is presented in Table 3. The chief limitation of the PCR techniques is that they are incapable of distinguishing between active and inactive targets, and are found to be prone to inhibition due to the interaction with DNA or interference with the DNA polymerase which increases false negative results. In addition, different primer sequences make it inappropriate for use, especially with the discovery of unique viruses. Previous information of the viral sequence is, therefore, a pre-requisite for any PCR reaction. Various modifications of the PCR assay have been used for detection of HEV, and they include the nested, multiplex, real time, and reverse-transcription polymerase chain reaction, all displaying their own merits and demerits.
The presence of norovirus, astrovirus, enterovirus have been established have been established in surface water, ground water and wastewater samples via multiplex and nested PCR. Other modified PCR techniques developed are the reverse-transcriptase polymerase chain reaction (RT-PCR) and real-time or quantitative polymerase chain reaction (qRT-PCR). The RT-PCR are able to amplify and detect HEV viruses that possess only the RNA genomic information. These techniques has been implemented for the identification of different groups of the HEV in various environments. These techniques also offer better rates of detection, and great sensitivity and accuracy. In addition, they are precise, they reduce experiment time and the possible source of contamination is reduced. A summary of the numerous molecular techniques, principles, merits and limitations is presented in Table 4.
2.3. Viral Metagenomics
Viral metagenomics is a modern genomic technique used for studying viral communities in their natural habitat, without the isolation and laboratory cultivation of single species. The sequencing of the genomic DNA information using metagenomics can be achieved either through the PCR amplicon sequencing or via shotgun metagenomics. The PCR amplicon approach, is mainly used for targeted species, the identification and characterization of the specific genomic regions is done through the use of specific primers. The second approach, shotgun metagenomics, is a technique whereby unculturable and difficult microbes are analysed and studied extensively without prior knowledge of the state of these communities. There has not been an individual gene marker that is peculiar to most viral genomes, like the 16S RNA used to denote the bacteria genome, hence, this has limited the understanding and investigation of viruses by amplicon sequencing and ribosomal DNA profiling. Studies on viral metagenomic have revealed that a lot of the generated sequences are not similar or matching to known viruses, hence the need for viral metagenomic analysis in the virology field. Specifically, viral metagenomics has provided the detection of viral species presumed to be a potential threat to human health, means for virus discovery, and the characterization of the viral population. Figure 4A, B provide an overview of the number of research articles on metagenomic studies on human virome in diverse parts of the world. They also indicate how the number of research articles has risen from around 200 articles in 2002, to more than 12,000 articles in 2017. Due to this, more metagenomic datasets of viruses have been established. Africa is still far behind in terms of research articles being produced, with approximately 50 articles available, to date.
The first-generation sequencing is a chain-termination technique, where sequencing is achieved by the selective incorporation of chemical analogues of deoxyribonucleotide triphosphates (dNTPs), the monomers for DNA strand synthesis, with an approximate reads of approximately 1200 bp long. This technique has been used to characterize the presence of the different groups of human adenoviruses (HAdVs) in environmental samples. The main setback of this technology is that it is a low throughput, thereby limiting it as a means for diagnosis, and is labour intensive and slow. In 2004, the revolution and activation of an improved sequencing knowledge began through the introduction of the second-generation sequencing platform. The second-generation platform includes 454 Roche platform, Ion Torrent Personal Genome Machine, AB SOLiD and Illumina Solexa sequencers. The 454 sequencing platform has been used to examine the diversity of human RNA viruses present in Lake Needwood, a freshwater lake in Maryland, USA, with results indicating the presence of four different types of viruses. Likewise 454 platform was able to detect and study the dominant DNA and RNA viral species in reclaimed water, the study showed that both the reclaimed and portable water was dominated by phages, it has also be used as a monitoring tool for identification of viral agents of animal, plant and human diseases in freshwater samples. Ion Torrent platform has also been explored for the sequencing and microbial profiling of multiple viral groups from animal samples and sediments from the Athabasca River. The Illumina Solexa technology system seems to be the most favoured platform over other existing second-generation platforms. The sequencing of microbes is based on the sequence by synthesis (SBS), with upgraded system versions. Illumina systems have been used to sequence viruses from both clinical and environmental samples. Table 5 shows the strength and weakness of the second- and third-generation platforms. The rudimentary workflow for second-generation sequencing is shown in Figure 4.
Recently, the emerging third-generation sequencing technologies that are being introduced in the genomic scientific world are the Pacific Biosciences Single Molecule Real Time (SMRT) sequencing, Nanopore sequencing by Oxford Nanopore, and the Helicos TM Genetic Analysis System. The technology has the potential of generating high read lengths of up to 100,000 bp within hours, and is very expensive to acquire. The most recent third-generation technology is Nanopore Technology, which involves the use of a small device or membrane with a pore size of approximately 1.5–2 nm. The distinguishing feature of all the third-generation sequencing platforms is that the technique does not require an amplification step during the library preparation. In addition, the read lengths are between 25–15,000 bp, with a run time of approximately 30 min, when compared with the second-generation platforms. Pacific Biosciences Single Molecule Real Time technologies has explored some microbial populations. Currently, these technologies are being developed and upgraded, but they have not been exclusively explored to the fullest for the determination and analysis of the HEV, probably due to cost of set-up and lack of technical skills.
3. Metagenomics and Its Application in Africa
In certain countries, viral metagenomic studies have increased gradually. It is emerging as an alternative technique for viral identification, diversity and abundance, in a range of environmental samples which includes the ocean environment, surface freshwater bodies and lakes, ballast water, wastewater plants, reclaimed water, the atmosphere, plants, aquaculture, and in clinical samples such as feces, blood, and in some animals. In the face of the advances in the biological world, where the cost of sequencing is gradually reducing, developing countries such as South Africa are still a long way from benefiting from the technology. Over the years, environmental metagenomic studies in South Africa have focused mainly on studying diversity and abundance of bacteria in different aquatic ecosystems and extreme environments.
In 2015, Tekere and co-workers carried out a metagenomic analysis study in a thermal hot spring in Limpopo. The aim was to define the genetic and phylogenetic diversity of thermophiles in this environment. The community composition, distribution and abundance of the thermophiles living in the different hot spring waters, and biofilms of South Africa, were assessed. In addition, the abundance of halophilic bacteria were also identified from a salt pan in the Limpopo province. In 2018, Abia and co-workers used metagenomics to analyse the functional profiles of some bacterial populations in sediments as well as in surface water samples. It was observed that the abundance and diversity of bacterial is attributed mainly to the occurrence of an unapproved informal settlement with poor infrastructure. The functional profiling revealed that bacteria could be a possible pathway in human diseases. In addition to the natural environments, man-made extreme environments such as industrial wastewater, was also explored for bacteria diversity.
Metagenomics is progressing slightly in Kenya, since it has been observed that arthropods—which are referred to as blood-feeding agents for viruses—could cause an exceptional health concern. The intercontinental virome diversity studies on the culex mosquitoes were done using samples from Kenya and China and analysed using NGS. The study revealed that mosquitoes are vital vectors as well as the fact that viruses are harbored by these arthropods. The study also indicated the presence of some specific vertebrates, invertebrates, plants, and protozoa as well as uncategorized assembly of viruses. Another part of Africa that metagenomics is also gaining momentum in is Namibia. Metagenomics has been employed to better understand virus abundance, ecology and diversity in the soil samples. The enumeration of these viral particles on different types of soils has shown that viral abundance can range from 1.5 × 108 to 6.4 × 108 per gram of soil. NGS has also been used to determine the diverse ecological patterns in the Namib Desert, the cold Miers Valley, and the Antarctica hyper arid deserts, so as to understand the response to, and microbial adaptation to, environmental stressors. Likewise, comparative metagenomic studies have been conducted on the mechanisms that are likely responsible for the stress response in hypoliths in extremely hot hyper-arid desert soils. In Kampala, Uganda, the diversity and richness of some HEV was investigated from wastewater samples and surface water using viral metagenomics. In this study, numerous human and vertebrate viruses were discovered, such as Herpesvirales, Iridoviridae, Poxviridae, Circoviridae, Parvoviridae, Bunyaviridae from the effluent samples. Through the study, it was also established that the discharge from the wastewater treatment plant appears to influence the quality of the surface water through high viral concentrations levels. Although in this study, only the sampling and filtering of the water samples was done in Uganda, the NGS analysis, and data interpretation of the sample was done at Michigan State University in the United States. This was probably due to the fact that most of the infrastructure, cost and manpower associated with the metagenomic study and pipeline were not available.
In South Africa, a study of viral diversity using metagenomics has not been explored to the fullest, except in few environments. In Kogelberg Biosphere Reserve in South Africa, the unique plant viral biodiversity was explored in a vegetation in the western province using metaviromic technique. The recovered DNA from the soil samples was sequenced under the Illumina Platform with some bioinformatics analysis carried out which detected biodiversity among the Caudovirales group.
The functional and phylogenetic analysis of the metaviromes revealed a high percentage of phages while distinct viromes from known isolates were left. New and emerging phage related protein sequences were also identified in this research study, thereby presenting a prospect for more research studies in such environments to explore more viral diversity using metagenomics.
Metagenomics was also explored in South Africa, in Western Cape province, to determine the unique interaction of viruses’ diversity in an African hot spring community; this was achieved via electron microscopy and sequencing. In this study, the metaviromes analysis was able to detect the presence of salterproviruses using a polymerase B gene phylogeny. The diversified presence of phages, as well as novel archaea viruses, was also discovered in the hot spring. Likewise, a research group in the Eastern Cape province employed the approach of viral metagenomics to screen, identify, and recover, the prevalent species of Human Adenovirus (HAdV) present in sewage and mussel samples, which are associated with human infections. In this study, the metaviromes indicated the predominant presence of HAdV-17 in mussel samples. This is an indication that it is not only the environmental samples that should be the most important priority; both food products and clinical samples should be screened thoroughly. The manifestation of HAdV-D17 in the seafood samples raises an alarm round the ecological health state of the river as well as the extent of contamination existing in the Swartkops River estuary. Table 6 demonstrates the trends of the metagenomics approach using different sequencing platforms in Africa.
4. Open Research Work and Implications for Environmental Genomes
More insight into virology ecology has expanded since the commencement of viral metagenomics. At present, in South Africa, conventional molecular techniques have mainly been used in the isolation, quantification and identification of HEV. In all these conventional approaches used thus far, our knowledge of the different species of viruses in the environment has been limited. More information about the occurrence, abundance, diversity and ecological richness of these microbes remain unexplored due to lack of skills and technology. Characterization of viral communities through conventional methods or protocols is often biased, as they do not allow for total viral community analyses. Some of these techniques are peculiar to a gene or organism, tedious and specific since no specific molecular assay has the potential to determine all viruses present in a sample in one single run. NGS has received huge success and application in viral ecology in various matrices, where other techniques have had setbacks. Based on literature and scientific reports, identification of HEV using metagenomics is still an upcoming approach in resource-poor settings like underdeveloped or developing regions. The non-stop monitoring of bio-indicators in wastewater systems using metagenomics could also attribute to evaluating the distribution patterns of viral infections, as well as the microbial risk assessment, which can make available early advice of any potential disease outbreaks. The South African aquatic systems have the prospect of an almost unimaginable microbial diversity, despite the water scarcity syndrome been experienced in recent years. Techniques such as viral metagenomics can be used to improve surveillance of viral pathogens, to understand the evolution and diminishing viral species due to climate changes, and for diversity in food security and public health.
5. Conclusions and Future Perspectives
Since the introduction of metagenomics and NGS, the field has gained momentum, giving room and opportunity for the characterization of all possible microbes in a sample. Since there is not much development in the areas of cutting-edge technologies in developing nations, the quest for information regarding the state of our water systems continues to deteriorate.
Emerging and recurring viral species may not be the only setbacks facing developing countries, but a problem that the entire world faces. This is due to the fact that these viruses have a mysterious way of contaminating and polluting the world’s entire aquatic ecosystem. It is proposed that the investigation about the prevalence of possible microorganisms within the aquatic system is essential because diverse activities are carried out in various parts of the world. The relatively high cost of modern molecular technologies, as well as computational human expertise for the analysis of the data generated, have greatly contributed to the slow growth of the viral microbial ecological research community in Africa. NGS is undeniably a key technology; however, the implementation of this technique is still a challenge in Africa. A wide range of challenges are defying researchers in Africa, such as limited scientific resources, limited human skills, insufficient training and lack of access to genome sequencing facilities. In addition, we recommend that more energy should be directed towards instituting more water and safety programmes in emerging nations, as this may help to break the barriers and restrictions that are swallowing up the scientific community.