Dataset: 11.1K articles from the COVID-19 Open Research Dataset (PMC Open Access subset)
All articles are made available under a Creative Commons or similar license. Specific licensing information for individual articles can be found in the PMC source and CORD-19 metadata
More datasets: Wikipedia | CORD-19

Logo Beuth University of Applied Sciences Berlin

Made by DATEXIS (Data Science and Text-based Information Systems) at Beuth University of Applied Sciences Berlin

Deep Learning Technology: Sebastian Arnold, Betty van Aken, Paul Grundmann, Felix A. Gers and Alexander Löser. Learning Contextualized Document Representations for Healthcare Answer Retrieval. The Web Conference 2020 (WWW'20)

Funded by The Federal Ministry for Economic Affairs and Energy; Grant: 01MD19013D, Smart-MD Project, Digital Technologies

Imprint / Contact

Highlight for Query ‹Airborne disease risk

Vaccinomics Approach for Designing Potential Peptide Vaccine by Targeting Shigella spp. Serine Protease Autotransporter Subfamily Protein SigA

1. Background

Shigella is a Gram-negative, facultative anaerobic, nonmotile, nonspore forming, and rod-shaped true bacteria closely related to Salmonella and Escherichia coli. The resulting infection by this organism called shigellosis, also known as bacillary dysentery or Marlow syndrome, is most typically associated with diarrhoea and other gastrointestinal symptoms in humans. This pathogen is usually found in water that is contaminated with human feces within the setting of poor hygiene among kids of underneath 5 years old and is transmitted via the fecal-oral route. The infection will occur even if there is just a bodily function of only ten to one hundred microorganisms. In each year, 165 million cases of Shigella infection are accounted worldwide, of that, 163 million take place in developing countries and ultimately result in millions of death. Bangladesh has got the top rates of shigellosis according to the recent Global Enteric Multicenter Study (GEMS) in Asia. The output of this study has revealed that the Shigella is the third leading reason behind diarrhoea in children [3, 4].

Shigella species are usually classified into four serogroups: S. dysenteriae (12 serotypes), S. flexneri (6 serotypes), S. boydii (18 serotypes), and S. sonnei (one serotype) based on the biochemical properties and group-specific O antigens within the outer portion of the semipermeable membrane. S. dysenteriae, S. flexneri, and S. boydii are physiologically similar in distinction to S. sonnei. Among them, S. flexneri is the most frequently isolated species globally and accounts for 60% of cases in the unindustrialized countries; S. sonnei causes 77% of cases in the industrialized countries.

The underlying therapeutic challenge to manage Shigella is its accrued resistance to most often used antibiotics like ampicillin, tetracycline, streptomycin, nalidixic acid, and sulfamethoxazole-trimethoprim. Earlier, ciprofloxacin, a third-generation fluoroquinolone antibiotic, has been used effectively for the treatment of bacillary dysentery. However, this antibiotic is no longer helpful for the treatment of bacillary dysentery in south Asian countries together with Bangladesh, because of the dissemination of fluoroquinolone-resistant variety and its equivalent clones across the countries [7, 8]. Hence, it is essential to find a sustainable approach like vaccinomics, which can elicit long-term and consistent immunological responses to fight against Shigella.

SigA is annotated in the she pathogenicity island of Shigella, encoding SigA protein which belongs to the serine protease autotransporter of enterobacteriaceae (SPATE) subgroup proteins. The autotransporter proteins of Gram-negative bacteria exhibit an N-terminal signal sequence, required for secretion across the inner membrane, and a C-terminal domain that forms an amphipathic β-barrel pore that allows passage of the functional domain across the outer membrane. This type of exporter proteins either remains attached to the cell surface or is released from the cell by proteolytic cleavage. SigA is a multifunctional protein, able to degrade casein with cytotoxic and enterotoxic effects. Moreover, SigA is cytopathic for human epithelial type-2 (HEp-2) cells, causing morphological changes and loss of integrity of the cell monolayers, important for the pathologic process of Shigella. The position of SigA in the chromosome made them less vulnerable to loss compare to the other virulence factors harbouring within the plasmid, and more exposure to the immune cells occurred by this secreted toxin. Most importantly, this protein has been shown to be immunogenic following infection with Shigella. The generalized modules of membrane antigen- (GMMA-) based outer membrane proteins including SigA were also shown to be highly immunogenic, which prompted us to target SigA as one of the best vaccine candidates and to design potential peptide vaccine covering all the Shigella spp. and most of the regions of the world.

Epitope-based immunizing agents are often an inexpensive choice to thwart enteric Shigella infection. The identification of specific epitopes derived from infectious pathogens has considerably advanced the event of epitope-based vaccines (EVs). Higher understanding of the molecular basis of substance recognition and human leukocyte antigen- (HLA-) binding motifs has resulted in the advancement of rationally designed vaccines that solely depends on algorithms predicting the peptide's binding to human HLA. The traditional process for the development of a vaccine is very complex compared to that of the epitope-based vaccine, and additionally, it is chemically stable, more specific, and free of any infectious or oncogenic potential hazard. However, the invention of a wet laboratory-based candidate epitope is expensive and laborious that requires varied medicine experiments in the laboratory for the ultimate choice of epitopes. Hence, the interest for predicting epitopes by computational strategies, alternate in silico approaches among researchers, is growing bit by bit with reduced efforts.

Vaccinomics is the application of integrated knowledge from different disciplines including immunogenetics and immunogenomics to develop candidate next-generation vaccine and understand its immune response. Currently, various vaccinomics databases are accessible for identification of distinctive B lymphocyte epitopes and HLA ligands with high sensitivity and specificity [15–17]. The vaccinomics approach has already proven its potency in identifying the conserved epitope in the case of human immunodeficiency virus, multiple sclerosis, tuberculosis, and malaria with desired results. In our study, we have applied vaccinomics approaches for the screening of potentially conserved epitopes by targeting protein SigA.

2.1. Sequence Retrieval and Antigenic Protein Determination

The SigA protein sequences of different strains of Shigella species were retrieved from the NCBI GenBank database and analysed in the VaxiJen v2.0 server for the determination of the most potent antigenic protein. Additionally, the target protein was crosschecked against human pathogens and other similar pathogens to ensure the orthologous entry by using BLAST-P and OrthoMCL databases.

2.2. T-Cell Epitope Prediction and Affinity with MHC

The epitope prediction for the respective protein and their affinity score with MHC class I and class II allele was measured following previously used approach [27, 28]. Concisely, the NetCTL v1.2 server was used for predicting potential cytotoxic T-lymphocyte (CTL) epitopes from the most antigenic protein. A combined algorithms including MHC-I binding, transporter of antigenic peptide (TAP) transport efficiency, and proteasomal C-terminal cleavage prediction were employed for the T-cell epitope prediction. The epitope with the highest score for 12 MHC class I supertypes was selected.

T Cell Epitope Prediction Tools from Immune Epitope Database and Analysis Resource (IEDB-AR) were used for the prediction of affinity with MHC class I and MHC class II [31, 32]. The stabilized matrix method (SMM) was used to calculate the half-maximal inhibitory concentration (IC50) of peptide binding to MHC class I with a preselected 9.0-mer epitope. The peptides were also assessed for HLA I binding affinity by the software, EPISOFT. For the analyses of MHC class II binding, the IEDB-recommended method was used for the specific HLA-DP, HLA-DQ, and HLA-DR loci. Fifteen-mer epitopes were designed for MHC class II binding analysis considering the preselected 9-mer epitope and its conserved region in the Shigella strains. For the MHC class I and MHC class II alleles, the epitopes consisting IC50 < 250 nM and IC50 < 100 nM, respectively, were selected for further analysis. The MHC class II binding prediction tool PREDIVAC was also used to assess their affinity with HLA_DRB_1.

2.3. Cluster Analysis of the MHC Restricted Alleles

Furthermore, the MHCcluster v2.0 server was used for the identification of cluster of MHC restricted allele with appropriate peptides to further strengthen our prediction. This is the additional crosscheck of the predicted MHC restricted allele analysis from the IEDB analysis resources. The output from this server is a static heat map and a graphical tree for describing the functional relationship between peptides and HLAs.

2.4. Epitope Conservancy and Population Coverage Analyses

Epitope conservancy of the candidate epitopes was examined using a web-based epitope conservancy tool available in IEDB analysis resource. The conservancy level of each potential epitope was calculated by considering identities in all SigA protein sequences of different strains retrieved from the database. Multiple sequence alignment (MSA) was employed to understand the positions of the epitopes within the sequences. As SPATE family is very much specific for the enterobacteria, specifically, E. coli and Shigella, we also include two E. coli sequences (gi|693049347| and gi|699401135|) along with those of four species of Shigella for MSA construction. The Jalview ( tool was used for this analysis. The conservancy of the selected peptides was also substantiated by the Protein Variability Software (PVS). Population coverage for the epitope was assessed by the IEDB population coverage calculation tool. The combined score for MHC classes I and II was assessed for the analysis of the population coverage.

2.5. Homology Modelling and Structural Frustration Analysis

A homology model of the conserved region was obtained by MODELLER v9, and the predicted model was assessed by the PROCHECK [38, 39] server. For the disorder prediction among the amino acid sequences, DISOPRED v3 was used. The protein frustratometer server was employed for the detection of the stability and energy differences of the 3D structure of the protein.

2.6. Molecular Docking Analysis and HLA Allele Interaction

Docking studies were also performed using the best possible epitope following the strategy used in previous studies [27, 28]. AutoDock Vina was used for the docking analysis. In our study, we have selected the HLA-E∗01:01 molecule as a candidate for MHC class I and the HLA-DQA1 as a candidate for MHC class II for docking analysis because they are the available hits in the Protein Data Bank (PDB) database. The PDB structure 2ESV, human cytomegalovirus complexes with T-cell receptors, VMAPRTLIL peptide, and 3PL6—structure of autoimmune TCR Hy.1B11 in complex with HLA-DQ1—were retrieved from the Research Collaboratory for Structural Bioinformatics (RCSB) protein database. Then, the structures were simplified by using PyMOL (the PyMOL Molecular Graphics System, Version, Schrödinger, LLC) for the final docking purpose.

The PEP-FOLD server was used for the conversion of the 3D structure of the epitope “IELAGTLTL” for MHC I and the epitope “KAIELAGTLTLTGTP” for the MHC II molecule in order to analyse the interaction with HLA alleles.

Finally, molecular docking was performed at the center of X: 77.8087, Y: −3.2264, and Z: −9.5769 and the dimensions (angstrom) of X: 31.4432, Y: 29.9517, and Z: 19.0455 for the MHC I molecules. For the MHC II molecules, docking was performed at the center of X: 38.5584, Y: 46.6132, and Z: −36.4392 and the dimensions (angstrom) of X: 34.8104, Y: 40.4401, and Z: 37.3366. Additionally, we have performed a control docking with the experimentally known peptide—MHC-bound complex. The PDB structure 2ESV, human cytomegalovirus complexes with T-cell receptors, and VMAPRTLIL peptide were used for this purpose. The gridline was used at the center of X: 77.3404, Y: −3.5159, and Z: −9.5829.

2.7. Allergenicity Investigation and B-Cell Epitope Prediction

The AllerHunter server was used to predict the allergenicity of our proposed epitope for further securing the prediction, and the support vector machine (SVM) algorithm was used for the prediction within the server. The predicted T-cell epitope (15-mer) was screened by IEDB-AR using a number of web-based tools for the suitability as the B-cell epitope [47–49].

3.1. Analysis of the Retrieved Sequences and Their Antigenicity

A total of 44 SigA proteins from different variants of S. flexneri, S. dysenteriae, S. boydii, and S. sonnei were retrieved from the GenBank database (Table S1 in Supplementary Material available online at Thereafter, analyses with the VaxiJen v2.0 server showed the protein with the accession number of gi|745767180| to have the highest antigenicity of 0.6699 (Table S1). This highly antigenic protein was further analysed to detect the highly immunogenic epitope. No significant entry was found in the orthologous entry search of our targeted protein.

3.2. T-Cell Epitope Identification

The NetCTLv1.2 server identified the T-cell epitopes, where the epitope prediction was confined to 12 MHC class I supertypes. Based on the combined score, the top twelve epitopes (Table 1) were listed for further analysis.

3.3. MHC Restriction and Cluster Analysis

IEDB analysis resource predicted both MHC class I and MHC class II restricted allele on the basis of the IC50 value. All the predicted epitopes in Table 1 were assessed for the MHC interaction analysis. Epitopes for the MHC class I alleles are presented in Table 2. The peptide IELAGTLTLT was predicted to have the highest number of MHC class I binding. This peptide was predicted to have the binding affinity with five MHC class I alleles including HLA-E∗01:01, HLA-B∗40:01, HLA-B∗15:02, HLA-C∗03:03, and HLA-C∗12:03. Furthermore, the interacted alleles were reassessed by cluster analysis and are shown in Figure 2(a), as a heat map, and in Figure S1A, as a dynamic tree. The peptides were reassessed by the EPISOPT software for the HLA I binding, and IELAGTLTL was found to have affinity with six HLA I alleles (Table S3). From this analysis, we selected top four peptides VTARAGLGY, FHTVTVNTL, HTTWTLTGY, and IELAGTLTL depending on the affinity with most MHC class I.

Epitopes for the MHC class II alleles are presented in Table 3. Depending on the IC50 values as well as on the number of MHC class II alleles, three 15-mer peptide candidates were selected. The peptides NSGFHTVTVNTLDAT, KAIELAGTLTLTGTP, and AAKSYMSGNYKAFLT were predicted to have high affinity with MHC-II allele, which can interact with 32, 29, and 24 MHC class II alleles. The data has been validated by another software PREDIVAC. The predivac scores of the two core peptides FHTVTVNTL and IELAGTLTL have been shown to be promising for their binding to HLA_DRB_1 (Table 3). Accumulating both MHC class I allele- and MHC class II allele-based analyses, we showed FHTVTVNTL and IELAGTLTL peptides to have the best score to be a vaccine potential.

3.4. Conservancy Analysis and Position of the Epitopes

Conservancy of all the proposed epitopes was assessed by the IEDB conservancy analysis tool and is summarized in Table 4. FHTVTVNTL, IELAGTLTL, NYAWVNGNI, and SMYNTLWRV were shown to have 100% conserved regions across all the SigA proteins. The position of all the predicted epitopes is shown in a multiple sequence alignment of SigA proteins in Figure 3. Here, we used only our desired sequences for the proper annotation. So, from the most potential candidates, only two, that is, FHTVTVNTL and IELAGTLTL, were found to be fully conserved. The top four epitopes were shown within the protein in Figure 4. The conservancy of both of these peptides were crosschecked by PVS software, and it was found that they were located in the conserved region of the SigA protein (Figure S5). The epitopes are precisely positioned on the surface of the protein indicating that they would be accessible to the immune system, especially by B-cells.

3.5. Model Validation Structural Frustration Analysis

MODELLER modelled the three-dimensional structure of the targeted protein through the best multiple template-based modelling approach. The validation of the model was measured by the PROCHECK server through the Ramachandran plot and is depicted in Figure S2, where 88.8% amino acid residues were found within the favoured region. Furthermore, the predicted model was also assessed for the frustration analysis and is depicted in Figure 5. The DISOPRED server likewise assessed the disorder of the protein sequences in order to get an understanding about the disorder among the targeted sequences, which is shown in Figure S3.

3.6. Population Coverage Analysis

IEDB analysis resource predicted both MHC class I- and MHC class II-based coverage of the selected epitopes for the world population to assess the feasibility of being a potential vaccine candidate. The combined prediction was also assessed. The epitope “IELAGTLTLT” has the highest population coverage of 83.86% for the whole world population (shown graphically in Figure 6); however, another potential epitope “FHTVTVNTL” was shown to have 50.61% population coverage (Table S2).

3.7. Molecular Docking Analysis

The core epitope (IELAGTLTL) with 9.0 mer and its 15-mer extension (KAIELAGTLTLTGTP) were bound in the groove of the HLA-E∗01:01 and HLA-DQA1 with an energy of −7.8 and −9.7 kcal/mol, respectively. AutoDock Vina generated different poses of the docked peptide, and the best one was picked for the final calculation at an RMSD (root-mean-square deviation) value of 0.0. The docking interface was visualized with the PyMOL Molecular Graphics System. The 9.0-mer epitope interacted with Arg-61, Asn-62, and Glu-152 through steric interaction and formed hydrogen bonding with the Glu-156 amino acid residues. On the other hand, the 15-mer epitope interacted with Asp-55 through electrostatic interaction and Glu-66 through steric interaction and formed hydrogen bonding with the Gly-58, Arg-61, Asn-62, and Asn-82 amino acid residues. The docking output and the interacted residues are shown in Figures 7 and 8 with different orientations. Furthermore, the control docking energy was found to be −6.8 kcal/mol and is illustrated in Figure S4.

3.8. Allergenicity Analysis

The AllerHunter web server predicted the sequence-based allergenicity calculation very precisely. The allergenicity of the queried core epitope (IELAGTLTLT) was 0.05 (sensitivity = 98.40%, specificity = 27.4%), and the allergenicity of the 15-mer epitope (KAIELAGTLTLTGTP) was 0.05 (sensitivity = 98.4%, specificity = 27.0%).

3.9. B-Cell Epitope Prediction

B-cell epitope prediction was obtained for the peptide KAIELAGTLTLTGTP (15 mer) through the sequence-based approaches, and values are anticipated with different parameters, ranging from −0.6464 to 1.137. These values are the different propensity scores and predicted with a threshold ranging from −0.352 to 1.037 (Figure 9()). The Kolaskar and Tongaonkar antigenicity scale was employed for evaluating the antigenic property of the peptide with a maximum of 1.072. The antigenic plot is showed in Figure 9(a). Peptide surface accessibility is another important benchmark to meet up the criteria of a potential B-cell epitope. Henceforth, Emini surface accessibility prediction was employed, with a maximum propensity score of 1.137 (Figure 9(b)). To reinforce our provision for the prediction of the epitope to elicit B-cell response, the Parker hydrophilicity prediction was also employed with a maximum score of 1.086 and is depicted in Figure 9(c).

4. Discussion

Enteric infections are the foremost cause of sickness and impermanence throughout the world, and only the Shigella infections resulted in over a million deaths annually. The ever rising multidrug-resistant (MDR) strains of the Shigella bacteria area unit are another international concern for the researchers to search out a brand new resolution for preventing the deaths [50, 51]. Recently, there are several studies that focus on the development of the vaccine against Shigella and continue in the clinical trial. Most of them use attenuated and inactivated preparation of the bacteria for eliciting immune responses which has some potential escape risk [52–54]. In this study, we have tried to find out alternatives to treat this global burden through vaccinomics approaches and targeting the immunogenic and toxic protein SigA. The sequences of different strains of Shigella showed that there is a little island of conserved sequence throughout the species, and we have focused on that target for designing the vaccine candidate. The orthologous entry search of our targeted protein revealed no significant similarity with human pathogens and other closely related pathogens. These results further strengthen our prediction through confirming no cross immunity.

In recent time, most of the vaccines are grounded on B-cell immunity; vaccines based on a T-cell epitope have been invigorated lately. This is often as a result of body substance response from memory B-cells which may be overawed basically by matter drift as time goes on, whereas cell-mediated immunity repeatedly delivers long-run immunity [56, 57]. As a consequence, a T-lymphocyte epitope elicits a robust and distinctive immune response through the cytotoxic lymphocyte- (CTL-) mediated pathway and impedes the spreading of the infectious agents by the CTL through recognizing and killing the infected cells or by secreting specific cytokines.

The epitopes VTARAGLGY, FHTVTVNTL, HTTWTLTGY, and IELAGTLTL are primarily selected for the designing of vaccine from the initial analysis depending on the affinity with MHC class I and additionally confirmed their presence along with those of the ancestral homologue in E. coli (Figure 2). Finally, through substantiation with different parameters, the core epitopes IELAGTLTL and FHTVTVNTL (in 15.0-mer form, KAIELAGTLTLTGTP and NSGFHTVTVNTLDAT, resp.) were found to be the most potential and highly interacting HLA candidates for MHC class II molecule. Furthermore, we have used pSORTb to predict the subcellular localization of SigA and found that there is a score of 5.87 for localization in the outer membrane and another score of 4.13 for extracellular localization. The result was quite similar with that for the localization of other SPATE proteins in the bacterial cell surface as well as in secreted forms.

The three-dimensional model built through MODELLER and validated by the Ramachandran plot with an acceptable range resulted in the display of the perfect position of the epitope on the surface of the structure. As the epitope was found on the surface (Figure 4) of the model, it would increase the possibility to interact with the immune system earlier. Furthermore, the analysis from the DISOPRED and frustration analysis servers strengthen our prediction, though there are no disorder and energy frustration in the epitope region of the sequences and model, respectively (Figure 5 and Figure S3).

To get the acceptability, vaccine candidates must have wider population coverage. This is very much important before designing. In our analysis, we have found that our proposed epitope IELAGTLTL had combined population coverage of 83.86%, whereas the other most potential candidate FHTVTVNTL had combined population coverage of 50.61%. This output revealed that the proposed epitopes would have wider coverage in vitro.

Molecular docking upkeeps the prediction with a higher docking score and the perfectly oriented interactions between the both MHC and the predicted 9.0-mer and 15-mer epitopes. Additionally, comparative analysis with the experimentally known peptide—MHC complex—has also revealed the precision of our prediction through the similar binding energy and interacted residues. Another significant finding is the conservancy result. Through analysis of the whole retrieved sequences, it was found that our predicted epitopes have a 100% conservancy and hopefully they would be potential candidates for treating all of the Shigella spp. Our proposed epitopes are nonallergenic in nature according to the FAO/WHO allergenicity evaluation scheme.

Finally, the core epitope “IELAGTLTL” was also found to be more potential B-cell epitope candidates that were proposed through the sequence-based approaches including the Kolaskar and Tongaonkar antigenicity scale, Emini surface accessibility prediction, and Parker hydrophilicity prediction. From the overhead analysis, we envisage that our suggested epitope would also elicit an immune response in vitro.

5. Conclusion

The improved knowledge about antigen recognition at molecular level led us to the development of rationally designed peptide vaccines. The idea of peptide vaccines is based on detecting and chemical synthesis of immunodominant B-cell and T-cell epitopes capable of evoking specific immune responses. In this study, we used different computational tools to identify potential epitope targets against Shigella which will help to decrease the cost and time of wet lab experiments more successfully. Our bioinformatic analyses speculate that the selected part of the outer membrane and highly immunogenic protein, SigA, is a potential candidate for a peptide vaccine. It might also contribute to the reduction in the SigA-mediated pathogenicity to the host. However, further wet lab validation is necessary to confirm the efficiency of our identified peptide sequence as an epitope vaccine against Shigella.