Dataset: 11.1K articles from the COVID-19 Open Research Dataset (PMC Open Access subset)
All articles are made available under a Creative Commons or similar license. Specific licensing information for individual articles can be found in the PMC source and CORD-19 metadata
More datasets: Wikipedia | CORD-19

Logo Beuth University of Applied Sciences Berlin

Made by DATEXIS (Data Science and Text-based Information Systems) at Beuth University of Applied Sciences Berlin

Deep Learning Technology: Sebastian Arnold, Betty van Aken, Paul Grundmann, Felix A. Gers and Alexander Löser. Learning Contextualized Document Representations for Healthcare Answer Retrieval. The Web Conference 2020 (WWW'20)

Funded by The Federal Ministry for Economic Affairs and Energy; Grant: 01MD19013D, Smart-MD Project, Digital Technologies

Imprint / Contact

Highlight for Query ‹Enzootic Pneumonia of Pigs risk

Computational Analysis of Cysteine Proteases (Clan CA, Family C1) of Leishmania major to Find Potential Epitopic Regions


Leishmania (Order: Kinetoplastida, Family: Trypanasomatidae) is an obligate intracellular parasite responsible for a broad spectrum of diseases, ranging from simple cutaneous to invasive visceral leishmaniasis (1). Protozoan parasites of the genus Leishmania present two forms in their life cycle: promastigote, which multiplies in the mid gut of the sand fly vector, and amastigote, the obligate intracellular form that lives within phagolysosomes of the vertebrate host 2., 3.. Three major types of leishmaniasis, namely cutaneous, mucocutaneous and visceral, occur in humans depending on the species of Leishmania. Infection by species such as L. major, L. tropica and L. mexicana may cause localized cutaneous lesions, resulting in lifelong immunity. Infection by L. braziliensis and L. panamensis initially presents as cutaneous lesions that may then spread or metastasize causing mucocutaneous lesions. Infection by L. donovani, L. infantum and L. chagasi may result in a chronic disseminating visceral disease in the liver and spleen (4).

It has been recognized for many years that proteases of pathogenic organisms may modulate the host’s defense mechanisms (5). Proteases are grouped into clans and families on the basis of the architecture of their catalytic dyad or triad (6). Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan. L. major has cysteine proteases (CPs) of eight families within clan CA. Family C1 contains CPA and CPB, which are both cathepsin L-like in terms of primary amino acid sequence, and CPC, which is cathepsin B-like. CPB is unusual in that it has a 100-amino acid C-terminal extension in comparison with most CPs of the group, and exists as multiple isoenzymes, which are encoded by a tandem array of similar CPB genes located in a single locus (the arrays comprise eight genes in L. major). The CPBs of L. mexicana are stage-regulated and the isoforms present differences in their substrate specificity and catalytic properties (7).

Although the exact roles of CPs in Leishmania pathogenesis are unclear, it has been demonstrated that Leishmania cannot grow within macrophages in the presence of CP inhibitors (8). These observations provide evidence of the importance of these molecules in the survival of both promastigote and amastigote forms of these parasites (9). Despite trypanosomatid CPs may be instrumental in modulating the host’s immune response to favor parasite survival and proliferation, they are themselves immunogenic. L. mexicana CP is a T cell immunogen, resulting in the development of potentially protective Thl cell lines (10). This finding suggests that the CP itself is a vaccine candidate and that homologous enzymes in other species may also be so. A CP of L. pifanoi, however, provided rather little protection for the host against infection with the parasite (11), although more recently a similar L. amazonensis CP provided some protection against subsequent challenge, apparently through inducing a Thl-associated response (12). These differing results presumably reflect the complexity of the immune response to the active parasite enzymes and how the response may be determined by the precise immunization conditions. It is therefore encouraging that a CP-rich fraction of L. major was shown to be a strong inducer of a primed human immune response and may have protective function (13). These observations suggest that trypanosomatid CPs have potential as vaccines although attempts to exploit them are really just beginning 14., 15.. In addition, it has also been proven that infected dendritic cells are the critical antigen-presenting cells responsible for T cell priming in Leishmania infections. Amastigotes, but not the infectious promastigotes, are the main targets for phagocytosis activity of dendritic cells 16., 17..

A key step in the design of subunit vaccines is the identification of epitopes from overlapping synthetic peptides. This method decreases the possibility of missed epitopes, but lots of peptides need to be synthesized at a high cost. New developments in immunoinformatics and other computational methodologies, combined with the broad versatility in the design and synthesis of genetic (DNA) vaccines, underlay new strategies for the novel design of antigen-specific, epitope-based vaccines against many pathogens that currently have proven refractive to conventional vaccine therapy (18). Epitopes are selected by prediction with software, which saves the expense of synthetic peptides and working time 19., 20..

Basically, the recognition of antigenic epitopes by the immune system, either small discrete T cell epitopes or large conformational epitopes recognized by B cells and soluble antibodies, is the key molecular event at the heart of the immune response to pathogens (21). The objective of this bioinformatics-based study is to enhance the optimal selection of epitopic regions of clan CA, C1 family of cysteine proteases as potential targets of immune response. Consensus sequence methodology was used to identify sequences of 9 amino acids or longer with complete conservation in 80% or more of C1 families of cysteine proteases. These conserved sequences were further analyzed to identify targets for candidate epitope-based T cell vaccine formulations against L. major. Furthermore, concerning the activation of human humoral immunity by leishmaniasis, B cell epitopes were also predicted based on propensity scales for each of the 20 amino acids. Utilizing bioinformatics servers for vaccine candidates is a time-saving approach that could significantly help to increase our information about various aspects of pathogens in molecular biology. These theoretical predictions can then be tested by using experimental and complementary methods.

Prediction of MHC class I binding peptides

For the prediction of major histocompatibility complex (MHC) class I binding peptides of C1 family of cysteine proteases, one sequence for each of the cysteine proteases A, B and C was selected. In the case of CPB, due to the high similarity between these cysteine proteases, only one out of eight sequences was chosen (LmjF08.1050). This sequence is a good candidate of CPB because of the complete identity of its sequence with the consensus sequence of CPB. All overlapping nonamer peptides were generated from this dataset and were screened for potential T cell antigens using the NetCTL algorithm, from which 716 peptides were short-listed. Most of the peptides were found to exhibit mono-supertype specificity, meaning that they bind to a single supertype. Some of them, however, appeared to bind to multiple supertypes; the highest number of supertypes a given nonamer could bind is 5 out of the 12 supertypes tested. In fact, out of the 716 human leukocyte antigen (HLA)-binding nonamers, one peptide binds to 5 supertypes, 4 bind to 4 supertypes, 9 bind to 3 supertypes and 44 bind to 2 supertypes. Sequences of binding peptides to the 4 and 5 supertypes and the name of proteins they belong to are summarized in Table 1.

Knowing the number of binding peptides of each of the analyzed proteins is important, considering the polymorphic nature of HLA and its diversity in populations of different geographical regions. Therefore, a good T cell antigen should have peptides recognized by many HLA alleles. The analysis revealed that CPB has the maximum number of binding peptides, followed by CPC and CPA, respectively (Table 2).

For short-listing potential vaccine candidates, it is important to analyze the binding profiles from a supertype perspective. Of the 12 supertypes studied, the largest number of nonamers was found to be recognized by the allele B62 (53), followed by B58 (38), A2 (36), A24 (24), A1, B8 and B39 (21), B44 (20), A3 and B7 (19), B27 (17) and A26 (15), as illustrated in Figure 1.

The score with which a peptide binds to HLA ranges from 0.75 to 3.1949. In general, the binding score of CPC peptides to A1 locus supertypes is higher compared with CPB, CPA and other supertypes (Table 3).

Putative promiscuous T cell epitopes may be localized in clusters, as reported in studies of HIV-1 22., 23., 24., 25., the outer membrane of Chlamydia trachomatis

(26), and among others 27., 28.. The clusters are also ideal for developing epitope-based vaccines because they contain multiple promiscuous epitopes. The number of immunogenic hotspots for CPA, CPB and CPC is 4, 5 and 0, respectively, as shown in Table 4.

The identification of conserved sequences is very important to design peptide vaccines, because vaccines that are developed on the basis of the conserved segments among candidate proteins can be used against a large majority of pathogen’s variants. In Figure 2, three segments (I, II and III) with identity ≥90% and have ≥9 amino acids in length are shown as conserved regions. Obviously the epitopes predicted in these regions are very significant. For immunological applications, a minimum conserved sequence length of 9 amino acids is required because this represents the typical length of peptides that bind to HLA molecules (29). The features of potential epitopes located in conserved regions with maximum scores are summarized in Table 5.

B cell epitope prediction

Before the prediction of B cell epitope of CPB (LmjF08.1050), signal peptide of this protein predicted by SignalP 3.0 hidden Markov model (HMM) (signal peptide probability 0.999, signal anchor probability 0.001, with cleavage site probability 0.760 between residues 27 and 28) was excluded. Hydrophilicity, flexibility, accessibility, turns, exposed surface, polarity and antigenic propensity scales were applied to predict B cell epitopes. These parameters were correlated with the location of continuous epitopes. As a result, 9 regions were predicted to be B cell epitopes (Figure 2).


The aim of this investigation was to apply bioinformatics methods to study the B and T cell epitopic sites of C1 family cysteine proteases of L. major. To help the development of vaccines, understanding the structural basis for the cell-mediated immune response is necessary (30). The perfect bioinformatics prediction of T cell epitopes can to a great extent reduce the experimental cost in candidate epitope identification (31).

In the present study, NetCTL program has been used to predict MHC class I of cysteine proteases A, B and C of L. major

(32). CP proteins are immunogenic and are potential vaccine candidates. Efficient processing and presentation of vaccine antigens by class I and/or class II MHC are essential for a good T cell response. Since humans carry only a limited number of co-dominant HLA alleles in their genome (2 each for A, B and C loci), out of hundreds of polymorphic alleles present in the population, it becomes important that a candidate vaccine must generate peptides that bind to a wide range of HLA molecules to provide good population coverage.

In this work we found that generated peptides bind in larger numbers to B supertypes. However, almost all of the peptides with the highest binding score belong to CPC. In other words, CPC is the major source of peptides that bind to HLA loci with more affinity. These observations suggest that greater emphasis must be placed on cytotoxic T lymphocyte (CTL) response generated by the presentation of antigen by B alleles and should design epitope-based vaccine directed towards these HLA.

T cell epitopes specific to multiple HLA supertypes are advantageous for vaccine design because they effectively increase the number of epitopes to which an individual can respond, and provide much more extensive coverage of the population (33). The peptides binding to more than one HLA are termed promiscuous and such peptides are of prime interest for vaccine design because of their relevance in coverage of higher proportions of human populations. In silico approaches would help to predict some of the HLA-binding motifs, which could act as promiscuous epitopes (34). Most of the generated nonameric peptides in this work are mono-allelic binders. To cover majority of the population, it is essential to have vaccine candidates that have multi-binding behavior. Consequently, peptides with the binding ability of ≥4 supertypes were taken as promiscuous epitopes. Note that each supertype consists of multiple HLA alleles, and peptides that can bind to ≥4 supertypes have a great potentiality to activate the most proportion of T cell population.

It is generally recognized that conserved protein sequences represent important functional domains (35), for which mutations would be detrimental to the survival of the pathogen. The functions of conserved sequences can be elucidated by databases that comprise data on protein families, domains and functional sites, such as the Pfam database ( (36). In Figure 2, in addition to the ClustalW consensus sequence, the results of the Pfam database and the highly conserved regions that have ≥9 amino acids in length have also been shown. It is clear that the predicted epitopes located in the conserved segments have more validity.

Eventually, identification of proteins with peptides binding to larger number of alleles, assessment of alleles or supertypes of MHC that bind large number of peptides than others have great importance in determining epitopes as a candidate vaccine. In addition, allelic variation in binding affinity, immunological hotspots, HLA distribution analysis and similarity of epitopes to the self proteins play a key role in identification of these epitopes 34., 35., 36..

In proteins, turns are located on the surface; these parts are accessible and hydrophilic but the core regions are mostly devoid of water molecules (37). Antigenic determinants lie in regions that are hydrophilic, exposed and polar, while accessibility and flexibility of these segments are high. This has led to the rules that would allow the position of B cell epitopes to be predicted from these features of the protein sequence 37., 38..

In conclusion, recognizing epitopes on proteins is essential for developing synthetic vaccines and can facilitate immunotherapy of leishmaniasis and many other infectious diseases. In the present work, employing a bioinformatics approach, a set of peptides has been identified, which can be used in either a natural or a synthetic vaccine cocktail. This approach could be extended to the entire proteome of L. major to identify newer sets of potentially antigenic proteins and yet reducing the number of T and B cell antigens for experimental verification. These kinds of researches can be applied for omitting non-functional sequences of proteins, which would help in designing new immunological methods.

Amino acid sequences

The sequences of ten cysteine proteases (clan CA, family C1) of L. major were obtained from GeneDB database ( (39). Sequences included one CPA (Systematic ID: LmjF19.1420), one CPC (LmjF29.0820) and eight CPBs (LmjF08.1010, LmjF08.1020, LmjF08.1030, LmjF08.1040, LmjF08. 1050, LmjF08.1060, LmjF08.1070 and LmjF08.1080).

Prediction of MHC class I binding peptides

NetCTL program version 1.2 ( (40) predicts peptides restricted to 12 HLA class I supertypes (A1, A2, A3, A24, A26, B7, B8, B27, B39, B44, B58 and B62), integrated with predictions of HLA binding, proteasomal C-terminal cleavage and transport efficiency by the transporter associated with antigen processing (TAP) molecules. HLA binding and proteasomal cleavage predictions were performed by an artificial neural networks (ANN) method and TAP transport efficiency was predicted using a weight matrix method. The parameters used for NetCTL prediction were: 0.15 weight on C terminal cleavage, 0.05 weight on TAP transport efficiency, and 0.75 threshold for HLA supertype binding. The final scores are the predicted MHC class I affinities in form of –logIC50 and IC50 values.

B cell epitope prediction

All prediction calculations were based on propensity scales for each of the 20 amino acids. Sequence of each protein was read as a moving window. In order to compare the profiles obtained by different methods, various scales were normalized where the original values of each scale were set between +3 and –3. Hydrophilicity (41), flexibility (42), accessibility (43), turns (44), exposed surface (45), polarity (46) and antigenic propensity (47) scales were applied to predict B cell epitopes by BcePred server ( with default threshold.

Signal peptide prediction

Due to the elimination of signal peptides of CPBs before secretion to the outer space of the infected cells, this region must be excluded from the entire sequence of the protein for exerting the prediction analysis on it. Signal peptide prediction was achieved using SignalP 3.0 HMM ( (48).

Immunogenic hotspot prediction

Putative promiscuous T cell epitopes may be localized in clusters that are also ideal for developing epitope-based vaccines because they contain multiple promiscuous epitopes. For determining the immunogenic hotspots, MULTIPRED server ( (49) was utilized.


Sequences were aligned using ClustalW program (50) from the BioEdit v5.0.9 package (51).

Authors’ contributions

BS conceived the study and carried out the computational analysis. HM supervised the study. BS and HM prepared the manuscript. Both authors read and approved the final manuscript.

Competing interests

The authors have declared that no competing interests exist.