Dataset: 11.1K articles from the COVID-19 Open Research Dataset (PMC Open Access subset)
All articles are made available under a Creative Commons or similar license. Specific licensing information for individual articles can be found in the PMC source and CORD-19 metadata
More datasets: Wikipedia | CORD-19

Logo Beuth University of Applied Sciences Berlin

Made by DATEXIS (Data Science and Text-based Information Systems) at Beuth University of Applied Sciences Berlin

Deep Learning Technology: Sebastian Arnold, Betty van Aken, Paul Grundmann, Felix A. Gers and Alexander Löser. Learning Contextualized Document Representations for Healthcare Answer Retrieval. The Web Conference 2020 (WWW'20)

Funded by The Federal Ministry for Economic Affairs and Energy; Grant: 01MD19013D, Smart-MD Project, Digital Technologies

Imprint / Contact

Highlight for Query ‹Coronavirus symptoms

Computer-aided design of amino acid-based therapeutics: a review


Different diseases may be caused by pathogens or malfunctioning organs, and using therapeutic agents to heal them has an old recorded history. Small molecules are conventional therapeutic candidates that can be easily synthesized and administered. However, many of these small molecules are not specific to their targets and may lead to side effects.1 Moreover, a number of diseases are caused due to deficiency in a specific protein or enzyme. Thus, they can be treated using biologically based therapies that are able to recognize a specific target within crowded cells.2 Under the biologic conditions, some macromolecules such as proteins and peptides are optimized to recognize specific targets.3 Therefore, they can override the shortcomings of small molecules.3 Recently, pharmaceutical scientists have shown interest in engineering amino acid-based therapeutics such as proteins, peptides and peptidomimetics.4–6

Theoretical and experimental techniques can predict the structure and folding of amino acid sequences and provide an insight into how structure and function are encoded in the sequence. Such predictions may be valuable to interpret genomic information and many life processes. Moreover, engineering of novel proteins or redesigning the existing proteins has opened the ways to achieve novel biologic macromolecules with desirable therapeutic functions.7 Protein sequences comprise tens to thousands of amino acids. Besides, the backbone and side chain degrees of freedom lead to a large number of configurations for a single amino acid sequence. Protein design techniques give minimal frustration through precise identification of sequences and their characteristics.8–11 Considering energy landscape theory, the adequately minimal frustration in natural proteins occurs when their native state is adequately low in energy.7 The de novo design of a sequence is difficult because there are huge numbers of possible sequences: 20N for N-residue proteins with only 20 natural amino acids.12

Peptide design should incorporate computational approaches. It can benefit from searching the more advanced fields used for small molecules and protein design.13 However, the straightforward adoption of computational approaches employed to small-molecule and protein design has not be accepted as a reasonable solution to the peptide design problem.14–16 In the peptide drug design, the conformational space accessible to peptides challenges the small-molecule computational approaches. Besides, the necessity for nonstandard amino acids and various cyclization chemistries challenges the available tools for protein modeling.13 Furthermore, the aggregation of peptide drugs during production or storage can be an unavoidable problem in the peptide design procedure. Rational design of a peptide ligand is also challenging because of the elusive affinity and intrinsic flexibility of peptides.17 Peptide-focused in silico methods have been increasingly developed to make testable predictions and refine design hypotheses. Consequently, the peptide-focused approaches decrease the chemical spaces of theoretical peptides to more acceptable focused “drug-like” spaces and reduce the problems associated with aggregation and flexibility.13,18 For the discussions that follow, peptides can be defined as relatively small (2–30 residues) polymers of amino acids.18

In physiological conditions, several problems such as degradation by specific or nonspecific peptidases may limit the clinical application of natural peptides.19 Moreover, the promiscuity of peptides for their receptors emerges from high degrees of conformational flexibility that can cause undesirable side effects.20 Besides, some properties of therapeutic peptides, such as high molecular mass and low chemical stability, can result in a weak pharmacokinetic profile. Therefore, peptidomimetic design can be a valuable solution to circumvent some of undesirable properties of therapeutic peptides.21,22

In the biologic environment, peptidomimetics can mimic the biologic activity of parent peptides with the advantages of improving both pharmacokinetic and pharmacodynamic properties including bioavailability, selectivity, efficacy and stability. A wide range of peptidomimetics have been introduced, such as those isolated as natural products,23 synthesized from novel scaffolds,24 designed based on X-ray crystallographic data25 and predicted to mimic the biologic manner of natural peptides.26

Using hierarchical strategies, it is possible to change a peptide into mimic derivatives with lower undesirable properties of the origin peptide.27 Over the past 10 years, computational methods have been developed to discover peptidomimetics.28 In a part of this review, novel computational methods introduced for peptidomimetic design have been summarized.

Peptidomimetics can be categorized as follows: peptide backbone mimetics (Type 1), functional mimetics (Type 2) and topographical mimetics (Type 3).29 The first generation of peptidomimetics (Type 1) mimics the local topography of amide bond. It includes amide bond isosteres,30 pyrrolinones31 or short fragments of secondary structure, such as beta-turns.32 Such mimetics generally match the peptide backbone atom-for-atom, and comprise chemical groups that also mimic the functionality of the natural side chains of amino acids. A number of prosperous instances of Type 1 peptidomimetics have been reported.33

The second type of peptidomimetics is described as functional mimetics or Type 2 mimetics, which include small, non-peptide compounds that are able to identify the biologic targets of their parent peptide.34 At first, they were assumed to be conservative structural analogs of parent peptides. However, using site-directed mutagenesis, their binding sites to biologic targets were investigated. The results indicated that Type 2 peptidomimetics routinely bind to protein sites that are different from those selected by the original peptide.35 Therefore, Type 2 mimetics maintain the ability to interfere with the peptide–protein interaction process without the necessity to mimic the structure of the natural peptide.28

Type 3 peptidomimetics reveal the best conception of peptidomimetics. They consist of the necessary chemical groups that act as topographical mimetics and contain novel chemical scaffolds that are unrelated to natural peptides.36

Here, theoretical and computational techniques to design proteins, peptides and peptidomimetics are reviewed. However, the current review does not deeply highlight the computational aspects of amino acid-based therapeutic design, but only discusses the methods used to design the mentioned therapeutics. Figure 1 summarizes the key concepts presented in this study.

As some examples, the structures of Aldesleukin, Leuprolide and Spaglumic acid, important amino acid-based therapeutics approved by the US Food and Drug Administration (FDA), are shown in Figure 2A–C. The X-ray crystallographic structures of Aldesleukin (PDB ID: 1M47; Figure 2A) and Leuprolide (PDB ID: 1YY2; Figure 2B) were obtained from the Protein Data Bank (PDB; and visualized by PyMol tool. The structure of Spaglumic acid was retrieved (in MOL format) from PubChem database ( with the PubChem ID 188803 (Figure 2C) and visualized using PyMol. Aldesleukin, a lymphokine, is a recombinant protein used to treat adults with metastatic renal cell carcinoma ( Leuprolide, a synthetic nine-residue peptide analog of gonadotropin releasing hormone, is used to treat advanced prostate cancer ( Spaglumic acid is used in allergic conditions such as allergic conjunctivitis. The drug belongs to a class of peptidomimetics known as hybrid peptides. Hybrid peptides contain at least two dissimilar types of amino acids (alpha, beta, gamma or delta) linked to each other via a peptide bond (

In the current study, all FDA-approved therapeutics (in 2018) were retrieved from DrugBank (https://www.drugbank. ca/biotech_drugs) and an analysis was conducted to compare their percentages. Protein-based therapies, gene or nucleic acid-based therapies, vaccines, allergenics and cell transplant therapies made up 8.05%, 0.17%, 2.64%, 16.20% and 0.14% of total approved therapeutics, respectively. Small-molecule drugs made up 72.76% of the approved therapeutics (Figure 3).

Selecting the template (scaffold) protein

The template (also named as scaffold protein) contains a group of backbone atom coordinates. The coordinates can be retrieved from an available X-ray crystal structure or cautiously from a nuclear magnetic resonance (NMR) structure.39 Fixing the backbone decreases the computational complication, but it may inhibit the main chain modifications to adjust sequence alternation.7 Backbone flexibility can generate designed functionalities over the protein’s normal function. The backbone flexibility is introduced through incorporating other closely associated conformations to an existing structure.40–42 Recently, new functionalities were effectively introduced into the TIM-barrel topology.43 This fold has been detected as one of the most shared structures in 21 distinct protein superfamilies.44

Sequence search and characterization

In a design procedure, a protein sequence is selected such that it meets the energetic and geometric constraints established by the chosen fold. Sequence search techniques sample different sequences and estimate their energies to gain the one owing the minimum energy.3

In order to identify the sequences subject to an objective function or a specific energy, a diverse strategies including optimization and probabilistic approaches have been developed.45 Optimization processes may recognize candidate sequences using stochastic or deterministic methods.45 Probabilistic approaches focus on characterizing the sequence space probabilistically.

Deterministic methods: To achieve a sequence folded into a global minimum energy conformation, deterministic methods search the whole sequence space and identify the global optima.3,7 These methods include dead-end elimination (DEE),46 self-consistent mean field,47 graph decomposition and linear programming.48 Stochastic algorithms search the sequence space in an exploratory manner.3 These algorithms include Monte Carlo algorithms (simulated annealing),49 graph search methods50 and genetic algorithms.51 Some of the most commonly used methods are discussed below.

DEE has been considered as a thorough search algorithm. To find and remove sequence-rotameric positions that are not portions of the global minimum energy conformation, DEE compares two amino acid rotamers and removes the one with greater interaction energy.52 Interaction energies are computed for each rotamer of the test amino acid, along with all rotamers of every other amino acid.3 The situation is repetitively examined for total amino acid states as well as their rotamers until it no longer holds true.52,53 Expanding the sequence length increases the combinatorial complication of DEE exponentially. Therefore, to design sequences of 30 amino acids or larger, application of DEE may be restricted.54 Details of the theorem are explained elsewhere.3,7

Stochastic search algorithms: As mentioned before, deterministic approaches are perfect to design proteins with small sizes, but show the applied disadvantages with extension of sequence size. Stochastic or heuristic methods are valuable to design large proteins.3 The most widely used method for protein design includes Monte Carlo sampling.3,7

Monte Carlo method samples positions of complicated proteins in a way related to a selected probability distribution such as Boltzmann distribution. Boltzmann distribution specially weighs low-energy configurations. The Monte Carlo algorithm performs iterative series of calculations. At the primary step of each search, a partially accidental test sequence is generated, and its energy is calculated via a physical potential. During the primary step, both rotamer state and amino acid identity are adjusted and an efficient temperature controls the probable energy alterations. In the next step, named simulated annealing, the temperature gradually decreases and permits favorable sampling of lower-energy configurations.55 Multiple independent calculations are carried out to converge the system to a global minimum.3,7 For more explanation about the theorems and details of the formulation of the probability distribution and weights, readers are referred to study previous reports.3,7

Probabilistic approach: Probabilistic approaches are frequently employed when thorough information is not accessible for protein design. In a probabilistic approach, site-specific amino acid probabilities may be utilized, rather than particular sequences. The procedure is partially motivated by the uncertainties to find sequences consistent with a specific structure. Briefly, the backbone atoms are fixed or greatly constrained, side chain conformations are discretely handled, energy functions are estimated and solvation is handled by simple models.7 However, in order to offer valuable sequence information for design experiments and to find structurally significant amino acids, probabilistic techniques leverage structural characteristics of interatomic interactions.7

Generally, Monte Carlo methods give a probabilistic sampling of sequences.49,55 In addition, an entropy-based formalism has been defined to predict amino acid probabilities for a certain backbone structure.56,57 The method employs concepts from statistical thermodynamics to assess the site-specific probabilities. To address the whole space of existing compositions, the theory is not restricted by the computational enumeration and sampling. Large protein structures with >100 variable residues can be supplied simply.7

Sampling sequence space to generate conformations

The chemical variability of a sequence and the number of various amino acids permitted at each position are defined as “degrees of freedom for each amino acid”. Moreover, each of the 20 natural residues search the whole sequence space.58 To decrease the degrees of freedom for each amino acid and searching the sequence space, diverse approaches such as hydrophobic patterning have been proposed.58 Monomers can be used to probe a protein structure59 and improve its function,60 other than the naturally occurring amino acids.61

Sampling of side chain conformational space to form conformations

Side chain conformations are typically consistent with the energy minima of molecular potentials and can be obtained from a structural database.62 Rotamer statuses are related to the repeatedly detected values of dihedral angles in the side chain of each amino acid. For example, the simplest amino acids including alanine and glycine have only one rotamer status, while the bigger amino acids have >80 diverse rotamer statuses.62

A variety of rotamer libraries including backbone- dependent, secondary structure-dependent and backbone-independent libraries have been developed for protein design.62,63 By using a rotamer library, one can discretize a meaningful state space to decrease the computational difficulty. Rotamer libraries can be extended beyond the 20 natural amino acids. The effective rotamers can model cofactors, ligands, water and posttranslational modifications. For example, to improve the modeling of protein–protein interactions and model water within proteins interiors, the structurally definite water molecules can be inserted as a solvated rotamer library.61

Scoring functions (energy functions)

Energy functions have been employed to quantify sequence–structure compatibilities.64 They include linear associations of hydrogen bonds made by backbone atoms, repulsion among atoms, hydrophobic attraction among non-polar groups and electrostatic interactions among sequential neighbors.65 The sequence of a protein is selected so that it can adjust the energetic and geometric constraints enforced by the favorite fold. Constraints typically contain several intramolecular interactions such as van der Waals, hydrophobic, polar and electrostatic interactions, as well as hydrogen bonds. Generally, by using a scoring function, it is possible that energetic contributions of the mentioned parameters are taken into account.3,7,65

De novo design: designing the sequence and 3D structure

Through assembly of proteins fragments66,67 or secondary-structure elements,68,69 novel structures can be modeled de novo. In the design procedures, the backbone coordinates are generally constrained.

Summary and important findings of some proteins designed using computational approach including a retroaldol enzyme,43 the Kemp elimination enzyme,70 a novel βαβ protein,71 a redesigned procarboxypeptidase,72 a novel α/β protein structure and the TOP773 are shown in Table 1.

Ligand-based peptide design

The ligand-based design has been classified as follows: 1) sequence-based, 2) property-based and 3) conformation-based design.

Sequence-based approach uses the information of conserved regions and analyzes the multiple sequence alignments. This method is directed by the hypothesis that conserved regions are functionally and structurally significant.13 Computational tools allow the ligand-based peptide design, although they lag behind bioinformatics strategies developed for protein designing.13 Recently, using a method based on a PAM250 matrix, the relationship between a series of 35 collagen peptides and antiangiogenic activity including proliferation, migration and adhesion was analyzed.74 The PAM250 matrix captured information of mutation rates among all pairs of amino acids. Based on the results, regions at the C and N termini of the peptides were detected to be significant for an ideal activity and suggested as two distinct binding sites. The approach showed the potential worth of the sequence-based peptide design.74 In another report, a computational platform called SARvision was developed to support sequence-based design. SARvision signifies an important step for peptide sequence/activity relationship (SAR) analysis. Moreover, it pools the improved visualization abilities with advanced sequence/activity analysis.75

Compared to small molecules, property-based design methods for peptides are in the early stages of development. In a recent study, the ΔG decomposition per residue and the physicochemical characteristics of amino acids, such as hydrophilicity, hydrophobicity and volume, were used to model peptide binding to targets of interest.76,77 Finally, a model was built to estimate peptide ΔG values for binding to the class I major histocompatibility complex (MHC) protein HLA-A*0201.78 Furthermore, in a wide range of studies, antimicrobial peptides were successfully analyzed by using the property-based approach.79 For example, a machine-learning method was employed to design novel antimicrobial peptides.80 The victory of the property-based methods with antimicrobial peptides may be explained by the fact that the desired biologic activity of membrane disruption is relatively nonspecific.13

In the case of conformation-based peptide design, computational techniques were developed to predict the conformational ensembles or structure of peptides and analyze the SARs.81,82 PEP-FOLD is an online tool used to predict the 3D structures of peptides of length 9–36 residues.81 A remarkable suggestion from the data is that PEP-FOLD seems to solve the conformational sampling problem.13,81

In order to search conformational spaces of a peptide, long timescale molecular dynamic simulations have been employed.83,84 Besides, quantum mechanical calculations are promising to address the scoring deficiency in the peptide conformational examination.85 Apparently, to affect the peptide design processes positively, improving the major theoretical and technical issues is necessary before such computationally sophisticated and costly procedures.

Conformation of a peptide may be modeled to generate a 3D pharmacophore hypothesis. A certain pharmacophore hypothesis is useful to determine the ADME/Tox activities or particular potencies of a peptide.86 For example, screening of a peptide library was jointed to generate a pharmacophore hypothesis to identify potent agonists of melanocortin-4 receptor isoforms. A combinatorial tetrapeptide library was screened, and SAR and ligand-derived pharmacophore templates were generated. The pharmacophore hypothesis was proposed to allow continuous attempts in the rational design of melanocortin receptor molecules.86

Structure survey

Recently, an increase in the number of protein–peptide 3D structures deposited in the PDB has assisted to search the molecular mechanism and structural basis of peptide recognition and binding.87 Information of crystal structures of protein–peptide complexes can improve our knowledge of the chemical forces involved in the binding and special modes of binding. Dynamic data of the complexes can be partially extracted from the solution NMR structures deposited in the PDB. To record the structures and functions of various protein–peptide complexes, the experimentally resolved structure data were gathered, annotated and analyzed, and several distinctive databases such as PepX,88 PepBind89 and peptidDB were generated.90 The PepX database, derived from the PDB, comprises unique protein–peptide interface collections.88 The PepBind database contains 4,986 protein–peptide complex structures from the PDB.89 PeptidDB is a curated database of 103 protein–peptide complexes.90

The abundance of the structural information specifically on monomeric proteins could be gathered to design protein–peptide interactions with no requirement for their sequence homology.91

Protein–peptide docking

Precise docking of a highly flexible peptide is a major challenge.18 Traditional docking protocols, such as AutoDock, Vina92,93 and MOE-Dock,94 developed for docking of small molecules, were also used to dock a peptide to a protein receptor. However, comparative studies revealed that these techniques would face failure if the docked peptides were >3 residues long.95 Therefore, development of peptide-focused docking protocols is very important.96 Other protein–protein docking tools such as z-dock and Hex have been used for the computational peptide design in some studies.96 Below, details of recently developed peptide-focused docking approaches are discussed.

First, heuristic evolution procedures were applied to search the large conformational space of linear peptides before the binding.97 However, these procedures were not efficient and their use was limited.18 Then, a scheme based on conformational sampling became common in the peptide docking. Besides, several illustrative approaches were proposed to balance between the accuracy and efficacy of the flexible peptide docking. In this aspect, DocScheme,98 DynaDock99 and pepspec100 were integrated to online user-friendly interfaces and introduced.

Recently, PepCrawler101 and FlexPepDock102 were developed as the peptide docking tools.18 It is reported that FlexPepDock102 has sub-angstrom accuracy in reproducing the crystal structures of protein–peptide complexes.103 All of the FlexPepDock-based methods assume previous information about the peptide-binding site.13

AnchorDock, a recently described algorithm, allows powerful blind docking calculations through relaxing the constraint.104 The program predicts anchoring origins on a protein surface. Following recognition of the anchoring origins, an assumed peptide conformation is refined using an anchor-constrained molecular dynamic process.105

HADDOCK, a well-known protein–protein docking tool, has been recently expanded to run the flexible peptide–protein docking.105 To handle a docking procedure, HADDOCK uses ambiguous interaction restraints based on the experimental information about intermolecular interactions. This rigid body peptide docking is followed through a flexible-simulated annealing process. The novel HADDOCK strategy initiates docking computations from an ensemble of three dissimilar peptide conformations (eg, α-helix, extended and polyproline-II) that are high informative inputs.105

CABS-dock is a recently introduced protein–peptide docking tool and runs a primary docking procedure whose outcomes can be refined by other tools such as FlexPepDock.106 In the primary phase of the procedure, random conformations of a peptide are predicted and located around the protein target of interest. The process is followed by replica exchange Monte Carlo dynamics. Subsequently, 10 models are selected for the last optimization using the Modeller tool to gain accurate scoring and ranking poses.13,106

GalaxyPepDock was developed to use experimentally resolved protein–peptide structures for running the template-based docking pooled by flexible energy-based optimization.107

Atomistic simulation

Atomistic Monte Carlo and molecular dynamics simulations are accurate, but they are meticulous techniques to investigate peptide–protein binding interactions. These techniques can also detect the thermodynamic profile and trajectory included in protein–peptide identification. These methods predict the association among conformations of a peptide in solution or protein.108 In a study, in order to describe the binding of a decapeptide to the cognate SH3 receptor, a long-term molecular dynamic simulation was used and a two-state model was built.109 In the first step, a relatively quick diffusion phase, nonspecific encounter complexes were generated and stabilized by using electrostatic energy. The secondary step was a slow modification phase, in which the water molecules were emptied out from the space between the peptide ligand and the receptor.109 In another report, by using Monte Carlo method, the mentioned two-state model was verified to trace some oligopeptide routes for binding to various PDZ (Post synaptic density protein, Drosophila disc large tumor suppressor, and Zonula occludens-1 protein) domains.110

The affinity of BH3 peptides to Bcl-2 protein was investigated, and results showed the higher affinity of bound peptides occurred when the corresponding peptides were in a lower degree of disorder in unbound states and vice versa.111 These results showed that the highly structured peptides could increase their affinity through reducing the entropic loss associated with the binding. Overall, in addition to the electrostatic and hydrophobic forces, protein–peptide interactions can be affected by the entropic effect and conformational flexibility that could be willingly examined with atomistic simulations.111

Very recently, using a fast molecular dynamics simulation, the energetic and dynamic features of protein–peptide interactions were studied. In most cases, the native binding sites and native-like postures of protein–peptide complexes were recapitulated. Additional investigation showed that insertion of motility and flexibility in the simulation could meaningfully advance the correctness of protein–peptide binding prediction.112

Peptide affinity prediction

Most features of computational peptide design are based on the accuracy and efficacy of affinity prediction. Hence, the fast and reliable prediction of peptide–protein affinity is significant for rational peptide design.18 In this aspect, two categories of prediction algorithms including sequence- and structure-based approaches were developed. The sequence-based method uses the information derived from primary polypeptide sequences to approximate and evaluate the standards of the binding affinity. The structure-based process takes the information derived from 3D structures of protein–peptide complexes to predict the binding affinity.113

At the sequence level, the quantitative structure–activity relationships (QSARs) have been widely utilized to forecast the binding affinity of peptides and conclude the biologic function.114 To model the statistical correlation between sequence patterns and biologic activities of experimentally assessed peptides, machine-learning methods such as partial least squares (PLS), artificial neural networks (ANN) and support vector machine (SVM) have been used. The obtained correlations have been used to infer experimentally undetermined peptides.115

The relationship between the biologic activity and molecular structure is an important issue in biology and biochemistry. QSAR is a well-established method employed in pharmaceutical chemistry and has become a standard tool for drug discovery. However, the predictive capacity of QSAR techniques is generally weaker than statistics-based approaches. Therefore, a combination of the QSAR method with a statistic-based technique may bring out the best in each other and can be a trend in future developments of drug discovery.114

At the structural level, numerous reports on affinity prediction have addressed the MHC-binding peptides. Plentiful MHC–peptide complex structure records have been deposited in the PDB.116

The significance of domain-peptide recognition has been recently illustrated in the metabolic pathway and cell signaling.117 To predict the protein–peptide binding potency, a number of strict theories were suggested based on the potential free energy perturbation. The theories computed the alteration of free energies upon the interaction between phosphor-tyrosine-tetra-peptide (pYEEI) and human Lck SH2 domain.118 Furthermore, to obtain a deep insight into the structural and energetic aspects of peptide recognition by the SH3 domain, a number of molecular modeling experiments such as homology modeling, molecular docking and mechanism dynamics were used.119 Peptide array strategies confirmed that some peptide candidates may be potent binders of the Abl SH3 domain.120 Very recently, an approach including quantum mechanics/molecular mechanics, semi-empirical Poisson–Boltzmann/surface area and empirical conformational free energy analysis was developed to quantitatively illustrate the energetic contributions involved in the affinity losing of PDZ domain and OppA protein to their peptide ligands.121,122

De novo peptide design

Recently, in order to de novo target-based peptide design, two remarkable methodologies including the VitAL method and an approach developed by Bhattacherjee and Wallin were introduced. The VitAL method pools verterbi algorithm with AutoDock to design peptides for the binding sites of a target.123 The “Bhattacherjee and Wallin” approach explores both peptide sequence and conformational space around a protein target at the same time.124 This approach was tested on three dissimilar peptide–protein domains to assess its ability.13

A brief list of the existing computational resources employed in peptide design is presented in Table 2.

In silico peptidomimetics design

In recent years, some computational methods have been proposed to design peptidomimetics. These methods can be classified based on their specificity to translate peptides to peptidomimetics.28 To select the best method, awareness about the structure of peptide–protein complexes is important.28,96 Herein, recently introduced methods for computer-aided design of peptidomimetics are presented.

De novo design method

GrowMol is a combinatorial algorithm employed in the peptidomimetics design. GrowMol searches a variety of probable ligands for the binding sites of a target protein125 and produces molecules with the chemical and steric complementarity for the 3D structure of binding sites.

This method was used to generate peptidomimetic inhibitors of thermolysin, HIV protease and pepsin. By using the X-ray crystal structures of pepstatin–pepsin complexes, GrowMol predicted therapeutic peptidomimetics against the aspartic proteases. The algorithm created some cyclic inhibitors bridging the side chains of cysteine residues in the Pl and P3 inhibitor subsites. The binding modes were checked using X-ray crystallography.125,126

LUDI is another interesting software referring to the de novo methodology.127 By using natural and non-natural amino acids as building blocks, the software designed peptidomimetics against renin, thermolysin and elastase.127 Conformational flexibility of each novel peptidomimetic was searched through sampling the multiple conformers of each amino acid.127

Peptide-driven pharmacophoric method

Peptide-driven pharmacophoric hypothesis is the most perceptive computational technique discovered in the peptidomimetics design. The method is especially useful when the X-ray structures of protein–protein complexes exist.28 The main idea is to adapt the hot spot concept into the associated pharmacophoric feature concept. With a pharmacophore-based virtual screening process, this strategy can determine novel type 3 mimetics.128 In fact, the side chains of each amino acid can be simply categorized based on the conventional pharmacophoric characteristics, such as hydrogen bond donors and acceptors, aromatic ring and charged and hydrophobic centers.

For example, in a report, pharmacophore model directed synthesis of the non-peptide analogs of a cationic antimicrobial peptide identified an anti-staphylococcal activity.129 To make a pharmacophore hypothesis, a model of RNA III-inhibiting peptide (RIP), a well-known heptapeptide inhibitor of the staphylococcal pathogenesis, was utilized. Through the virtual screening of 300,000 commercially available small molecules based on the RIP-based pharmacophore, Hamamelitannin was discovered as a non-peptide mimetic of RIP. Hamamelitannin is a tannin derivate extracted from Hamamelis virginiana.28,129

In another study, two rounds of in silico screening were performed to discover potential peptidomimetics able to mimic a cyclic peptide (cyclo-[CPFVKTQLC]) that is known to bind the anb3 integrin receptor.130 At the end of the process, the most potent representatives were at least 2,000 times better than the original cyclopeptide (around 2 mM).130

In a prosperous instance, virtual screening was done by using multi-conformational forms of a large commercial library. A target-based pharmacophoric model mapped the CD4-binding site on HIV-1 gp120. The pharmacophore hypothesis was made based on a homology model of the protein cavity. In a cell-based assay, two of the top scoring molecules were detected as micromolar inhibitors of HIV-1 replication.131

The pharmacophore-based screening was used to find the novel Alzheimer’s therapeutics as mimetics of neurotrophins.132 The therapeutic utilization of neurotrophins might be restricted because of several deficiencies such as its reduced central nervous system penetration, decreased stability and potency to enhance neuronal death through interaction with the p75NTR receptor. The mimetism of particular nerve growth factor domains could inhibit neuronal death. Peptidomimetics of the loop 1 and loop 4 domains of nerve growth factor can prevent neuronal death induced by p75NTR-dependent and Trk-related signaling.132

In another study, a full-computational pharmacophore-based approach assessed the FDA-approved drugs as valuable candidates to inhibit protein–protein interactions.133 Peptide structures were designated in terms of pharmacophores and searched against the FDA-approved drugs to detect same molecules. The top ranking drug matches contained several nuclear receptor ligands and matched allosterically to the binding site on the target protein. The top ranking drug matches were docked to the peptide-binding site. The majority of the top-ranking matches presented a negative free energy change upon binding that was comparable to the standard peptide.133

Geometry similarity method

Geometry similarity methods create a geometric similarity between non-peptide templates and peptide patches. In a study, the SuperMimic tool was developed to recognize peptide mimetics.134 In the program, a complex library of peptidomimetics composed of several protein structure libraries has been deposited. Moreover, SuperMimic includes the D-peptides, synthetic components (reported as beta-turn or gamma-turn mimetics) and peptidomimetic ligands obtained from the PDB.134 In the program, the searching process allows scanning a library of small molecules that mimic the tertiary structure of a query peptide followed by scanning of a protein library where a query for small molecule can adopt into the backbone.28,134

Sequence-based method

Recently, a method has been developed to rank peptide compound matches that are limited to short linear motifs in proteins and compounds with amino acid substituents.135 The algorithm allows mapping the side chain-like substituents on every compound of a large chemical library. The complete molecule can be signified by a short sequence, and each fragment in the molecule can be represented as a distinct letter abbreviation.28 A cross-search between the PubChem database (about 5.4 million molecules) and a non-redundant collection of 11,488 peptides obtained from PDB demonstrated that the algorithm can be useful for high-throughput measurements.28 To recognize a true positive, the method explored identified protein motifs against the National Cancer Institute Developmental Therapeutic Program compound database.135

In another study, the Similarity of Amino Acid Motifs to Compounds web server was developed to ease screening of identified motif structures against bioactive compound databases.136 The methodology was reported to be efficient since the compound databases were preprocessed to maximize the accessible data, and the necessary input data was minimal.136 In Similarity of Amino Acid Motifs to Compounds, motif matching can be full or partial that may decrease or enhance the number of potential mimetics, respectively. Using a novel search algorithm, the web service can perform a fast screening of known or putative motifs against ready compound libraries. The classified results can be examined by linking to appropriate databases.28,136

Fragment-based method

Replacement with Partial Ligand Alternatives through Computational Enrichment is a fragment-based approach.137

By using structures of peptide-bound proteins as design anchors, the program can computationally find a non-peptide mimetic for specific determinants of known peptide ligands.137

Hybrid peptide-driven shape and pharmacophoric method

Development and application of strategies for pharmacophore modeling indicate that the medicinal chemistry community has broadly accepted the intuitive nature of the pharmacophore concept. Besides, shape complementarity has been identified as a significant element in the molecular identification between ligands and their targets.28 In virtual screening efforts, using the pharmacophore- and shape-based techniques distinctly may increase the rate of false-positive results.128 Therefore, incorporating both pharmacophore- and shape-matching techniques into one program can potentially diminish the rate of false positives.128

Recently, to discover novel peptidomimetics, a web-oriented virtual screening tool named pepMMsMIMIC138 was developed to pool the conventional pharmacophore matching with shape complementarity. A library of 17 million conformers were extracted from 3.9 million commercially available chemicals and gathered in the MMsINC database. The database was used as a skeleton to develop pepMMsMIMIC.139 In the pepMMsMIMIC interface, the 3D structure of a protein-bound peptide is used as an input. Then, chemical structures able to mimic the pharmacophore and shape similarity of the original peptide are proposed to involve in the protein–protein recognition.139

A list of in silico methods used to design potential peptidomimetics along with their strengths and weaknesses is presented in Table 3.


Overall, design and development of therapeutics are tedious, expensive and time-consuming procedures. Therefore, using modern approaches including computer-aided design methods can lessen the examination phase, price and failure of therapeutics discovery. Computational methods used to design amino acid-based therapeutics can increase the range of available biotherapeutics. Benefiting from the dramatic advance in bioinformatics, computational tools can be used to find and develop therapeutic proteins, peptides and peptidomimetics.140,141 Moreover, using the computational tools decrease the cost of therapeutics development, from concept to market, by up to 50%.140

However, in the computational protein designing, there are some challenges such as our inadequate knowledge of folding and physical forces that stabilize protein structures. Moreover, sequences and local structures have many degrees of freedom that can complicate the sequence search. Therefore, there is a requirement for effective methods to find sequences related to a particular structure and measure essential protein folding criteria.

Overall, in silico design of amino acid-based therapeutics includes many challenges that should be removed to improve the overall performance of the design processes. For example, although structure determination of all disease-related proteins through crystallography and NMR is a laborious task, it is necessary to gather much structural information of peptide–protein interactions. Besides, development of vigorous algorithms to calculate protein–protein binding energies is essential. The estimation of binding constant between two macromolecules with an appropriate speed–accuracy tradeoff needs millisecond scale molecular dynamics. Moreover, understanding of both protein–protein and protein–peptidomimetics recognition processes in a molecular level can be improved using higher accurate force fields such as quantum mechanical polarizable force.

In recent years, there are growing examples on the approval of monoclonal antibodies (therapeutic antibodies) by the FDA for treatment of various diseases. This important area of amino acid-based therapeutics has been covered in more depth elsewhere.142,143 For more explanation about the theorems and details of antibody informatics for drug discovery as well as the computer-aided antibody design, readers are referred to study previous reports.142,143