Dataset: 11.1K articles from the COVID-19 Open Research Dataset (PMC Open Access subset)
All articles are made available under a Creative Commons or similar license. Specific licensing information for individual articles can be found in the PMC source and CORD-19 metadata
More datasets: Wikipedia | CORD-19

Logo Beuth University of Applied Sciences Berlin

Made by DATEXIS (Data Science and Text-based Information Systems) at Beuth University of Applied Sciences Berlin

Deep Learning Technology: Sebastian Arnold, Betty van Aken, Paul Grundmann, Felix A. Gers and Alexander Löser. Learning Contextualized Document Representations for Healthcare Answer Retrieval. The Web Conference 2020 (WWW'20)

Funded by The Federal Ministry for Economic Affairs and Energy; Grant: 01MD19013D, Smart-MD Project, Digital Technologies

Imprint / Contact

Highlight for Query ‹SARS-CoV-2 medication

Potential Broad Spectrum Inhibitors of the Coronavirus 3CLpro: A Virtual Screening and Structure-Based Drug Design Study

1. Introduction

Coronaviruses (CoV) affect a diverse group of animal hosts, and cause a plethora of diseases in animals including progressive peritonitis, acute and chronic hepatitis, gastroenteritis, nephritis, and encephalitis. In humans, coronavirus infection results in respiratory tract complications with varying degree of severity and has been associated with gastroenteritis. Four human coronaviruses (HCoV; 229E, OC43, NL63 and HKU1) are endemic in the human population and are mainly associated with mild, self-limiting respiratory illnesses. Another two human coronaviruses, namely severe acute respiratory syndrome (SARS)-CoV and Middle East respiratory syndrome (MERS)-CoV, cause severe respiratory syndromes and present a significant threat with their high fatality rates. The clinical presentation of the four non-severe, endemic, human coronaviruses is largely indistinguishable symptomatically, commonly presenting with rhinorrhea, sore throat, cough and fever. Majority of infections are associated with self-limiting upper respiratory tract disease or “the common cold” but can also present with high morbidity outcomes of the lower respiratory tract including bronchiolitis, pneumonia,, asthmatic exacerbations acute exacerbations of chronic obstructive pulmonary disease (COPD) and croup in HCoV-NL63 infected patients, and commonly results in hospitalization. Febrile seizures have been reported for most human coronaviruses but the significance of HKU1 is alarming with one study indicating that 50% of patients infected with HKU1 experience febrile seizures.

HCoVs affect all age groups but elicit more serious disease in young, elderly and immunocompromised, frequently resulting in hospitalization. Reports on the prevalence of HCoVs and their association with upper and lower respiratory tract disease vary but range between 3.3% and 16%. Over 70% of the general public has seroconverted towards all four non-severe HCoVs with primary infection shown to occur in childhood and reinfection occurring throughout life. Given the high prevalence of respiratory infections, human coronaviruses represent a substantial disease burden, which is exacerbated by the high implications of healthcare workers in coronavirus outbreaks.

The outbreak of SARS-CoV in 2003 and MERS-CoV less than 10 years later highlights an additional significant threat of coronaviruses to humans and confirms that the SARS outbreak in 2003 was not an isolated incident. With the ever increasing diversity of animal coronavirus species, especially within bats, the likelihood of recombination leading to future outbreaks is high and the threat of potential pandemics is real as highly pathogenic coronaviruses continue to spill over from zoonotic sources into the human population. Misdiagnosis of these outbreaks pose a further substantial threat to healthcare workers with nosocomial spread to other patients putting further pressure on an already strained healthcare system. As there is currently no antiviral strategy to combat infection with human coronaviruses, and supportive care is the primary treatment regime, the importance of identifying broad spectrum lead compounds is increasing.

The 3CLpro has proven to be a valuable target in drug discovery efforts and has been validated as an effective drug target in several studies. It has even been termed “the Achilles’ heel of coronaviruses”. A wide variety of inhibitor classes, including peptidomimetic analogues, covalent inhibitors and small molecule inhibitors have been assessed. These compounds have been identified by ab initio structure-based design, high throughput and virtual screening, where inhibitors either target the enzyme active site or the allosteric dimerization domain. The first generation of 3CLpro inhibitors were irreversible peptidomimetic structures, often five residues in length with at reactive warhead at the terminus that formed a covalent bond between the thiolate anion of the catalytic Cys145 residue and the reactive atom of the warhead. These reactive warheads have included Michael acceptors, aldehydes, epoxy-ketones, halo-methyl ketones, and trifluoromethyl ketones. Peptide derivative warhead inhibitors were later followed by the development of non-peptidic covalent inhibitors. The use of covalent inhibitors is however limited due there propensity for off-target side-effects and toxicity. Recent studies have therefore focused more on the development of noncovalent inhibitors, which have generally produced large peptidomimetic compounds with low ligand efficiency and currently there is still no effective therapy for the treatment of HCoVs. All coronavirus 3CLpro share a high sequence homology, as well as main chain architecture and substrate conservation, which makes the identification of broad spectrum lead compounds more viable.

The substrate binding site of the 3CLpro has two deeply buried S1 and S2 subsites, as well as shallow S1′, S3 and S4 subsites with varying degrees of solvent exposure. Substrate specificity of coronavirus 3CLpro is mainly determined by the P1, P2 and P1′ positions. The P1 position has an absolute specificity for glutamine which stabilizes the S1 subsite via a hydrogen bond with the imidazole Nε2 of His162/3 and van der Waals interactions with surrounding residues of the S1 pocket. The P2 site has a preference for leucine or methionine to fill the hydrophobic S2 pocket. The sidechains of the S3 site are solvent-exposed and therefore this site is expected to tolerate a wide range of functionality, but shows a preference for basic residues. Sidechains and backbones of residues surrounding the S4 site create a highly congested pocket which favors a small, hydrophobic residue in the P4 position, either Ser, Thr, Val or Pro. The S1′ and S2′ subsites also accommodate small residues in the P1′ and P2′ positions, which may include Ser, Ala or Gly. A typical cleavage recognition site is therefore (Ser, Ala)-(Val, Thr)-Leu-Glu ↓ (Ser, Ala, Gly), which is conserved among all coronavirus 3CLpro. These features can therefore be exploited in the design of potential broad spectrum lead compounds.

2.1. Consensus Docking and Scoring with Vina, Glide, Gold and Molecular Mechanics-Generalized Born Surface Area (MM-GBSA)

Crystal structures of the 3CLpro for 229E, NL63, HKU1 and SARS-CoV were obtained from PDB. The 3CLpro of OC43 was obtained by homology modeling. Vina performed the initial high throughput screen of the Drugs-Now dataset from ZINC, comprising a total of approximately 6.5 million compounds. Ligand files were obtained in 3D SDF format and converted to PDB with Open Babel. Ligand PDB files were then converted to PDBQT with the script.

The initial virtual screen was conducted with 2ZU2 (3CLpro of HCoV-229E) as the receptor as it comprises the largest binding pocket and therefore should identify the largest amount of primary leads. The bound ligands of apo structures were removed to create an appropriate cavity within the binding site. For docking with Vina, the receptor was processed with Autodock Tools where water molecules were removed and polar hydrogens and Gasteiger charges were added. Histidine was protonated in the neutral state at the epsilon nitrogen (Nε2) in accordance with crystal data. The grid box for searching was centered on the active His41 residue with grid box spacing set to 1 Å and extended 26 grid points in each direction to encompass the entire binding pocket. The top ranked 1500 ligands from the virtual screen formed the primary hit-list as this was an acceptable quantity for the medium throughput analysis and subsequent refinement. The lowest binding score within the hit-list corresponded to a binding affinity of −9.5 kcal/mol and therefore all compounds with an equivalent binding affinity were progressed.

For docking and later refinement with Glide the 3CLpro structures were prepared in Maestro with Schrödinger’s Protein Preparation Wizard. Receptors were minimized by a restrained minimization using the OPLS2005 force field, during which, heavy atoms were restrained to remain within 0.3 Å of the original structure. To define the binding site for docking with Glide, each structure was aligned and an active inhibitor (obtained from the crystal structure of the HKU1 3CLpro) was superimposed over the active site. For docking in Glide the receptor grid was generated around the superimposed ligand to accurately encompass the binding pocket. The grid box was generated 10 Å in each direction and a constraint was set for the ligand to form a hydrogen bond with the pivotal His162 residue of the S1 active pocket. All other settings remained default to Glide receptor grid generation. Glide docking was performed with default settings using standard precision (SP) mode with the fulfilment of the previously defined constraint. Ligands which did not fulfil this constraint were automatically excluded. A binding score cut-off, defined as −7 kcal/mol, was used to further filter the hit-list. This cut-off was predetermined based on the binding affinity of a known inhibitor.

Filtered compounds were subsequently imported into GOLD for redocking into the 2ZU2 structure prepared in Maestro where the active site was again defined by superimposition of the active ligand. An automated docking procedure with default parameters were used including 50 docking attempts and clusters defined by heavy atoms of ligands separated by >0.75 Å root-mean-square deviation (RMSD). The GOLDScore fitness scoring function was used to rank ligands based on a GOLDScore cut-off of 60. To ensure an accurate prediction of binding pose, the RMSD between the poses predicted by Vina, Glide and GOLD were measured using the script of Schrödinger. Ligands where the RMSD was greater than 2 Å between predicted poses were excluded.

The Prime/MM-GBSA method was used to calculate the binding free energy of each ligand within the docking complex, where the pose predicted by Glide was utilized. The free energy of binding is the calculated difference between the minimized receptor-inhibitor complex and the sum of the energies of the minimized unbound inhibitor and receptor. Ligand poses were minimized using the local optimization feature in Prime, where energies of the complex were calculated with the OPLS-2005 force field and GBSA continuum solvent model. As a final filter a binding free energy cut-off of −80 kcal/mol was defined by determining the free energy of binding of a known active inhibitor.

2.2. Pharmacophore Model for VS

Pharmacophore tools of Molecular Operating Environment (MOE) were used to define 3D pharmacophoric features using the 3IH ligand and 3D23 structure for the initial high throughput virtual screen. Features included an essential hydrogen bond with the His163 residue and a hydrophobic moiety in the P2 position. Excluded volumes of 1 Å were set to better consider the steric effects. Non-essential features included a hydrogen bond donor and acceptor in the Glu166 position, a hydrogen bond acceptor in the P4 pocket, a single hydrogen bond donor in the P2 and P3 pocket and an acceptor in the P1′ pocket. A total of four pharmacophoric features were required within the excluded volumes of 1 Å. The pharmacophore model again screened the Drugs-Now dataset. Ligands satisfying the pharmacophore query within an RMSD of 0.5 Å were protonated with Open Babel and steric clashes caused by the ligand in order to satisfy pharmacophore constraints in the absence of explicit hydrogens were removed by refine only docking with Glide. A hydrogen bond constraint with His163 was used as a primary filter. Compounds were immediately assessed with the MM-GBSA scoring function, to avoid possible false negatives of the less reliable docking scoring functions, where the previously described MM-GBSA cut-off of −80 kcal/mol was again implemented.

2.3. Characterization of Broad Spectrum Potential

The above mentioned techniques conducted a high throughput virtual screen of the ZINC Drugs-Now dataset by docking and pharmacophore modeling. The structure-based screen identified potential binders of 2ZU2, where the ligand-based approach focused on 3D23. To identify which of these compounds would potentially form broad spectrum inhibitors the remaining three 3CLpro structures were incorporated into the study. All five structures were aligned and ligands identified by docking and pharmacophore modeling were superimposed over individual active sites. Side chains of residues in the active pocket assuming orientations that resulted in a steric clash with the ligand were refined and the ligand was minimized with Prime to better fit the pocket. MM-GBSA methods described previously were again applied to predict the free binding energy of the complex. As before, a binding energy of −80 kcal/mol was used as a cut-off and the hydrogen bond constraint in the S1 pocket was again applied to discriminate between potential binders and non-binders. In an effort to not miss potential leads the interpretation of certain parameters were left flexible if other parameters supported the advancement of the compound.

2.4. Molecular Dynamic Simulation

The CHARMM27 all atom force field was assigned to receptor structures using the three point TIP3P water model. All input hydrogens in the coordinate file were ignored and were assigned according to the force field. Histidine protonation states were assigned to the epsilon nitrogen in the neutral form in accordance with crystallographic data. The ligand, in the pose predicted by molecular docking, was prepared by SwisParam. The protein-ligand complex was then placed in the center of a solvated, cubic box, 1 nm from the box edge. The system was solvated with the spc216 water model and sodium counter ions were added to neutralize the system. The system was energy minimized by a brief dynamics simulation using Steepest Descent Minimization until steepest descent converged to Fmax <1000. Long range electrostatic interactions were calculated with the Particle Mesh Ewald (PME) method with a cutoff of 1 nm and periodic boundary conditions were applied. Both NVT and NPT equilibrations were run for 50,000 steps or 100 ps using a 2 fs time step and a leap-frog integrator. All bonds were constrained by the LINCS algorithm. PME was again used for long range electrostatics with a 0.16 Fourier spacing. Short range electrostatic and van der Waals cutoffs were 1.0 nm and 1.4 nm respectively. Coordinates, velocities and energies were saved every 0.2 ps. During NVT equilibration temperature coupling was achieved by V-rescale algorithm with a target temperature of 300 K. Pressure coupling during NPT equilibration was achieved via the Parrinello-Rahman algorithm and is incorporated in the NPT equilibration once temperature is stable to ensure the proper density is reached (approximately 1000 kg/m3). Following NVT and NPT system equilibration an unrestrained, 10 ns simulation (5,000,000 steps) was run with the leap frog integrator, saving coordinates, velocities and energies every 2 ps. All other parameters were maintained from the NVT and NPT equilibration including both temperature and pressure coupling.

3.1. Starting Structures

The 3CLpro of 229E was previously solved in complex with an inhibitor by X-ray crystallography at a high resolution of 1.8 Å. The corresponding crystallographic structure, with Protein Data Bank (PDB) code 2ZU2, shows a dimeric structure with the inhibitor, zinc N-ethyl-N-phenyldithiocarbamate (EPDTC), coordinated to the catalytic dyad residues of His41 and Cys144 with a zinc-centered tetrahedral geometry. The apo form of the NL63 3CLpro was solved in its dimeric form by X-ray crystallography to a high resolution of 1.6 Å with the respective PDB code, 3TLO. The HCoV-NL63 3CLpro structure (3TLO) however remains unpublished in any academic journals. Recently, our group solved and published the 3CLpro structure of OC43 by homology modeling.

The PDB code, 3D23, represents the structure of the 3CLpro of HKU1. It was previously solved by X-ray crystallography to a resolution of 2.5 Å in complex with a peptidomimetic, covalent inhibitor. The PDB entry contains a total of four chains, two chains form the active dimeric structure with the remaining two found in monomeric forms. Each chain contains the covalent inhibitor bound to the active site which undergoes nucleophilic attack of the active cysteine Sγ to form an irreversible, covalent bond. As this inhibitor possesses several pharmacophoric features common to the natural substrate and favorable to drug development it was utilized to develop a pharmacophore model for virtual screening.

There are several structures available which represent the SARS-CoV 3CLpro. However, the majority of these include structures in complex with a peptidomimetic, covalent inhibitor or mutational studies. At the time this study commenced, only one structure could be identified in complex with a small molecule inhibitor. This structure, with PDB code 3V3M, also represented the only coronavirus 3CLpro with a noncovalent inhibitor. As this structure could be used for necessary benchmarking in latter studies, it was chosen to represent the SARS-CoV 3CLpro. The crystallographic structure contains a noncovalent, furyl amide ligand, ML188, which was identified by a high throughput screen, with subsequent lead optimization based on structure-activity relationships within the S1, S1′ and S2 binding pockets.

3.2. Defining Benchmarks and Determining Accuracy of Vina and Glide

Molecular docking algorithms are often calibrated on a training set of experimental ligand-protein complexes and accuracy of these docking programs is often highly dependent on the training set used. This highlights the importance of confirming that the docking software used for virtual screening is capable of replicating the binding mode of a known, experimental inhibitor for the class of enzymes studied.

In order to confirm that the molecular docking algorithms used in this study are well suited to the 3CLpro of human coronaviruses, and their related class, a previously solved experimental structure with a bound, noncovalent inhibitor is required. Accurate replication of the experimental binding pose by molecular docking will confirm the suitability of the docking algorithm for virtual screening. Initial studies on structure-based design of inhibitors, targeting the 3CLpro of coronaviruses, were focused on peptidomimetic compounds with a reactive warhead group that formed an irreversible, covalent interaction with the active Cys residue. These large, covalent inhibitors would be incompatible with molecular docking as replication of the quantum mechanics necessary for covalent bond formation is outside the capabilities of molecular docking. From the PDB several experimental structures of the 3CLpro of various species of coronaviruses with bound inhibitor could be identified. Almost all of these structures represented large, peptidomimetic, covalent inhibitors, with the exception of 3V3M. The structure of 3V3M represented the only noncovalent inhibitor of modest size that we could identify from the PDB. Replication of the binding mode of the bound inhibitor, ML188, by the docking algorithms used in the virtual screen and subsequent refinement process would indicate that they are well calibrated for this class of receptor.

Default parameters applied to the high throughput virtual screen for Vina and refinement with Glide were applied to the inhibitor and receptor of 3V3M. An RMSD below 2 Å for heavy atoms (excluding hydrogens) between the experimental structure and predicted pose of docking is a well-defined benchmark to assess the accuracy of molecular docking sampling algorithms. Figure 1 represents the experimental pose of the ML188 ligand with overlaid predicted pose of Vina and Glide.

Vina was able to accurately predict the pose of ML188, where the top ranked pose was within an RMSD of only 0.244 Å to the experimental structure within the pocket of 3V3M (Figure 1a). The orientation of the P1′ furan is somewhat distorted with the conformation of the ring structure being assumed to orientate at a right angle to the experimental pose. The overall predicted pose of Vina corresponds to the experimental pose with a high level of accuracy. Glide was also capable of accurately predicting the pose of ML188 within the pocket of 3V3M within an RMSD of 0.79 Å (Figure 1b). This indicates that sampling by both Vina and Glide are well calibrated for the class of enzyme studied and, as the remaining structures are highly homologous to 3V3M, they will achieve an equivalent level of accuracy. These results strongly support the use of Vina as high throughput tool for virtual screening and indicates that with the use of Glide as a subsequent refinement tool, the accuracy of predicting the binding pose of novel ligands will be high.

The ML188 lead compound also provided a means to set benchmarks for scoring functions and free energy of binding estimation. The inhibitor achieved a binding affinity estimation of −8.1 kcal/mol when bound to 3V3M. In an effort to reduce the numbers of false negatives this value was decreased to −7 kcal/mol for filtering purposes. For MM-GBSA scoring it is advisable that the threshold value is defined by an experimental structure. The highly homologous structure of 3V3M with the known active inhibitor ML188 was therefore scored with MM-GBSA. The free energy of binding between 3V3M and ML188, where ligand strain was taken into account, was equal to −85 kcal/mol. As this ligand is highly suited towards the structure of 3V3M, where our goal is the identification of broad spectrum inhibitors, the cut-off value was decreased to −80 kcal/mol. This was done to ensure that ligands with potential broad spectrum activity were not missed based on their predicted inability to bind to a single receptor.

3.3. Molecular Docking

Vina was chosen as the tool to conduct the initial high throughput screen of the approximately 6.5 million compounds of the Drugs-Now dataset of the ZINC database. For the initial screen, 2ZU2 was selected as the receptor to identify the highly refined subset of potential inhibitors. This receptor possessed the largest binding pocket of all structures when analyzed by SiteMap (data not shown) and it was therefore proposed to provide the largest number of initial hit compounds. The primary hit-list consisted of the top 1500 ranked compounds from the high throughput virtual screen (HTVS) of the ZINC Drugs-Now dataset conducted by Vina, a quantity where further refinement and analysis would be achievable. Ranking the initial virtual screen based on docking score alone serves as a starting point in the selection criteria and assumes that the predicted binding affinity somewhat reflects true binding affinity. The lowest binding score within this hit-list corresponded to −9.5 kcal/mol. Scoring functions have shown success in differentiating between binders and non-binders by ranking the binding score. For this reason, all ligands which achieved this score were also included in the study, yielding a total of 1527 hit compounds. Further assessment and refinement of this initial hit-list of 1527 ligands was conducted by redocking with Glide and GOLD and scoring with their respective scoring functions and MM-GBSA.

To improve the likelihood of identifying actual active compounds, three sampling algorithms and four scoring functions were applied to the hit-list identified by Vina in a consensus based approach. This approach has frequently improved enrichment rates of active compounds in a virtual screening context, in comparison to the use of a single algorithm for docking and scoring. The initial stage of refinement included redocking with Glide under standard precision and rescoring with its internal Glide GScore scoring function. In order to define a threshold binding score cut-off to prefilter hits, the crystallographic structure of 3V3M, with bound inhibitor ML188, was used. As most scoring functions cannot be compared between structures it is important to define a binding score cut-off based on a related experimental structure. The inhibitor ML188 achieved a binding affinity estimation of −8.1 kcal/mol when bound to 3V3M. In an effort to reduce the numbers of false negatives this value was decreased to −7 kcal/mol. As a subsequent stage of hit-list refinement the identified hit compounds and 2ZU2 receptor were directly imported into GOLD for redocking and rescoring. A GOLDScore Fitness cut-off was defined as 60 and again used to prefilter hits.

At this stage the hit-list was refined to a point where the binding pose of each hit could be analyzed by manual inspection. The natural substrate of the 3CLpro forms a hydrogen bond with His162/3 in the S1 pocket where the S2 pocket is filled by a hydrophobic moiety. Each ligand was therefore assessed with the Pose Filter script of Schrödinger and manually in Maestro to confirm that each ligand identified in the hit-list met these pharmacophoric constraints. These comprehensive filters of three sampling and scoring algorithms and pharmacophoric feature constraints provided us with a highly defined hit-list of 116 compounds. Assessing the free energy of binding with the highly advanced MM-GBSA scoring function would provide further confidence that the hits identified do in fact represent high-likelihood active compounds.

Rescoring the final hit-list with MM-GBSA identified a total of 22 ligands that passed all filters and pharmacophoric constraints for the 2ZU2 receptor. This included a total of three sampling algorithms and four scoring functions from Vina, Glide and GOLD and the highly advanced MM-GBSA, respectively, and two pharmacophoric constraints with a hydrogen bond acceptor in the S1 pocket and a hydrophobic moiety filling the S2 pocket. This consensus docking and scoring approach is in comparison to several studies, which identified novel lead compounds from a virtual screen, where only a single molecular docking algorithm was incorporated. The comprehensive list of techniques used in this study heightens the probability that the ligands identified in the final hit-list do represent novel scaffolds of active molecules and with the high homology between the 3CLpro of various human coronaviruses these compounds should exhibit a high potential for broad spectrum effects.

3.4. Pharmacophore Modeling

Varying techniques in drug design have led to the identification of different ligands from the same dataset and even identification of divergent scaffolds. The use of multiple techniques therefore improves the probability of identifying a diverse set of lead compounds. The pharmacophore model was based largely on a single covalent inhibitor. This inhibitor was chosen as it possesses several favorable noncovalent features for the generation of a pharmacophore model. As general practice, several inhibitors are often used to generate a pharmacophore model where features that are frequently repeated are included in the pharmacophore. For 3CLpro these features are well documented and the presence of several active ligands would only serve to replicate the prior knowledge. The essential features of the pharmacophore model included a hydrogen bond acceptor in the P1 position and a hydrophobic moiety in the P2 position. Non-essential features included a hydrogen bond donor and acceptor in the position to interact with Glu166, a hydrogen bond acceptor in the P4 pocket, a single hydrogen bond donor in the P2 and P3 pockets and an acceptor in the P1′ pocket. A total of four features, including the two essential features and any two of the non-essential features, within an excluded volume of 1 Å were required for a potential hit to satisfy the pharmacophore model.

Molecular Operating Environment (MOE) was used to generate the pharmacophore model and conduct the virtual screen against the entire ZINC Drugs-Now dataset. To ensure a primary hit-list of manageable size, only ligands satisfying the pharmacophore within an RMSD of 0.5 Å where filtered to the initial hit-list, which included approximately 25,000 compounds. Since MOE utilizes a united atom force field, it was necessary to add hydrogens with Open Babel. The protonation of the ligands generated several steric clashes within the active pocket of 3D23 and refine-only docking with Glide was therefore used to remove these clashes. As the pharmacophore model only filters compounds based on the presence of a hydrogen bond acceptor in the P1 position, and does not take into account distance and geometry essential for the formation of a hydrogen bond, the Pose Filter script of Schrödinger was again used to further refine the hit-list, which now contained 3569 hits. Scoring this hit-list with MM-GBSA further reduced its size to 15 compounds with the previously defined cut-off value.

3.5. Identification of Potential Broad Spectrum Inhibitors

The two virtual screening approaches, including molecular docking against 2ZU2, and pharmacophore modeling with the active inhibitor of 3D23, provided us with 22 and 15 hits respectively. Initial attempts to redock these hits into the individual binding pockets of the remaining four structures in order to identify potential broad spectrum inhibitors proved highly unsuccessful. Neither the binding mode nor binding affinity or free energy of binding could be replicated for any complex studied. This finding was in contradiction to our hypothesis that the high degree of homology between various active sites should allow for development of broad spectrum inhibitors.

The static nature of the receptor during docking did not allow for any degree of the induced-fit phenomena, which is commonly observed during ligand or substrate binding. Although the residues within the active site of individual 3CLpro are highly homologous, the orientation of their sidechains is variable between structures, which hinder ligand binding. In a dynamic, physiological system, these sidechains would accommodate ligand binding by the induced-fit phenomena, however in molecular docking this process is circumvented by the static nature of the receptor. This was the sole reason for the inability to replicate the binding mode, seen in 2ZU2 and 3D23, in the remaining structures. The orientation of residue sidechains inserting into the active pocket were highly variable and the inability to replicate the dynamic movements of these sidechains during docking led to the highly variable and inefficient binding modes with decreased binding affinity.

To overcome this short fall of molecular docking, each receptor was aligned and ligands were superimposed over individual active sites using coordinates identified while docking to 2ZU2 or refine-only docking with 3D23. Residue sidechains resulting in a steric clash with the ligand were refined using the Prime module of Schrödinger, which replicates their dynamic, flexible movements allowing the sidechains to accommodate the ligand in the pocket. The ligand was then minimized within the pocket to remove any further steric clashes.

The binding mode predicted using this approach was then further assessed using the pharmacophoric constraints of a hydrogen bond in the S1 and a hydrophobic moiety in the S2 pockets and scoring with MM-GBSA where the cut-off previously defined was again implemented. Of the 22 compounds identified in the hit-list of potential 2ZU2 binders, 15 compounds could replicate the interactions in the S1 and S2 pockets and fulfil the MM-GBSA cut-off (Table 1), where four of the compounds identified by pharmacophore modeling could fulfil these requirements (Table 2). This final list of 19 ligands represented 15 distinct clusters (Figure 2). Free energy of binding between a known active inhibitor, ML188, and respective coronavirus 3CLpro is also shown in Table 1 and Table 2 for comparative purposes.

3.6.1. Candidate Inhibitor Identified by Molecular Docking, ZINC27332786

The conformation of the ZINC27332786 ligand, as identified by molecular docking results, indicated that it binds to the active site and forms similar key interactions throughout all five proteases studied. A pyridine filled the S1 pocket and formed a hydrogen bond between the pyridine nitrogen and imidazole Nε2 of His163. A 1,2,4-triazole formed a linker in the center of the structure and mediated the formation of a hydrogen bond between N2 and the backbone of Glu165/6, which formed a third hydrogen bond via the backbone carbonyl and ligand NH. OC43 3CLpro exhibited a fourth hydrogen bond between N4 of the 1,2,4-triazole and Gln189. This residue is conserved in 3D23 and 3V3M structures, however it was orientated towards the solvent exposed exterior of the pocket. In 2ZU2 and 3TLO the position of Gln is replaced by Pro188/9, which therefore did not contribute to the formation of a bond. Two 2-chloro-6-fluorobenzyl rings also filled the hydrophobic S2 and S4 pockets (Figure 3).

Molecular dynamics studies indicated that for 3V3M all these interactions were highly conserved, forming an average of three bonds over the simulation with high occupancies between His163 and both bonds at Glu166. The remaining four structures exhibited similar profiles however with varying degrees of occupancy, with an average bond formation of 2.5 for 2ZU2, 1.7 for 3D23, 2.6 for 3TLO and 2.1 for OC43 3CLpro. All structures displayed high occupancy at both Glu166 positions, however a decreased occupancy at the His162/3 position. OC43 3CLpro and 3D23 observed the lowest occupancy of 13.63% and 7.27% respectively and 2ZU2 and 3TLO displayed moderate occupancies of 25.30% and 41.05% respectively. The loss of hydrogen bond formation at this position was often caused by the pyridine ring orientating at a right angle to the His162/3 Nε2. With the exception of 3TLO all structures also formed hydrogen bonds with moderate frequency with Gln191/2. These bonds were predominantly formed via the backbone, however 2ZU2 was capable of forming with both the backbone and sidechain, where the sidechain exhibits a higher frequency. The bond formed between Glu189 and the N4 of the 1,2,4-triazole, as seen in docking results for OC43 3CLpro, exhibits a relatively low occupancy of 17.95%.

This indicated that results achieved from molecular dynamics were in relatively strong agreement with that of molecular docking, where all bonds were formed with high frequency, with the exception of the bond between the pyridine N and His162/3, which remained present, however at lower occupancy. The compound does, however, still provide an excellent scaffold for lead optimization with the substructures filling the S2, S3 and S4 subsites making extensive and stable interactions with surrounding residues including two hydrogen bonds between the central 1,2,4-triazole and NH and the backbone of Glu165/6. Further lead optimization may explore the replacement of the pyridine ring with the related lactam ring, which is known to mediate two hydrogen bonds, one between the ligand carbonyl and the His162/3 Nε2 and a second hydrogen bond between the lactam NH and Glu166. The presence of this second hydrogen bond and van der Waals interactions, mediated via ring substituents, may further stabilize the compound within the S1 pocket. A phenylalanine substituent or imidazole ring, has also shown to improve the inhibitory effects of peptidomimetic inhibitors.

Analysis of ligand RMSD indicated that all structures deviated to some extent from their starting structures at the beginning of the simulation, but once in this position were stable and only minor fluctuations were observed from this point. Solvent accessible surface area of the ligand also remained consistent throughout the duration of the simulation and fluctuated within a range of 4.8 nm2–5.4 nm2 (Figure 4). Assessment of ligand radius of gyration also indicated that the compactness of the ligand did not alter greatly during the 10 ns simulation and fluctuated within 4.6 Å and 4.9 Å for all structures. This again indicated that the ligand is stable within the pocket and did not dissociate into surrounding solvent.

3.6.2. Candidate Inhibitor Identified by Pharmacophore Modeling, ZINC09411012

The 3CLpro structures with bound inhibitor identified by pharmacophore modeling again indicated conserved interactions between the receptor and ligand. As with the previous compound a pyridine ring fills the S1 pocket and formed a hydrogen bond with the imidazole Nε2 of His162/3. A backbone carbonyl of His164 for 3V3M formed a hydrogen bond with a ligand NH. This residue was replaced by a Gln in all remaining structures, however a backbone carbonyl of these two residues were perfectly aligned thereby allowing the formation of the hydrogen bond in all structures. The backbone NH of Glu165/6 again formed a hydrogen bond with a ligand carbonyl. A fourth hydrogen bond is observed in 3V3M, 3D23 and OC43 3CLpro between the sidechain of Gln189 and a ligand carbonyl. The position of this Gln in 2ZU2 and 3TLO was not aligned structurally to 3V3M, 3D23 or OC43 3CLpro, with structural superimposition showing a Pro in this position accounting for the loss of the hydrogen bond in 2ZU2 and 3TLO. Two methyl groups insert into the deep S2 pocket and made extensive hydrophobic interactions with surrounding residues. A potential problematic feature of this ligand was a solvent exposed 1,3-dioxolane, which made few interactions with surrounding residues. Further lead optimization steps may explore the removal of this substituent to ascertain any structure-activity relationship (Figure 5).

Molecular dynamics trajectories again showed a strong correlation with that seen in our pharmacophore models, with the exception of the 2ZU2-ZINC09411012 complex. In this structure the pyridine ring gradually dissociated out of the S1 pocket at around 4 ns followed by almost complete dissociation of the ligand from the active site, where the 1,3-dioxolane became completely solvent exposed. The two methyl groups that filled the hydrophobic S2 pocket remained the only substructures of the ligand to maintain any structural conformation in the active subsite, indicating the importance of this group in ligand stability. The orientation of all S1 subsite residues including His162, Phe139, Glu165, His171 and Tyr160 remained unaltered and therefore could not attribute to this loss of stability. The movement may most likely be attributed to by an initial minor loss of stability caused by the movement of Ala143 closer to the pyridine ring, with subsequent bond formation between the Ala backbone and pyridine N. This movement destabilized the pyridine ring, which subsequently formed a new hydrogen bond with Gly142. At this point the pyridine had almost completely lost its orientation in the S1 pocket, which resulted in further loss of the van der Waals interactions leading to complete dissociation from the pocket (Figure 5). This movement also corresponded to a gradually increasing RMSD, assuming few stable conformations, indicative of an unstable ligand.

The position of Ala143 in 2ZU2 is transformed to Ser in 3V3M, 3D23 and OC43 3CLpro, suggesting that the hydrophobic sidechain of the Ala in 2ZU2 may have promoted the movement towards the pyridine, encouraging rotation of the nitrogen towards the Ala backbone. The orientation of the pyridine ring in 3TLO is somewhat stabilized by a bifurcated interaction with the sidechain of Glu166 which promoted the movement of the pyridine N away from Ala143 without disrupting the interaction with His162. Lead optimization steps may therefore preferentially incorporate a core structure that prevents rotation of the pyridine by resulting steric clashes. As mentioned previously, the exploration of a phenylalanine substituent, imidazole or lactam ring in this position may also contribute positively to the stability of the ligand within the S1 pocket.

The remaining structures did however show stability of the ligand within the active pocket, where the occupancy of the hydrogen bond with the Nε2 of His162/3 ranged from 64% for 3TLO, 75% for OC43 3CLpro, 83% for 3V3M and 89% for 3D23. Interactions with the Glu165/6 were also highly conserved with a frequency of 86% for 3TLO, 91% for OC43 3CLpro and 3V3M and 94% for 3D23. The position of 164, occupied by a His in 3V3M and a Gln in the remaining structures, showed varying degrees of bond formation, with the presence of the His in 3V3M resulting in a significant reduction in bond formation. Gln showed moderate, 41% for 3D23, to high, 88% for 3TLO, occupancy with OC43 3CLpro having an occupancy of 61%. The sidechain of Gln189 was shown to form a hydrogen bond with the ligand in 3V3M, 3D23 and OC43 3CLpro, however this residue was highly solvent exposed and therefore resulted in a high degree of flexibility of the sidechain, which should hinder the formation of this bond. For 3D23 and OC43 3CLpro this was observed, where the frequency of bond formation was 27% and 6% respectively. In 3V3M the ligand was however able to stabilize the sidechain to maintain the formation of the bond with a high degree of frequency at 73%.

Ligand RMSD again showed similar profiles to the previously described complexes, where initially a large movement was observed followed by subsequent stabilization of the ligand within the active site. Structures, including 3V3M and 3D23, observed significant fluctuations within their stable conformation which was largely attributed to by the presence of the solvent exposed dioxolane moiety, which had few features and interactions to stabilize it within the pocket. As this structure contributed few interactions, future studies may explore the removal of this substituent to ascertain its effects on stability. Radius of gyration again confirmed RMSD results where all structures fluctuate between 4.7 Å and 5.0 Å, with the exception of 2ZU2 which deterred greatly at approximately 6 ns that signified the point at which the pyridine ring completely dissociated from the S1 pocket. This however did not result in an increase in solvent accessible surface area where all structures fluctuated between 3.8 nm2 and 4.4 nm2. With the exception of 2ZU2, these structures confirmed, with great confidence, the interactions and conformations observed within our rigid pharmacophore model. The representative ligand identified by molecular docking was also able to largely fulfil the interactions and orientations observed.

4. Discussion

Lead compounds that have shown promise as potential inhibitors of the 3CLpro of coronaviruses are largely represented by peptidomimetic compounds that covalently alter the thiolate anion of the active Cys residue. These compounds are generally large and have a high likelihood of off target effects through covalent modification and, as yet, there remains no commercially available molecular entity for the treatment of coronaviruses. Few studies have focused on the development of broad spectrum, noncovalent inhibitors and, to our knowledge, none have aimed to prioritize the identification of broad spectrum lead compounds that inhibit the 3CLpro related to the coronaviruses that most frequently affect humans, namely 229E, OC43, NL63 and HKU1. For this reason, there is insufficient knowledge surrounding the structure of broad spectrum inhibitors targeting this family of proteases. This study, focusing on these and the coronavirus associated with SARS, provides valuable insight into the structure of ligand scaffolds of potential noncovalent inhibitors.

Different techniques used in virtual screening have led to the identification of a diverse set of lead compounds, where one technique is able to identify compounds the other may have missed. For this reason, we used a joint docking and pharmacophore modeling based virtual screening strategy to identify a highly defined list of potential lead compounds. Molecular docking and pharmacophore modeling are extremely useful tools in the design and identification of novel lead compounds and have relatively low infrastructure and fiscal dependence. Its importance in the pharmaceutical industry cannot be overlooked even though it does have several shortcomings. Fundamental flaws in these structure-based techniques include the static nature of the receptor and the implicit solvent models used for virtual screening. Molecular dynamics is the most advanced in silico tool, which allows us to fill this void by incorporating Newtonian physics to simulate the dynamic movements of proteins in an explicit solvent model thereby replicating the native environment of the protein. Molecular dynamic simulations also provide a mechanistic explanation for any ligand inactivity, which would be impossible to identify with in vitro techniques. In this study, it has identified the key feature responsible for the destabilization of the ZINC09411012 ligand within the 2ZU2 binding pocket and thereby allowed us to identify which substituents should be replaced in further rounds of lead optimization. As ligand inactivity may be a result of a single misplaced substituent this mechanistic view point provides extensive information for future lead optimization.

The essential pharmacophoric constraints, including an acceptor in the P1 and hydrophobic moiety in P2, used in both docking and pharmacophore modeling drug discovery methods would be expected to generate a hit-list with similar scaffolds. The final hit-list, containing a total of 19 ligands (15 from molecular docking and four identified by pharmacophore modeling), however were highly divergent and represented 15 distinct clusters. These were determined using the PubChem ( structure clustering web-service with shape + feature 3D similarity and recommended parameters for achieving statistical significance at the 95% confidence interval (Figure 5). 3D similarity calculation, using shape-only and feature-only, yielded similar results (data not shown). As all cluster members were already preselected for features to fill the P1 and P2 pockets, have similar estimated binding affinities and the structure clusters are significantly dissimilar, they represent 15 valuable scaffolds that can be further experimentally validated and to be used as a basis for future lead optimization and anti-coronaviral inhibitor discovery experiments.

In our simulations of potential inhibitors, previously identified by structure and ligand-based drug design techniques including molecular docking and pharmacophore modeling, we could largely replicate all interactions observed in our static models. The pyridine ring that fills the S1 pocket in 3V3M is stable in both modelled ligand-protein complexes and with the known inhibitor, ML188, it is evident that this feature contributes positively to antiviral effects of the inhibitor. However, with the remaining 3CLpro, several complexes showed instability of the pyridine ring within the S1 pocket. We therefore propose that future lead optimization explores the effects of incorporating a lactam ring in this position as it is known to better replicate the interactions of the natural substrate and is also effective against the SARS-CoV 3CLpro. A phenylalanine substituent and an imidazole ring have also been shown to be efficacious in this P1 position. The replication of the majority of interactions that were observed in the static structure-based models for ZINC27332786, from molecular docking, and ZINC09411012, from pharmacophore modeling, suggests that the structure-based techniques were correct in identifying the key interactions and orientation of the ligand within the pocket. We can therefore confidently assume that the remaining ligands identified in the structure-based techniques are, in fact, in their correct conformation and do represent promising lead compounds.

As the second generation of lead discovery focusing on non-peptidic, noncovalent inhibitors is still largely in its infancy, future in vitro activity assays of the ligands identified in this study will provide vital information on novel scaffolds for lead optimization. The diverse and comprehensive approach of hit-list refinement also ensures a high likelihood of the presence of bioactives within the list. This study provides a promising start in the pursuit of active lead compounds for the broad spectrum inhibition of the 3CLpro associated with human coronaviruses. These results strongly support further in vitro testing of this highly refined hit-list. Activity data will provide valuable information on the structure of bioactive scaffolds and could represent the next generation of small lead molecules. In conclusion, in the drug-design sphere, a variety of computational approaches have been used at the various stages of the drug-design process in support of the experimental techniques. Structure-based and ligand-based drug design approaches have developed into valuable drug discovery tools, owing to their versatility and synergistic character. Importantly, however, predictions from virtual screening methods are not meant to replace experimental affinity determination, but can complement the experimental methods by producing an enriched subset of a large chemical database, with an increased possibility of the compounds actually binding to the drug target of interest. Therefore, since past research has shown that many predicted drug candidates fail to prove effective in vivo the experimental validation of the identified 19 ligands reported in this manuscript is essential.