Dataset: 11.1K articles from the COVID-19 Open Research Dataset (PMC Open Access subset)
All articles are made available under a Creative Commons or similar license. Specific licensing information for individual articles can be found in the PMC source and CORD-19 metadata
.
More datasets: Wikipedia | CORD-19

Logo Beuth University of Applied Sciences Berlin

Made by DATEXIS (Data Science and Text-based Information Systems) at Beuth University of Applied Sciences Berlin

Deep Learning Technology: Sebastian Arnold, Betty van Aken, Paul Grundmann, Felix A. Gers and Alexander Löser. Learning Contextualized Document Representations for Healthcare Answer Retrieval. The Web Conference 2020 (WWW'20)

Funded by The Federal Ministry for Economic Affairs and Energy; Grant: 01MD19013D, Smart-MD Project, Digital Technologies

Imprint / Contact

Highlight for Query ‹Blackleg risk

Unreported intrinsic disorder in proteins: Building connections to the literature on IDPs

Introduction

Recent years evidence an exponential increase in the number of papers on intrinsically disordered proteins (IDPs), indicating that the phenomenon of protein intrinsic disorder is becoming a popular research topic. This point is also reflected in the articles of the “Digested Disorder” series published in this journal.1-3 To make this point stronger some interesting numbers are provided below. PubMed search (as of August 03, 2014) for ((intrinsically disordered) OR (natively unfolded) OR (intrinsically unstructured) OR (intrinsically unfolded) OR (intrinsically flexible protein)) returned 2,490 hits. Restricting this search to the past one and a half years only (i.e., searching for ((intrinsically disordered) OR (natively unfolded) OR (intrinsically unstructured) OR (intrinsically unfolded) OR (intrinsically flexible protein)) AND (“2013/01/01”[Date - Publication]: “3000”[Date - Publication])) gives 659 hits. Curiously, although papers on IDPs (2,490) constitute just 0.045% of all the protein-related papers (5,569,481) published during the past 115 years, the fraction of IDP-related papers increases to 0.18% if only publications of the past year and a half are taken into account. It is of interest to compare these trends with the related tendencies of the research on proteins in general. Protein-related papers (5,569,481) constitute ∼23% of all the papers published during the past 115 y and annotated in PubMed (24,068,783). However, this number decreases to ∼21% if only papers published during the past year and a half are taken into account (in 2013–2014, the overall number of published papers is 1,719,568; the number of papers dealing with proteins is 362,612). This suggests that an IDP is a new rising star in the field of protein science.

Despite this steady and sure increase in the appreciation of protein intrinsic disorder, there are still numerous instances when this concept is overlooked, or missed or simply ignored. Unfortunately, such “missed disorder” phenomenon continues to be rather common in the modern literature. The original goal of the first article in the new “Unreported disorder in proteins” series was to find several papers published in different journals that talk about IDPs (or hybrid proteins containing both ordered and disordered regions) not recognizing that they are talking about such proteins, to show how consideration of intrinsic disorder can be used to strengthen conclusions and draw relationships to prior discoveries in the field. However, analysis of recent publications in just 2 journals revealed that the “missed disorder” happened to be essentially more abundant than it was originally expected. In fact, as of August 3, 2014, of 677 papers published during 2013–2014 in the Journal of Molecular Biology, 609 were dealing with proteins, with 87 papers being dedicated to the IDPs or hybrid proteins. Papers were considered to be dealing with intrinsic disorder if their texts contained at least a single mention of disorder, flexibility, conformational flexibility, or unfoldedness in relation to the protein of interest or any of its regions. Obviously, this is a very relaxed and inclusive approach, since simple mentioning of disorder or conformational flexibility is not sufficient for the detailed elucidation of the functional role of disorder/flexibility. The direct PubMed search for the IDP-related papers published in the Journal of Molecular Biology using the search criteria ((intrinsic disorder) OR (intrinsically disordered) OR (natively unfolded) OR (intrinsically unstructured) OR (intrinsically unfolded) OR (intrinsically flexible protein)) AND (“2013/01/01”[Date - Publication]: “3000”[Date - Publication]) AND (“Journal of Molecular Biology”[Journal]) generated just 17 hits. Analysis of the table of content of this journal combined with a brief computational analysis of related proteins revealed that there are at least 22 more papers, research subject of which are IDPs. Analogous analysis of the publications in Biochemistry during the 2013–2014 provided comparable data: of 1455 published papers, 25 were “officially” dedicated to intrinsic disorder or to intrinsically disordered/natively unfolded/intrinsically unstructured/intrinsically unfolded/intrinsically flexible proteins. Mining of papers published in Biochemistry during 2013–2014 revealed that some 189 papers contained at least a single mention of “conformational flexibility” OR “intrinsic disorder” OR “disordered protein” OR “disordered peptide” OR “disordered region” OR “natively unfolded” in relation to the protein of interest or any of its regions. Furthermore, based on the combination of literature mining and a brief computational analysis 20 “hidden gems” were found; i.e., papers that missed protein disorder.

The mentioned 42 papers dealing with the unreported intrinsic disorder and their related IDPs or hybrid proteins containing ordered and intrinsically disordered regions are briefly outlined below. Table 1 contains all the proteins considered in this review. Proteins are arranged according to the increase in the extent of their disorder evaluated as percentage of the residues predicted to be disordered (i.e., possessing disorder scores above 0.5) by PONDR® VSL2, which is among the more accurate disorder predictors.4

Table 1 shows that the extent of unreported intrinsic disorder in 143 proteins ranges from 10% to 100%, indicating that all these proteins belong to the category of moderately or highly disordered proteins; i.e., proteins with the disorder content ranging from 10% to 30% and from 30% to 100%, respectively. Table 1 also shows that a very significant fraction of such proteins with overlooked disorder (>75%) belongs to the “highly disordered” category.

The order of sections in this review crudely follows the order of proteins in Table 1. When several proteins are discussed in a paper, the corresponding section within the text is placed at a position ascribed by Table 1 to a protein with the lowest disorder content. The corresponding sections are organized in the following way: first, a brief description of the paper is provided; then, the biological importance of the related proteins is discussed; next, the results of the computational and bioinformatics analyses of the disorder status of a given protein (or set of proteins) are represented to show how consideration of intrinsic disorder can enhance conclusions of a paper.

Eukaryotic translation initiation factors 4A, 4B and 4G (eIF4A, eIF4B, and eIF4G)

Andreou & Klostermeier investigated the peculiarities of the combined action of 3 eukaryotic translation initiation factors, eIF4A, eIF4B, and eIF4G, in RNA unwinding needed to resolve secondary structure elements from the 5′-untranslated region of mRNAs to enable ribosome scanning.5

A set of eukaryotic translation initiation factors (eIFs) is involved in the complex, multi-stage process of initiation of mRNA translation in eukaryotes. In the first step of this process, the eIF4F complex binds to the m7Gppp cap on the 5′-end of the mRNA. This eIF4F complex consists of the cap-binding protein eIF4E, the scaffolding protein eIF4G, and the DEAD-box helicase eIF4A.6,7 At the next steps of the translation initiation, the eIF4F complex recruits the 43S ribosomal pre-initiation complex which scans the 5′-untranslated region in 5′-to-3′ direction in search for the translation start codon.8,9 One of the important points in this process is the resolution of secondary and tertiary structures in the mRNA, which is typically done by the DEAD-box helicase eIF4A, which possesses RNA-dependent ATPase and ATP-dependent RNA helicase activities.10 These 2 activities of eIF4A are enhanced by some auxiliary proteins, such as eIF4B and eIF4G that act synergistically stimulating the eIF4A helicase activity in the mRNA scanning process.5

Figure 1 illustrates the disorder propensities of the 3 proteins related to the mRNA unwinding. Disorder was evaluated by a family of PONDR predictors. Here, scores above 0.5 correspond to disordered residues/regions. PONDR® VSL2B is one of the most accurate stand-alone disorder predictors,11 PONDR® VL3 possesses high accuracy in finding long IDPRs,12 PONDR® VLXT is not the most accurate predictor but has high sensitivity to local sequence peculiarities which are often associated with disorder-based interaction sites,13 whereas PONDR-FIT represents a metapredictor which, being moderately more accurate than each of the component predictors, is one of the most accurate disorder predictors.14 The various predictors often give different predictions of disorder for the same protein, perhaps leading so some confusion when trying to understand the implications of this and the following figures. These differences arise because of different computational methods and especially because different training sets were used. Despite the differences, the overall disorder prediction trends are typically similar for each protein.

In agreement with the notion that catalytic functions require specific and ordered structure, eIF4A, with its ATPase and ATP-dependent RNA helicase activities, is predicted to be mostly ordered protein (see Fig. 1A; UniProt ID: P10081). The mostly ordered nature of this enzyme is further supported by the fact that almost entire sequence is seen in the crystal structure (see red structure at Fig. 1B, PDB ID: 2VSO), except to the residues 1–11, 126–135, 351–356, and 394–395. On the contrary, the auxiliary proteins eIF4B and eIF4G are predicted to possess long disordered regions (see Figs. 1C and 1D; UniProt IDs: P34167 and P39935, respectively). eIF4G is also predicted to have an ordered region that coincides with its middle or eIF4A interacting domain (residues 571–854). This region was co-crystallized with the eIF4A factor (see blue structure at Fig. 1B). Curiously, several eIF4G regions were missing in structure of the eIF4A-eIF4G complex (e.g., residues 571–576, 583–596, 686–688, 717–729, 803–811, and 853–854).

Long intrinsically disordered regions are important for functions of eIF4B and eIF4G. In fact, according to ANCHOR,15,16 eIF4B (UniProt ID: P34167) is expected to have 14 disorder-based binding sites (residues 13–20, 33–40, 48–53, 68–74, 103–109, 159–166, 214–219, 234–244, 259–271, 286–298, 312–324, 338–355, 376–386, and 407–426). Similarly, eIF4G (UniProt ID: P39935) is predicted to have 18 AiBSs, residues 5–24, 32–53, 59–120, 130–158, 198–226, 230–241, 292–307, 329–342, 351–398, 411–433, 450–465, 474–485, 505–515, 570–583, 818–827, 832–844, 893–920, and 933–949.

Human plasminogen activator inhibitor 1 (PAI-1)

Florova et al. investigated the mechanisms of the stabilization of human plasminogen activator inhibitor 1 (PAI-1, also known as serpin E1) during the formation of the transient ternary “molecular sandwich” complex (MSC) containing PAI-1, vitronectin (Vn), and the target enzyme.17 Major players involved in the formation of this complex are PAI-1, which is a major endogenous inhibitor of plasminogen activators, a cell adhesive glycoprotein Vn, and a proteinase inhibited by PAI-1. An important feature of PAI-1 is its ability to exist in 2 forms, a metastable active conformation with solvent-accessible reactive center loop (RCL) and thermodynamically stable, inactive, latent conformation, where RCL is spontaneously inserted to the middle of the β-sheet B of PAI-1.17 This “magic” chameleon-like RCL is located in the 331–350 region. Figure 2A shows that although human PAI-1 (UniProt ID: P05121) is predicted to be mostly ordered, its RCL is located within grayish area with a mean disorder score close to 0.5, suggesting that this region is characterized by noticeable intrinsic mobility.

T4 UvsW helicase and Single-Stranded DNA Binding Protein gp32

Perumal et al. reported that a specific interaction between the bacteriophage T4 UvsW helicase and the T4 single-stranded DNA (ssDNA) binding protein gp32 is required for enhancement of the UvsW DNA unwinding function.18 Curiously, UvsW interact with gp32, both in the presence and absence of DNA, through the C-terminal acidic tail of the gp32 protein. In the absence of this interaction, the ssDNA annealing and ATP-dependent translocation activities of UvsW are severely inhibited when gp32 coats the ssDNA lattice. However, when UvsW and gp32 do interact, UvsW is able to efficiently displace the gp32 protein from the ssDNA. The tail-less gp32 inhibits activities of UvsW on DNA in the presence of gp32.18

The UvsW helicase, a member of the SF2 superfamily of helicases19 (i.e., enzymes that use ATP to carry out mechanical work related to the unwinding of duplex DNA structures and reorganization of RNA secondary structures20), is one of the 3 helicases encoded by the bacteriophage T4,21 where it plays a number of roles in a variety of DNA repair and recombination pathways.22-27 The involvement of this helicase in DNA replication is achieved via the UvsW-mediated unwinding of R-loop structures generated at the replication origin by the host RNA polymerase.22 UvsW (UniProt ID: P20703) activity involves the generation and consumption of single stranded DNA (ssDNA). In agreement with the overall low disorder predisposition (see Fig. 3A), almost entire sequence of UvsW was successfully crystallized (Fig. 3B, PDB ID: 2OCA).

Naked ssDNA rarely occurs within T4-infected cells as they are immediately coated with gp32 ssDNA binding protein, which serves to protect it against degradation by endonucleases. The gp32 is also thought to coordinate the many activities necessary to carry out DNA replication, recombination and repair by recruiting and modulating the activity of the enzymes involved in these processes.28 The bacteriophage T4 gp32 (UniProt ID: P03695) is a 301 residue-long protein. The crystal structure of the gp32 central region (residues 22–239) is known (see Fig. 3C).29 Gp32 binds preferentially to ssDNA and destabilizes double-stranded DNA. It is involved in DNA replication, repair and recombination, dinds ssDNA as the replication fork advances and stimulates the replisome processivity and accuracy. The N-terminal region of gp32 is important for the cooperative binding of gp32 monomers on ssDNA, whereas the C-terminal acidic tail is involved interaction with the replicative DNA polymerase, the primase and helicase loader protein.30-32 Also, this C-terminal region (residues 254–301) was shown to play a crucial role in controlling the D-loop unwinding capability of UvsW, since the presence of a mutant gp32 protein lacking the acidic tail (delta254–301, gp32-A) led to a significant reduction in the unwinding of the D-loop substrate.18

Fig. 3D shows that a significant portion of the C-terminal half of this protein is predicted to be mostly disordered (residues 200–301). Curiously, this disordered region includes the functionally important C-terminal tail that defines the ability of gp32 to modulate the UvsW activity.

PilF and PilQ of the Pseudomonas aeruginosa Type IV pilus system

Some bacteria and archaea contain specific cell envelope-spanning biomolecular machines, Type IV pili (T4P), which are used by bacteria and archaea to interact with the environment. Pilus biogenesis is controlled by the assembly of the outer membrane PilQ secretin channel through which the pili are extruded. Koo et al. investigated the roles of the Pseudomonas aeruginosa type IV pili (T4P) pilotin protein PilF in the outer membrane PilQ secretin channel targeting, oligomerization, and function.33

PilF belongs to the class 1 pilotins, which are α-helical proteins containing several tetratricopeptide (TPR) motifs, that are comprised of approximately 34-amino acid and are able to form a superhelical fold that mediates protein–protein interactions in both prokaryotes and eukaryotes.33 Secretins (including PilQ sectretin) possess a conserved C-terminal region, containing the secretin domain that is putatively embedded into the outer membrane, and a variable, system-specific N-terminal region.34,35

Figure 4A shows that the P.aeruginosa PilF (UniProt ID: Q51385) is predicted to be a hybrid protein possessing both ordered and disordered regions. Curiously, about 2/3 of this protein possesses disorder scores close to 0.5. This grayish area is likely to be characterized by high conformational plasticity and is predicted to contain 4 AiBSs, residues 5–16, 41–46, 76–81 and 111–115. Similarly, PilQ (UniProt ID: P34750) is predicted to be a hybrid protein (see Fig. 4B) with several disorder-based binding sites (residues 101–107, 120–122, 245–251, 502–505, and 533–547).

Interaction of 14–3–3-ζ with integrin tails

Bonet et al. analyzed the molecular mechanisms of interaction of the 14–3–3ζ protein with several integrin proteins.36 The authors provided a detailed biophysical characterization of the cytoplasmic tails of α4, β1, β2 and β3 integrins binding to 14–3–3ζ and showed that binding affinities and interaction modes of different integrins with this 14–3–3 are rather different. Furthermore, although many structural features of these interactions are similar to other known 14–3–3 complexes, the binding exhibits specific features involving secondary sites. Particularly, in addition to a canonical binding mode for the α4 phospho-peptide, some residues outside the consensus 14–3–3ζ binding motif of this integrin were shown to be essential for an efficient interaction. Although a short β2 phospho-peptide is sufficient for high-affinity binding to 14–3–3ζ, the authors also found novel 14–3–3ζ/integrin tail interactions that were independent of phosphorylation. 36

The members of the 14–3–3 family are highly conserved acidic proteins of ∼30 kDa that are abundantly expressed in all eukaryotic cells. In human, there are 7 14–3–3 isoforms (β, γ, ε, η, σ, τ and ζ) forming homodimers or heterodimers. These proteins regulate and control many signaling pathways, such as cytoskeletal dynamics programmed cell death, and cell cycle progression, and are involved in the pathogenesis of several human diseases, such as cancer and neurological disorders.37,38 Although 14–3–3 proteins are considered as phosphor-serine/threonine binding modules possessing 2 consensus recognition motifs, RSXpSXP and RXF/YXpSXP,39 some other binding modes are known including interaction with some unphosphorylated motifs.40,41 X-ray structures of various 14–3–3 isoforms and their numerous complexes are known providing considerable knowledge on the various binding modes.42,43 Curiously, a detailed analysis of more than 200 binding partners of 14–3–3 proteins showed neither structural nor functional relatedness in this group of proteins.44 However, bioinformatics analysis established that >90 % of the 14–3–3 interactors contain disordered regions and that almost all 14–3–3-binding sites are located inside disordered regions.44

Among various partners of 14–3–3 proteins are membrane-spanning receptors, integrins, which are involved in numerous biological functions, such as cell adhesion, migration and differentiation and are associated with a wide-range of diseases.45,46 Integrins are heterodimers formed by α- and β-subunits composed of large extracellular (ecto) domains, trans-membrane domains and short (13–70 residues) C-terminal cytoplasmic domains,47 which are flexible tails acting as hubs for numerous integrin-based protein-protein interactions.48-50 These cytoplasmic tails of the 2 integrin subunits also mediates the bidirectional inside-to-outside and outside-to-inside signaling, where signals transmitted by integrins from outside to inside the cell promote cell survival and proliferation, and where integrin affinity for the extracellular ligands can also be controlled by intracellular factors.47

Figure 5A represents a structure of the complex between the 14–3–3ζ protein (blue cloud) and a phosphorylated peptide from the β2 integrin tail (red chain).51 It is seen that similar to many other 14–3–3 complexes, the β2 integrin tail is bound in a highly extended form. To illustrate interactivity of the β2 integrin, Figure 5B shows the results of the analysis of this protein by STRING. Here, settings were chosen to find binding partners with the highest confidence (0.9). Finally, Figure 3C shows some disorder-based alignments of the C-terminal tails used in the work of Bonet et al.,36 namely residues 1001–1032 of human integrin α4 (black line, UniProt ID P13612), residues 752–801 of human integrin β1 (black line, UniProt ID P05556), residues 724–769 of human integrin β2 (black line, UniProt ID P05107), residues 742–788 of human integrin β3 (black line, UniProt ID P05106), and residues 747–798 of human integrin β7 (black line, UniProt ID P26010). Figure 5C clearly shows that all these integrin tails involved in the specific interaction with the 14–3–3ζ protein are mostly disordered.

TRIMunity

In the review by Rajsbaum et al., an important family of proteins, tripartite motif (TRIM) proteins is introduced together with the multitude of their functional roles in the innate antiviral immunity.52 Since there are more than 70 distinct members in the family of human TRIM proteins, it is physically impossible to consider all of them even very superficially. The feature that links all these proteins together is the fact that they share 3 conserved N-terminal domains: a Really Interesting New Gene (RING) domain, one or 2 B-Boxes (B1/B2) and a coiled-coil domain.53-56 In addition to immune-related functions many TRIM proteins are involved in a wide range of biological activities, such as transcriptional regulation, apoptosis, cell differentiation, development, oncogenesis, ubiquitin E3 ligases, and E3 ligases for other ubiquitin-like molecules such as SUMO and the IFN-inducible protein ISG15.57 Computational analysis (see Table 1) revealed that all members of the human TRIM family are extensively disordered.

Multiple roles of viperin in the innate antiviral response

Helbig and Beard overviewed the ability of viperin to modulate conditions within the cell and to interfere with proviral host proteins in order to create an unfavorable environment for viral replication.58 Viperin is one of the few products of numerous interferon-stimulated genes (ISGs) possessing direct antiviral activity, being able to limit a broad range of viruses and to play an emerging role in modulating innate immune signaling. This is a highly species conserved protein consisting of 3 distinct domains: a variable N-terminal domain that contains an amphipathic helix and a leucine zipper region, a highly conserved central domain containing a “radical SAM domain,” and a conserved C-terminal critical for the antiviral properties of viperin against a number of viruses.58 The authors showed that viperin plays a role in innate immune signaling, limits different viruses through both direct inhibition of replication and interference with viral budding/release, and disrupts the actin cytoskeleton to increase infectivity of human cytomegalovirus (HCMV) in a known example of evolutionary escape of HCMV from the antiviral properties of viperin.58 Viperin is able to interact with the 5 host proteins FPPS, TFP, IRAK1, VAP-A and TRAF6 and the 3 viral proteins DENV NS3, HCV NS5A and HCMV vMIA in order to accomplish these diverse biological functions.58

Figure 6A represents the results of the STRING59-based evaluation of viperin interactivity. Although it has been emphasized that “It is unusual for viperin to be able to interact with such a divergent range of other proteins and to potentially mediate quite distinct cellular functions”,58 the mystery of such “unusual” polyfunctionality is naturally solved by the presence of extensive disorder in this protein (see Fig. 6B; UniProt ID: Q8WXG1).

SAMHD1 host restriction factor

Sze et al. overviewed available information on the sterile α motif and histidine-aspartic domain (HD) containing protein 1 (SAMHD1), which is a member of the unique group of host restriction factors that limit retroviral replication at distinct stages of the viral life cycle.60 SAMHD1 is a deoxynucleoside triphosphate triphosphohydrolase responsible for the degradation of the deoxynucleoside triphosphates (dNTP) into their deoxynucleosides (dN) and inorganic triphosphates, thus depleting the cellular dNTP pool required for cellular DNA polymerase.61 Activity of this protein is modulated by post-translational modifications, cell-cycle-dependent functions and cytokine-mediated changes, as well as via interaction with the Vpx accessory protein.60

SAMHD1is a predominantly nuclear protein composed of 2 functional domains, the sterile α motif (SAM) domain involved in protein–protein and SAMHD1-nucleic acid interactions,62 and the HD domain containing the enzymatic sites crucial for its triphosphohydrolase activity, RNA binding and nuclease activity.63 Another functionally important region is located at the C-terminus of SAMHD1, where a V-domain capable of interaction with the HIV-2/SIVsm Vpx accessory protein is located (residues 595–626).64 Based on the intrinsic disorder propensity analysis (Fig. 2B), SAMHD1 (UniProt ID: Q9Y3Z3) is expected to be a predominantly ordered protein possessing disordered N- and C-terminal tails (residues 1–100 and 580–626). Obviously, intrinsic disorder in the C-terminal tail is important for the SAMHD1 interaction with the HIV-2/SIVsm Vpx accessory protein.

Interaction between the major bacterial heat shock chaperone GroESL and an RNA chaperone CspC

In their recent paper, Lenz & Ron described a novel interaction between the major heat shock chaperone GroESL of E. coli (Hsp60) with an RNA chaperone (CspC) leading to the CspC proteolysis needed for the transient nature of the heat shock response.65

Protein chaperones and proteases have multiple functions in controlling the wellbeing of a cell, acting as protein quality control that enables the cells to cope with the unfolding and aggregation of proteins.66,67 CspC is a member of the cold shock protein (Csp) family that consists of 7 small homologues with high affinity to single-strand nucleic acid and that serve as RNA chaperones.68-71 It has been shown that CspC is degraded during heat shock and that interaction of CspC with the major protein chaperone GroESL plays a role in the enhanced, temperature-dependent proteolysis of CspC.65

The fact that CspC is able to interact with GroESL (which is known to bind partially unfolded and misfolded protein species) is a clear indication that CspC does not possess rigid structure, at least under the conditions favoring such interaction. The likely explanation for this binding is in the potential intrinsically disordered nature of CspC. In fact, many protein and RNA chaperones were shown to be intrinsically disordered or hybrid proteins possessing both ordered and intrinsically disordered regions/domains.72 In agreement with this observation, Figure 2C represents the PONDR plots for the CspC from E. coli (UniProt ID: P0A9Y6) and shows that a significant portion of this small proteins (up to 35%) is predicted to be disordered.

Interaction of PTEN with phosphatidylinositol phosphate

Kalli et al. used multiscale molecular dynamics simulations to define the interaction mechanisms of phosphatase and tensin homolog (PTEN; UniProt ID: P60484) and of the PTEN domain of Ciona intestinalis voltage sensitive phosphatase (Ci-VSP; UniProt ID: Q4W8A1) with phosphatidylinositol phosphate (PIP)-containing lipid bilayers.73 The authors revealed that the association of the PTEN with such bilayers involves the formation of an initial electrostatics-driven encounter complex between the protein and bilayer followed by reorientation of the protein to optimize its interactions with PIP molecules in the membrane.73 PTEN is a cytosolic enzyme that can interact with the inner leaflet of the plasma membrane and, being bound to membrane, catalyzes dephosphorylation of PI(3,4,5)P3 to PtdIns(4,5)P2.74 PTEN has 4 domains, an N-terminal PIP2-binding module, a phosphatase domain (PD), a C2 domain, and a C-terminal tail. Several recent studies indicated that both N- and C-tails of this protein are intrinsically disordered.75-77 Furthermore, it has been emphasized that post-translational modifications, conserved eukaryotic linear motifs, and molecular recognition features are present in the disordered C-tail of PTEN, enhancing protein-protein interactions of this protein needed for the various cellular functions of PTEN.75

Domain swapped Pukovnik Xis

Singh et al. determined the structure of Pukovnik Xis by X-ray crystallography.78 Xis is the recombination directionality factor that serves as a DNA bending machine that difines the outcome of integrase-mediated site-specific recombination by redesign of higher-order protein–DNA architectures.79 Mycobacterium phage Pukovnik (which is a relative of phage L5) contains a small Xis protein (56 residues) that binds cooperatively to attR DNA at specific X1–X4 binding sequences. Similar to L5, Pukovnik Xis stimulates integrase-mediated excision and inhibits integration.80,81 The presence of both Xis and intergrase is needed for the formation of an attR intasome.78 The cooperative binding of several Xis proteins to DNA is driven by a winged-helix motif and relies on the use of contacts between a central loop and the wing motif to mediate interactions between Xis proteins when bound to DNA leading to the formation of a micronucleoprotein filament with a modest bend.78,82,83

Singh et al. showed that in the DNA-bound form, 5 individual Pukovnik Xis subunits stack onto each other through an extensive array of protein–protein interactions forming a filament with left-handed superhelical twist.78 Stacking of Xis subunits within the filament is rather regular, and each monomer exhibits a twist of 40° and an ∼60° bending angle. Within the asymmetric unit, the filament region contains 4 Xis protomers whereas the fifth protomer is positioned adjacent to the filament. This “outside” protomer forms the domain-swapped complex with the Xis protomer 3. This domain-swapped Xis pair is created due to a dramatic rearrangement within the loop connecting β1 and β2 (residues 36–39) allowing the C-terminal residues 40–56 of one subunit to interact with the N-terminal 35 residues of a neighboring Xis monomer.78 It has been emphasized that the most important protein-protein interactions mediating the filament formation are formed by residues 51–54 of Pukovnik Xis that make extensive contacts with the wing of the adjacent monomer and deletion of which greatly diminishes DNA binding affinity and abolishes excision.78 Therefore, although the Xis filament is composed of identical subunits, the individual Xis protomers are involved in very different interactions and form very different interfaces. To further illustrate this point, Figure 7A represents a crystal structure of the Pukovnik Xis pentamer (PDB ID: 4J2N) and 4 pairs of different dimers found within this pentameric filament. Figure 7B shows that Pukovnik Xis (UniProt ID: B3VGI6) is predicted to contain substantial amount of intrinsic disorder (note scale of the Y-axis), including disordered N- and C-terminal tails. Therefore, it is likely that the intrinsically disordered nature determines the ability of this small protein to be involved in a complex net of protein-protein and protein-DNA interactions, including its ability to form a wide array of regular and domain-swapped dimers.

Proline-rich domain of the AIPL1 chaperone

Li et al. provided a detailed characterization of the functional mechanisms of interesting Hsp90 co-chaperones, human aryl hydrocarbon receptor (AHR) interacting protein (AIP) and AIP like 1 (AIPL1).84 AIP and AIPL1 share 49% sequence identity, contain an N-terminal FKBP-like prolyl peptidyl isomerase (PPIase) domain (which is inactive in both proteins) followed by a tetratricopeptide repeat (TPR) domain. In addition, AIPL1 harbors a unique C-terminal proline-rich domain (PRD).84 The authors showed that AIP is inactive as a chaperone, whereas AIPL1 exhibits chaperone activity and prevents the aggregation of non-native proteins, suggesting that PRD is crucial for the chaperone function of this protein providing a means for efficient binding of AIPL1 to non-native proteins.84 Since AIPL1 possesses decreased affinity to Hsp90, the C-terminal PRD plays a role of a negative regulator of the AIPL1-Hsp90 interaction.84

Figure 8A shows that the major portion of the human AIPL1 (N-terminal 75%) containing FKBP-like prolyl peptidyl isomerase and TPR domains is predicted to be mostly ordered (UniProt ID: Q9NZN9), whereas the C-terminal domain is predicted to be highly disordered. On the other hand, Figure 8B indicates that the human AIP (UniProt ID: O00170) is expected to be mostly ordered. In agreement with these disorder predictions, 3D structures are known for the PPIase FKBP-type domain of human AIP (residues 2–166, PDB ID: 2LKN) and for its TPR domain (residues 173–330, PDB ID: 4AIF). The intrinsically disordered nature of the C-terminal PRD of the human AIPL1 possessing chaperone activity is in accord with earlier observations that chaperones are often either entirely disordered or contain long disordered regions.72,85

Basic residues in the activated protein C (APC) exosite

Takeyama et al. provided a detailed description of the interaction between the activated protein C (APC) and factor (F) VIIIa.86 The authors showed that the basic residues located within the 39-, 60-, and 70–80-loops of APC constitutes an exosite that contributes to the binding of FVIII and therefore are important for the subsequent proteolytic inactivation of FVIII.86 APC, which is also known as vitamin K-dependent protein C, anticoagulant protein C, autoprothrombin IIA, and blood coagulation factor XIV, is a single chain vitamin K-dependent zymogen for a plasma serine protease that upon activation by the thrombin–thrombomodulin complex down regulates the coagulation cascade by limited proteolysis of FVa and FVIIIa.87-89 Analysis of the crystal structure of human APC (PDB ID: 1AUT) revealed that there are 3 surface loops (39, 60, and 70–80), rich in basic residues, located in the protease domain of APC near the active site pocket.86,90 FVIIIa contains acidic C-terminal sequences that may potentially provide interactive sites for the basic exosite of APC.86

Figure 8C represents the results of the computational disorder analysis in human APC (UniProt ID: P04070) and shows that mentioned loops enriched in basic residues are predicted to be disordered or very flexible, thereby providing an interesting mechanistic plane for the molecular basis of APC recognition and binding of FVIII.

The Escherichia coli primosomal DnaT protein

Molecular mechanisms of the oligomerization of E.coli primosomal DnaT protein are uncovered in the study by Szymanski et al.91 The authors showed that the removal of the short C-terminal region dramatically affects the oligomerization process and instead of the trimer, the isolated N-terminal domain of DnaT forms a dimer.91

Priming of the DNA strand during the replication process is catalyzed by a multiprotein-DNA complex known as the primosome.92,93 The translocation of this complex along the DNA is fueled by NTP hydrolysis. Besides being responsible for the synthesis of short oligoribonucleotide primers used to initiate synthesis of the cDNA strand, primosome plays a role in the restarting of the stalled replication fork at the damaged DNA sites.93,94 The assembly of the primosome is driven by an essential replication protein in Escherichia coli, the DnaT protein, where, the primosome assembly is initiated by recognition of a specific primosome assembly site (PAS) of the replicating DNA or the damaged DNA site by the PriA protein, or the PriB protein–PriA complex, followed by the association of the DnaT and the PriC protein.92-94 This primary DNA-protein complex is a scaffold, specifically recognized by the DnaB helicase–DnaC protein complex, which results in formation of the preprimosome. Next, the preprimosome is recognized by the primase, and a functional primosome is formed. The DnaT protein is crucial for the specific entry of the DnaB helicase into the primosome complex. 92-94

The DnaT monomer consists of the large, N-terminal core domain that includes the first 161 residues of the protein and a small C-terminal region containing the 18 remaining amino acids.91 Disorder analysis of this protein (UniProt ID: P0A8J2) revealed that although the majority of the DnaT is predicted to be ordered, the crucial for trimerization C-terminal tail is expected to be completely disordered (see Fig. 8D). This observation is in agreement with recent notion that disordered protein tails are commonly involved in a wide array of important functions.95

Platelet endothelial cell adhesion molecule 1 (PECAM-1)

Tourdot et al. dedicated their research to the analysis of the effect of the sequential phosphorylation of immunoreceptor tyrosine-based inhibitory motifs (ITIMs) on function of the platelet endothelial cell adhesion molecule-1 (PECAM-1).96 PECAM-1 is a dual ITIM-containing receptor of the Ig superfamily that is capable of inhibiting immunoreceptor tyrosine-based activation motif (ITAM)-induced activation of B cells, T cells, and mast cells, as well as GPVI/FcRγ chain-mediated platelet activation.97 The inhibitory properties of PECAM-1 require phosphorylation of both of its ITIMs located in the vicinity of tyrosine residues at positions 663 (VQY663TEV) and 686 (TVY686SEV).96 Curiously, the ITIMs of PECAM-1 are not phosphorylated in resting cells but are phosphorylated upon cellular activation. Furthermore, these PECAM-1 ITIMs are phosphorylated sequentially, with phosphorylation of Y663 depending on prior phosphorylation of Y686.98 This sequential process of the PECAM-1 phosphorylation starts with the phosphorylation of the C-terminal ITIM by Src family kinases, which enables phosphorylation of the N-terminal ITIM of PECAM-1 by other Src homology 2 domain-containing nonreceptor tyrosine kinases (NRTKs).96

Figure 8E shows that although human PECAM-1 (UniProt ID: P16284) is predicted to be mostly ordered, the residues, phosphorylation of which is crucial for its function (Y663 and Y686), are located within the highly disordered C-terminal tail.

Mouse serine/arginine-rich splicing factor 1 (SRSF1)

Aubol et al. studied the molecular mechanisms of the phosphorylation of SR splicing factors, which are the proteins possessing regions enriched in Arg–Ser dipeptide repeats known as RS domains, by a specific serine kinase SRPK1.99 Phosphorylation of RS domains is crucial for the regulation of the activities of the SR splicing factors. RS domains of SR proteins can range from 50 to over 300 residues in length, and the Arg–Ser dipeptide repeats can vary in both length and position. SRPK1 phosphorylates the prototype SR protein, serine/arginine-rich splicing factor 1 (SRSF1), via a directional mechanism where 11 serines flanked by arginines are sequentially fed from a docking groove in the large lobe of the kinase domain to the active site.99

Figure 8F shows that SRSF1 (UniProt ID: Q6PDM2) is predicted to be a highly disordered protein. The disordered nature of this splicing factor is in agreement with recent bioinformatics studies, which showed that human100 and yeast spliceosomes101 are enriched in intrinsic disorder, and abundant disordered regions are crucial for functions of numerous auxiliary protein factors involved in the formation of this ribonucleoprotein complex.100,101 The potential functionality of the disordered regions found in mouse SRSF1 was further supported by ANCHOR-based analysis,15,16 which revealed that this protein contains 6 AiBSs, residues 34–43, 75–87, 123–129, 144–177, 186–197, and 221–245.

The members of human interferon-inducible transmembrane protein (IFITM) family

The roles of human interferon-inducible transmembrane proteins (IFITMs) in antagonizing viral infection were reviewed by Perreira et al.102 IFITMs are representatives of a diverse group of host restriction factors fighting viral replication at different steps of the viral life cycle. The expression of many restriction factors is transcriptionally controlled by the antiviral cytokine, interferon. The interferon-inducible transmembrane proteins (IFITM) 1, 2, and 3 that inhibit the replication of multiple pathogenic viruses are among the products of such interferon-stimulated genes.103-105 Among the variety of pathogenic viruses inhibited by IFITMs are influenza A virus (IAV) and influenza B virus, West Nile virus, dengue virus (DENV), severe acute respiratory syndrome coronavirus (SARS CoV), hepatitis C virus (HCV), Ebola virus (EBOV), Marburg virus (MARV), and multitude of the filoviruses, the flaviviruses, the bunyaviruses, and the rhabdoviruses.102

Replication of viruses requires breaching the plasma membrane of the host cell. The entry of the enveloped viruses relies on the specialized fusion proteins. There are 3 major classes of viral fusion proteins, all containing a fusion peptide that inserts into the cytolemma, thereby anchoring the 2 membranes side by side.106 Human IFITMs block the entry of viruses from each of the 3 classes of viral fusion proteins,102 by preventing viral-host membrane fusion subsequent to viral binding and endocytosis.104,107 In humans, there are 5 IFITMs (IFITM1, 2, 3, 5, and 10; corresponding UniProt IDs: P13164, Q01629, Q01628, A6NNB3, and A6NMD0). These proteins each contain 2 hydrophobic membrane-associated domains separated by a conserved intracellular loop. All of them also contain the cytosolically-located N-terminal domain (NTD) of various length.102

Figure 9 shows that the NTDs of all 5 human IFITMs are predicted to be intrinsically disordered. The likely functional role of these disordered tails is reflected in the fact that the longest NTD of IFITM10 contains 3 AiBSs, residues 8–29, 51–64, and 71–82. Shorter tails of other human IFITMs might also be involved in protein-protein interactions since all of them contain characteristic “dips,” which are commonly associated with the existence of specific molecular recognition features, MoRFs, which are short folding-prone fragments of disordered regions that are able to fold (at least partially) at interaction with specific binding partners.108-112

Chameleon region undergoing α-helix to β-strand transition in the mutant form of the type 2 ryanodine receptor domain A (RyR2A) linked with a heritable cardiomyopathy

Using the solution NMR spectroscopy and X-ray crystallography, Amador et al. studied the effects of mutations associated with arrhythmogenic right ventricular dysplasia type 2 (ARVD2) on the structure and dynamics of the N-terminal domain of the mouse ryanodine receptor RyR2A (UniProt ID: E9Q401).113 Ryanodine receptors (RyRs) are the largest tetrameric Ca2+ release channels (∼2.2 MDa) of the sarcoplasmic reticulum in the electrically excitable cells. Here, RyRs resides in close proximity to plasma membrane voltage-gated dihydropyridine receptors (DHPRs) and respond to the DHPR activity through a direct interaction and/or indirect Ca2+ sensitivity, propagating sarcoplasmic reticulum luminal Ca2+ release, thereby playing an integral role in excitation–contraction coupling in skeletal muscle cells and cardiomyocytes.114,115 Mutations in the RyR genes are associated with several heritable human diseases, with most disease-associated mutations making RyRs hypersensitive to activating stimuli such as Ca2+ on either the luminal or the cytosolic side of the receptor. In one of such mutations, the removal of exon 3 in RyR2 (RyR2AΔ3) results in a 35-residue deletion (N57-G91) that does not destabilize the protein.116,117 Earlier structural analysis revealed that the β-trefoil fold of this domain is rescued by the transformation of the part of the preceding region of unknown structure (residues 88–109) into the missing β-strand leading to the noticeable increase in the thermal stability of the RyR2AΔ3 domain.117 Using solution NMR analysis, Amador et al. showed that this “part of the preceding region of unknown structure” corresponds to a previously unresolved α2-helix (residues 95–104) in the wild-type RyR2A.113 Curiously, this α-helix undergoes structural transformation to a β-strand, thereby rescuing the β-strand deleted in the RyR2AΔ3 due to the removal of 3 in RyR2 gene. Therefore, the authors revealed that this newly discovered α2-helix corresponds to the region of missing electron density in the earlier crystal structures of the protein, is rather dynamic, exhibits a greater backbone RMSD compared to the remaining secondary structure elements, and is able to undergo an α-to-β transition after the deletion of residues N57-G91 in RyR2AΔ3.113 In other words, applying NMR to look at the crystallographically invisible region, an important chameleon region was discovered. Although this chameleon region is relatively small, it is really mighty, with at least some mightiness being attributed to its intrinsically disordered nature.

ATP-induced dimerization of the F0F1 ε subunit from Bacillus PS3

Rodriguez et al. used complementary biophysical techniques for examining the ATP-induced conformational switching of the isolated ε subunit of the F0F1 ATP synthase from the thermophile Bacillus PS3 (Tε).118 The authors showed that ATP binding induces large-scale conformational transition consistent with formation of stable helical structure in the isolated Tε.118 Based on the complimentary hydrogen/deuterium exchange (HDX) mass spectrometry analysis it has been concluded that this transition is accompanied by a pronounced stabilization in the vicinity of the ATP-binding pocket.118

Structurally, the ε subunit consists of a β-sandwich and 2 C-terminal helices, α1 and α2.119 Curiously, this protein is able to undergo dramatic structural rearrangements from a compact structure in the non-bound form (see Fig. 10A; PDB ID: 1AQT)119 to a highly extended open form in a complex with other subunits of the bacterial F1 complex (see Fig. 10B; PDB ID: 3OAA). The ability of the ε subunit to undergo large conformational change associated with binding to the F1 complex is determined by the presence of long IDPR at the C-terminal region (see Fig. 10C; UniProt ID: P0A6E6).

Functional implications of the cyclin CaPcl5 phosphorylation

Simon et al. investigated functional peculiarities of the Candida albicans cyclin CaPcl5.120 The authors showed that the activation of the cyclin-dependent kinase Pho85 that induces phosphorylation of the transcription factor CaGcn4 is controlled via the specific binding of CaPcl5 to Pho85. Furthermore, it is shown that CaPcl5 induces its own phosphorylation at 2 adjacent sites in the N-terminal region of the protein and that this phosphorylation causes degradation of the cyclin in vivo, whereas in vitro studies indicated that this phosphorylation was accompanied by a loss of specific substrate recognition thereby representing a novel mechanism for limiting cyclin activity.120

Activity of cyclin-dependent kinases (CDKs) is regulated via binding of ancillary subunits, the cyclins,121 which also participate in targeting the kinase to specific substrates.122 Similar to other members of the CDK family, a yeast (Saccharomyces cerevisiae) CDK Pho85 can bind up to 10 different Pho85 cyclins (Pcls),123 which defines the ability of this CDK to phosphorylate different substrates leading to a wide range of Pho85 functions in cellular regulation. Among various regulatory functions of Pho85 complexed with different cyclins, is the Pho85/Pcl5-driven phosphorylation of the bZIP transcription factor Gcn4 leading to its degradation.124 Curiously, it was shown that cyclin Pcl5 coevolved with its substrate Gcn4, and in the pathogenic yeast Candida albicans, degradation of CaGcn4 depends on CaPcl5.125

Mutagenic analysis of the CaPcl5 (UniProt ID: Q5AK08) revealed 2 potential CDK phosphorylation sites located in the N-terminal segment, at positions 38 and 43.120

Figure 11 shows that both of these phosphorylated sites are located within the disordered region, providing further illustration for an important notion stating that sites of posttranslational modifications in proteins are commonly located within the disordered regions.126

Nuclear localization of the alternatively spliced sirtuin-2 isoform

Rack et al. described a discovery of a new alternatively spliced isoform of the human sirtuin-2.127 The new isoform results from skipping of exons 2–4 in the SIRT2 gene.127

Sirtuin-2 is a member of a large family of protein-modifying enzymes, NAD+-dependent deacetylases, which are highly conserved throughout bacteria, archaea, and eukaryotes, and are heavily involved in modulation of various cellular processes ranging from metabolic homeostasis, to cell cycle control, to development, and to chromatin organization.128-133 In their turn, the numerous activities of sirtuins are regulated by alternative splicing, modulation of expression levels, posttranslational modifications or subcellular compartmentalization.134-137 In humans, there are 7 sirtuin homologues with distinct cellular locations. For example, Sirt1, Sirt6 and Sirt7 are primarily localized to the nucleus, whereas Sirt3, Sirt4 and Sirt5 are mitochondrial proteins, and Sirt2 is believed to have a predominant cytosolic localization.138

Earlier, 4 variants of human SIRT2 gene were reported, with alternative splicing affecting the N-terminal part of the sirtuin-2 protein in 2 isoforms (residues 1–37 are missing in the isoform-2 and the residues 1–38 (MAEPDPSHPLETQAGKVQEAQDSDSDSEGGAAGGEADM) are substituted by residues MPLAECPSC- RCLSSFRSV in the isoform-3), and with the isoform-4 possessing changes at the C-terminus, where residues 266–271 (VQPFAS) are substituted by GRGLAG, and residues 272–389 are missing. These isoforms contain a leucine-rich nuclear export signal (NES) within their N-terminal region which mediate their cytosolic localization139 where they have numerous functions. A recent finding a new isoform-5 with the predominant nuclear localization helped to resolve an apparent contradiction between the predominant cytosolic localization of this protein and the existence of multiple nuclear functions.127 The newly discovered alternative splicing event results in the substitution of the codons for amino acids 6–76 of the full-length ORF of isoform-1 by an arginine codon in a new isoform. This isoform-5 is predominantly localized in the nucleus, does not exhibit detectable deacetylase activity, but is properly folded and retains the ability to bind p300.127

It has been established that the protein segments affected by alternative splicing are most often intrinsically disordered, suggesting that alternative splicing enables functional and regulatory diversity while avoiding structural complications associated with the deletion of portions of the well-folded proteins.140 In line with these observations, later studies revealed that the tissue-specific protein segments produced by alternative splicing often contain disordered regions, are enriched in posttranslational modification sites, and frequently embed conserved binding motifs.141 Also, it was proposed that alternative splicing of intrinsically disordered regions containing linear interaction motifs and/or post-translational modification sites results in complete rewiring of protein interactions.142

Figure 12 shows that in agreement with these earlier observations the alternatively spliced isoform-5 of human sirtuin-2 differs from the canonical isoform (UniProt ID: Q8IXJ6) by lacking the intrinsically disordered N-terminal tail (cf. Figs. 12A and 12B). Curiously, the ANCHOR analysis15,16 of the human sirtuin-2 revealed that this alternative splicing event removed 2 AiBSs (located at the residues 1–15 and 36–49 of the isoform-1). Therefore, functionality of siruin-2 is modulated by alternative splicing that removes potential binding sites from the disordered region of this protein.

Role of the C-terminal tail in competitive binding of semaphorin-3 to neuropilin-1

Parker et al. investigated the role of the amino acid sequence at the C-terminal region of human semaphorin-3 in interaction of this protein with neuropilin-1.143 The neuropilin-1 is a member of the family of type I transmembrane receptors that are essential in development, homeostasis, and pathogenesis being involved in the coordination of several important signaling events in the cardiovascular and nervous system. The members of the class III Semaphorin (Sema3) family of axon guidance molecules are among the ligands neuropilins (Nrps). Ligand binding of Nrps is a subject of the intensive research due to the involvement of these transmembrane receptors in various pathogenic conditions. It has been established that all known Nrp ligands require a C-terminal arginine (CR) for binding to a conserved pocket in the Nrp b1 domain.144-147 To better understand the mechanism of interaction between the Nrps and their ligands and to understand the role of residues upstream of the CR, Parker et al. synthesized a library of semaphorin-3 derived peptides. This study revealed that the C-1 residue (i.e., residue preceding the CR) serves the critical role of positioning the CR and C-2 residues to promote concurrent Nrp binding.143 Bioinformatics analysis revealed that the analyzed sequence (WDQKKPRNRR) is a part of long intrinsically disordered tail that spans over the last 100 residues of human semaphorin-3 (UniProt ID: Q13275, see Fig. 9F).

Structure of the unmodified 3 Glu-OCN form of bovine osteocalcin

Malashkevich et al. reported the X-ray crystal structure of the unmodified 3 Glu-OCN form of bovine osteocalcin.148 This is an important contribution since it adds a crucial structural information on osteocalcin, which is a small, abundant noncollagenous protein synthesized by osteoblasts, and that typically contains 3 γ-carboxyglutamic acid (3 Gla-OCN) residues generated post-translationally in a vitamin K-dependent process149,150 by the vitamin K-dependent (VKD) carboxylase.151 Earlier, based on the circular dichroism study it has been concluded that in the presence of Ca2+, the 3 Glu-OCN molecule contained significantly less α-helical structure than the 3 Gla-OCN form.152 Furthermore, it was shown that Ca2+ no longer binds to decarboxylated, unmodified osteocalcin 153 Malashkevich et al. revealed that the crystal structure of the thermally decarboxylated bovine osteocalcin contained residues 17–47, was rather similar to 3 Gla Ca2+-OCN, consisted of 3 α-helices surrounding a hydrophobic core, and contained C23–C29 disulfide bond between 2 of the helices but did not contain bound Ca2+.148 This is an interesting finding since earlier work clearly showed that in their apo-forms, modified and non-modified osteocalcin from different sources are mostly disordered.148 The explanation for this apparent contradiction can be derived from the analysis of conditions used for the 3 Glu-OCN crystallization, where protein in 20 mM NaCl and 10 mM CaCl2 (pH 7.0) was mixed with the reservoir solution containing 2.5 M ammonium sulfate and 0.1 M Bis-Tris propane (pH 7.0) in 1:1 ratio.148 Therefore, high ionic strength solution was used to partially neutralize the high anionic charge in the helical regions of osteocalcin. Although this approach permitted better realization of the α-helix-forming potential of the non-modified osteocalcin, the crystallization conditions are clearly non-physiological.

Coordinated transcriptional regulation of the Hspa1a gene by multiple transcription factors

Sasi et al. investigated the peculiarities of the regulation of expression of the inducible heat shock protein HSPA1A at the transcription level by several transcription factors.154 HSPA1A, together with HSPA1B are the members of the human inducible heat shock protein family (Hsp70), altered expression of which has been attributed to the pathogenesis of various diseases, such as cancer, cardiovascular diseases, and neurodegenerative disorders.155-158 It is known that the heat shock response-related proteins (including Hsp70) can be expressed during normal conditions (e.g., during the cell growth and development) or can be induced by various pathological conditions, such as infection, inflammation, and protein conformation diseases. Sasi et al. show that transcription factors HSF-1, CREB, NF-Y, and NF-kB synergistically regulate expression of the heat-shock-induced gene Hspa1a.154

The initiation of the heat shock response is manifested by the activation of the heat shock transcription factors (HSFs), a family of related transcription factors which, in mammals, is composed of HSF-1, -2, -3, -4, -5, -Y and -X.159 cAMP response element binding protein (CREB) is the cAMP-regulated transcription factor that has been shown to stimulate target gene expression often via the associating with the coactivator paralogs P300 and CREB binding protein (CBP).160-162 Complex formation between CREB and CBP/P300 requires protein kinase A (PKA)-mediated phosphorylation of CREB at Ser-133.160 The nuclear transcription factor Y (NF-Y) is a sequence-specific DNA-binding protein that recognizes the Y box, which is a promoter element common to all major histocompatibility complex class II genes.163 Also, NF-Y is known to interact with a CCAAT box, which is one of the most common elements in eukaryotic promoters, found in the forward or reverse orientation.164 Furthermore, NF-Y has been reported to play a key role in the basal expression of many Hsps through this CCAAT box.165,166 NF-Y is heterotrimeric transcription factor composed of 3 components, NF-YA (subunit α, UniProt ID: P23511), NF-YB (subunit β, UniProt ID: P25208), and NF-YC (subunit γ, UniProt ID: Q13952), where the dimerization of NF-YB and NF-YC is a prerequisite for the NF-YA association and DNA binding.167 Finally, the transcription factor nuclear factor-kappa B (NF-κB; UniProt ID: 19838) is a key player in an intracellular signaling cascade regulating many inflammatory mediators.168 NF-κB is a common transcription factor present in almost all cell types and is the endpoint of a series of signal transduction events that are initiated by a vast array of stimuli related to many biological processes such as inflammation, immunity, differentiation, cell growth, tumorigenesis and apoptosis. NF-κB is a homo- or heterodimeric complex formed by the Rel-like domain-containing proteins RELA/p65, RELB, NF-κB1/p105, NF-κB1/p50, REL and NF-κB2/p52, and the heterodimeric p65-p50 complex.

Figure 13 represents the results of the multiparametric evaluation of intrinsic disorder propensities of HSF-1 (A, UniProt ID: Q00613), CREB (B, UniProt ID: P16220), NF-YA (C, UniProt ID: P23511), NF-YB (D, UniProt ID: P25208), NF-YC (E, UniProt ID: Q13952), and NF-kB (F, UniProt ID: P19838) and shows that these proteins are predicted to be highly disordered. The high levels of functional intrinsic disorder is a “family signature” of the transcription factors in general.169 Furthermore, the recent analysis of the abundance and importance of IDPRs in functions of various HSFs clearly indicated that the heat shock response requires HSF flexibility to be more efficient.170 Also, the importance of intrinsic disorder in functions of various proteins involved in the innate immune response (including NF-κB) has been recently analyzed.171

Phosphorylation sites in melanopsin

Blasic et al. analyzed the peculiarities of phosphorylation of the C-terminal domain of the photopigment melanopsin possessing 37 serine and threonine sites that are potential sites for phosphorylation by a G-protein dependent kinase (GRK).172 Melanopsin is a member of the G protein coupled receptor (GPCR) family that undergoes light-dependent phosphorylation that is involved in deactivation of the photoresponse. Blasic et al. revealed that of the 37 phosphorylation sites, a small cluster of 6 or 7 sites in the proximal region of the C-tail is critical for mediating deactivation.172

Figure 14A shows that mouse melanopsin (UniProt ID: Q9QXZ9) is predicted to have long disordered C-tail (residues 380–521). This is a rather expected output since phosphorylation sites are known to be preferentially located within the IDPRs.173-175

Human defensins

A recent review of Wisons et al. is dedicated to the antiviral activities of human defensins.176 These peptides with a wide range of antimicrobial activities are important components of the innate immune system. Among various antiviral mechanisms of human defensins are direct targeting of viral envelopes, glycoproteins, and capsids, inhibition of viral fusion and post-entry neutralization, inhibition of viral replication via disruption of host intracellular signaling and binding and modulation of host cell surface receptors, and augmentation and altering of adaptive immune responses.176 Of the 2 types of defensins, α- and β-defensins, found in human, significant knowledge is accumulated about the molecular mechanisms of the α-defensin action, whereas molecular details on the β-defensin activities are more disperse.176

Defensins are small (∼29–129 amino acids) cationic, amphipathic polypeptides with a predominantly β-sheet structure stabilized by 3 disulfide bonds.176 Human α-defensins are typically shorter than β-defensins, with α-defensins staying in a range of 29–39 residues and some β-defensins can being as long as 129 residues. Search of UniProt for human defensins produces about 40 hits, of which 6 entries were α-defensins. Defensins are produced in a form of proprotein containing a signal peptide and/or a propeptide. Figure 15 represents disorder profile for several representative members of the human defensin family and shows that all proproteins possess noticeable amount of intrinsic disorder, the content of which varies significantly between different defensins. It is likely that the presence of noticeable disordered tails plays a role in the maturation of defensins since sites of the proteolytic attack are commonly located within regions of intrinsic disorder.177 It is also possible that these disordered tails serve as entropic bristle domains178 defining solubility of pro-defensins. In β-defensins, long disordered regions can have other functions. For example, the ANCHOR analysis15,16 of human β-defensin 129 (UniProt ID: Q9H1M3) revealed that this longest member of the human defensin family (there are 129 amino acids in a mature protein and 183 residues in a pro-protein) contains 4 AiBSs, residues 78–87, 89–100, 117–122, and 149–164.

Inhibition of E2F transcription factor by the phosphorylated C-terminal domain of the C-terminal domain of retinoblastoma protein

Burke et al. investigated the roles of the phosphorylation of the C-terminal domain of the retinoblastoma protein (RbC) in interaction of Rb with the E2F transcription factor and related regulation of the Rb activities in growth suppression.179

Deregulation of the broad-functioning tumor suppressor retinoblastoma protein (Rb) is associated with several human cancers.180,181 Among numerous functions ascribed to this important protein is a negative regulation of cell division at the G1–S transition of the cell cycle,182 where Rb forms a growth-repressive complex with E2F transcription factors183 in a phosphorylation-dependent manner. Burke et al. showed that there are at least 2 different mechanisms by which RbC phosphorylation inhibits E2F binding. Here, phosphorylation of S788 and S795 weakens the direct association between the RbC and the marked-box domains of E2F, whereas phosphorylation of S788, S795, S807 and S811 induces an intramolecular association between RbC and the pocket domain, which overlaps with the site of E2F transactivation domain binding.179

Figure 14B shows that human retinoblastoma-associated protein (UniProt ID: P06400) is predicted to have numerous disordered regions, including long disordered tails. The longest disordered region is the C-terminal tail (residues 760–928) that contains all the phosphorylation sites discussed above. According to the ANCHOR analysis,15,16 disordered tails of Rb are enriched in potential binding sites located at residues 7–20, 41–49, 57–64, 830–840, 844–860, 872–881, and 891–919. N-terminal tail of Rb contains 2 regions of compositional bias (poly-Ala and poly-Pro regions, residues 10–18 and 20–29, respectively). Known sites of the Rb interaction with LIMD1 and E4F1 are located within the intrinsically disordered C-terminal tail (residues 763–928 and 771–928, respectively). The fact that phosphorylation sites of the RbC are located within the disordered region is in agreement with the general trend, where sites of many catalytically-induced posttranslational modifications including phosphorylation173 and ubiquitination184 are typically found in regions of intrinsic disorder.126,174,185-187

The tetratricopeptide repeat (TPR) motif-containing protein LGN and its binding partner Frmpd1 (FERM and PDZ domain containing 1)

Pan et al. described structural peculiarities of the human LGN-Frmpd1 complex.188 To this end, a crystal structure of the complex between the 15–350 fragment of human LGN containing 8 tetratricopeptide repeat motifs (UniProt ID: P81274) and the 901–951 fragment of human Frmdp1 (UniProt ID: Q5SYB0) was solved at 2.4 Å resolution.188 In this LGN-TPR/Frmpd1 complex, almost the entire length of LGN-TPR was well resolved, and most residues of the Frmpd1 fragment were clearly resolved adopting an extended conformation that occupied most of the concave channel formed by the 8 TPR motifs and buried a total of 2541 Å2 surface area.188 The noticeable exceptions were the 11 residues in the loop connecting the αA and αB of TPR3 (amino acids 153–163) and the 6 residues at the C terminus (amino acids 345–350) in the LGN-TPR, and the N-terminal 8 residues (amino acids 912–919) and the last 2 residues at the C terminus (amino acids 937–938) of the Frmpd1 fragment, all of which were missing in the structure of the LGN-TPR/Frmpd1 complex.188

Leu-Gly-Asn repeat-enriched protein LGN/AGS3 in mammals (Pins, or partner of inscuteable in Drosophila neuroblasts) plays a number of crucial roles in regulation of cell polarity and spindle orientation during cellular differentiation and self-renewal in multicellular eukaryotes development.189-191 Structurally, LGN consist of 8 tetratricopeptide repeat (TPR) motifs in its N-terminal half and 3 or 4 GoLoco motifs (also referred to as G-protein regulatory or GPR motifs) in the C-terminal half.192-194 Both of these motifs are crucial for protein-protein interactions,195,196 with each GoLoco motif of Pins/LGN being capable of binding to GDP-bound Gαi197 leading to stable cortical localization of Pins/LGN,198 and with the TPR motifs possessing multiple binding partners. The canonical TPR motif is a 34-amino-acid protein–protein interaction module multiple copies of are found in a wide range of proteins with diverse functions such as cell cycle regulations, gene transcription and splicing processes, protein trafficking, and protein folding.195,199

Figure 15A shows that human LGN protein is involved in a wide range of protein-protein interactions, clearly serving as an important hub protein. This interactivity analysis is done by STRING.59

Frmpd1 protein (FERM and PDZ domain containing 1) is known to serve as a regulatory binding partner of AGS3, with a short fragment of Frmpd1 (amino acids 901–938) being shown to bind to the 8 TPR motifs of AGS3.200 Also, a 50-residue fragment (amino acids 901–951) of Frmpd1 was shown to bind to LGN with a Kd∼1 μM.188,194

Curiously, besides the mentioned above regions of missing electron density in the LGN-TPR/Frmpd1 complex Figures 15B and 15C shows that full-length LGN and Frmpd1 proteins both belong to the category of hybrid proteins possessing some ordered domains and significant number of intrinsically disordered protein regions (IDPRs). In fact, the GoLoco motif containing C-terminal half of human LGN (residues 350–684) and the major portion of Frmpd1 (residues 511–1390) are predicted to be mostly disordered by all the computational tools used in this study. Overall, more than 40% of the LGN residues and more than 50% of the Frmpd1 residues are predicted to be disordered, which clearly places these proteins to the category of highly disordered proteins.

Furthermore, both proteins are expected to have numerous disorder-based interaction sites. These can be identified by ANCHOR,15,16 a computational tool that relies on the pair-wise energy estimation approach developed for the general disorder prediction method IUPred,201,202 and is based on the hypothesis that long regions of disorder contain localized potential binding sites that cannot form enough favorable intra-chain interactions to fold on their own, but are likely to gain stabilizing energy by interacting with a globular protein partner.15,16 Here the term ANCHOR-indicated binding site (AiBS) is used to identify a region of a protein suggested by the ANCHOR algorithm to have significant potential to be a binding site for an appropriate but typically unidentified partner protein. This analysis revealed that there are 10 AiBSs in LGN (residues 370–379, 436–452, 491–499, 508–513, 531–537, 545–553, 566–573, 596–607, 629–638 and 661–671), whereas Frmpd1 contains 23 AiBSs (residues 534–554, 609–617, 622–636, 652–662, 704–715, 747–779, 792–804, 828–878, 896–909, 932–949, 958–967, 983–1019, 1023–1077, 1087–1100, 1106–1114, 1119–1129, 1138–1171, 1192–1220, 1245–1250, 1290–1297, 1312–1319, 1336–1342, and 1424–1431). Of special interest is the fact that the 901–951 fragment of human Frmdp1 used in the mentioned structural analysis188 is predicted to overlap with 2 AiBSs (residues 896–909 and 932–949). This is a clear indication that intrinsic disorder plays a role in the formation of the LGN-TPR/Frmpd1 complex.

Structural and functional analysis of human sirtuin-1

Davenport et al. reported a crystal structure of human sirtuin-1catalytic domain (residues 234–510) in complex with its C-terminal regulatory segment (CTR, residues 641–665).203 In this structure (see Fig. 17A, PDB ID: 4IG9), the catalytic NAD+-binding domain adopts the canonical sirtuin fold, whereas CTR forms a β-hairpin structure complementing the β-sheet of the catalytic domain and covering an essentially invariant hydrophobic surface.203

Biological significance of the sirtuin family was outlined in the previous section. Human sirtuin-1 (Sirt1) is implicated in a wide range of human diseases due to the fact that this enzyme plays multiple roles in cellular processes ranging from energy metabolism to cell survival via it ability to deacetylate a wide range of substrates, such as p53, NF-κB, FOXO transcription factors, and PGC-1α.204-206 Deacetylation activity of Sirt1 was shown to be affected by various regions located within the long N- and C-termini that flank the Sirt1 catalytic domain.207,208

Figure 17B illustrates that in agreement with a recent bioinformatics study revealing that N- and C-terminal segments of sirtuins in all known organisms are expected to be intrinsically disordered,209 both termini of human Sirt1 (UniProt ID: Q96EB6, residues 1–250 and 500–747) are predicted to be mostly disordered, whereas the central catalytic domain is mostly ordered. The ANCHOR analysis15,16 revealed that human Sirt1 contains 14 AiBSs located at the residues 1–33, 43–49, 53–125, 133–169, 184–193, 220–225, 498–503, 521–528, 549–574, 588–593, 618–624, 637–672, 691–706, and 710–744. It is important to emphasize here that the CTR (residues 641–665) used in the mentioned above structural analysis of human Sirt1203 is completely embedded within one of the AiBSs (residues 637–672).

The endosome-associated deubiquitinase AMSH

Davies et al. performed a systematic mutational analysis to elucidate the molecular mechanisms of activation of AMSH [associated molecule with a Src homology 3 domain of signal transducing adaptor molecule (STAM)], a deubiquitinating enzyme (DUB) with exquisite specificity for Lys63-linked polyubiquitin chains and recruitment of this protein to the endosomal sorting complexes required for transport (ESCRT) machinery.210 The fact that the mutations AMSH were implemented in microcephaly capillary malformation (MIC-CAP) syndrome in children211 further reiterates the importance of this study. AMSH is a member of the JAMM (JAB1/MPN/MOV34) family of DUBs involved in the regulation of ubiquitin signaling by catalyzing the hydrolysis of isopeptide (or peptide) bonds between ubiquitin and target proteins or within polymeric chains of ubiquitin.212 The catalytic activity of AMSH is stimulated upon binding to STAM, which is a member of the ESCRT-0 complex. Here, the AMSH-STAM interaction is conducted through the binding of the SH3 binding motif of AMSH to the SH3 domain of the STAM.213

Davies et al. used in their analysis the catalytic domain of AMSH (residues 219–424).210

Figure 14C represents the results of the disorder analysis of human AMSH (UniProt ID: O95630) and shows that this protein contains a long IDPR (residues 90–250) thereby illustrating that the N-terminal part of the analyzed catalytic domain is predicted to be disordered. According to the ANCHOR analysis, AMSH contains a disorder-based binding site at position 205–213 that is located in the close proximity to the known STAM-binding motif of this protein. Search of the ELM server (http://elm.eu.org/)214 for functional eukaryotic linear motifs (ELMs) revealed that AMSH has several ELMs in the 200–250 region, such as WDR5 WD40 repeat-binding ligand (residues 199–213), FHA phosphopeptide ligand (residues 208–214), SH3 ligand (residues 224–230), PCSK cleavage site (residues 235–239), cyclin recognition site (residues 235–239), GSK3 phosphorylation sites (residues 2070219 and 240–247) among several other ELMs.

Interactions of the transperiplasmic protein TonB with outer membrane transporters BtuB, FecA, and FhuA

Freed et al. looked at the peculiarities of interaction between the E.coli transperiplasmic protein TonB with outer membrane transporters BtuB, FecA, and FhuA.215

TonB is the E.coli inner membrane protein, which possesses a polyproline motif (residues 70–81) that may span the length of the periplasmic space216 and a globular C-terminal domain (residues 150–239) that interacts with the transporters.217,218

Figure 18A shows that the N-terminal half of the periplasmic domain of TonB protein (UniProt ID: P02929) is predicted to be highly disordered, whereas its C-terminal domain possesses significant amount of order. Figure 18B illustrates an important point that the dimer formed by the truncated periplasmic domain (residues 150–239) is a highly intertwined and elongated structure with noticeable domain swapping. A crystal structure of the TonB-FhuA complex is shown in Figure 18C. Curiously, although the entire periplasmic domain of TonB was used in the crystallization experiment, residues 39–157 and 236–239 are not seen in the resulting structure, clearly indicating that these regions are likely to preserve mostly disordered structure even in the bound form of TonB. The disordered N-terminal and C-terminal tails clearly have functional importance since they contains AiBSs, 3 at the N-terminus (residues 39–65, 92–95, and 124–164) and one at the C-terminus (residues 232–239). One more AiBS is predicted at residues 171–188.

Structure and function of human DnaJ homolog subfamily A member 1 (DNAJA1)

Stark et al. performed functional analysis of the important human chaperone, DnaJ homolog subfamily A member 1 (DNAJA1), and solved solution structure of its J-domain (residues 1–67).219 Human DNAJA1 has multiple important functions and act as protein chaperone, regulator of androgen receptor signaling, and activator of the DnaK protein. Furthermore, levels of the human DNAJA1 serve as a biomarker for pancreatic cancer, since the expression of this protein in pancreatic cancer cells is downregulated fold5-.220 NMR analysis revealed that the structure of the human DNAJA1 J-domain consists of 4 α-helices, residues 17–21 (α1), 29–42 (α2), 52–65 (α3), and 68–75 (α4) (see Fig. 19A; PDB ID: 2M6Y).219 On the contrary, the evaluation of intrinsic disorder propensity of the full-length protein suggested that DNAJA1 (UniProt ID: P31689) possesses significant amount of disorder throughout its N- and C-terminal tails (see Fig. 19B). Curiously, by ANCHOR analysis, there are 3 disorder-based binding sites in DNAJA1, residues 26–31, 85–93, and 392–397, with binding site #2 being overlapped with α2.

Structure and functions of the purine-rich element binding protein B (Purβ)

In a recent comprehensive study, Romora et al. showed that the purine-rich element binding protein B (Purβ) serves as a suppressor of myofibroblast differentiation and ACTA2 repression.221 Purα and Purβ are members of a small family of nucleic acid-binding proteins that interact with purine-rich ssDNA or RNA sequences homologous to the so-called PUR element originally described in eukaryotic gene flanking regions and origins of DNA replication.222-224

Figure 14D shows that human Purβ protein (UniProt ID: O35295) is predicted to possess significant amount of functionally important intrinsic disorder. Importance of the long disordered regions in this protein is supported by finding 10 AiBSs (residues 1–6, 18–27, 47–75, 87–95, 99–106, 135–137, 142–144, 186–201, 243–256, and 277–287). In agreement with these predictions, the authors showed that several regions of Purβ were (residues 41–112, 125–210, and 125–303) were intrinsically unstable.221

The host restriction factor tetherin

Hotter et al. provided a comprehensive review of the antiviral role of one of the host restriction factors (i.e., specific cellular antiviral factors that inhibit retroviral replication at different steps of the viral life cycle), human tetherin.225

Among various antiviral mechanisms ascribed to tetherin are the inhibition of the release of diverse enveloped viruses by tethering them to the cell surface, induction of an inflammatory response, activation of NF-κB, action as an innate immune sensor of viral infections, activation of immune response through interactions with the immunoglobulin-like transcript 7 (ILT7, LILRA4).225 As commonly seen in the virus-host arm race, effective antiviral strategies of the host are counterbalanced by the development of novel invasive strategies, and various viruses have evolved antagonists against this host restriction factor, such as accessory HIV-1 (human immunodeficiency virus type 1) Vpu protein that counteracts the effects of tetherin.225

Tetherin is a type II transmembrane protein that contains both an N-terminal transmembrane region and a C-terminal glycosyl-phosphatidylinositol (GPI) anchor.226 Structurally, human tetherin exists as a disulfide-linked parallel homodimer, which is formed via the extracellular part that is involved in the formation of a canonical coiled-coil structure, where a long α-helix contains 3 cysteines that form disulfide bonds with the corresponding cysteines of a second tetherin molecule.227 It is believed that tetherin-caused viral retention relies on this unusual architecture, where one transmembrane anchor may remain in the cellular plasma membrane, whereas the other is able to stick to the viral membrane.228,229

Figure 20A represents the results of the multi-tool disorder predictions for human tetherin (also known as bone marrow stromal antigen, UniProt ID: Q10589) and shows that this protein is expected to be mostly disordered. This is not a very surprising observation since coiled-coil proteins are expected to be intrinsically disordered in their monomeric states and fold to a helical structure at the formation of coiled-coil structure caused by the interaction with corresponding partners.230

Cyclin D1/Cdk2 complex

To clarify some uncertainties in previous studies regarding formation and activities of the cyclin D1/cyclin-dependent kinase 2 (Cdk2) complexes Jahn et al. utilized a novel p21-PCNA fusion protein and p21 mutant proteins to understand the mechanisms of the cyclin D1/Cdk2 complex formation.231 The authors show that p21 serves as an important scaffolding protein, which is required for the formation of the functional cyclin D1/Cdk2 complex, since cyclin D1 and Cdk2 failing to complex in its absence.231

Figure 20B shows that the scaffolding protein p21 (which is a cyclin-dependent kinase inhibitor 1, UniProt ID: P38936) is predicted to be mostly disordered and packed with disorder-based binding sites. In fact, there are 5 AiBSs in this relatively small protein, residues 36–41, 68–79, 100–105, 109–123, and 145–164. Using this approach, the authors were able to identify a novel Cdk2 substrate, polypyrimidine tract binding protein-associated splicing factor (PSF). Figure 20C shows that PSF (UniProt ID: P23246) is predicted to have highly disordered tails. ANCHOR analysis15,16 revealed that intrinsic disorder is crucial for the PSF functions, since this protein was predicted to have 16 AiBSs (residues 1–72, 93–97, 103–146, 154–164, 171–232, 235–257, 271–287, 293–304, 430–439, 481–496, 523–541, 579–581, 596–605, 612–678, 687–696, and 703–707).

Human and mouse apolipoproteins A-I

Nguen et al. analyzed mechanistic peculiarities of the interaction between the high density lipoprotein (HDL) particles and human or mouse apolipoprotein A-I (ApoA-I).232The authors established that the C-terminal domains (CTDs) of human and mouse ApoA-I proteins are the major players facilitating interactions of these proteins with HDL. Curiously, human CTD, being a bit more hydrophobic than mouse CTD, binds to the HDL particle with higher affinity. On the other hand, the isolated N-terminal helix bundle domains (residues 1–190) of human and mouse ApoA-I proteins binds HDL poorly.232 ApoA-I stabilizes discoidal HDL particles by forming a double-belt structure233,234 and plays a major role in the reverse cholesterol transport pathway.235,236

ApoA-I contains a globular N-terminal domain (residues 1–43) and a lipid-binding C-terminal domain (residues 44–243).234 In the belt-like structure of the smallest discoidal HDL, 2 apoA-I molecules wrap around a small patch of bilayer containing 160 lipid molecules. The C-terminal domain of each monomer is ring-like, curved, planar amphipathic α-helix with the hydrophobic surface curved toward the lipids.234

Figure 21 shows that both human and mouse ApoA-I proteins (UniProt IDs: P02647 and Q00623) are predicted to be mostly disordered, with mouse protein possessing a bit more disordered structure. According to the ANCHOR analysis, there are 3 AiBSs in both proteins (residues 15–21, 158–164, and 221–230 in human protein, and residues 13–18, 173–204, and 207–230 in mouse ApoA-I). Figure 21C represents a structure of the human ApoA-I dimer in a complex with lipids (PDB ID: 3K2S). This model was built by uniting several synergistic experimental techniques, such as small angle neutron scattering (SANS) with contrast variation, isotopic deuteration of selected macromolecule components, and hydrogen/deuterium exchange tandem mass spectrometry (HD-MS/MS).237 In agreement with the results of disorder predictions, the ApoA-I dimer represents a highly intertwined double superhelix with a minimal number of intrachain contacts, which is stabilized mostly by the interchain interactions. In other words, this complex is characterized by a very large interface area, and, therefore, according to Gunasekaran et al.238 it belong to the category of 2-state dimers, where the monomers are unfolded in the unbound state and fold simultaneously with the complex formation.

Function and regulation of MAVS

Jacobs and Coyne provided an important overview of the mitochondrial antiviral signaling protein (MAVS, also known as IPS-1/VISA/Cardif).239 MAVS is the mitochondrially-located innate immune signaling adaptor regulating and coordinating signals received from 2 independent cytosolic pathogen recognition receptors to induce antiviral genes. MAVS can be regulated by host cell factors that inhibit MAVS signaling by direct protein–protein interactions, by altering mitochondrial properties or dynamics, or by post-translational modifications.239 The tip of the MAVS C-terminus contains a mitochondrial intermembrane region (residues 535–540) needed for the interaction with the mitochondrial membrane. According to UniProt, human MAVS (UniProt ID: Q7Z434) is involved in a wide array of protein-protein interactions. Among established MAVS binding proteins are: DDX58/RIG-I, IFIH1/MDA5, TRAF2, TRAF6, C1QBP, IRF3, FADD, RIPK1, CHUK, IKBKB, HCV and hepatitis GB virus B NS3/4A proteases, HHAV protein 3ABC, NLRX1, PSMA7, TRAFD1, PCBP2, IPS1, ITCH, CYLD, SRC, DHX58/LGP2, DDX58/RIG-I, IKBKE, TMEM173/MITA, IFIT3, TBK1, and human respiratory syncytial virus (HRSV) NS1 protein (http://www.uniprot.org/uniprot/Q7Z434). MAVS contains multiple alternatively spliced isoforms, sites of posttranslational modifications, and a proline-rich region (residues 103–153).

In other words, MAVS satisfies all the major criteria to be considered as an intrinsically disordered protein. Figure 22A represents the vast interactome of MAVS evaluated by STRING. Figure 22B shows that human MAVS is predicted to be a highly disordered protein. ANCHOR analysis15,16 revealed that it contains 8 AiBSs located at the residues 123–148, 156–268, 277–290, 194–325, 338–377, 397–410, 416–449, and 463–479. Some of the AiBSs coincide or overlap with known functional sites of the human MAVS, e.g., regions of interaction with TRAF2 (residues 143–147), TRAF6 (residues 153–158 and 455–460). All this clearly indicates that MARV is a crucial intrinsically disordered adaptor of the innate immune signaling.

Regulation of human myosin light chain phosphatase via phosphorylation at 2 sites of the regulatory subunit

Khasnis et al. analyzed the mechanisms and functional outputs of the phosphorylation/dephosphorylation events at Thr696 and Thr853 sites of the MYPT1 protein, which is a regulatory subunit of the myosin light chain phosphatase (MLCP).240 MLCP is a cytoskeleton-associated protein phosphatase-1 (PP1) that serves as a RhoA/ROCK effector, regulating dynamic reorganization of the cytoskeleton crucial for cell motility. The authors also showed that the C-terminal domain of human MYPT1 (residues 495–1030) was responsible for the binding to the N-terminal portion of myosin light meromyosin.240 Promiscuous interactability of MYPT1 is illustrated by Figure 23A representing the results of STRING analysis. Figure 23B shows that the very significant part of MYPT1 (UniProt ID: O14974) is expected to be intrinsically disordered, with longest disordered region covering ∼72% of protein length (residues 290–1030) and containing 18 disorder-based binding sites (residues 318–338, 374–403, 406–419, 434–454, 472–477, 492–506, 548–564, 579–588, 626–649, 666–673, 697–712, 754–774, 780–815, 853–860, 887–892, 897–917–932–943, and 1021–1030). Clearly, intrinsic disorder is crucial for function of this protein.

GerD forms a novel trimeric superhelical rope fold

Li et al. investigated structural peculiarities of the core polypeptide of the inner membrane GerD lipoprotein from Geobacillus stearothermophilus.241 Some bacteria are able to form endospores in response to adverse growth conditions.242,243 Spores formed as a result of such sporulation process are extremely resistant to various environmental insults thereby providing the bacteria with an important means to exist in the metabolically dormant state indefinitely and remain viable for hundreds of years without water or nutrients.244,245 Restoration of the favorable conditions triggers spore germination leading to the fast (within minutes) “awakening” of the normal metabolism in bacteria followed by outgrowth to generate growing cells.243,244,246 This awakening is controlled by the specific nutrient sensors, cognate germinant receptors (GRs) located in the inner membrane of the spore.

GerD is one of such bacterial cognate germinant receptors that trigger spore germination in the presence of specific nutrients called germinants in the environments of the spores.241 Li et al. have determined the crystal structure of the 121-residue core domain of the Geobacillus stearothermophilus GerD protein (GerD60-180) that lacks the N-terminal signal and lipobox sequences (residues 1–28), the H01 and H02 helices (residues 29–59), and the C-terminal acidic tail (residues 181–195).241 By a set of biophysical techniques, this GerD60-180 was shown to form a stable, well-ordered 3-helix bundle in solution. In crystal structure, GerD60-180 trimer consists of 3 parallel polypeptide chains twisted into a superhelical, right-handed rope (PDB ID: 4O8W).241 In this elongated trimer, each of the individual GerD60-180 chains forms 8 helices (H1 to H8) linked by short turns. Structures of the individual chains can be superimposed on each other except to the loosely packed H8 helices. Helices H1–H7 in each GerD60-180 chain twist around the central axis to form 2 complete turns of a right-handed supercoil with a pitch of ∼44Å. Figure 24A represents crystal structure of the GerD60-180 trimer as a molecular surface and ribbon diagram and also shows the highly extended structure of one of the GerD60-180 monomers.241 Analysis of the overall shapes of the highly intertwined and extended monomers within the GerD60-180 trimer suggests that the formation of this trimer represents an example of the folding-upon-binding process. In agreement with this hypothesis, Figure 24B shows that the core domain of the Geobacillus stearothermophilus GerD protein (UniProt ID: Q5L3Q1) is mostly disordered.

Prokaryotic ubiquitin-like protein (Pup), the Pup ligase PafA and bacterial proteasome

Forer et al. investigated the interaction between the prokaryotic ubiquitin-like protein (Pup), the proteasome regulatory subunit Mpa (mycobacterium proteasome ATPase), and proteasome accessory factor A (PafA), an enzyme that forms an isopeptide bond between the γ-carboxylate of a glutamate at the C-terminus of Pup and the ε-amine of a substrate lysine.247 The authors show that Pup is involved in simultaneous interaction with both Mpa and PafA, and that Mpa forms a complex with PafA, suggesting that PafA and the proteasome acts as a modular machine for the tagging and degradation of cytoplasmic proteins.247

Figure 25A represents a crystal structure of 3 Pup molecules bound to the Mpa hexamer (PDB ID: 3M9D), and the results of disorder prediction for the Mycobacterium tuberculosis Pup (UniProt ID: P9WHN5) are shown in Figure 25B. Pup is obviously a highly disordered protein that partially folds at binding. In fact, of 68 residues used in the crystallization experiment, only 31 are visible in structure, whereas remaining 37 residues (1–20 and 52–68) are located within the regions of missing electron density. According to the ANCHOR analysis, Pup possesses 2 disorder-based binding regions, residues 1–15 and 34–64. Curiously, Forer et al. suggested that Pup has 2 binding sites for RafA located at residues 38–47 and 51–58, and that the first RafA binding site is overlapped with the region responsible for the Mpa binding (residues 21–51).247

The nucleolar PICT-1/GLTSCR2 protein

Borodianskiy-Shteinberg et al. studied self-association of the human protein interacting with carboxyl terminus-1 (PICT-1), also known as the glioma tumor suppressor candidate region 2 gene product (GLTSCR2), a nucleolar protein with unknown function.248 PICT-1/GLTSCR2 is conserved among eukaryotes and is assumed to be essential for preimplantation embryogenesis and embryonic stem cell survival and proliferation.249 It was suggested that PICT-1 might act as a potential tumor suppressor, being involved in interaction with the C-terminal region of the tumor suppressor phosphatase and tensin homolog (PTEN), promoting its phosphorylation and stabilization.250 It was also suggested that PICT-1 can function as an oncogene through the inhibition of p53.249

Figure 26A represents the results of the evaluation of the interactivity of the human PICT-1/GLTSCR2 (UniProt ID: Q9NZM5) by STRING and shows a dense interaction network of this intriguing protein. Figure 26B provides an explanation for the binding promiscuity of PICT-1 by showing that this protein is predicted to be highly disordered.

Recognition of modified tRNA by the HIV-1 nucleocapsid protein p7 and related peptides

Spears et al. studied the molecular mechanisms underlying specific recognition of highly post-transcriptionally modified human tRNALys3UUU (which is the primer for HIV replication) by the HIV-1 nucleocapsid protein p7.251 It is known that p7 recruits htRNALys3UUU from the host cell by binding to and remodeling the tRNA structure. An intrigue here is in the fact that htRNALys3UUU is one of the most uniquely processed tRNAs that has a broad spectrum of chemically different post-transcriptional modifications playing crucial roles in regulation of the conformation and function of this tRNA during protein synthesis.252 Using phage display approach, Spears et al. showed that p7 contains a specific htRNALys3UUU–binding region whose interaction with fully modified tRNA can be mimicked by the 15- and 16-amino acid peptides with the signature sequence of R-W-Q/N-H-X2-F-Pho-X-G/A-W-R-X2-G (where X can be most amino acids, and Pho is any hydrophobic residue).251

HIV-1 nucleocapsid protein p7 is a 55-residues-long protein containing 2 zinc finger domains flanked by basic amino acids required for interaction with nucleic acids.253,254 In addition to the discussed above capability of p7 to specifically recognize htRNALys3UUU, the major function of p7 is to bind specifically to the packaging signal of the full-length viral RNAs and to deliver them into the assembling virion.255 As it is a highly charged basic protein, p7 binds single-stranded nucleic acids nonspecifically. Consequently, it coats the genomic RNA protecting it from nucleases and compacting viral RNA within the core. Protein p7 also serves as an RNA chaperone that enhances several nucleic acid-dependent steps of viral life, such as melting RNA secondary structures, promoting DNA strand exchange reactions during reverse transcription,256-258 and stimulating integration.259

In the NMR solution structure of p7, regions corresponding to the 2 zinc fingers (residues 15–28 and 36–49) possess well-resolved structures, whereas residues 1–13, 32–34, and 52–55 were highly dynamic and did not converge to the unique conformations.260,261 Recent intrinsic disorder propensity analysis revealed that p7 is a highly disordered protein, with regions corresponding to the zinc fingers predicted to be more ordered than the remainder of the protein and identified as potential α-MoRFs.262

Concluding Remarks

This article represents a set of papers dealing with interesting and important proteins with diverse functions from various organisms. The only uniting theme for all these studies is the notion that they represent noticeable disorder overlooks. In fact, all proteins studied there contain intrinsic disorder. The amount of disorder ranges, and in some proteins only relative short regions are disordered or highly flexible, whereas some other proteins are almost entirely disordered. However, very often this disorder has functional implementations, and consideration of studied proteins in light of intrinsic disorder often provides interesting mechanical clues on the molecular mechanisms of their action.

We hope that the readers will find this new series useful, since it might increase the awareness of the scientific community about importance of intrinsic disorder for protein function. We also hope to see submissions from readers who found some important intrinsic disorder overlooks in published articles. Obviously, papers in this seris will range in their size, and we envision several types of submissions, such as comprehensive reviews of unreported disorder in papers published in particular journals (i.e., similar to this review), reviews of unreported disorder in specific areas of protein-related research, as well as brief notes about overlooked disorder in individual papers. The biggest hope though is that this series will become obsolete, and protein intrinsic disorder will find its way to be robustly present in the scientific literature.

Disclosure of Potential Conflicts of Interest

No potential conflicts of interest were disclosed.