The tertiary architectures RNAs adopt are crucial for modulating gene expression across all domains of life, making them important targets of structural and dynamics studies. For instance, for riboswitches, the presence or absence of specific ligands drives the folding of one of two or more mutually exclusive, regulatory states (1–4). In viral RNA genomes, structured, untranslated regions commonly exercise direct control over viral gene expression (5,6). In the ribosome, the ability to distinguish between cognate and near-cognate tRNAs is governed in part by the extrahelical flipping of adenines A1492 and A1493 (7). Both the global architecture and the subtle motions of specific base and ribose moieties are thus demonstrably important and can profoundly modulate an RNA's function (8).
However, in spite of this importance, directly establishing how dynamics modulates the structure and function of RNAs has been difficult because X-ray crystallography and nuclear magnetic resonance spectroscopy are plagued by distinct but equally challenging problems. In crystallography, motions can only be observed in the ps–ns timescales and the strain imposed by crystal packing can obscure and distort structural data (9,10).
In contrast, NMR spectroscopy can probe dynamic fluctuations directly over a wide range of timescales. Unfortunately, NMR suffers from both narrow chemical shift dispersion and rapid signal decay, exacerbated by direct one-bond and multi-bond spin-spin couplings. The former leads to spectral crowding and the spin-spin couplings can lead to decreased spectral resolution and inaccurate measurements of 13C relaxation rates such as longitudinal relaxation rates (R1), transverse relaxation rates (R2), and heteronuclear Overhauser effect (hNOE) (11–13). Furthermore, these problems become more pronounced as the size of the RNA increases: the spectral quality deteriorates because of increased line broadening.
Addressing these problems requires the development of new technologies. In the past, spectral overlap has been addressed using heteronuclear multi-dimensional pulse sequences applied to uniformly 13C/15N labeled RNAs (14–16). By spreading the poorly dispersed proton resonances over the better resolved carbon and/or nitrogen dimensions, it is possible to resolve overlapped proton peaks in small RNAs. While these advances have greatly aided NMR structural studies of RNAs with a median size of 30 nt, they fail for RNAs larger than 60 nt. Out of 460 RNA structures in the PDB (Protein Data Bank), only seven RNA structures with sizes >60 nt have been solved by NMR (17–25). Of these seven, three RNA structures of 101, 132 and 155 nt have been solved using mostly homonuclear two-dimensional (2D) NOESY methods based on nucleotide-specific and fragmentation-based segmental 2H-labeling approaches (18,20,25). Thus, current uniform labeling approaches while valuable are quite limiting (26,27).
Also of great interest are the large couplings of adjacent 13C nuclei within the ribose and base ring systems which cause several complications in RNA relaxation measurements. The foremost concern is that uniform labeling introduces strong couplings that can render 13C R1, hNOE and CPMG (Carl-Purcell-Meiboom-Gill) relaxation measurements inaccurate. These couplings also complicate and limit the range of applicability of CEST (Chemical Exchange Saturation Transfer) and rotating-frame relaxation rate (R1ρ) measurements and analyses while also decreasing the attainable resolution and sensitivity of NMR experiments (11,13,28–34).
Numerous robust spectroscopic solutions have been proposed in the past to circumvent these coupling problems (33–41). Unwanted splittings can be removed using constant time (CT) evolution (35–38), adiabatic band selective decoupling (39–41), or a series of selective pulses. Constant time evolution limits the acquisition time that can be used to obtain adequate resolution. To improve resolution requires long constant-time delays that lead to significant signal loss for large RNA molecules (41). Additionally, obtaining accurate relaxation parameters are problematic for 13C-CPMG based relaxation dispersion rates for quantifying millisecond (ms) time-scale processes, as well as R1 and proton–carbon hNOE (28,42) important for quantifying ns-ps time-scale motions in RNA (43,44). Several precautions are needed to obtain accurate R1 and R1ρ measurements (28,45–46): provided R1 is derived from the initial slope of the relaxation decay curve, fairly accurate rates can be extracted for small RNAs; for R1ρ experiments, distortions can arise from transfer between adjacent 13C atoms with similar chemical shifts via a Hartmann–Hahn mechanism and these need to be minimized (28,45–46); to suppress the echo-modulation caused by the large scalar couplings during the 13C relaxation delay, R1ρ can be measured instead of CPMG (28,47–49). Still for nucleic acids high power spinlocks (>1 kHz) are needed to study isolated spin pairs such as C2 found in adenine and C8 in both adenine and guanine. For low spin lock power levels (<1 kHz), oscillations can be observed in the monoexponential decay of peak intensity, arising from residual scalar coupling interactions within neighbouring nuclei (48,50–51).
In addition, application of selective cross-polarization (52–55) using weak radio-frequency fields can effectively decouple homonuclear J-couplings. This elegant spectroscopic solution has been exploited to measure nitrogen R1ρ in proteins and both carbon and nitrogen R1ρ in uniformly labeled nucleic acids (56–59). While this scheme obviates the need for selective 13C isotopic enrichment, in uniformly labeled samples, the presence of large homonuclear scalar couplings again limits the range of applicability of these methods (56,51). Finally building upon schemes for protein 15N and 13C CEST measurements by Kay and co-workers, Zhang et al. developed a set of nucleic-acid-optimized 1D/2D 13C CEST experiments that use various shaped pulses to refocus carbon–carbon scalar coupling and showed that accurate exchange parameters can be obtained for all CEST profiles in uniformly labeled RNA samples for purines and ribose carbons(34,60–62). Nonetheless, they and others acknowledged the following limitations for both CEST and R1ρ in uniformly labeled RNA and protein samples (33–34,60–63). First, the lowest spinlock or saturating B1 field that can be used is limited (∼3× the scalar coupling) to ∼45 Hz for Ade C2, ∼45 Hz for purine C8, ∼150 Hz for C1’. For pyrimidine ring carbons with large carbon-carbon couplings of ∼66 Hz, it would require ∼200 Hz spinlock fields for C5 and C6, clearly intractable with uniformly labeled samples. Second, even though 13C–13C couplings do not introduce errors in extracted chemical shifts for purines, these homonuclear couplings decrease the resolution. Ultimately, the coupling effects need to be considered in the CEST data analyses for couplings greater than 15 Hz. Otherwise, exchange parameters (kex) are overestimated and population ratios are underestimated (33). Thus, uniformly labeled samples do limit the range of wide applicability of both CEST and R1ρ to biological problems.
These spectroscopic tools notwithstanding, an alternative, straightforward and effective solution for overcoming the problem of spectral crowding and J-coupling would complement existing methodologies. A promising method is to synthesize site-specific isotopically labeled nucleotides (11,64–65) as we recently demonstrated with our chemo-enzymatic production of pyrimidine nucleotides (30,31). Here, we extend that approach to improving the synthesis of purine nucleotides. Our synthesis offers improvements in speed, streamlined reaction conditions, and higher yields. By combining the newly developed purine nucleotides with our previous pyrimidine nucleotides we present an improvement to the traditional NOESY structural assignment protocol. Additionally, we show that the measurements of relaxation parameters using CPMG, R1ρ, and CEST are possible for both small and large RNAs. Furthermore, we demonstrate substantial improvements in signal-to-noise and line width for relaxation optimized spectroscopy (TROSY) experiments compared to the traditional heteronuclear single quantum coherence (HSQC) experiments for isolated two-spin systems approximated by our purine and pyrimidine labeling schemes (30–31,66–67).
Reagents and solvents were purchased from Sigma-Aldrich. 8-13C adenine and 8-13C guanine were either purchased from Cambridge Isotope Laboratories or synthesized as described in the supplementary materials. Similarly, preparative chemical synthesis of labeled adenine and guanine, chemo-enzymatic nucleotide synthesis, RNA preparation, and NMR experiments are detailed in the supplementary materials.
NMR spectral processing was done in Topspin (Bruker Biospin) and NVFx (One Moon Scientific). Peak intensities were selected using in-house software by David Fushman. CPMG data were fit to a two-state exchange model using the full Bloch-McConnell matrix. The time-dependent evolution of magnetization during the CPMG period was solved numerically by non-linear least squares fitting using in-house Matlab software. Errors in fits were calculated using Jacobian or 200 Monte Carlo simulations (33), and the larger of the two errors was reported for CPMG and CEST relaxation dispersion analysis.
NMRViewJ was used for peak assignments. Hydrogen and carbon chemical shifts are predicted based on the secondary structure of the input RNA molecule. Expected cross peaks for different experiment types and labeling patterns were then generated using the RNA Peak Generator tool. Expected cross-peaks were generated for HSQC spectra based on the covalent structure and for NOESY spectra using inter-atomic distances typically observed in RNA helices. For bacterial A-site RNA, of which there are no deposited chemical shifts in the BMRB database, the RNA Peak Generator accurately predicted 15 of the 18 expected C1′-H1′ resonances, 7 of the 9 C2′-H2′, and 20 of the 27 C6/8-H6/8 resonance within 0.1 ppm of their actual values in the HSQC spectra. Since the NOESY peak generator was used in a mode where it only predicts peaks in helical regions, peaks in bulge and tetraloop regions of the A-site RNA were not predicted. Further assignment of the NOESY spectra utilized the RNA peak slider tool within NMRViewJ. This links the predicted peaks into a network connected by atoms shared between the different peaks. Peaks are then interactively positioned in a way that utilizes the network of peaks typically connected within the NOESY ‘walk’. Overall, the combined tools of NMRViewJ allowed for relatively rapid assignment of the resonance in the specifically labeled A-site RNA model system and provides a powerful tool that, when combined with selective labeling, can streamline resonance assignment for RNA than previously reported (41).
Chemo-enzymatic synthesis of GTP and ATP
We have created an improved method for the synthesis of site-selective isotopically labeled ATP and GTP with increased yields and speed of synthesis. Both purine reactions proceed to completion without the need to purify intermediate species. Final yields of >90% and >75% respectively for ATP and GTP were achieved relative to starting input adenine or guanine. Both yields are better than previously reported (72–77). In addition to improved yields, ATP synthesis is complete in 4–5 h while GTP synthesis is complete in 7–8 h. Previously, ATP synthesis was reported to take 29–48 h and GTP synthesis 48–70 h (72–77). These improvements allow reactions to be complete in a single day. Additionally we have taken advantage of the ability of creatine kinase to act on a variety of substrates to convert NDPs to NTPs and adapted the use of dATP as the energy source in the energy regeneration system (77–79). The use of dATP is ideal since the lack of a 2′-OH of the ribose in dATP prevents its interaction with the boronate column used to purify ATP and GTP. This offers a more robust synthesis, free of contaminants, and does not dilute the synthesized labels with unlabeled ATP. The effectiveness of these nucleotides is demonstrated by their incorporation into a number of interesting RNAs.
The production of GTP was achieved in a two-step, one pot reaction. Specifically-labeled ribose and guanine were combined in the presence of phosphoribosyl pyrophosphate synthetase (PRPPS), ribokinase (RK), and xanthine-guanine phosphoribosyl transferase (XGPRT), with a dATP regeneration system. The dATP regeneration system was composed of myokinase and creatine kinase, with creatine phosphate acting as the high energy phosphate donor. The formation of GMP was monitored by FPLC and NMR (Supplementary Figure S2). However, due to the low solubility of guanine (0.01 mM) FPLC was unsuitable to track its disappearance, thus making it difficult to monitor the progression of the reaction. However, by NMR spectroscopy, the resonance chemical shift between the labeled 13C-1′ position of unreacted ribose and newly formed GMP was used to determine the completion of the first step of the reaction (Supplementary Figure S2A). When the majority of guanine was converted to GMP, in approximately 4–5 h, guanylate kinase was added to the reaction. Guanylate kinase phosphorylates GMP to GDP. GDP is phosphorylated to GTP by creatine kinase which acts promiscuously to convert NDPs to NTPs. This was unexpected, as CK is said to be highly specific (78,79). GMP is completely converted to GTP in an additional 3 h. We confirmed by FPLC that conversion is complete and further validate this observation by 31P NMR (Supplementary Figure S2B).
The production of ATP and the progression of the reaction is monitored as reported for GTP. A notable difference is that adenine's greater solubility (8 mM) allowed the use of FPLC to monitor the disappearance of uncoupled base and the formation of product for all steps of the reaction. Labeled adenine and ribose were combined in the presence of PRPPS, RK, adenine phosphoribosyl transferase (APRT), and the dATP regeneration system. The dATP regeneration system acts on both AMP and ADP and takes the reaction to completion in ∼ 4 h. The reaction is similarly monitored by FPLC and NMR (Supplementary Figure S2C&D).
Synthesis of nucleotides with isolated base C6 or C8 offers large sensitivity improvements
When studying large RNAs (>40 nts) by NMR, slow molecular tumbling leads to broadened linewidths and losses in signal intensity. Careful selection of appropriate NMR experiments to address these losses are necessary for successful measurement of many NMR parameters. TROSY experiments take advantage of the interference between the dipolar coupling and chemical shift anisotropy (CSA) components of T2 relaxation (66). For the base C8 position of adenine and guanine, these contributions effectively cancel at ∼800 MHz field strength leading to reduction in the R2 relaxation rate (80,81). Thus, RNAs synthesized with our selective site-specifically labeled NTPs should benefit from TROSY based NMR experiments that reduce the problems of crowding, fast signal decay, low resolution, and decreased S/N ratios (12,34,31,66–67,80–81).
The benefits of TROSY increases with the size of the RNA. For small RNAs such as IRE (29 nt) we saw substantial, yet modest, improvements for the base region. These improvements in signal intensities ranged from 2.2- to 3.4-fold (average: 2.9 ± 0.5) when comparing TROSY with conventional HSQC sequences (Figures 1A and 2A). For the larger HCV SARS RNA (59 nt), the signal improvements are larger and ranged from 2.0- to 5.4-fold (average 3.3 ± 1.0) (Figures 1B and 2B). For the C1′ and C5′ peaks the improvements were more modest since these positions have lower CSA values (Supplementary Figure S3). Our labeled C8 approximates an isolated two spin system necessary for these gains in signal. Thus, the large improvements seen for these positions when using our site-specifically labeled nucleotides can be harnessed for assignment, structural, and dynamics measurements (82).
CEST measurements on RNAs >50 nucleotides
The above observations led us to run 13C-TROSY version of the 15N-TROSY experiment of Kay et al. (68). We decided that to validate this TROSY pulse sequence it would be appropriate to mirror measurements made previously on a fluoride riboswitch construct (34). Interestingly, our construct isolated from Bacillus anthracis showed similar behavior to the construct from Bacillus cereus studied by Zhang et al., albeit with slightly shifted values. While for B. cereus kex = 112 ± 4 s−1 and pb = 10.1 ± 0.1% at 30°C, the B. anthracis construct had exchange parameters of kex = 617 ± 54 s−1 and pb = 3.0 ± 0.1% at 35°C for a global fit to both C1′ and C6 (Figure 3). Further, when comparing TROSY CEST (68) to the traditional HSQC CEST (34), S/N improvements of ≈ 2:1 were obtained (Supplementary Figure S4). While the fits of both the HQSC and CEST data sets gave similar exchange parameters, comparing the χ2 of the fits showed significant improvements for the TROSY CEST experiment (0.6–42.5) when both experiments were run using the same parameters and experiment time. These measurements were made on the C1′ and C6 positions of a 1′,6-13C-1,3-15N-5-2H UTP labeled sample.
What then are some of the benefits of a selectively labeled sample when uniformly labeled samples have been shown to be adequate? Strong coupling eliminated between ribose carbons allowed a straightforward analysis of the CEST data without the need to account for and correct J-coupling (33,34). In particular obtaining CEST data for C6 pyrimidine is particularly problematic because of complications mentioned above in the introduction using uniformly labeled samples and that field strengths of >180 Hz needed preclude their use in uniformly labeled samples. With our selective labeled samples, we were able to obtain excellent CEST profiles readily for both purine and pyrimidines.
CPMG measurements on purine nucleotides and RNAs >50 nucleotides
CPMG relaxation dispersion measurements facilitate the extraction of information about exchange phenomenon occurring on the μs–ms timescale (83–89). Previously, others have used similar approaches to measure CPMG experiments for RNAs smaller than 50 nucleotides with specifically labeled pyrimidine bases (35,36). Here, we present data that illustrates the effect of creating an isolated, labeled C8 and C2′ positions in our nucleotides, and show that measurements of CPMG parameters are readily accessible without the problem of J-coupled induced oscillations (28,90).
We have transcribed a 59 nt viral RNA with 1′,8-13C labeling pattern as a proof of concept. The data indicate that while a majority of the nucleotides within the RNA do not experience exchange on the ms time-scale, a few residues sample a lowly populated state. Without data being fit at multiple static magnetic field strengths, the only meaningful parameter that can be extracted is a kex value (Figure 4A). The exchange rates extracted from the CPMG experiments on the viral RNA match well with those from CEST experiments (unpublished). Even though similar information, and perhaps more, can be derived from R1ρ data, we find that CPMG is straightforward to setup and analyse compared to R1ρ experiments. Thus having labeled RNA that facilitates CPMG measurements is important for the field.
Using in-house Matlab scripts, CPMG data were fit to a two-state exchange model using the Bloch–McConnell matrix as previously described by Kay et al. (90). Site-selective labels allow us to prepare isolated two spin systems without the carbon-carbon or carbon-nitrogen scalar couplings. In the past, such scalar couplings have hindered the interpretation of relaxation dispersion data (89,90). Using the bacterial A-site RNA as a model system, we were able to capture motions on the microsecond timescale using CPMG experiments to monitor exchange of the ribose C2′ residues. It is widely accepted that motions in residues A1492 and A1493 are involved in the discrimination between cognate and near-cognate tRNAs (7–8,91–93). Most notably, A1493, a residue that flips in and out of the bulge region of A-site showed characteristic dispersion profiles (Figure 4B). The extracted kex and pb values of 3800 ± 200 s−1 and 1.8 ± 0.1% match well the previously reported values of 4000 s−1 and 2.5% determined by relaxation dispersion measurements on the C1′ positions of the ribose moieties (8). Thus, our labels can be used to readily and straightforwardly capture lowly populated states in RNA.
Reduction of spectral crowding for large RNAs
The relatively narrow spectral width over which base and sugar carbons and protons resonate is a major limitation of RNA NMR that must be overcome (82). Overlap of signals is only partially alleviated by 2D and 3D NMR experiments in samples in which all 4 nucleotides are uniformly 13C- and 15N-labeled. We reasoned that what would be critical for de-cluttering spectra to manageable levels for large RNAs is not only the ability to choose which of the four nucleotides to label, but also which of the atomic sites to isotopically enrich. To demonstrate the power of this approach, we have examined RNAs ranging in size from 27 to 59 nucleotides in length.
For a large RNAs transcribed with only 1′,8-13C2 ATP, the resonances that belong to the adenine C8 can be identified rapidly when compared to a sample that has all four nucleotides fully-labeled (not shown). While it is possible to achieve a similar result using a fully labeled ATP only sample, one bond 13C–13C and 13C–15N couplings quickly degrade the quality of the spectrum. With a view to design a new NOESY assignment protocol, we synthesized RNA samples that maximize the information content of their spectra while simultaneously alleviating spectral overlap.
NOESY resonance assignments: an alternating 13C-1′ and 13C-2′ labeling scheme
The classic approach to assign resonances in a helical stretch of an RNA employs a NOESY walk methodology (41,94). Protons close in space (<5 Å) can produce cross peaks in a NOESY spectrum indicative of a through-space transfer of longitudinal magnetization between the adjacent nuclei. For nucleotides in a helix, the protons attached to the C8/C6 of the base and the C1′/C2′ of the sugar fulfill this distance requirement. By labeling all nucleotides at the C1′ and C8/C6, the base and ribose of adjacent nucleotides can be connected. However, as the size of the RNAs increases, spectral crowding becomes especially pronounced in the sugar resonances and may lead to incorrect peak assignments. In the past the solution to this problem might have been to remove these resonances by transcribing the RNA with unlabeled cytosine. While the spectra would then be simplified, the NOESY walk is broken in any helical stretches that contain cytosine. Here we propose an alternative approach. Instead of transcribing the RNA with unlabeled cytosine, a different labeling pattern such as 2′,8-13C could be used. In this way, the NOESY walk is preserved while removing the overlapping C1′ resonances.
Thus, by combining our previous work on pyrimidine synthesis with our current purine synthesis, we can make RNAs that provide labeling patterns that enable an important advance in NOESY assignment strategies (41,94). For the conventional uniformly labeled samples, the C2′ and C1′ resonances are both extremely crowded as discussed above. In a traditional NOESY walk all nucleotides or various permutations are fully-labeled. NOE crosspeaks between protons attached to the C1′ and C2′ and the C8/C6 of the same and previous nucleotides are observed for helical regions. As we have illustrated, spectral crowding can severely hinder this assignment process. However, by labeling the base of C6/C8 of each nucleotide and alternating the label on the ribose between C1′ and C2′ a sample is created that not only distinguishes the purines from the pyrimidines but also the A–U and the G–C pairs. We first made nucleotide specific labeled samples, and from the overlaid spectra, we could immediately tell that C/U and G/A showed more spectral overlap in their sugar resonances. Thus it was necessary to label C/G on their C1′ carbons and U/A on their C2′ carbons. As a proof of concept we have labeled the bacterial A-site RNA with 1′,6-13C-1,3-15N CTP, 2′,6-13C-1,3-15N UTP, 1′,8-13C GTP, and 2′,8-13C ATP (Figure 5). By combining this alternative labeling strategy with NOESY experiments that allow for filtering/editing of 1H cross-peaks based on the attached carbons (12C versus 13C), we can create a unique and powerful system to assign resonances without ambiguity (94–96). For ambiguous or overlapped cross-peaks, we utilized 3D 13C-NOESY-HSQC experiments. This alternating ribose pattern allowed us to unambiguously assign helical regions of RNA. In future work, we will streamline this methodology for use in larger RNAs.
The resulting assignment matched those previously determined (8). In situations where there is significant overlap in the base region, samples in which certain bases are unlabeled or even deuterated can be made allowing for the assignment bottleneck to be quickly circumvented.
This work extends our previous synthesis of pyrimidine (30,31) to purine nucleotides. We have shown that the ability to easily synthesize a variety of purine and pyrimidine nucleotides facilitates the study of large RNAs. These nucleotides are suitable for use in three key aspects of RNA NMR structural biology: assignment, structural and dynamics measurements.
The first advantage of these new site-specific labels is the potential for new assignment schemes. We have coupled alternate labeling of either C1′ or C2′ labeled ribose to C8 labeled purine bases. These combinations have allowed us to develop a new NOESY assignment strategy that benefits from reduced spectral crowding. This new strategy takes advantage of the large proton chemical shift differences between the C1′ and C2′ ribose carbons. By using an alternating C1′ and C2′ pattern with labeled bases, the NOESY spectrum is greatly simplified without compromising the information content present. Since all nucleotides are labeled, a complete NOESY walk is possible in helical regions. Additionally if the purines and pyrimidines labeled with C1′ and C2′ enrichment are reversed, orthogonal data is generated that can confirm the previous assignment.
The second advantage of these new labels is that the removal of the strong 13C J-coupling leads to substantial improvements in signal intensity in the protonated base C6 and C8 positions. Additionally, these isolated spin pairs have facilitated the measurement of μs-ms dynamics using CPMG and CEST pulse sequences without the complications of large carbon-carbon couplings. Finally with these isolated ‘two-spin’ labels, these couplings need not be explicitly taken into account in the data analysis of CEST profiles as required in previous studies using uniformly labeled RNA or protein (33,34), but also be able to probe more useful sites such as pyrimidine C5 and C6 sites. It is important to note that other dispersion experiments such as R1ρ will also benefit from using RNAs transcribed with isolated spin systems. The improvements we see from TROSY based pulse sequences scales with the size of the RNA.
A price to pay for not using spectroscopic tools to minimize the 13C-13C coupling problem is that the number of probe sites is now limited to the labeled sites. Nonetheless, it still remains useful because our method allows for very rapid accumulation of chemical shifts, a set of parameters that are easily and accurately measured and available at very early stages of NMR data analyses. Thus by measuring various chemical shifts (H1′/C1′, H2′/C2′, H5′,H5″/C5′, H2/C2, C4, H5/C5, H6/C6, H8/C8, N1, N3, N7, N9), we think the availability of such parameters will facilitate chemical shift based structure calculations of RNA, especially for constructing structural models for transiently and sparsely populated RNA states as has been done, so far, only for proteins (97).
We, therefore, anticipate that as the size of the RNAs under investigation becomes greater than 100 nucleotides the combined use of these selective labels with TROSY- and HMQC-based pulse elements will be critical for advancing NMR for the study of the structure and dynamics of a large number of new and interesting RNAs.