Bioinformatics, 2017, 1–6 doi: 10.1093/bioinformatics/btx615 Advance Access Publication Date: 25 September 2017 Original Paper Structural bioinformatics RRDB: a comprehensive and non-redundant benchmark for RNA–RNA docking and scoring Yumeng Yan and Sheng-You Huang* School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, People’s Republic of China *To whom correspondence should be addressed. Associate Editor: Alfonso Valencia Received on May 8, 2017; revised on August 22, 2017; editorial decision on September 22, 2017; accepted on September 23, 2017 Abstract Motivation: With the discovery of more and more noncoding RNAs and their versatile functions, RNA–RNA interactions have received increased attention. Therefore, determination of their complex structures is valuable to understand the molecular mechanism of the interactions. Given the high cost of experimental methods, computational approaches like molecular docking have played an important role in the determination of complex structures, in which a benchmark is critical for the development of docking algorithms. Results: Meeting the need, we have developed the first comprehensive and nonredundant RNA– RNA docking benchmark (RRDB). The diverse dataset of 123 targets consists of 78 unboundunbound and 45 bound-unbound (or unbound-bound) test cases. The dataset was classified into three groups according to the interface conformational changes between bound and unbound structures: 47 ‘easy’, 38 ‘medium’ and 38 ‘difficult’ targets. A docking test with the benchmark using ZDOCK 2.1 demonstrated the challenging nature of the RNA–RNA docking problem and the important value of the present benchmark. The bound and unbound cases of the benchmark will be beneficial for the development and optimization of docking and scoring algorithms for RNA–RNA interactions. Availability and implementation: The benchmark is available at http://huanglab.phys.hust.edu.cn/ RRDbenchmark/. Contact: email@example.com Supplementary information: Supplementary data are available at Bioinformatics online. 1 Introduction RNA–RNA interactions play important roles in the regulation of gene expression and cell development (Engreitz et al., 2014; Guil and Esteller, 2015; Morris and Mattick, 2014). Determination of their complex structures is valuable to understand the molecular mechanism of related biological processes and thus develop therapeutic interventions or drugs targeting RNA–RNA interactions (Capriotti and Marti-Renom, 2008). Although basepairing is believed to be a major contributor to the stability of intermolecular RNA–RNA interactions, much like intramolecular basepairs in RNA secondary structure, it has been found that unlike for RNA secondary structure prediction, the overall Gibbs free energy of stacking basepairs is also critical for the success of RNA–RNA interaction prediction (Lai and Meyer, 2016). Therefore, molecular docking, which computationally predicts the interactions between molecules by ranking sampled binding modes through their binding free energies, is expected to play an important role for predicting RNA–RNA interactions, given the high cost and technical difficulties in experimental methods (Huang, 2014; Janin et al., 2003; Wodak and Janin, 1978). An important aspect in molecular docking is the selection of appropriate structures for benchmarking, which is critical for the development of docking algorithms and scoring functions. First, benchmark datasets are needed for validation of docking algorithms C The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: firstname.lastname@example.org V 1 2 and scoring functions. Second, comparative assessments of different docking and scoring algorithms on the same benchmark datasets can provide valuable insights into how to improve the existing algorithms and how to develop new methods (Huang, 2014). However, although considerable progresses have been achieved in protein-protein and protein-DNA/RNA docking benchmarks as well as RNA structural databases (Barik et al., 2012; Berman et al., 1992; Chen et al., 2003; Coimbatore Narayanan et al., 2014; Huang and Zou, 2013; Hwang et al., 2008, 2010; Kastritis et al., 2011; Mintseris et al., 2005; Nithin et al., 2017; Perez-Cano et al., 2012; Petrov et al., 2013; Rahrig et al., 2010; van Dijk and Bonvin, 2008; Vreven et al., 2015), little effort has been made for benchmarking RNA–RNA docking because of the limited number of experimental RNA individual and RNA–RNA complex structures in the protein data bank (PDB) (Berman et al., 2000). With the increasing number of experimentally determined RNA structures deposited in the PDB, development of a benchmark dataset for RNA–RNA docking has become feasible. In addition, with the discovery of more and more noncoding RNAs and their interactions, molecular docking is also expected to play an increasing role. Therefore, a benchmark is pressingly needed for the development of RNA–RNA docking and scoring algorithms. Meeting the need, we have developed a comprehensive and nonredundant benchmark of 123 diverse targets for RNA–RNA docking that include all types of RNA–RNA complexes with at least one unbound structure from the PDB. Each target in the benchmark dataset includes both the native bound partners and their corresponding unbound structures so as to reflect conformational changes of RNAs on binding. The benchmark dataset will be beneficial for the development and improvement of docking algorithms and scoring functions for RNA–RNA interactions. 2 Materials and methods An appropriate set of structures for benchmarking molecular docking should possess three features. First, a benchmark dataset should consist of diverse targets to test the robustness of docking/scoring algorithms. Second, experimentally determined structures instead of models should be preferred for benchmarking so as to avoid introduction of computational errors from modeling. Finally, the benchmark structures should include both the bound and unbound structures of interacting partners so as to reflect realistic conformational changes upon binding. To construct a good benchmark meeting the criteria mentioned, we have queried all the X-ray crystal structures with a resolution better than 4.0 Å and NMR structures to identify those PDB entries that contain at least two RNA chains but without protein and DNA chains. As of March 12, 2017, the search yielded a total of 554 entries. It can be seen from Figure 1A that most complexes (i.e. 537 entries) have a resolution better than 3.25 Å, showing the good quality of the queried structures. These PDB entries were manually examined and only reasonable RNA–RNA complexes were kept. Here, a reasonable RNA–RNA complex was defined as a structure that meets all following criteria. First, the interacting RNA chains should belong to the same biological unit. Second, the complexes that contain only backbone atoms in the RNAs should be excluded. Third, there exist direct interactions with at least one basepair between the interacting RNA chains. A total of 440 structures of such RNA–RNA complexes met the three criteria. Then, the structures of these 440 complexes were downloaded from the PDB. The receptor and ligand RNAs were extracted for each of the complexes, respectively, and subjected to manual Y.Yan and S.-Y.Huang inspection. To obtain a nonredundant dataset, all the receptor and ligand structures were converted into sequences, and clustered according to their sequence similarities to remove the redundancy. Here, the program Align of the FASTA package was used to perform all-against-all pairwise sequence alignments for the RNA structures of 440 complexes (Myers and Miller, 1988). The sequence identity cutoff was set to be 60% during clustering of the RNA sequences, as it was shown that the twilight zone of RNA alignment begins at 60% pairwise sequence identity (Gardner, 2005), in contrast to 20% for proteins. Namely, if the receptor and ligand of a complex all had at least 60% sequence identity with the receptor and ligand of another complex, the two RNA complexes were grouped into the same cluster. According to the criterion, the 440 complexes were then grouped into 160 clusters. For each cluster, one RNA–RNA complex with the best solution or the NMR structure was selected as the representative. Preference was also given to the cases with the least amount of ligand mediation, as we focused on the RNA–RNA interactions. This yielded a nonredundant set of 160 RNA–RNA complex structures. Next, we tried to identify the corresponding unbound structures for the 160 complexes. We searched the sequences of bound structures against all the RNA sequences in the PDB using the FASTA program (Pearson and Lipman, 1988). If an RNA structure in the PDB had more than 90% sequence identity to a bound structure and the alignment covered at least 90% of the sequence of the bound structure, the structure was considered as a candidate of the unbound structure. If there were multiple unbound structure candidates for a bound structure, the unbound structure was selected according to the following priorities: highest sequence identity, structure in free form, highest resolution crystal structure unless only NMR structures were available. If the selected NMR structure consists of an ensemble of models, the first model was selected as a representative of the unbound structure. To make the bound and unbound structures comparable, only those aligned residues in the unbound structures were kept according to the sequence alignment, so that the bound and unbound structures had the same number of residues. Only those targets with at least one unbound structure were kept, resulting in a final set of 123 RNA–RNA test cases, most of which (i.e. 113 cases) have a resolution better than 3.0 Å (Fig. 1B). These 123 targets formed our nonredundant benchmark dataset of bound and unbound structures for RNA–RNA docking and scoring. 3 Results and discussion 3.1 Benchmark dataset The nonredundant benchmark of 123 targets for RNA–RNA docking was listed in Supplementary Table S1, and can also be accessed from our website at http://huanglab.phys.hust.edu.cn/RRDbench mark/. For convenience, each target was named by the PDB entry of the complex for bound structures. It can be seen from the table that the benchmark covers a wide range of RNA molecules in terms of molecule types, sequence length, conformational changes and complex types. For examples, the sequence length ranges from six nt for the short RNAs composed of UGGGGU of target 4RKV to 2765 nt for 23S ribosomal RNA of target 3JQ4. The unbound structures showed a wide range of conformational changes with a maximum rmsd of 29.88 Å for the receptor of target 2D1A—the HIV-1 dimerization initiation site in the extended-duplex dimer, 25.85 Å for the ligand of target 1D4R—the fragment of human srp rna helix 6, and 25.48 Å for the binding interface of target 1D4R, respectively. To RNA–RNA docking benchmark A 3 B Fig. 1. The numbers of the complexes at different resolution thresholds from 2.0 to 4.0 Å in the initial queried 554 complexes (A) and the final set of 123 targets (B) make the benchmark dataset easy to use, the unbound structures of receptor and ligand RNAs were superimposed onto their respective bound structures using UCSF Chimera (Pettersen et al., 2004). For each target, following the sequence alignment, a residue number mapping between bound and unbound structures was obtained for the receptor and ligand RNAs, respectively. Thus, every target of the benchmark dataset consists of two pairs of bound and unbound structures from the PDB for the receptor and ligand, and two files on residue mappings for the receptor and ligand, respectively. All the binding interfaces of the bound and unbound structures were manually checked to make sure that no gaps or ligands would significantly affect the binding interface between two RNA partners. The benchmark also listed other important information about the receptor and ligand RNA structures in the table, which includes the presence of kink-turns (Klein et al., 2001), the strand type of the structures and the HETATM atoms like ligands and non-standard nucleic acid bases. Here, the kink-turns were detected using the program DSSR (Lu et al., 2015). To measure the size of binding interface for each target, we have calculated the change in solvent accessible surface areas (DSA) of the receptor and ligand RNAs upon binding, where the DSA was defined as that SA of the receptor plus SA of the ligand minus SA of the complex. Here, the SA was calculated by the program FreeSASA (Mitternacht, 2016), in which the probe radius was set to 1.4 Å. It can be seen from Supplementary Table S1 that the DSA also has a wide range from 580 Å2 for target 4E59—an RNA duplex containing CCG repeats, to 5382 Å2 for target 2YIE—the aptamer domain of the Fmn riboswitch. 3.2 Difficulty classification An important function for a docking benchmark is to test how well a docking algorithm can handle the conformational changes in realistic docking. Therefore, the 123 targets of the benchmark have been grouped into three categories, ‘easy’, ‘medium’ and ‘difficult’ cases, according to the root mean square deviation (RMSD) of the interface region between the bound and unbound structures of a target after optimal superimposition. The interface was defined as those residues of the bound structures having at least one atom that is within 10 Å from the other partner. The superimposition was based on one backbone atom for each nucleotide, that is C4’ atom for RNAs (Huang and Zou, 2013, 2014). It should be noted that RNA molecules can have different reduced representations for Table 1. Criteria to categorize targets by interface RMSD Category Criterion Easy Medium Difficult Irmsd 1:5 Å 1.5 Å < Irmsd 3:0 Å Irmsd > 3:0 Å # of cases 47 38 38 nucleotides, such as the use of P or C4’. An advantage of using C4’ over P for calculations is that RNA molecules normally contain C4’ atoms in each nucleotide but may miss P atoms in the terminal residues in some PDB files (Berman et al., 2000). Nevertheless, different reduced representations of RNAs should not result in significant differences in the measured RMSD values. According to the criteria, the benchmark dataset contains 47 ‘easy’ targets, 38 ‘medium’ targets and 38 ‘difficult’ targets (Table 1). It is also notable from Supplementary Table S1 that most of the targets have the unbound/unbound (U/U) structures for receptor/ ligand, but for some cases only one unbound structure was found from the PDB for one of the two binding partners. The other binding partner had no available unbound structure due to the limited number of experimental RNA structures, which we categorized as bound/unbound (B/U) or unbound/bound (U/B) cases. There are 78 U/U cases and 45 B/U (or U/B) targets in the present benchmark. Although there is only one unbound structure for the B/U or U/B targets, these cases have the same benchmarking value as those U/U cases, as docking depends on not only the receptor RNA but also the ligand RNA. As shown in Supplementary Table S1 that those B/ U (or U/B) cases cover all three difficulty categories with an interface RMSD ranging from 0.10 Å for the easy target 1Y0Q—an active group I ribozyme-product complex, to 21.23 Å for the difficult target 2D1A—the HIV-1 dimerization initiation site in the extendedduplex dimer. Figure 2 shows three representative examples corresponding to ‘easy’, ‘medium’ and ‘difficult’ cases, respectively. It can be seen from the figure that the conformational changes in both receptor and ligand are very small for easy target 2OUE. The backbones of its bound and unbound structures almost overlaps (Fig. 2A). Easy targets like target 2OUE are good for validating the performance of a semirigid docking algorithm in which flexibility can be considered implicitly. The easy targets can also be used to examine the efficiency of rigid-body sampling—a first step of docking algorithms. 4 Y.Yan and S.-Y.Huang Fig. 2. Comparison of the bound and unbound structures of three representative targets, in which the bound receptor/ligand structures are colored in red/cyan and the corresponding unbound structures are colored in blue/yellow. (A) ‘Easy’ target 2OUE—a junctionless all-RNA hairpin ribozyme (Irmsd ¼ 0:23 Å). (B) ‘Medium’ target 2ADT—a GAAA tetraloop-receptor complex (Irmsd ¼ 1:52 Å). (C) ‘Difficult’ target 2L1F—a conserved retroviral RNA packaging element (Irmsd ¼ 17:78 Å). The pictures were prepared using UCSF Chimera (Pettersen et al., 2004) (Color version of this figure is available at Bioinformatics online.) The ‘medium’ targets like 2ADT often involve significant conformational changes on the receptor and/or ligand RNAs (Fig. 2B). Therefore, docking the ‘medium’ targets may require explicit consideration of flexibility during sampling. Otherwise, the correct binding modes would not be ranked in the top predictions. For ‘difficult’ targets like 2L1F, there are often global conformational changes such as large bending or twisting between the unbound and bound structures (Fig. 2C). In some cases, the structure of RNA may change from an extended conformation to a collapsed conformation or the opposite from the bound state to the unbound state. Therefore, when docking with difficult targets, flexibility must be considered. The correct binding mode may be completely missed if the large conformational change is not explicitly considered during sampling. 3.3 Docking test One basic role for a docking benchmark is to objectively evaluate the performance of a docking algorithm. Namely, whether or not the docking algorithm can predict correct binding modes within top predictions. A good benchmark should be diverse in terms of molecular type and sequence, as addressed above. Another important aspect with a benchmark is that the benchmark should reflect realistic applications and thus be challenging at least for the basic docking algorithm like shape-based methods; otherwise, the benchmark may not be able to provide enough insights into existing docking algorithms and thus benefit the development of advanced docking algorithms. As such, we have conducted a preliminary docking test with both the bound and unbound structures of the benchmark by using ZDOCK 2.1—a widely used FFT-based docking algorithm that uses a pairwise shape-complementarity scoring function (Chen and Weng, 2003). During the preparation of input structures for docking, the lager RNA was set to be as receptor and the smaller RNA was set to be as ligand. If two RNAs have a comparable size, the receptor and ligand were assigned according to the order of their chain IDs. It should be noted that FFT-based docking algorithms like ZDOCK perform a systematically global docking of the ligand relative to the receptor. Therefore, the designation of receptor and ligand would not significantly change the docking results, although the designation could be arbitrary. Here, the naming of receptor and ligand is just for speed consideration because the computational time is proportional to the geometry size of the receptor plus the radius of the ligand in FFTbased docking where the receptor is fixed and the ligand is rotated by default. In addition, to minimize the effect of initial positions, the starting ligand structure was randomized before docking. The default docking parameters were used, which constructed 2000 binding modes for each docking run. With the prepared structures, we have successfully docked 122 of the 123 cases in the benchmark except target 3JQ4 (the complex of the large ribosomal subunit) whose number of atoms exceeds the limit of ZDOCK 2.1. With the docking results, we have calculated the success rates of ZDOCK 2.1 in binding mode predictions for docking with bound structures (bound docking) and unbound structures (unbound docking). Here, a successful prediction or a hit was defined if the interface RMSD between the predicted binding mode and the native complex structure is less than 5.0 Å. The success rate was defined as the number of cases with at least one correct prediction divided by the total number of cases in the benchmark when a specific number of top predictions were considered. In addition, we also calculated the average number of hits per complex. Figure 3 showed the success rates and average number of hits by ZDOCK 2.1 as a function of the number of top predictions for both bound and unbound docking. The bound docking serves as a primary test for the performance of scoring functions, as no conformational change is involved in the bound structures during docking. It can be seen from Figure 3A that the present RNA–RNA docking benchmark is challenging for ZDOCK 2.1 and only gave the success rates of 10.7 and 23.8% for bound docking when the top 1 and 10 predictions were considered. However, all the cases were correctly RNA–RNA docking benchmark A 5 B Fig. 3. The success rates (A) and average number of hits per complex (B) obtained by ZDOCK 2.1 as a function of the number of top considered predictions for the bound and unbound cases of the benchmark. For comparison, the merged results for bound and unbound docking are also shown in the figure. Here, a hit is a prediction with an interface RMSD of 5 Å from its native complex stracture, and the success rate was defined as the fraction of the test cases with at least one hit when a specific number of top predictions were considered A B Fig. 4. The ranking (A) and accuracy (B) of the first hit vs. the average number of involved nucleotides in the receptor and ligand RNA structures for the test cases in the benchmark predicted by ZDOCK 2.1, where a hit is a prediction with an interface RMSD of < 5 Å. The dashed lines in panels A and B indicate a docking failure predicted when the top 2000 prediction were considered, suggesting that sampling was not the reason for the low success rates. The benchmark was much more challenging for unbound docking, giving the success rates of 4.1 and 9.0% when the top 1 and 10 predictions were considered, indicating the significant impact of conformational changes on docking (Fig. 3B). Similar trends can be observed between bound and unbound docking in the number of hits per case. On average, ZDOCK 2.1 obtained 8.2 and 14.9 hits per complex for bound and unbound docking, respectively, when the top 1000 predictions were considered (Fig. 3B). The low success rates by ZDOCK 2.1 here indicated that the RNA–RNA docking problem is not trivial and requires the development of advanced docking algorithms and more accurate energy scoring functions for RNA–RNA interactions. The present benchmark just serves as this purpose for RNA–RNA docking and scoring. To further investigate the impact of RNA flexibility on docking, we have merged the results of bound and unbound docking and reranked the binding modes according to their docking scores. This kind of merged docking is similar to flexible docking through ensemble docking of multiple RNA structures (Huang and Zou, 2006). It can be seen from Figure 3 that although there are only two conformations for an RNA, the merged docking protocol significantly improved the success rate and obtained a comparable performance with bound docking. These results indicate the importance of considering conformational changes in docking. As the size of RNA is a critical factor in RNA structure predictions, we also investigated the impact of the size of RNA on the docking accuracies. Figure 4 showed the ranking and interface RMSD of the first hit vs. the average number of nucleotides in the receptor and ligand RNAs for bound and unbound docking. The lower ranking and RMSD correspond to the higher docking accuracy. It can be seen from the figure that overall the docking accuracies for bound docking are significantly better than those for unbound docking due to the conformational changes in the unbound structures, as expected. Two common features can also be observed for bound and unbound docking. First, docking tends to become 6 more challenging with the increase of the number of nucleotides in both ranking and RMSD (Fig. 4). Second, the rankings can be roughly divided into three regions according to the sizes of RNAs (Fig. 4A). The first region corresponds to the long RNAs with more than 100 nucleotides, where the docking is the most challenging and the ranking of the first hit is high or even failed. The second region is for the short RNAs with less than 10 nucleotides, where the docking did not perform well either and had a ranking around 100, with a few failed cases. The third region corresponds to the RNAs with 10–100 nucleotides. In this intermediate region, the docking results are diverged and may give the first hit with a ranking as good as #1 or even a failure. These findings may be understood as follows. On the one hand, docking tends to be more challenging for larger molecules because their larger surface areas will result in more possible binding modes during sampling. This is why the docking accuracy was poor for the large-size RNAs. On the other hand, docking is also challenging for short RNAs because small RNAs tend to be less structural, which makes shape-based complementarity less significant in scoring. Therefore, the docking did not perform well either for short RNAs. However, compared to very large and very short RNAs, the intermediate-size RNAs achieved a good balance between the sampling issue and the scoring challenge due to their intermediate surface areas and well-folded structures. These findings would be beneficial for the development of RNA docking algorithms and scoring functions for RNA–RNA interactions. 4 Conclusion We have constructed a comprehensive and nonredundant benchmark of 123 diverse targets for RNA–RNA docking and scoring. The benchmark consists of 78 unbound-unbound cases and 45 bound-unbound (or unbound-bound) cases from the experimental structures in the PDB. According to the conformational change at the interface, the 123 targets of the benchmark were grouped into 47 ‘easy’, 38 ‘medium’ and 38 ‘difficult’ cases based on their interface RMSDs between the bound and unbound structures. A preliminary docking test on the benchmark showed that the RNA–RNA docking problem remains challenging and requires the development of advanced docking algorithms and scoring functions. The present benchmark is of value for such purpose of RNA–RNA docking and scoring. The RNA–RNA docking benchmark is scheduled to be updated annually and available at http://huanglab.phys.hust.edu.cn/ RRDbenchmark/. Funding This work was supported by the National Natural Science Foundation of China (grant No. 31670724), the National Key Research and Development Program of China (grant Nos. 2016YFC1305800 and 2016YFC1305805) and the startup grant of Huazhong University of Science and Technology. Conflict of Interest: none declared. References Barik,A. et al. (2012) A protein-RNA docking benchmark (I): nonredundant cases. Proteins, 80, 1866–1871. Berman,H.M. et al. (1992) The nucleic acid database. A comprehensive relational database of three-dimensional structures of nucleic acids. Biophys. J., 63, 751–759. Berman,H.M. et al. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235–242. Y.Yan and S.-Y.Huang Capriotti,E. and Marti-Renom,M.A. (2008) Computational RNA structure prediction. Curr. Bioinform., 3, 32–45. Chen,R. et al. (2003) A protein-protein docking benchmark. Proteins, 52, 88–91. Chen,R. and Weng,Z.P. (2003) A novel shape complementarity scoring function for protein–protein docking. Proteins, 51, 397–408. Coimbatore Narayanan,B. et al. (2014) The Nucleic Acid Database: new features and capabilities. Nucleic Acids Res., 42, D114–D122. Engreitz,J.M. et al. (2014) RNA–RNA interactions enable specific targeting of noncoding RNAs to nascent Pre-mRNAs and chromatin sites. Cell, 159, 188–199. Gardner,P.P. (2005) A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Res., 33, 2433–2439. Guil,S. and Esteller,M. (2015) RNA–RNA interactions in gene regulation: the coding and noncoding players. Trends. Biochem. Sci., 40, 248–256. Huang,S.-Y. and Zou,X. (2006) Ensemble docking of multiple protein structures: considering protein structural variations in molecular docking. Proteins, 66, 399–421. Huang,S.-Y. and Zou,X. (2013) A nonredundant structure dataset for benchmarking protein-RNA computational docking. J. Comput. Chem., 34, 311–318. Huang,S.-Y. and Zou,X. (2014) A knowledge-based scoring function for protein-RNA interactions derived from a statistical mechanics-based iterative method. Nucleic Acids Res., 42, e55. Huang,S.-Y. (2014) Search strategies and evaluation in protein–protein docking: principles, advances and challenges. Drug Discov. Today, 19, 1081–1096. Hwang,H. et al. (2008) Protein–protein docking benchmark version 3.0. Proteins, 73, 705–709. Hwang,H. et al. (2010) Protein-Protein Docking Benchmark Version 4.0. Proteins, 78, 3111–3114. Janin,J. et al. (2003) CAPRI: a critical assessment of predicted interactions. Proteins, 52, 2–9. Kastritis,P. et al. (2011) A structure-based benchmark for protein-protein binding affinity. Protein Sci., 20, 482–491. Klein,D.J. et al. (2001) The kink-turn: a new RNA secondary structure motif. EMBO J., 20, 4214–4221. Lai,D. and Meyer,I.M. (2016) A comprehensive comparison of general RNA–RNA interaction prediction methods. Nucleic Acids Res., 44, e61. Lu,X.J. et al. (2015) DSSR: an integrated software tool for dissecting the spatial structure of RNA. Nucleic Acids Res., 43, e142. Mintseris,J. et al. (2005) Protein–protein docking benchmark 2.0: an update. Proteins, 60, 214–216. Mitternacht,S. (2016) FreeSASA: An open source C library for solvent accessible surface area calculations. F1000 Res., 5, 189. Morris,K.V. and Mattick,J.S. (2014) The rise of regulatory RNA. Nat. Rev. Genet., 15, 423–437. Myers,E.W. and Miller,W. (1988) Optimal alignments in linear space. Comput. Appl. Biosci., 4, 11–17. Nithin,C. et al. (2017) A non-redundant protein-RNA docking benchmark version 2.0. Proteins, 85, 256–267. Pearson,W.R. and Lipman,D.J. (1988) Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA, 85, 2444–2448. Perez-Cano,L. et al. (2012) A protein-RNA docking benchmark (II): extended set from experimental and homology modeling data. Proteins, 80, 1872–1882. Petrov,A.I. et al. (2013) Automated classification of RNA 3D motifs and the RNA 3D Motif Atlas. RNA, 19, 1327–1340. Pettersen,E.F. et al. (2004) UCSF Chimera–a visualization system for exploratory research and analysis. J. Comput. Chem., 25, 1605–1612. Rahrig,R.R. et al. (2010) R3D Align: global pairwise alignment of RNA 3D structures using local superpositions. Bioinformatics, 26, 2689–2697. van Dijk,M. and Bonvin,A.M. (2008) A protein-DNA docking benchmark. Nucleic Acids Res., 36, e88. Vreven,T. et al. (2015) Updates to the integrated protein–protein interaction benchmarks: docking benchmark version 5 and affinity benchmark version 2. J. Mol. Biol., 427, 3031–3041. Wodak,S.J. and Janin,J. (1978) Computer analysis of protein–protein interaction. J. Mol. Biol., 124, 323–342.