None
Rebecca F. Alford (ralford3@jhu.edu)
PI: Jeffrey J. Gray (Johns Hopkins ChemBE)
Test created 6/6/19
The purpose of this test is to evaluate the scientific performance of franklin2019, the default energy function for membrane protein structure prediction and design. Specifically, this probes the ability of franklin2019 to discriminate near-native from non-native decoys.
The benchmark dataset includes four targets: V-ATPase (VATP; 2bl2), Bacteriorhodopsin (BRD7; 1py6), Fumarate Reductase (FMR5; 1qla), and Rhodopsin (RHOD; 1u19). For each target, we are using two sets of decoys. The first set of decoys were generated by Yarov-Yaravoy et al. 2006 [1] through ab initio folding and include 5,000 models per target. These decoys are between 5-40Å RMSD from the native crystal structure. The second set of decoys were generated by Dutagaci et al. 2017 [2] through molecular dynamics simulations and include between 75-110 decoys per target. These decoys are between 1-11 RMSD from the native crystal structure.
To balance the data, we randomly selected a subset of 100 decoys from the low-resolution set. The same set of random 100 models is used for all testing.
References for datasets:
1. Yarov-Yaravoy V, Schonbrun J, Baker D (2006) "Multipass membrane protein structure prediction using Rosetta" Proteins 62(4):1010-25
2. Dutagaci B, Wittayanaraku K, Mori t, Feig M (2017) "Discrimination of native-like states of membrane proteins with implicit membrane-based scoring functions" 13(6):3049-3059.
The input files are the PDB coordinate files for each decoy and a span file generated using the mp_span_from_pdb application. The PBD coordinates were downloaded from the Orientations of Proteins in Membranes database.
Each decoy is refined using the RosettaMPRelax protocol with the franklin2019 energy function. The franklin2019 energy function is described in ( Alford, R. F., Fleming, P. J., Fleming, K. G. & Gray, J. J. Protein Structure Prediction and Design in a Biologically Realistic Implicit Membrane. Biophys. J. 118, 2042–2055 (2020)) and the RosettaMPRelax protocol is described in (Alford RF, Koehler Leman J, Weitzner BD, Duran Am, Tilley DC, Elazar A, Gray JJ (2015) "An integrated framework advancing membrane protein modeling and design" PLoS Comput Biol 11(9): e1004398.)
This benchmark test takes approximately 2,000 CPU hours.
We use the Boltzmann-weighted RMS (Wrms) metric to evaluate decoy discrimination. We chose this metric because it identifies the average RMS, accounting for the likelihood of those structures occurring in nature according to their energies. This metric is further described in:
(Bhardwaj G, Mulligan VK, & Bahl CD et al. (2016) "Accurate de novo design of hyperstable constrained peptides" Nature 538(7625):329-335)
Pass/fail is defined by comparison of calculated Wrms values for this benchmark with previously established values. The test passes if Wrms is within 0.5Å of the established value.
In the plot generated by this test, the sampled RMS is shown as a blue line and the Wrms is shown as a red line.
The calculated Wrms values are compared with baseline Wrms values from the soluble ref2015 energy function and previous versions of the membrane energy function.
The membrane normal and center, in addition to the coordinate frame setup are described in:
(Alford RF, Koehler Leman J, Weitzner BD, Duran Am, Tiley DC, Elazar A, Gray JJ (2015) "An integrated framework advancing membrane protein modeling and design" PLoS Comput. Biol. 11(9):e1004398.)
The PNear and Wrms metrics evaluate discrimination based on the root-mean-squared-deviation from the coordinates of the native crystal structure. While this is a good general metric for proteins, it looses information about the membrane. I am also not in favor or rerunning Rosetta scoring for the analysis step, this should either be done as a second step during the submission or directly in the XML script. One inconsistency here is that the fa_water_to_bilayer score term has a sizeable difference between the scoring and re-scoring step, which needs to get looked into.
Further Lactose Permease (LTPA; 1pv6) was taken out, not sure why.