Scientific test: relax_fast

FAILURES

    None

RESULTS

alternative text

## AUTHOR AND DATE

Julia Koehler Leman, julia.koehler.leman@gmail.com, Richard Bonneau, Oct 2018

## PURPOSE OF THE TEST

This benchmark tests different versions of the relax protocol (fast_relax with 1 iteration (basic), 5 iterations (default) and cart_relax) to see what RMSD and score ranges are sampled with different protocols and how close the decoys stay to the native structure.

## BENCHMARK DATASET

There are 12 proteins in the benchmark set that are taken from (Conway, P., Tyka, M. D., DiMaio, F., Konerding, D. E. & Baker, D. Relaxation of backbone bond geometry improves protein energy landscape modeling. Protein Sci. 23, 47–55 (2014).). These are single-chain proteins ranging from 75 to 384 residues. The input files are cleaned PDBs.

## PROTOCOL

This is running RosettaScripts with the FastRelax mover, then superimposes the decoys to the native and computes RMSDs via SimpleMetrics. [It is running RosettaScripts because the fast_relax application wouldn't compute RMSDs, even with the native flag.] We use the ref2015 high-res scorefunction and superimpose on the backbone heavy atoms. 100 decoys are created for each protein in the set. Each relax protocol takes about 120 CPU hours for the entire set, which makes this 360 CPU hours for all three protocols (fast_relax with 1 iteration (basic), 5 iterations (default) and cart_relax) for the entire benchmark set.

## PERFORMANCE METRICS

We use the total Rosetta score and RMSD to the native structure as we are interested in the sampling range and the score ranges in sampling. A passed test constitutes 90% of the decoys generated have a smaller than the defined score and RMSD cutoffs. The cutoffs were defined by looking at these ranges in a single run without outlier decoys. Specifically, for RMSD we use the maximum RMSD + stdev + 1A. The +1A is to account for the very narrow funnels with small stdevs. For score, we use the maximum score + stdev.

## KEY RESULTS

Typically, we see deep funnels (vertical score-vs-RMSD plots) for most proteins, with sampling ranges around 1-2A RMSD and scores in the -200 to -1200 ranges. 5 iterations of fastrelax moves the decoys away from the native a little bit more than a single iteration and have slightly lower scores, both of which are expected. Cartrelax gets lower scores than even 5 iterations of fastrelax, sometimes significantly lower scores. The RMSD range for cartrelax is similar to fastrelax, except for the outliers, where cartrelax samples a more narrow range. Outliers are (1) 2DCF which has a chainbreak and a flexible N-terminus, sampling RMSDs up to 4A in fastrelax, up to 8A in fastrelax 5 iterations, but samples a normal narrow range in cartrelax <2A; (2) 2FKK has a chainbreak and a very long disordered region (~100 residues towards the C-terminus). Sampling ranges are up to 10A for fastrelax (1 or 5 iterations), but <4A in cartrelax.

## DEFINITIONS AND COMMENTS

## LIMITATIONS

The dataset is somewhat diverse and realistic in terms of protein size, alpha/beta content, and loop content. However, it is unclear how relax performs on multi-chain and symmetric proteins. Further, I am unclear whether this is the latest state-of-the-art Rosetta protocol to use.