core

PyRosettaCluster is a class for reproducible, high-throughput job distribution of user-defined PyRosetta protocols efficiently parallelized on the user’s local computer, high-performance computing (HPC) cluster, or elastic cloud computing infrastructure with available compute resources.

Args:
tasks: A list of dict objects, a callable or called function returning

a list of dict objects, or a callable or called generator yielding a list of dict objects. Each dictionary object element of the list is accessible via kwargs in the user-defined PyRosetta protocols. In order to initialize PyRosetta with user-defined PyRosetta command line options at the start of each user-defined PyRosetta protocol, either extra_options and/or options must be a key of each dictionary object, where the value is a str, tuple, list, set, or dict of PyRosetta command line options. Default: [{}]

input_packed_pose: Optional input PackedPose object that is accessible via

the first argument of the first user-defined PyRosetta protocol. Default: None

seeds: A list of int objects specifying the random number generator seeds

to use for each user-defined PyRosetta protocol. The number of seeds provided must be equal to the number of user-defined input PyRosetta protocols. Seeds are used in the same order that the user-defined PyRosetta protocols are executed. Default: None

decoy_ids: A list of int objects specifying the decoy numbers to keep after

executing user-defined PyRosetta protocols. User-provided PyRosetta protocols may return a list of Pose and/or PackedPose objects, or yield multiple Pose and/or PackedPose objects. To reproduce a particular decoy generated via the chain of user-provided PyRosetta protocols, the decoy number to keep for each protocol may be specified, where other decoys are discarded. Decoy numbers use zero-based indexing, so 0 is the first decoy generated from a particular PyRosetta protocol. The number of decoy_ids provided must be equal to the number of user-defined input PyRosetta protocols, so that one decoy is saved for each user-defined PyRosetta protocol. Decoy ids are applied in the same order that the user-defined PyRosetta protocols are executed. Default: None

client: An initialized dask distributed.client.Client object to be used as

the dask client interface to the local or remote compute cluster. If None, then PyRosettaCluster initializes its own dask client based on the PyRosettaCluster(scheduler=…) class attribute. Deprecated by the PyRosettaCluster(clients=…) class attribute, but supported for legacy purposes. Either or both of the client or clients attribute parameters must be None. Default: None

clients: A list or tuple object of initialized dask distributed.client.Client

objects to be used as the dask client interface(s) to the local or remote compute cluster(s). If None, then PyRosettaCluster initializes its own dask client based on the PyRosettaCluster(scheduler=…) class attribute. Optionally used in combination with the PyRosettaCluster().distribute(clients_indices=…) method. Either or both of the client or clients attribute parameters must be None. See the PyRosettaCluster().distribute() method docstring for usage examples. Default: None

scheduler: A str of either “sge” or “slurm”, or None. If “sge”, then

PyRosettaCluster schedules jobs using SGECluster with dask-jobqueue. If “slurm”, then PyRosettaCluster schedules jobs using SLURMCluster with dask-jobqueue. If None, then PyRosettaCluster schedules jobs using LocalCluster with dask.distributed. If PyRosettaCluster(client=…) or PyRosettaCluster(clients=…) is provided, then PyRosettaCluster(scheduler=…) is ignored. Default: None

cores: An int object specifying the total number of cores per job, which

is input to the dask_jobqueue.SLURMCluster(cores=…) argument or the dask_jobqueue.SGECluster(cores=…) argument. Default: 1

processes: An int object specifying the total number of processes per job,

which is input to the dask_jobqueue.SLURMCluster(processes=…) argument or the dask_jobqueue.SGECluster(processes=…) argument. This cuts the job up into this many processes. Default: 1

memory: A str object specifying the total amount of memory per job, which

is input to the dask_jobqueue.SLURMCluster(memory=…) argument or the dask_jobqueue.SGECluster(memory=…) argument. Default: “4g”

scratch_dir: A str object specifying the path to a scratch directory where

dask litter may go. Default: “/temp” if it exists, otherwise the current working directory

min_workers: An int object specifying the minimum number of workers to

which to adapt during parallelization of user-provided PyRosetta protocols. Default: 1

max_workers: An int object specifying the maximum number of workers to

which to adapt during parallelization of user-provided PyRosetta protocols. Default: 1000 if the initial number of tasks is <1000, else use the

the initial number of tasks

dashboard_address: A str object specifying the port over which the dask

dashboard is forwarded. Particularly useful for diagnosing PyRosettaCluster performance in real-time. Default=”:8787”

nstruct: An int object specifying the number of repeats of the first

user-provided PyRosetta protocol. The user can control the number of repeats of subsequent user-provided PyRosetta protocols via returning multiple clones of the output pose(s) from a user-provided PyRosetta protocol run earlier, or cloning the input pose(s) multiple times in a user-provided PyRosetta protocol run later. Default: 1

compressed: A bool object specifying whether or not to compress the output

“.pdb”, “.pkl_pose”, “.b64_pose”, and “.init” files with bzip2, resulting in appending “.bz2” to decoy output files and PyRosetta initialization files. Also see the ‘output_decoy_types’ and ‘output_init_file’ keyword arguments. Default: True

compression: A str object of ‘xz’, ‘zlib’ or ‘bz2’, or a bool or NoneType

object representing the internal compression library for pickled PackedPose objects and user-defined PyRosetta protocol kwargs objects. The default of True uses ‘xz’ for serialization if it’s installed, otherwise uses ‘zlib’ for serialization. Default: True

system_info: A dict or NoneType object specifying the system information

required to reproduce the simulation. If None is provided, then PyRosettaCluster automatically detects the platform and returns this attribute as a dictionary {‘sys.platform’: sys.platform} (for example, {‘sys.platform’: ‘linux’}). If a dict is provided, then validate that the ‘sys.platform’ key has a value equal to the current sys.platform, and log a warning message if not. Additional system information such as Amazon Machine Image (AMI) identifier and compute fleet instance type identifier may be stored in this dictionary, but is not validated. This information is stored in the simulation records for accounting. Default: None

pyrosetta_build: A str or NoneType object specifying the PyRosetta build as

output by pyrosetta._version_string(). If None is provided, then PyRosettaCluster automatically detects the PyRosetta build and sets this attribute as the str. If a non-empty str is provided, then validate that the input PyRosetta build is equal to the active PyRosetta build, and raise an error if not. This ensures that reproduction simulations use an identical PyRosetta build from the original simulation. To bypass PyRosetta build validation with a warning message, an empty string (‘’) may be provided (but does not ensure reproducibility). Default: None

sha1: A str or NoneType object specifying the git SHA1 hash string of the

particular git commit being simulated. If a non-empty str object is provided, then it is validated to match the SHA1 hash string of the current HEAD, and then it is added to the simulation record for accounting. If an empty string is provided, then ensure that everything in the working directory is committed to the repository. If None is provided, then bypass SHA1 hash string validation and set this attribute to an empty string. Default: “”

project_name: A str object specifying the project name of this simulation.

This option just adds the user-provided project_name to the scorefile for accounting. Default: datetime.now().strftime(“%Y.%m.%d.%H.%M.%S.%f”) if not specified,

else “PyRosettaCluster” if None

simulation_name: A str object specifying the name of this simulation.

This option just adds the user-provided simulation_name to the scorefile for accounting. Default: project_name if not specified, else “PyRosettaCluster” if None

environment: A NoneType or str object specifying the active conda environment

YML file string. If a NoneType object is provided, then generate a YML file string for the active conda environment and save it to the full simulation record. If a non-empty str object is provided, then validate it against the active conda environment YML file string and save it to the full simulation record. This ensures that reproduction simulations use an identical conda environment from the original simulation. To bypass conda environment validation with a warning message, an empty string (‘’) may be provided (but does not ensure reproducibility). Default: None

output_path: A str object specifying the full path of the output directory

(to be created if it doesn’t exist) where the output results will be saved to disk. Default: “./outputs”

output_init_file: A str object specifying the output “.init” file path that caches

the ‘input_packed_pose’ keyword argument parameter upon PyRosettaCluster instantiation, and not including any output decoys, which is optionally used for exporting PyRosetta initialization files with output decoys by the pyrosetta.distributed.cluster.export_init_file() function after the simulation completes (see the ‘output_decoy_types’ keyword argument). If a NoneType object (or an empty str object (‘’)) is provided, or dry_run=True, then skip writing an output “.init” file upon PyRosettaCluster instantiation. If skipped, it is recommended to run pyrosetta.dump_init_file() before or after the simulation. If compressed=True, then the output file is further compressed by bzip2, and “.bz2” is appended to the filename. Default: output_path/`project_name`_`simulation_name`_pyrosetta.init

output_decoy_types: An iterable of str objects representing the output decoy

filetypes to save during the simulation. Available options are: “.pdb” for PDB files; “.pkl_pose” for pickled Pose files; “.b64_pose” for base64-encoded pickled Pose files; and “.init” for PyRosetta initialization files, each caching the host node PyRosetta initialization options (and input files, if any), the ‘input_packed_pose’ keyword argument parameter (if any) and an output decoy. Because each “.init” file contains a copy of the PyRosetta initialization input files and input PackedPose object, unless these objects are relatively small in size or there are relatively few expected output decoys, then it is recommended to run pyrosetta.distributed.cluster.export_init_file() on only decoys of interest after the simulation completes without specifying “.init”. If compressed=True, then each decoy output file is further compressed by bzip2, and “.bz2” is appended to the filename. Default: [“.pdb”,]

output_scorefile_types: An iterable of str objects representing the output scorefile

filetypes to save during the simulation. Available options are: “.json” for a JSON-encoded scorefile, and any filename extensions accepted by pandas.DataFrame().to_pickle(compression=”infer”) (including “.gz”, “.bz2”, and “.xz”) for pickled pandas.DataFrame objects of scorefile data that can later be analyzed using pyrosetta.distributed.cluster.io.secure_read_pickle(compression=”infer”). Note that in order to save pickled pandas.DataFrame objects, please ensure that pyrosetta.secure_unpickle.add_secure_package(“pandas”) has been first run. Default: [“.json”,]

scorefile_name: A str object specifying the name of the output JSON-formatted

scorefile, which must end in “.json”. The scorefile location is always output_path/scorefile_name. If “.json” is not in the ‘output_scorefile_types’ keyword argument parameter, the JSON-formatted scorefile will not be output, but other scorefile types will get the same filename before the “.json” extension. Default: “scores.json”

simulation_records_in_scorefile: A bool object specifying whether or not to

write full simulation records to the scorefile. If True, then write full simulation records to the scorefile. This results in some redundant information on each line, allowing downstream reproduction of a decoy from the scorefile, but a larger scorefile. If False, then write curtailed simulation records to the scorefile. This results in minimally redundant information on each line, disallowing downstream reproduction of a decoy from the scorefile, but a smaller scorefile. If False, also write the active conda environment to a YML file in ‘output_path’. Full simulation records are always written to the output ‘.pdb’ or ‘.pdb.bz2’ file(s), which can be used to reproduce any decoy without the scorefile. Default: False

decoy_dir_name: A str object specifying the directory name where the

output decoys will be saved. The directory location is always output_path/decoy_dir_name. Default: “decoys”

logs_dir_name: A str object specifying the directory name where the

output log files will be saved. The directory location is always output_path/logs_dir_name. Default: “logs”

logging_level: A str object specifying the logging level of python tracer

output to write to the log file of either “NOTSET”, “DEBUG”, “INFO”, “WARNING”, “ERROR”, or “CRITICAL”. The output log file is always written to output_path/logs_dir_name/simulation_name.log on disk. Default: “INFO”

logging_address: A str object specifying the socket endpoint for sending and receiving

log messages across a network, so log messages from user-provided PyRosetta protocols may be written to a single log file on the host node. The str object must take the format ‘host:port’ where ‘host’ is either an IP address, ‘localhost’, or Domain Name System (DNS)-accessible domain name, and the ‘port’ is a digit greater than or equal to 0. If the ‘port’ is ‘0’, then the next free port is selected. Default: ‘localhost:0’ if scheduler=None or either the client or clients

keyword argument parameters specify instances of dask.distributed.LocalCluster, otherwise ‘0.0.0.0:0’

ignore_errors: A bool object specifying for PyRosettaCluster to ignore errors

raised in the user-provided PyRosetta protocols. This comes in handy when well-defined errors are sparse and sporadic (such as rare Segmentation Faults), and the user would like PyRosettaCluster to run without raising the errors. Default: False

timeout: A float or int object specifying how many seconds to wait between

PyRosettaCluster checking-in on the running user-provided PyRosetta protocols. If each user-provided PyRosetta protocol is expected to run quickly, then 0.1 seconds seems reasonable. If each user-provided PyRosetta protocol is expected to run slowly, then >1 second seems reasonable. Default: 0.5

max_delay_time: A float or int object specifying the maximum number of seconds to

sleep before returning the result(s) from each user-provided PyRosetta protocol back to the client. If a dask worker returns the result(s) from a user-provided PyRosetta protocol too quickly, the dask scheduler needs to first register that the task is processing before it completes. In practice, in each user-provided PyRosetta protocol the runtime is subtracted from max_delay_time, and the dask worker sleeps for the remainder of the time, if any, before returning the result(s). It’s recommended to set this option to at least 1 second, but longer times may be used as a safety throttle in cases of overwhelmed dask scheduler processes. Default: 3.0

filter_results: A bool object specifying whether or not to filter out empty

PackedPose objects between user-provided PyRosetta protocols. When a protocol returns or yields NoneType, PyRosettaCluster converts it to an empty PackedPose object that gets passed to the next protocol. If True, then filter out any empty PackedPose objects where there are no residues in the conformation as given by Pose.empty(), otherwise if False then continue to pass empty PackedPose objects to the next protocol. This is used for filtering out decoys mid-trajectory through user-provided PyRosetta protocols if protocols return or yield any None, empty Pose, or empty PackedPose objects. Default: True

save_all: A bool object specifying whether or not to save all of the returned

or yielded Pose and PackedPose objects from all user-provided PyRosetta protocols. This option may be used for checkpointing trajectories. To save arbitrary poses to disk, from within any user-provided PyRosetta protocol:

`pose.dump_pdb(

os.path.join(kwargs[“PyRosettaCluster_output_path”], “checkpoint.pdb”))`

Default: False

dry_run: A bool object specifying whether or not to save ‘.pdb’ files to

disk. If True, then do not write ‘.pdb’ or ‘.pdb.bz2’ files to disk. Default: False

cooldown_time: A float or int object specifying how many seconds to sleep after the

simulation is complete to allow loggers to flush. For very slow network filesystems, 2.0 or more seconds may be reasonable. Default: 0.5

norm_task_options: A bool object specifying whether or not to normalize the task

‘options’ and ‘extra_options’ values after PyRosetta initialization on the remote compute cluster. If True, then this enables more facile simulation reproduction by the use of the ProtocolSettingsMetric SimpleMetric to normalize the PyRosetta initialization options and by relativization of any input files and directory paths to the current working directory from which the task is running. Default: True

author: An optional str object specifying the author(s) of the simulation that is

written to the full simulation records and the PyRosetta initialization ‘.init’ file. Default: “”

email: An optional str object specifying the email address(es) of the author(s) of

the simulation that is written to the full simulation records and the PyRosetta initialization ‘.init’ file. Default: “”

license: An optional str object specifying the license of the output data of the

simulation that is written to the full simulation records and the PyRosetta initialization ‘.init’ file (e.g., “ODC-ODbL”, “CC BY-ND”, “CDLA Permissive-2.0”, etc.). Default: “”

Returns:

A PyRosettaCluster instance.

class pyrosetta.distributed.cluster.core.PyRosettaCluster(*, tasks: Any = [{}], nstruct=1, input_packed_pose: Any = None, seeds: Optional[Any] = None, decoy_ids: Optional[Any] = None, client: Optional[Client] = None, clients: Optional[List[Client]] = None, scheduler: str = None, cores=1, processes=1, memory='4g', scratch_dir: Any = None, min_workers=1, max_workers=_Nothing.NOTHING, dashboard_address=':8787', project_name='2025.10.08.20.40.44.474398', simulation_name=_Nothing.NOTHING, output_path='/home/benchmark/rosetta/source/build/PyRosetta/Linux-5.4.0-84-generic-x86_64-with-glibc2.27/clang-6.0.0/python-3.11/minsizerel.serialization.thread/documentation/outputs', output_decoy_types: Any = None, output_scorefile_types: Any = None, scorefile_name='scores.json', simulation_records_in_scorefile=False, decoy_dir_name='decoys', logs_dir_name='logs', logging_level='INFO', logging_address: str = _Nothing.NOTHING, compressed=True, compression: Optional[Union[str, bool]] = True, sha1: Any = '', ignore_errors=False, timeout=0.5, max_delay_time=3.0, filter_results: Any = None, save_all=False, dry_run=False, norm_task_options: Any = None, cooldown_time=0.5, system_info: Any = None, pyrosetta_build: Any = None, environment: Any = None, author=None, email=None, license=None, output_init_file=_Nothing.NOTHING)

Bases: IO[G], LoggingSupport[G], SchedulerManager[G], TaskBase[G]

PyRosettaCluster is a class for reproducible, high-throughput job distribution of user-defined PyRosetta protocols efficiently parallelized on the user’s local computer, high-performance computing (HPC) cluster, or elastic cloud computing infrastructure with available compute resources.

Args:
tasks: A list of dict objects, a callable or called function returning

a list of dict objects, or a callable or called generator yielding a list of dict objects. Each dictionary object element of the list is accessible via kwargs in the user-defined PyRosetta protocols. In order to initialize PyRosetta with user-defined PyRosetta command line options at the start of each user-defined PyRosetta protocol, either extra_options and/or options must be a key of each dictionary object, where the value is a str, tuple, list, set, or dict of PyRosetta command line options. Default: [{}]

input_packed_pose: Optional input PackedPose object that is accessible via

the first argument of the first user-defined PyRosetta protocol. Default: None

seeds: A list of int objects specifying the random number generator seeds

to use for each user-defined PyRosetta protocol. The number of seeds provided must be equal to the number of user-defined input PyRosetta protocols. Seeds are used in the same order that the user-defined PyRosetta protocols are executed. Default: None

decoy_ids: A list of int objects specifying the decoy numbers to keep after

executing user-defined PyRosetta protocols. User-provided PyRosetta protocols may return a list of Pose and/or PackedPose objects, or yield multiple Pose and/or PackedPose objects. To reproduce a particular decoy generated via the chain of user-provided PyRosetta protocols, the decoy number to keep for each protocol may be specified, where other decoys are discarded. Decoy numbers use zero-based indexing, so 0 is the first decoy generated from a particular PyRosetta protocol. The number of decoy_ids provided must be equal to the number of user-defined input PyRosetta protocols, so that one decoy is saved for each user-defined PyRosetta protocol. Decoy ids are applied in the same order that the user-defined PyRosetta protocols are executed. Default: None

client: An initialized dask distributed.client.Client object to be used as

the dask client interface to the local or remote compute cluster. If None, then PyRosettaCluster initializes its own dask client based on the PyRosettaCluster(scheduler=…) class attribute. Deprecated by the PyRosettaCluster(clients=…) class attribute, but supported for legacy purposes. Either or both of the client or clients attribute parameters must be None. Default: None

clients: A list or tuple object of initialized dask distributed.client.Client

objects to be used as the dask client interface(s) to the local or remote compute cluster(s). If None, then PyRosettaCluster initializes its own dask client based on the PyRosettaCluster(scheduler=…) class attribute. Optionally used in combination with the PyRosettaCluster().distribute(clients_indices=…) method. Either or both of the client or clients attribute parameters must be None. See the PyRosettaCluster().distribute() method docstring for usage examples. Default: None

scheduler: A str of either “sge” or “slurm”, or None. If “sge”, then

PyRosettaCluster schedules jobs using SGECluster with dask-jobqueue. If “slurm”, then PyRosettaCluster schedules jobs using SLURMCluster with dask-jobqueue. If None, then PyRosettaCluster schedules jobs using LocalCluster with dask.distributed. If PyRosettaCluster(client=…) or PyRosettaCluster(clients=…) is provided, then PyRosettaCluster(scheduler=…) is ignored. Default: None

cores: An int object specifying the total number of cores per job, which

is input to the dask_jobqueue.SLURMCluster(cores=…) argument or the dask_jobqueue.SGECluster(cores=…) argument. Default: 1

processes: An int object specifying the total number of processes per job,

which is input to the dask_jobqueue.SLURMCluster(processes=…) argument or the dask_jobqueue.SGECluster(processes=…) argument. This cuts the job up into this many processes. Default: 1

memory: A str object specifying the total amount of memory per job, which

is input to the dask_jobqueue.SLURMCluster(memory=…) argument or the dask_jobqueue.SGECluster(memory=…) argument. Default: “4g”

scratch_dir: A str object specifying the path to a scratch directory where

dask litter may go. Default: “/temp” if it exists, otherwise the current working directory

min_workers: An int object specifying the minimum number of workers to

which to adapt during parallelization of user-provided PyRosetta protocols. Default: 1

max_workers: An int object specifying the maximum number of workers to

which to adapt during parallelization of user-provided PyRosetta protocols. Default: 1000 if the initial number of tasks is <1000, else use the

the initial number of tasks

dashboard_address: A str object specifying the port over which the dask

dashboard is forwarded. Particularly useful for diagnosing PyRosettaCluster performance in real-time. Default=”:8787”

nstruct: An int object specifying the number of repeats of the first

user-provided PyRosetta protocol. The user can control the number of repeats of subsequent user-provided PyRosetta protocols via returning multiple clones of the output pose(s) from a user-provided PyRosetta protocol run earlier, or cloning the input pose(s) multiple times in a user-provided PyRosetta protocol run later. Default: 1

compressed: A bool object specifying whether or not to compress the output

“.pdb”, “.pkl_pose”, “.b64_pose”, and “.init” files with bzip2, resulting in appending “.bz2” to decoy output files and PyRosetta initialization files. Also see the ‘output_decoy_types’ and ‘output_init_file’ keyword arguments. Default: True

compression: A str object of ‘xz’, ‘zlib’ or ‘bz2’, or a bool or NoneType

object representing the internal compression library for pickled PackedPose objects and user-defined PyRosetta protocol kwargs objects. The default of True uses ‘xz’ for serialization if it’s installed, otherwise uses ‘zlib’ for serialization. Default: True

system_info: A dict or NoneType object specifying the system information

required to reproduce the simulation. If None is provided, then PyRosettaCluster automatically detects the platform and returns this attribute as a dictionary {‘sys.platform’: sys.platform} (for example, {‘sys.platform’: ‘linux’}). If a dict is provided, then validate that the ‘sys.platform’ key has a value equal to the current sys.platform, and log a warning message if not. Additional system information such as Amazon Machine Image (AMI) identifier and compute fleet instance type identifier may be stored in this dictionary, but is not validated. This information is stored in the simulation records for accounting. Default: None

pyrosetta_build: A str or NoneType object specifying the PyRosetta build as

output by pyrosetta._version_string(). If None is provided, then PyRosettaCluster automatically detects the PyRosetta build and sets this attribute as the str. If a non-empty str is provided, then validate that the input PyRosetta build is equal to the active PyRosetta build, and raise an error if not. This ensures that reproduction simulations use an identical PyRosetta build from the original simulation. To bypass PyRosetta build validation with a warning message, an empty string (‘’) may be provided (but does not ensure reproducibility). Default: None

sha1: A str or NoneType object specifying the git SHA1 hash string of the

particular git commit being simulated. If a non-empty str object is provided, then it is validated to match the SHA1 hash string of the current HEAD, and then it is added to the simulation record for accounting. If an empty string is provided, then ensure that everything in the working directory is committed to the repository. If None is provided, then bypass SHA1 hash string validation and set this attribute to an empty string. Default: “”

project_name: A str object specifying the project name of this simulation.

This option just adds the user-provided project_name to the scorefile for accounting. Default: datetime.now().strftime(“%Y.%m.%d.%H.%M.%S.%f”) if not specified,

else “PyRosettaCluster” if None

simulation_name: A str object specifying the name of this simulation.

This option just adds the user-provided simulation_name to the scorefile for accounting. Default: project_name if not specified, else “PyRosettaCluster” if None

environment: A NoneType or str object specifying the active conda environment

YML file string. If a NoneType object is provided, then generate a YML file string for the active conda environment and save it to the full simulation record. If a non-empty str object is provided, then validate it against the active conda environment YML file string and save it to the full simulation record. This ensures that reproduction simulations use an identical conda environment from the original simulation. To bypass conda environment validation with a warning message, an empty string (‘’) may be provided (but does not ensure reproducibility). Default: None

output_path: A str object specifying the full path of the output directory

(to be created if it doesn’t exist) where the output results will be saved to disk. Default: “./outputs”

output_init_file: A str object specifying the output “.init” file path that caches

the ‘input_packed_pose’ keyword argument parameter upon PyRosettaCluster instantiation, and not including any output decoys, which is optionally used for exporting PyRosetta initialization files with output decoys by the pyrosetta.distributed.cluster.export_init_file() function after the simulation completes (see the ‘output_decoy_types’ keyword argument). If a NoneType object (or an empty str object (‘’)) is provided, or dry_run=True, then skip writing an output “.init” file upon PyRosettaCluster instantiation. If skipped, it is recommended to run pyrosetta.dump_init_file() before or after the simulation. If compressed=True, then the output file is further compressed by bzip2, and “.bz2” is appended to the filename. Default: output_path/`project_name`_`simulation_name`_pyrosetta.init

output_decoy_types: An iterable of str objects representing the output decoy

filetypes to save during the simulation. Available options are: “.pdb” for PDB files; “.pkl_pose” for pickled Pose files; “.b64_pose” for base64-encoded pickled Pose files; and “.init” for PyRosetta initialization files, each caching the host node PyRosetta initialization options (and input files, if any), the ‘input_packed_pose’ keyword argument parameter (if any) and an output decoy. Because each “.init” file contains a copy of the PyRosetta initialization input files and input PackedPose object, unless these objects are relatively small in size or there are relatively few expected output decoys, then it is recommended to run pyrosetta.distributed.cluster.export_init_file() on only decoys of interest after the simulation completes without specifying “.init”. If compressed=True, then each decoy output file is further compressed by bzip2, and “.bz2” is appended to the filename. Default: [“.pdb”,]

output_scorefile_types: An iterable of str objects representing the output scorefile

filetypes to save during the simulation. Available options are: “.json” for a JSON-encoded scorefile, and any filename extensions accepted by pandas.DataFrame().to_pickle(compression=”infer”) (including “.gz”, “.bz2”, and “.xz”) for pickled pandas.DataFrame objects of scorefile data that can later be analyzed using pyrosetta.distributed.cluster.io.secure_read_pickle(compression=”infer”). Note that in order to save pickled pandas.DataFrame objects, please ensure that pyrosetta.secure_unpickle.add_secure_package(“pandas”) has been first run. Default: [“.json”,]

scorefile_name: A str object specifying the name of the output JSON-formatted

scorefile, which must end in “.json”. The scorefile location is always output_path/scorefile_name. If “.json” is not in the ‘output_scorefile_types’ keyword argument parameter, the JSON-formatted scorefile will not be output, but other scorefile types will get the same filename before the “.json” extension. Default: “scores.json”

simulation_records_in_scorefile: A bool object specifying whether or not to

write full simulation records to the scorefile. If True, then write full simulation records to the scorefile. This results in some redundant information on each line, allowing downstream reproduction of a decoy from the scorefile, but a larger scorefile. If False, then write curtailed simulation records to the scorefile. This results in minimally redundant information on each line, disallowing downstream reproduction of a decoy from the scorefile, but a smaller scorefile. If False, also write the active conda environment to a YML file in ‘output_path’. Full simulation records are always written to the output ‘.pdb’ or ‘.pdb.bz2’ file(s), which can be used to reproduce any decoy without the scorefile. Default: False

decoy_dir_name: A str object specifying the directory name where the

output decoys will be saved. The directory location is always output_path/decoy_dir_name. Default: “decoys”

logs_dir_name: A str object specifying the directory name where the

output log files will be saved. The directory location is always output_path/logs_dir_name. Default: “logs”

logging_level: A str object specifying the logging level of python tracer

output to write to the log file of either “NOTSET”, “DEBUG”, “INFO”, “WARNING”, “ERROR”, or “CRITICAL”. The output log file is always written to output_path/logs_dir_name/simulation_name.log on disk. Default: “INFO”

logging_address: A str object specifying the socket endpoint for sending and receiving

log messages across a network, so log messages from user-provided PyRosetta protocols may be written to a single log file on the host node. The str object must take the format ‘host:port’ where ‘host’ is either an IP address, ‘localhost’, or Domain Name System (DNS)-accessible domain name, and the ‘port’ is a digit greater than or equal to 0. If the ‘port’ is ‘0’, then the next free port is selected. Default: ‘localhost:0’ if scheduler=None or either the client or clients

keyword argument parameters specify instances of dask.distributed.LocalCluster, otherwise ‘0.0.0.0:0’

ignore_errors: A bool object specifying for PyRosettaCluster to ignore errors

raised in the user-provided PyRosetta protocols. This comes in handy when well-defined errors are sparse and sporadic (such as rare Segmentation Faults), and the user would like PyRosettaCluster to run without raising the errors. Default: False

timeout: A float or int object specifying how many seconds to wait between

PyRosettaCluster checking-in on the running user-provided PyRosetta protocols. If each user-provided PyRosetta protocol is expected to run quickly, then 0.1 seconds seems reasonable. If each user-provided PyRosetta protocol is expected to run slowly, then >1 second seems reasonable. Default: 0.5

max_delay_time: A float or int object specifying the maximum number of seconds to

sleep before returning the result(s) from each user-provided PyRosetta protocol back to the client. If a dask worker returns the result(s) from a user-provided PyRosetta protocol too quickly, the dask scheduler needs to first register that the task is processing before it completes. In practice, in each user-provided PyRosetta protocol the runtime is subtracted from max_delay_time, and the dask worker sleeps for the remainder of the time, if any, before returning the result(s). It’s recommended to set this option to at least 1 second, but longer times may be used as a safety throttle in cases of overwhelmed dask scheduler processes. Default: 3.0

filter_results: A bool object specifying whether or not to filter out empty

PackedPose objects between user-provided PyRosetta protocols. When a protocol returns or yields NoneType, PyRosettaCluster converts it to an empty PackedPose object that gets passed to the next protocol. If True, then filter out any empty PackedPose objects where there are no residues in the conformation as given by Pose.empty(), otherwise if False then continue to pass empty PackedPose objects to the next protocol. This is used for filtering out decoys mid-trajectory through user-provided PyRosetta protocols if protocols return or yield any None, empty Pose, or empty PackedPose objects. Default: True

save_all: A bool object specifying whether or not to save all of the returned

or yielded Pose and PackedPose objects from all user-provided PyRosetta protocols. This option may be used for checkpointing trajectories. To save arbitrary poses to disk, from within any user-provided PyRosetta protocol:

`pose.dump_pdb(

os.path.join(kwargs[“PyRosettaCluster_output_path”], “checkpoint.pdb”))`

Default: False

dry_run: A bool object specifying whether or not to save ‘.pdb’ files to

disk. If True, then do not write ‘.pdb’ or ‘.pdb.bz2’ files to disk. Default: False

cooldown_time: A float or int object specifying how many seconds to sleep after the

simulation is complete to allow loggers to flush. For very slow network filesystems, 2.0 or more seconds may be reasonable. Default: 0.5

norm_task_options: A bool object specifying whether or not to normalize the task

‘options’ and ‘extra_options’ values after PyRosetta initialization on the remote compute cluster. If True, then this enables more facile simulation reproduction by the use of the ProtocolSettingsMetric SimpleMetric to normalize the PyRosetta initialization options and by relativization of any input files and directory paths to the current working directory from which the task is running. Default: True

author: An optional str object specifying the author(s) of the simulation that is

written to the full simulation records and the PyRosetta initialization ‘.init’ file. Default: “”

email: An optional str object specifying the email address(es) of the author(s) of

the simulation that is written to the full simulation records and the PyRosetta initialization ‘.init’ file. Default: “”

license: An optional str object specifying the license of the output data of the

simulation that is written to the full simulation records and the PyRosetta initialization ‘.init’ file (e.g., “ODC-ODbL”, “CC BY-ND”, “CDLA Permissive-2.0”, etc.). Default: “”

Returns:

A PyRosettaCluster instance.

tasks
nstruct
tasks_size
input_packed_pose
seeds
decoy_ids
client
clients
scheduler
cores
processes
memory
scratch_dir
adapt_threshold
min_workers
max_workers
dashboard_address
project_name
simulation_name
output_path
output_decoy_types
output_scorefile_types
scorefile_name
scorefile_path
simulation_records_in_scorefile
decoy_dir_name
decoy_path
logs_dir_name
logs_path
logging_level
logging_file
logging_address
compressed
compression
sha1
ignore_errors
timeout
max_delay_time
filter_results
save_all
dry_run
norm_task_options
yield_results
cooldown_time
protocols_key
system_info
pyrosetta_build
environment
author
email
license
output_init_file
environment_file
pyrosetta_init_args
_create_future(client: Client, protocol_name: str, compressed_protocol: bytes, compressed_packed_pose: bytes, compressed_kwargs: bytes, pyrosetta_init_kwargs: Dict[str, Any], extra_args: Dict[str, Any], passkey: bytes, resource: Optional[Dict[Any, Any]]) Future

Scatter data and return submitted ‘user_spawn_thread’ future.

_run(*args: Any, protocols: Any = None, clients_indices: Any = None, resources: Any = None) Union[NoReturn, Generator[Tuple[PackedPose, Dict[Any, Any]], None, None]]

Run user-provided PyRosetta protocols on a local or remote compute cluster using the user-customized PyRosettaCluster instance. Either arguments or the ‘protocols’ keyword argument is required. If both are provided, then the ‘protocols’ keyword argument gets concatenated after the input arguments.

Examples:

PyRosettaCluster().distribute(protocol_1) PyRosettaCluster().distribute(protocols=protocol_1) PyRosettaCluster().distribute(protocol_1, protocol_2, protocol_3) PyRosettaCluster().distribute(protocols=(protocol_1, protocol_2, protocol_3)) PyRosettaCluster().distribute(protocol_1, protocol_2, protocols=[protocol_3, protocol_4])

# Run protocol_1 on client_1, # then protocol_2 on client_2, # then protocol_3 on client_1, # then protocol_4 on client_2: PyRosettaCluster(clients=[client_1, client_2]).distribute(

protocols=[protocol_1, protocol_2, protocol_3, protocol_4], clients_indices=[0, 1, 0, 1],

)

# Run protocol_1 on client_2, # then protocol_2 on client_3, # then protocol_3 on client_1: PyRosettaCluster(clients=[client_1, client_2, client_3]).distribute(

protocols=[protocol_1, protocol_2, protocol_3], clients_indices=[1, 2, 0],

)

# Run protocol_1 on client_1 with dask worker resource constraints “GPU=2”, # then protocol_2 on client_1 with dask worker resource constraints “MEMORY=100e9”, # then protocol_3 on client_1 without dask worker resource constraints: PyRosettaCluster(client=client_1).distribute(

protocols=[protocol_1, protocol_2, protocol_3], resources=[{“GPU”: 2}, {“MEMORY”: 100e9}, None],

)

# Run protocol_1 on client_1 with dask worker resource constraints “GPU=2”, # then protocol_2 on client_2 with dask worker resource constraints “MEMORY=100e9”: PyRosettaCluster(clients=[client_1, client_2]).distribute(

protocols=[protocol_1, protocol_2], clients_indices=[0, 1], resources=[{“GPU”: 2}, {“MEMORY”: 100e9}],

)

Args:
*args: Optional instances of type types.GeneratorType or types.FunctionType,

in the order of protocols to be executed.

protocols: An optional iterable of extra callable PyRosetta protocols,

i.e. an iterable of objects of types.GeneratorType and/or types.FunctionType types; or a single instance of type types.GeneratorType or types.FunctionType. Default: None

clients_indices: An optional list or tuple object of int objects, where each int object represents

a zero-based index corresponding to the initialized dask distributed.client.Client object(s) passed to the PyRosettaCluster(clients=…) class attribute. If not None, then the length of the clients_indices object must equal the number of protocols passed to the PyRosettaCluster().distribute method. Default: None

resources: An optional list or tuple object of dict objects, where each dict object represents

an abstract, arbitrary resource to constrain which dask workers run the user-defined PyRosetta protocols. If None, then do not impose resource constaints on any protocols. If not None, then the length of the resources object must equal the number of protocols passed to the PyRosettaCluster().distribute method, such that each resource specified indicates the unique resource constraints for the protocol at the corresponding index of the protocols passed to PyRosettaCluster().distribute. Note that this feature is only useful when one passes in their own instantiated client(s) with dask workers set up with various resource constraints. If dask workers were not instantiated to satisfy the specified resource constraints, protocols will hang indefinitely because the dask scheduler is waiting for workers that meet the specified resource constraints so that it can schedule these protocols. Unless workers were created with these resource tags applied, the protocols will not run. See https://distributed.dask.org/en/stable/resources.html for more information. Default: None

generate(*args: Any, protocols: Any = None, clients_indices: Any = None, resources: Any = None) Union[NoReturn, Generator[Tuple[PackedPose, Dict[Any, Any]], None, None]]

Run user-provided PyRosetta protocols on a local or remote compute cluster using the user-customized PyRosettaCluster instance. Either arguments or the ‘protocols’ keyword argument is required. If both are provided, then the ‘protocols’ keyword argument gets concatenated after the input arguments.

Examples:

PyRosettaCluster().distribute(protocol_1) PyRosettaCluster().distribute(protocols=protocol_1) PyRosettaCluster().distribute(protocol_1, protocol_2, protocol_3) PyRosettaCluster().distribute(protocols=(protocol_1, protocol_2, protocol_3)) PyRosettaCluster().distribute(protocol_1, protocol_2, protocols=[protocol_3, protocol_4])

# Run protocol_1 on client_1, # then protocol_2 on client_2, # then protocol_3 on client_1, # then protocol_4 on client_2: PyRosettaCluster(clients=[client_1, client_2]).distribute(

protocols=[protocol_1, protocol_2, protocol_3, protocol_4], clients_indices=[0, 1, 0, 1],

)

# Run protocol_1 on client_2, # then protocol_2 on client_3, # then protocol_3 on client_1: PyRosettaCluster(clients=[client_1, client_2, client_3]).distribute(

protocols=[protocol_1, protocol_2, protocol_3], clients_indices=[1, 2, 0],

)

# Run protocol_1 on client_1 with dask worker resource constraints “GPU=2”, # then protocol_2 on client_1 with dask worker resource constraints “MEMORY=100e9”, # then protocol_3 on client_1 without dask worker resource constraints: PyRosettaCluster(client=client_1).distribute(

protocols=[protocol_1, protocol_2, protocol_3], resources=[{“GPU”: 2}, {“MEMORY”: 100e9}, None],

)

# Run protocol_1 on client_1 with dask worker resource constraints “GPU=2”, # then protocol_2 on client_2 with dask worker resource constraints “MEMORY=100e9”: PyRosettaCluster(clients=[client_1, client_2]).distribute(

protocols=[protocol_1, protocol_2], clients_indices=[0, 1], resources=[{“GPU”: 2}, {“MEMORY”: 100e9}],

)

Args:
*args: Optional instances of type types.GeneratorType or types.FunctionType,

in the order of protocols to be executed.

protocols: An optional iterable of extra callable PyRosetta protocols,

i.e. an iterable of objects of types.GeneratorType and/or types.FunctionType types; or a single instance of type types.GeneratorType or types.FunctionType. Default: None

clients_indices: An optional list or tuple object of int objects, where each int object represents

a zero-based index corresponding to the initialized dask distributed.client.Client object(s) passed to the PyRosettaCluster(clients=…) class attribute. If not None, then the length of the clients_indices object must equal the number of protocols passed to the PyRosettaCluster().distribute method. Default: None

resources: An optional list or tuple object of dict objects, where each dict object represents

an abstract, arbitrary resource to constrain which dask workers run the user-defined PyRosetta protocols. If None, then do not impose resource constaints on any protocols. If not None, then the length of the resources object must equal the number of protocols passed to the PyRosettaCluster().distribute method, such that each resource specified indicates the unique resource constraints for the protocol at the corresponding index of the protocols passed to PyRosettaCluster().distribute. Note that this feature is only useful when one passes in their own instantiated client(s) with dask workers set up with various resource constraints. If dask workers were not instantiated to satisfy the specified resource constraints, protocols will hang indefinitely because the dask scheduler is waiting for workers that meet the specified resource constraints so that it can schedule these protocols. Unless workers were created with these resource tags applied, the protocols will not run. See https://distributed.dask.org/en/stable/resources.html for more information. Default: None

Extra information:

The PyRosettaCluster.generate method may be used for developing PyRosetta protocols on a local or remote compute cluster and optionally post-processing or visualizing output PackedPose objects in memory. Importantly, subsequent code run on the yielded results is not captured by PyRosettaCluster, and so use of this method does not ensure reproducibility of the simulation. Use the PyRosettaCluster.distribute method for reproducible simulations.

Each yielded result is a tuple object with a PackedPose object as the first element and a dict object as the second element. The PackedPose object represents a returned or yielded PackedPose (or Pose or NoneType) object from the most recently run user-provided PyRosetta protocol. The dict object represents the optionally returned or yielded user-defined PyRosetta protocol kwargs dictionary object from the same most recently run user-provided PyRosetta protocol (see ‘protocols’ argument). If PyRosettaCluster(save_all=True), results are yielded after each user-provided PyRosetta protocol, otherwise results are yielded after the final user-defined PyRosetta protocol. Results are yielded in the order in which they arrive back to the client(s) from the distributed cluster (which may differ from the order that tasks are submitted, due to tasks running asynchronously). If PyRosettaCluster(dry_run=True), results are still yielded but ‘.pdb’ or ‘.pdb.bz2’ files are not saved to disk. See https://docs.dask.org/en/latest/futures.html#distributed.as_completed for more information.

Extra examples:

# Iterate over results in real-time as they are yielded from the cluster: for packed_pose, kwargs in PyRosettaCluster().generate(protocols):

# Iterate over submissions to the same client: client = Client() for packed_pose, kwargs in PyRosettaCluster(client=client).generate(protocols):

# Post-process results on host node asynchronously from results generation prc = PyRosettaCluster(

input_packed_pose=packed_pose, client=client, logs_dir_name=f”logs_{uuid.uuid4().hex}”, # Make sure to write new log files

) for packed_pose, kwargs in prc.generate(other_protocols):

# Iterate over multiple clients: client_1 = Client() client_2 = Client() for packed_pose, kwargs in PyRosettaCluster(client=client_1).generate(protocols):

# Post-process results on host node asynchronously from results generation prc = PyRosettaCluster(

input_packed_pose=packed_pose, client=client_2, logs_dir_name=f”logs_{uuid.uuid4().hex}”, # Make sure to write new log files

) for packed_pose, kwargs in prc.generate(other_protocols):

# Using multiple dask.distributed.as_completed iterators on the host node creates additional overhead. # If post-processing on the host node is not required between user-provided PyRosetta protocols, # the preferred method is to distribute protocols within a single PyRosettaCluster().generate() # method call using the clients_indices keyword argument: prc_generate = PyRosettaCluster(clients=[client_1, client_2]).generate(

protocols=[protocol_1, protocol_2], clients_indices=[0, 1], resources=[{“GPU”: 1}, {“CPU”: 1}],

) for packed_pose, kwargs in prc_generate:

# Post-process results on host node asynchronously from results generation

Yields:

(PackedPose, dict) tuples from the most recently run user-provided PyRosetta protocol if PyRosettaCluster(save_all=True) otherwise from the final user-defined PyRosetta protocol.

distribute(*args: Any, protocols: Any = None, clients_indices: Any = None, resources: Any = None) Optional[NoReturn]

Run user-provided PyRosetta protocols on a local or remote compute cluster using the user-customized PyRosettaCluster instance. Either arguments or the ‘protocols’ keyword argument is required. If both are provided, then the ‘protocols’ keyword argument gets concatenated after the input arguments.

Examples:

PyRosettaCluster().distribute(protocol_1) PyRosettaCluster().distribute(protocols=protocol_1) PyRosettaCluster().distribute(protocol_1, protocol_2, protocol_3) PyRosettaCluster().distribute(protocols=(protocol_1, protocol_2, protocol_3)) PyRosettaCluster().distribute(protocol_1, protocol_2, protocols=[protocol_3, protocol_4])

# Run protocol_1 on client_1, # then protocol_2 on client_2, # then protocol_3 on client_1, # then protocol_4 on client_2: PyRosettaCluster(clients=[client_1, client_2]).distribute(

protocols=[protocol_1, protocol_2, protocol_3, protocol_4], clients_indices=[0, 1, 0, 1],

)

# Run protocol_1 on client_2, # then protocol_2 on client_3, # then protocol_3 on client_1: PyRosettaCluster(clients=[client_1, client_2, client_3]).distribute(

protocols=[protocol_1, protocol_2, protocol_3], clients_indices=[1, 2, 0],

)

# Run protocol_1 on client_1 with dask worker resource constraints “GPU=2”, # then protocol_2 on client_1 with dask worker resource constraints “MEMORY=100e9”, # then protocol_3 on client_1 without dask worker resource constraints: PyRosettaCluster(client=client_1).distribute(

protocols=[protocol_1, protocol_2, protocol_3], resources=[{“GPU”: 2}, {“MEMORY”: 100e9}, None],

)

# Run protocol_1 on client_1 with dask worker resource constraints “GPU=2”, # then protocol_2 on client_2 with dask worker resource constraints “MEMORY=100e9”: PyRosettaCluster(clients=[client_1, client_2]).distribute(

protocols=[protocol_1, protocol_2], clients_indices=[0, 1], resources=[{“GPU”: 2}, {“MEMORY”: 100e9}],

)

Args:
*args: Optional instances of type types.GeneratorType or types.FunctionType,

in the order of protocols to be executed.

protocols: An optional iterable of extra callable PyRosetta protocols,

i.e. an iterable of objects of types.GeneratorType and/or types.FunctionType types; or a single instance of type types.GeneratorType or types.FunctionType. Default: None

clients_indices: An optional list or tuple object of int objects, where each int object represents

a zero-based index corresponding to the initialized dask distributed.client.Client object(s) passed to the PyRosettaCluster(clients=…) class attribute. If not None, then the length of the clients_indices object must equal the number of protocols passed to the PyRosettaCluster().distribute method. Default: None

resources: An optional list or tuple object of dict objects, where each dict object represents

an abstract, arbitrary resource to constrain which dask workers run the user-defined PyRosetta protocols. If None, then do not impose resource constaints on any protocols. If not None, then the length of the resources object must equal the number of protocols passed to the PyRosettaCluster().distribute method, such that each resource specified indicates the unique resource constraints for the protocol at the corresponding index of the protocols passed to PyRosettaCluster().distribute. Note that this feature is only useful when one passes in their own instantiated client(s) with dask workers set up with various resource constraints. If dask workers were not instantiated to satisfy the specified resource constraints, protocols will hang indefinitely because the dask scheduler is waiting for workers that meet the specified resource constraints so that it can schedule these protocols. Unless workers were created with these resource tags applied, the protocols will not run. See https://distributed.dask.org/en/stable/resources.html for more information. Default: None

Returns:

None

DATETIME_FORMAT: str = '%Y-%m-%d %H:%M:%S.%f'
REMARK_FORMAT: str = 'REMARK PyRosettaCluster: '
__init__(*, tasks: Any = [{}], nstruct=1, input_packed_pose: Any = None, seeds: Optional[Any] = None, decoy_ids: Optional[Any] = None, client: Optional[Client] = None, clients: Optional[List[Client]] = None, scheduler: str = None, cores=1, processes=1, memory='4g', scratch_dir: Any = None, min_workers=1, max_workers=_Nothing.NOTHING, dashboard_address=':8787', project_name='2025.10.08.20.40.44.474398', simulation_name=_Nothing.NOTHING, output_path='/home/benchmark/rosetta/source/build/PyRosetta/Linux-5.4.0-84-generic-x86_64-with-glibc2.27/clang-6.0.0/python-3.11/minsizerel.serialization.thread/documentation/outputs', output_decoy_types: Any = None, output_scorefile_types: Any = None, scorefile_name='scores.json', simulation_records_in_scorefile=False, decoy_dir_name='decoys', logs_dir_name='logs', logging_level='INFO', logging_address: str = _Nothing.NOTHING, compressed=True, compression: Optional[Union[str, bool]] = True, sha1: Any = '', ignore_errors=False, timeout=0.5, max_delay_time=3.0, filter_results: Any = None, save_all=False, dry_run=False, norm_task_options: Any = None, cooldown_time=0.5, system_info: Any = None, pyrosetta_build: Any = None, environment: Any = None, author=None, email=None, license=None, output_init_file=_Nothing.NOTHING) None

Method generated by attrs for class PyRosettaCluster.

static _add_pose_comment(packed_pose: PackedPose, pdbfile_data: str) PackedPose

Cache simulation data as a pose comment.

_close_logger() None

Close the logger for the client instance.

_close_socket_listener(clients: Dict[int, Client]) None

Close logging socket listener.

_close_socket_logger_plugins(clients: Dict[int, Client]) None

Purge cached logging socket addresses on all dask workers.

_cooldown() None
_dump_init_file(filename: str, input_packed_pose: Optional[PackedPose] = None, output_packed_pose: Optional[PackedPose] = None, verbose: bool = True) None

Dump compressed PyRosetta initialization input files and poses to the input filename.

static _dump_json(data: Dict[str, Any]) str

Return JSON-serialized data.

static _filter_scores_dict(scores_dict: Dict[Any, Any]) Dict[Any, Any]
_format_result(result: Union[Pose, PackedPose]) Tuple[str, Dict[Any, Any], PackedPose]

Given a Pose or PackedPose object, return a tuple containing the pdb string and a scores dictionary.

_get_clients_index(clients_indices: List[int], protocols: List[Callable[[...], Any]]) int

Return the clients index for the current protocol.

_get_cluster() ClusterType

Given user input arguments, return the requested cluster instance.

_get_init_file_json(packed_pose: PackedPose) str

Return a PyRosetta initialization file as a JSON-serialized string.

_get_instance_and_metadata(kwargs: Dict[Any, Any]) Tuple[Dict[Any, Any], Dict[Any, Any]]

Get the current state of the PyRosettaCluster instance, and split the kwargs into the PyRosettaCluster instance kwargs and ancillary metadata.

_get_output_dir(decoy_dir: str) str

Get the output directory in which to write files to disk.

_get_resource(resources: List[Dict[Any, Any]], protocols: List[Callable[[...], Any]]) Optional[Dict[Any, Any]]

Return the resource for the current protocol.

_get_seed(protocols: Sized) Optional[str]

Get the seed for the input user-provided PyRosetta protocol.

_get_task_state(protocols: List[Callable[[...], Any]]) Tuple[List[Callable[[...], Any]], Callable[[...], Any], Optional[str]]

Given the current state of protocols, returns a tuple of the updated state of protocols and current protocol and seed.

_is_protocol = False
_maybe_adapt(adaptive: Optional[AdaptiveType]) None

Adjust max_workers.

_maybe_teardown(clients: Dict[int, ClientType], cluster: Optional[ClusterType]) None

Teardown client and cluster.

_parse_results(results: Optional[Union[Iterable[Optional[Union[Pose, PackedPose, bytes]]], Pose, PackedPose]]) Union[List[Tuple[str, Dict[Any, Any]]], NoReturn]

Format output results on distributed worker. Input argument results can be a Pose, PackedPose, or None object, or a list or tuple of Pose and/or PackedPose objects, or an empty list or tuple. Returns a list of tuples, each tuple containing the pdb string and a scores dictionary.

_process_kwargs(kwargs: Dict[Any, Any]) Dict[Any, Any]

Remove seed specification from ‘extra_options’ or ‘options’, and remove protocols_key from kwargs.

_register_socket_logger_plugin(clients: Dict[int, Client]) None

Register SocketLoggerPlugin as a dask worker plugin on dask clients.

_save_results(results: Any, kwargs: Dict[Any, Any]) None

Write results and kwargs to disk.

_setup_clients_cluster_adaptive() Tuple[Dict[int, ClientType], Optional[ClusterType], Optional[AdaptiveType]]

Given user input arguments, return the requested client, cluster, and adaptive instance.

_setup_clients_dict() Union[Dict[int, ClientType], NoReturn]
_setup_initial_kwargs(protocols: List[Callable[[...], Any]], seed: Optional[str], task: Dict[Any, Any]) Tuple[bytes, Dict[str, Any]]

Setup the kwargs for the initial task.

_setup_kwargs(kwargs: Dict[Any, Any], clients_indices: List[int], resources: Optional[Dict[Any, Any]]) Tuple[bytes, Dict[str, Any], Callable[[...], Any], int, Optional[Dict[Any, Any]]]

Setup the kwargs for the subsequent tasks.

_setup_logger() None

Open the logger for the client instance.

_setup_protocols_protocol_seed(args: Tuple[Any, ...], protocols: Any, clients_indices: Any, resources: Any) Tuple[List[Callable[[...], Any]], Callable[[...], Any], Optional[str], int, Optional[Dict[Any, Any]]]

Parse, validate, and setup the user-provided PyRosetta protocol(s).

_setup_pyrosetta_init_kwargs(kwargs: Dict[Any, Any]) Dict[str, Any]
_setup_seed(kwargs: Dict[Any, Any], seed: Optional[str]) Dict[Any, Any]

Setup the ‘options’ or ‘extra_options’ task kwargs with the -run:jran PyRosetta command line flag.

_setup_socket_listener(clients: Dict[int, Client]) Tuple[Tuple[str, int], bytes]

Setup logging socket listener.

_write_environment_file(filename: str) None

Write the YML string to the input filename.

_write_init_file() None

Maybe write PyRosetta initialization file to the input filename.