core¶
PyRosettaCluster is a class for reproducible, high-throughput job distribution of user-defined PyRosetta protocols efficiently parallelized on the user’s local computer, high-performance computing (HPC) cluster, or elastic cloud computing infrastructure with available compute resources.
- Args:
- tasks: A list of dict objects, a callable or called function returning
a list of dict objects, or a callable or called generator yielding a list of dict objects. Each dictionary object element of the list is accessible via kwargs in the user-defined PyRosetta protocols. In order to initialize PyRosetta with user-defined PyRosetta command line options at the start of each user-defined PyRosetta protocol, either extra_options and/or options must be a key of each dictionary object, where the value is a str, tuple, list, set, or dict of PyRosetta command line options. Default: [{}]
- input_packed_pose: Optional input PackedPose object that is accessible via
the first argument of the first user-defined PyRosetta protocol. Default: None
- seeds: A list of int objects specifying the random number generator seeds
to use for each user-defined PyRosetta protocol. The number of seeds provided must be equal to the number of user-defined input PyRosetta protocols. Seeds are used in the same order that the user-defined PyRosetta protocols are executed. Default: None
- decoy_ids: A list of int objects specifying the decoy numbers to keep after
executing user-defined PyRosetta protocols. User-provided PyRosetta protocols may return a list of Pose and/or PackedPose objects, or yield multiple Pose and/or PackedPose objects. To reproduce a particular decoy generated via the chain of user-provided PyRosetta protocols, the decoy number to keep for each protocol may be specified, where other decoys are discarded. Decoy numbers use zero-based indexing, so 0 is the first decoy generated from a particular PyRosetta protocol. The number of decoy_ids provided must be equal to the number of user-defined input PyRosetta protocols, so that one decoy is saved for each user-defined PyRosetta protocol. Decoy ids are applied in the same order that the user-defined PyRosetta protocols are executed. Default: None
- client: An initialized dask distributed.client.Client object to be used as
the dask client interface to the local or remote compute cluster. If None, then PyRosettaCluster initializes its own dask client based on the PyRosettaCluster(scheduler=…) class attribute. Deprecated by the PyRosettaCluster(clients=…) class attribute, but supported for legacy purposes. Either or both of the client or clients attribute parameters must be None. Default: None
- clients: A list or tuple object of initialized dask distributed.client.Client
objects to be used as the dask client interface(s) to the local or remote compute cluster(s). If None, then PyRosettaCluster initializes its own dask client based on the PyRosettaCluster(scheduler=…) class attribute. Optionally used in combination with the PyRosettaCluster().distribute(clients_indices=…) method. Either or both of the client or clients attribute parameters must be None. See the PyRosettaCluster().distribute() method docstring for usage examples. Default: None
- scheduler: A str of either “sge” or “slurm”, or None. If “sge”, then
PyRosettaCluster schedules jobs using SGECluster with dask-jobqueue. If “slurm”, then PyRosettaCluster schedules jobs using SLURMCluster with dask-jobqueue. If None, then PyRosettaCluster schedules jobs using LocalCluster with dask.distributed. If PyRosettaCluster(client=…) or PyRosettaCluster(clients=…) is provided, then PyRosettaCluster(scheduler=…) is ignored. Default: None
- cores: An int object specifying the total number of cores per job, which
is input to the dask_jobqueue.SLURMCluster(cores=…) argument or the dask_jobqueue.SGECluster(cores=…) argument. Default: 1
- processes: An int object specifying the total number of processes per job,
which is input to the dask_jobqueue.SLURMCluster(processes=…) argument or the dask_jobqueue.SGECluster(processes=…) argument. This cuts the job up into this many processes. Default: 1
- memory: A str object specifying the total amount of memory per job, which
is input to the dask_jobqueue.SLURMCluster(memory=…) argument or the dask_jobqueue.SGECluster(memory=…) argument. Default: “4g”
- scratch_dir: A str object specifying the path to a scratch directory where
dask litter may go. Default: “/temp” if it exists, otherwise the current working directory
- min_workers: An int object specifying the minimum number of workers to
which to adapt during parallelization of user-provided PyRosetta protocols. Default: 1
- max_workers: An int object specifying the maximum number of workers to
which to adapt during parallelization of user-provided PyRosetta protocols. Default: 1000 if the initial number of tasks is <1000, else use the
the initial number of tasks
- dashboard_address: A str object specifying the port over which the dask
dashboard is forwarded. Particularly useful for diagnosing PyRosettaCluster performance in real-time. Default=”:8787”
- nstruct: An int object specifying the number of repeats of the first
user-provided PyRosetta protocol. The user can control the number of repeats of subsequent user-provided PyRosetta protocols via returning multiple clones of the output pose(s) from a user-provided PyRosetta protocol run earlier, or cloning the input pose(s) multiple times in a user-provided PyRosetta protocol run later. Default: 1
- compressed: A bool object specifying whether or not to compress the output
“.pdb”, “.pkl_pose”, “.b64_pose”, and “.init” files with bzip2, resulting in appending “.bz2” to decoy output files and PyRosetta initialization files. Also see the ‘output_decoy_types’ and ‘output_init_file’ keyword arguments. Default: True
- compression: A str object of ‘xz’, ‘zlib’ or ‘bz2’, or a bool or NoneType
object representing the internal compression library for pickled PackedPose objects and user-defined PyRosetta protocol kwargs objects. The default of True uses ‘xz’ for serialization if it’s installed, otherwise uses ‘zlib’ for serialization. Default: True
- system_info: A dict or NoneType object specifying the system information
required to reproduce the simulation. If None is provided, then PyRosettaCluster automatically detects the platform and returns this attribute as a dictionary {‘sys.platform’: sys.platform} (for example, {‘sys.platform’: ‘linux’}). If a dict is provided, then validate that the ‘sys.platform’ key has a value equal to the current sys.platform, and log a warning message if not. Additional system information such as Amazon Machine Image (AMI) identifier and compute fleet instance type identifier may be stored in this dictionary, but is not validated. This information is stored in the simulation records for accounting. Default: None
- pyrosetta_build: A str or NoneType object specifying the PyRosetta build as
output by pyrosetta._version_string(). If None is provided, then PyRosettaCluster automatically detects the PyRosetta build and sets this attribute as the str. If a non-empty str is provided, then validate that the input PyRosetta build is equal to the active PyRosetta build, and raise an error if not. This ensures that reproduction simulations use an identical PyRosetta build from the original simulation. To bypass PyRosetta build validation with a warning message, an empty string (‘’) may be provided (but does not ensure reproducibility). Default: None
- sha1: A str or NoneType object specifying the git SHA1 hash string of the
particular git commit being simulated. If a non-empty str object is provided, then it is validated to match the SHA1 hash string of the current HEAD, and then it is added to the simulation record for accounting. If an empty string is provided, then ensure that everything in the working directory is committed to the repository. If None is provided, then bypass SHA1 hash string validation and set this attribute to an empty string. Default: “”
- project_name: A str object specifying the project name of this simulation.
This option just adds the user-provided project_name to the scorefile for accounting. Default: datetime.now().strftime(“%Y.%m.%d.%H.%M.%S.%f”) if not specified,
else “PyRosettaCluster” if None
- simulation_name: A str object specifying the name of this simulation.
This option just adds the user-provided simulation_name to the scorefile for accounting. Default: project_name if not specified, else “PyRosettaCluster” if None
- environment: A NoneType or str object specifying the active conda environment
YML file string. If a NoneType object is provided, then generate a YML file string for the active conda environment and save it to the full simulation record. If a non-empty str object is provided, then validate it against the active conda environment YML file string and save it to the full simulation record. This ensures that reproduction simulations use an identical conda environment from the original simulation. To bypass conda environment validation with a warning message, an empty string (‘’) may be provided (but does not ensure reproducibility). Default: None
- output_path: A str object specifying the full path of the output directory
(to be created if it doesn’t exist) where the output results will be saved to disk. Default: “./outputs”
- output_init_file: A str object specifying the output “.init” file path that caches
the ‘input_packed_pose’ keyword argument parameter upon PyRosettaCluster instantiation, and not including any output decoys, which is optionally used for exporting PyRosetta initialization files with output decoys by the pyrosetta.distributed.cluster.export_init_file() function after the simulation completes (see the ‘output_decoy_types’ keyword argument). If a NoneType object (or an empty str object (‘’)) is provided, or dry_run=True, then skip writing an output “.init” file upon PyRosettaCluster instantiation. If skipped, it is recommended to run pyrosetta.dump_init_file() before or after the simulation. If compressed=True, then the output file is further compressed by bzip2, and “.bz2” is appended to the filename. Default: output_path/`project_name`_`simulation_name`_pyrosetta.init
- output_decoy_types: An iterable of str objects representing the output decoy
filetypes to save during the simulation. Available options are: “.pdb” for PDB files; “.pkl_pose” for pickled Pose files; “.b64_pose” for base64-encoded pickled Pose files; and “.init” for PyRosetta initialization files, each caching the host node PyRosetta initialization options (and input files, if any), the ‘input_packed_pose’ keyword argument parameter (if any) and an output decoy. Because each “.init” file contains a copy of the PyRosetta initialization input files and input PackedPose object, unless these objects are relatively small in size or there are relatively few expected output decoys, then it is recommended to run pyrosetta.distributed.cluster.export_init_file() on only decoys of interest after the simulation completes without specifying “.init”. If compressed=True, then each decoy output file is further compressed by bzip2, and “.bz2” is appended to the filename. Default: [“.pdb”,]
- output_scorefile_types: An iterable of str objects representing the output scorefile
filetypes to save during the simulation. Available options are: “.json” for a JSON-encoded scorefile, and any filename extensions accepted by pandas.DataFrame().to_pickle(compression=”infer”) (including “.gz”, “.bz2”, and “.xz”) for pickled pandas.DataFrame objects of scorefile data that can later be analyzed using pyrosetta.distributed.cluster.io.secure_read_pickle(compression=”infer”). Note that in order to save pickled pandas.DataFrame objects, please ensure that pyrosetta.secure_unpickle.add_secure_package(“pandas”) has been first run. Default: [“.json”,]
- scorefile_name: A str object specifying the name of the output JSON-formatted
scorefile, which must end in “.json”. The scorefile location is always output_path/scorefile_name. If “.json” is not in the ‘output_scorefile_types’ keyword argument parameter, the JSON-formatted scorefile will not be output, but other scorefile types will get the same filename before the “.json” extension. Default: “scores.json”
- simulation_records_in_scorefile: A bool object specifying whether or not to
write full simulation records to the scorefile. If True, then write full simulation records to the scorefile. This results in some redundant information on each line, allowing downstream reproduction of a decoy from the scorefile, but a larger scorefile. If False, then write curtailed simulation records to the scorefile. This results in minimally redundant information on each line, disallowing downstream reproduction of a decoy from the scorefile, but a smaller scorefile. If False, also write the active conda environment to a YML file in ‘output_path’. Full simulation records are always written to the output ‘.pdb’ or ‘.pdb.bz2’ file(s), which can be used to reproduce any decoy without the scorefile. Default: False
- decoy_dir_name: A str object specifying the directory name where the
output decoys will be saved. The directory location is always output_path/decoy_dir_name. Default: “decoys”
- logs_dir_name: A str object specifying the directory name where the
output log files will be saved. The directory location is always output_path/logs_dir_name. Default: “logs”
- logging_level: A str object specifying the logging level of python tracer
output to write to the log file of either “NOTSET”, “DEBUG”, “INFO”, “WARNING”, “ERROR”, or “CRITICAL”. The output log file is always written to output_path/logs_dir_name/simulation_name.log on disk. Default: “INFO”
- logging_address: A str object specifying the socket endpoint for sending and receiving
log messages across a network, so log messages from user-provided PyRosetta protocols may be written to a single log file on the host node. The str object must take the format ‘host:port’ where ‘host’ is either an IP address, ‘localhost’, or Domain Name System (DNS)-accessible domain name, and the ‘port’ is a digit greater than or equal to 0. If the ‘port’ is ‘0’, then the next free port is selected. Default: ‘localhost:0’ if scheduler=None or either the client or clients
keyword argument parameters specify instances of dask.distributed.LocalCluster, otherwise ‘0.0.0.0:0’
- ignore_errors: A bool object specifying for PyRosettaCluster to ignore errors
raised in the user-provided PyRosetta protocols. This comes in handy when well-defined errors are sparse and sporadic (such as rare Segmentation Faults), and the user would like PyRosettaCluster to run without raising the errors. Default: False
- timeout: A float or int object specifying how many seconds to wait between
PyRosettaCluster checking-in on the running user-provided PyRosetta protocols. If each user-provided PyRosetta protocol is expected to run quickly, then 0.1 seconds seems reasonable. If each user-provided PyRosetta protocol is expected to run slowly, then >1 second seems reasonable. Default: 0.5
- max_delay_time: A float or int object specifying the maximum number of seconds to
sleep before returning the result(s) from each user-provided PyRosetta protocol back to the client. If a dask worker returns the result(s) from a user-provided PyRosetta protocol too quickly, the dask scheduler needs to first register that the task is processing before it completes. In practice, in each user-provided PyRosetta protocol the runtime is subtracted from max_delay_time, and the dask worker sleeps for the remainder of the time, if any, before returning the result(s). It’s recommended to set this option to at least 1 second, but longer times may be used as a safety throttle in cases of overwhelmed dask scheduler processes. Default: 3.0
- filter_results: A bool object specifying whether or not to filter out empty
PackedPose objects between user-provided PyRosetta protocols. When a protocol returns or yields NoneType, PyRosettaCluster converts it to an empty PackedPose object that gets passed to the next protocol. If True, then filter out any empty PackedPose objects where there are no residues in the conformation as given by Pose.empty(), otherwise if False then continue to pass empty PackedPose objects to the next protocol. This is used for filtering out decoys mid-trajectory through user-provided PyRosetta protocols if protocols return or yield any None, empty Pose, or empty PackedPose objects. Default: True
- save_all: A bool object specifying whether or not to save all of the returned
or yielded Pose and PackedPose objects from all user-provided PyRosetta protocols. This option may be used for checkpointing trajectories. To save arbitrary poses to disk, from within any user-provided PyRosetta protocol:
- `pose.dump_pdb(
os.path.join(kwargs[“PyRosettaCluster_output_path”], “checkpoint.pdb”))`
Default: False
- dry_run: A bool object specifying whether or not to save ‘.pdb’ files to
disk. If True, then do not write ‘.pdb’ or ‘.pdb.bz2’ files to disk. Default: False
- cooldown_time: A float or int object specifying how many seconds to sleep after the
simulation is complete to allow loggers to flush. For very slow network filesystems, 2.0 or more seconds may be reasonable. Default: 0.5
- norm_task_options: A bool object specifying whether or not to normalize the task
‘options’ and ‘extra_options’ values after PyRosetta initialization on the remote compute cluster. If True, then this enables more facile simulation reproduction by the use of the ProtocolSettingsMetric SimpleMetric to normalize the PyRosetta initialization options and by relativization of any input files and directory paths to the current working directory from which the task is running. Default: True
- author: An optional str object specifying the author(s) of the simulation that is
written to the full simulation records and the PyRosetta initialization ‘.init’ file. Default: “”
- email: An optional str object specifying the email address(es) of the author(s) of
the simulation that is written to the full simulation records and the PyRosetta initialization ‘.init’ file. Default: “”
- license: An optional str object specifying the license of the output data of the
simulation that is written to the full simulation records and the PyRosetta initialization ‘.init’ file (e.g., “ODC-ODbL”, “CC BY-ND”, “CDLA Permissive-2.0”, etc.). Default: “”
- Returns:
A PyRosettaCluster instance.
- class pyrosetta.distributed.cluster.core.PyRosettaCluster(*, tasks: Any = [{}], nstruct=1, input_packed_pose: Any = None, seeds: Optional[Any] = None, decoy_ids: Optional[Any] = None, client: Optional[Client] = None, clients: Optional[List[Client]] = None, scheduler: str = None, cores=1, processes=1, memory='4g', scratch_dir: Any = None, min_workers=1, max_workers=_Nothing.NOTHING, dashboard_address=':8787', project_name='2025.10.08.20.40.44.474398', simulation_name=_Nothing.NOTHING, output_path='/home/benchmark/rosetta/source/build/PyRosetta/Linux-5.4.0-84-generic-x86_64-with-glibc2.27/clang-6.0.0/python-3.11/minsizerel.serialization.thread/documentation/outputs', output_decoy_types: Any = None, output_scorefile_types: Any = None, scorefile_name='scores.json', simulation_records_in_scorefile=False, decoy_dir_name='decoys', logs_dir_name='logs', logging_level='INFO', logging_address: str = _Nothing.NOTHING, compressed=True, compression: Optional[Union[str, bool]] = True, sha1: Any = '', ignore_errors=False, timeout=0.5, max_delay_time=3.0, filter_results: Any = None, save_all=False, dry_run=False, norm_task_options: Any = None, cooldown_time=0.5, system_info: Any = None, pyrosetta_build: Any = None, environment: Any = None, author=None, email=None, license=None, output_init_file=_Nothing.NOTHING)¶
Bases:
IO
[G
],LoggingSupport
[G
],SchedulerManager
[G
],TaskBase
[G
]PyRosettaCluster is a class for reproducible, high-throughput job distribution of user-defined PyRosetta protocols efficiently parallelized on the user’s local computer, high-performance computing (HPC) cluster, or elastic cloud computing infrastructure with available compute resources.
- Args:
- tasks: A list of dict objects, a callable or called function returning
a list of dict objects, or a callable or called generator yielding a list of dict objects. Each dictionary object element of the list is accessible via kwargs in the user-defined PyRosetta protocols. In order to initialize PyRosetta with user-defined PyRosetta command line options at the start of each user-defined PyRosetta protocol, either extra_options and/or options must be a key of each dictionary object, where the value is a str, tuple, list, set, or dict of PyRosetta command line options. Default: [{}]
- input_packed_pose: Optional input PackedPose object that is accessible via
the first argument of the first user-defined PyRosetta protocol. Default: None
- seeds: A list of int objects specifying the random number generator seeds
to use for each user-defined PyRosetta protocol. The number of seeds provided must be equal to the number of user-defined input PyRosetta protocols. Seeds are used in the same order that the user-defined PyRosetta protocols are executed. Default: None
- decoy_ids: A list of int objects specifying the decoy numbers to keep after
executing user-defined PyRosetta protocols. User-provided PyRosetta protocols may return a list of Pose and/or PackedPose objects, or yield multiple Pose and/or PackedPose objects. To reproduce a particular decoy generated via the chain of user-provided PyRosetta protocols, the decoy number to keep for each protocol may be specified, where other decoys are discarded. Decoy numbers use zero-based indexing, so 0 is the first decoy generated from a particular PyRosetta protocol. The number of decoy_ids provided must be equal to the number of user-defined input PyRosetta protocols, so that one decoy is saved for each user-defined PyRosetta protocol. Decoy ids are applied in the same order that the user-defined PyRosetta protocols are executed. Default: None
- client: An initialized dask distributed.client.Client object to be used as
the dask client interface to the local or remote compute cluster. If None, then PyRosettaCluster initializes its own dask client based on the PyRosettaCluster(scheduler=…) class attribute. Deprecated by the PyRosettaCluster(clients=…) class attribute, but supported for legacy purposes. Either or both of the client or clients attribute parameters must be None. Default: None
- clients: A list or tuple object of initialized dask distributed.client.Client
objects to be used as the dask client interface(s) to the local or remote compute cluster(s). If None, then PyRosettaCluster initializes its own dask client based on the PyRosettaCluster(scheduler=…) class attribute. Optionally used in combination with the PyRosettaCluster().distribute(clients_indices=…) method. Either or both of the client or clients attribute parameters must be None. See the PyRosettaCluster().distribute() method docstring for usage examples. Default: None
- scheduler: A str of either “sge” or “slurm”, or None. If “sge”, then
PyRosettaCluster schedules jobs using SGECluster with dask-jobqueue. If “slurm”, then PyRosettaCluster schedules jobs using SLURMCluster with dask-jobqueue. If None, then PyRosettaCluster schedules jobs using LocalCluster with dask.distributed. If PyRosettaCluster(client=…) or PyRosettaCluster(clients=…) is provided, then PyRosettaCluster(scheduler=…) is ignored. Default: None
- cores: An int object specifying the total number of cores per job, which
is input to the dask_jobqueue.SLURMCluster(cores=…) argument or the dask_jobqueue.SGECluster(cores=…) argument. Default: 1
- processes: An int object specifying the total number of processes per job,
which is input to the dask_jobqueue.SLURMCluster(processes=…) argument or the dask_jobqueue.SGECluster(processes=…) argument. This cuts the job up into this many processes. Default: 1
- memory: A str object specifying the total amount of memory per job, which
is input to the dask_jobqueue.SLURMCluster(memory=…) argument or the dask_jobqueue.SGECluster(memory=…) argument. Default: “4g”
- scratch_dir: A str object specifying the path to a scratch directory where
dask litter may go. Default: “/temp” if it exists, otherwise the current working directory
- min_workers: An int object specifying the minimum number of workers to
which to adapt during parallelization of user-provided PyRosetta protocols. Default: 1
- max_workers: An int object specifying the maximum number of workers to
which to adapt during parallelization of user-provided PyRosetta protocols. Default: 1000 if the initial number of tasks is <1000, else use the
the initial number of tasks
- dashboard_address: A str object specifying the port over which the dask
dashboard is forwarded. Particularly useful for diagnosing PyRosettaCluster performance in real-time. Default=”:8787”
- nstruct: An int object specifying the number of repeats of the first
user-provided PyRosetta protocol. The user can control the number of repeats of subsequent user-provided PyRosetta protocols via returning multiple clones of the output pose(s) from a user-provided PyRosetta protocol run earlier, or cloning the input pose(s) multiple times in a user-provided PyRosetta protocol run later. Default: 1
- compressed: A bool object specifying whether or not to compress the output
“.pdb”, “.pkl_pose”, “.b64_pose”, and “.init” files with bzip2, resulting in appending “.bz2” to decoy output files and PyRosetta initialization files. Also see the ‘output_decoy_types’ and ‘output_init_file’ keyword arguments. Default: True
- compression: A str object of ‘xz’, ‘zlib’ or ‘bz2’, or a bool or NoneType
object representing the internal compression library for pickled PackedPose objects and user-defined PyRosetta protocol kwargs objects. The default of True uses ‘xz’ for serialization if it’s installed, otherwise uses ‘zlib’ for serialization. Default: True
- system_info: A dict or NoneType object specifying the system information
required to reproduce the simulation. If None is provided, then PyRosettaCluster automatically detects the platform and returns this attribute as a dictionary {‘sys.platform’: sys.platform} (for example, {‘sys.platform’: ‘linux’}). If a dict is provided, then validate that the ‘sys.platform’ key has a value equal to the current sys.platform, and log a warning message if not. Additional system information such as Amazon Machine Image (AMI) identifier and compute fleet instance type identifier may be stored in this dictionary, but is not validated. This information is stored in the simulation records for accounting. Default: None
- pyrosetta_build: A str or NoneType object specifying the PyRosetta build as
output by pyrosetta._version_string(). If None is provided, then PyRosettaCluster automatically detects the PyRosetta build and sets this attribute as the str. If a non-empty str is provided, then validate that the input PyRosetta build is equal to the active PyRosetta build, and raise an error if not. This ensures that reproduction simulations use an identical PyRosetta build from the original simulation. To bypass PyRosetta build validation with a warning message, an empty string (‘’) may be provided (but does not ensure reproducibility). Default: None
- sha1: A str or NoneType object specifying the git SHA1 hash string of the
particular git commit being simulated. If a non-empty str object is provided, then it is validated to match the SHA1 hash string of the current HEAD, and then it is added to the simulation record for accounting. If an empty string is provided, then ensure that everything in the working directory is committed to the repository. If None is provided, then bypass SHA1 hash string validation and set this attribute to an empty string. Default: “”
- project_name: A str object specifying the project name of this simulation.
This option just adds the user-provided project_name to the scorefile for accounting. Default: datetime.now().strftime(“%Y.%m.%d.%H.%M.%S.%f”) if not specified,
else “PyRosettaCluster” if None
- simulation_name: A str object specifying the name of this simulation.
This option just adds the user-provided simulation_name to the scorefile for accounting. Default: project_name if not specified, else “PyRosettaCluster” if None
- environment: A NoneType or str object specifying the active conda environment
YML file string. If a NoneType object is provided, then generate a YML file string for the active conda environment and save it to the full simulation record. If a non-empty str object is provided, then validate it against the active conda environment YML file string and save it to the full simulation record. This ensures that reproduction simulations use an identical conda environment from the original simulation. To bypass conda environment validation with a warning message, an empty string (‘’) may be provided (but does not ensure reproducibility). Default: None
- output_path: A str object specifying the full path of the output directory
(to be created if it doesn’t exist) where the output results will be saved to disk. Default: “./outputs”
- output_init_file: A str object specifying the output “.init” file path that caches
the ‘input_packed_pose’ keyword argument parameter upon PyRosettaCluster instantiation, and not including any output decoys, which is optionally used for exporting PyRosetta initialization files with output decoys by the pyrosetta.distributed.cluster.export_init_file() function after the simulation completes (see the ‘output_decoy_types’ keyword argument). If a NoneType object (or an empty str object (‘’)) is provided, or dry_run=True, then skip writing an output “.init” file upon PyRosettaCluster instantiation. If skipped, it is recommended to run pyrosetta.dump_init_file() before or after the simulation. If compressed=True, then the output file is further compressed by bzip2, and “.bz2” is appended to the filename. Default: output_path/`project_name`_`simulation_name`_pyrosetta.init
- output_decoy_types: An iterable of str objects representing the output decoy
filetypes to save during the simulation. Available options are: “.pdb” for PDB files; “.pkl_pose” for pickled Pose files; “.b64_pose” for base64-encoded pickled Pose files; and “.init” for PyRosetta initialization files, each caching the host node PyRosetta initialization options (and input files, if any), the ‘input_packed_pose’ keyword argument parameter (if any) and an output decoy. Because each “.init” file contains a copy of the PyRosetta initialization input files and input PackedPose object, unless these objects are relatively small in size or there are relatively few expected output decoys, then it is recommended to run pyrosetta.distributed.cluster.export_init_file() on only decoys of interest after the simulation completes without specifying “.init”. If compressed=True, then each decoy output file is further compressed by bzip2, and “.bz2” is appended to the filename. Default: [“.pdb”,]
- output_scorefile_types: An iterable of str objects representing the output scorefile
filetypes to save during the simulation. Available options are: “.json” for a JSON-encoded scorefile, and any filename extensions accepted by pandas.DataFrame().to_pickle(compression=”infer”) (including “.gz”, “.bz2”, and “.xz”) for pickled pandas.DataFrame objects of scorefile data that can later be analyzed using pyrosetta.distributed.cluster.io.secure_read_pickle(compression=”infer”). Note that in order to save pickled pandas.DataFrame objects, please ensure that pyrosetta.secure_unpickle.add_secure_package(“pandas”) has been first run. Default: [“.json”,]
- scorefile_name: A str object specifying the name of the output JSON-formatted
scorefile, which must end in “.json”. The scorefile location is always output_path/scorefile_name. If “.json” is not in the ‘output_scorefile_types’ keyword argument parameter, the JSON-formatted scorefile will not be output, but other scorefile types will get the same filename before the “.json” extension. Default: “scores.json”
- simulation_records_in_scorefile: A bool object specifying whether or not to
write full simulation records to the scorefile. If True, then write full simulation records to the scorefile. This results in some redundant information on each line, allowing downstream reproduction of a decoy from the scorefile, but a larger scorefile. If False, then write curtailed simulation records to the scorefile. This results in minimally redundant information on each line, disallowing downstream reproduction of a decoy from the scorefile, but a smaller scorefile. If False, also write the active conda environment to a YML file in ‘output_path’. Full simulation records are always written to the output ‘.pdb’ or ‘.pdb.bz2’ file(s), which can be used to reproduce any decoy without the scorefile. Default: False
- decoy_dir_name: A str object specifying the directory name where the
output decoys will be saved. The directory location is always output_path/decoy_dir_name. Default: “decoys”
- logs_dir_name: A str object specifying the directory name where the
output log files will be saved. The directory location is always output_path/logs_dir_name. Default: “logs”
- logging_level: A str object specifying the logging level of python tracer
output to write to the log file of either “NOTSET”, “DEBUG”, “INFO”, “WARNING”, “ERROR”, or “CRITICAL”. The output log file is always written to output_path/logs_dir_name/simulation_name.log on disk. Default: “INFO”
- logging_address: A str object specifying the socket endpoint for sending and receiving
log messages across a network, so log messages from user-provided PyRosetta protocols may be written to a single log file on the host node. The str object must take the format ‘host:port’ where ‘host’ is either an IP address, ‘localhost’, or Domain Name System (DNS)-accessible domain name, and the ‘port’ is a digit greater than or equal to 0. If the ‘port’ is ‘0’, then the next free port is selected. Default: ‘localhost:0’ if scheduler=None or either the client or clients
keyword argument parameters specify instances of dask.distributed.LocalCluster, otherwise ‘0.0.0.0:0’
- ignore_errors: A bool object specifying for PyRosettaCluster to ignore errors
raised in the user-provided PyRosetta protocols. This comes in handy when well-defined errors are sparse and sporadic (such as rare Segmentation Faults), and the user would like PyRosettaCluster to run without raising the errors. Default: False
- timeout: A float or int object specifying how many seconds to wait between
PyRosettaCluster checking-in on the running user-provided PyRosetta protocols. If each user-provided PyRosetta protocol is expected to run quickly, then 0.1 seconds seems reasonable. If each user-provided PyRosetta protocol is expected to run slowly, then >1 second seems reasonable. Default: 0.5
- max_delay_time: A float or int object specifying the maximum number of seconds to
sleep before returning the result(s) from each user-provided PyRosetta protocol back to the client. If a dask worker returns the result(s) from a user-provided PyRosetta protocol too quickly, the dask scheduler needs to first register that the task is processing before it completes. In practice, in each user-provided PyRosetta protocol the runtime is subtracted from max_delay_time, and the dask worker sleeps for the remainder of the time, if any, before returning the result(s). It’s recommended to set this option to at least 1 second, but longer times may be used as a safety throttle in cases of overwhelmed dask scheduler processes. Default: 3.0
- filter_results: A bool object specifying whether or not to filter out empty
PackedPose objects between user-provided PyRosetta protocols. When a protocol returns or yields NoneType, PyRosettaCluster converts it to an empty PackedPose object that gets passed to the next protocol. If True, then filter out any empty PackedPose objects where there are no residues in the conformation as given by Pose.empty(), otherwise if False then continue to pass empty PackedPose objects to the next protocol. This is used for filtering out decoys mid-trajectory through user-provided PyRosetta protocols if protocols return or yield any None, empty Pose, or empty PackedPose objects. Default: True
- save_all: A bool object specifying whether or not to save all of the returned
or yielded Pose and PackedPose objects from all user-provided PyRosetta protocols. This option may be used for checkpointing trajectories. To save arbitrary poses to disk, from within any user-provided PyRosetta protocol:
- `pose.dump_pdb(
os.path.join(kwargs[“PyRosettaCluster_output_path”], “checkpoint.pdb”))`
Default: False
- dry_run: A bool object specifying whether or not to save ‘.pdb’ files to
disk. If True, then do not write ‘.pdb’ or ‘.pdb.bz2’ files to disk. Default: False
- cooldown_time: A float or int object specifying how many seconds to sleep after the
simulation is complete to allow loggers to flush. For very slow network filesystems, 2.0 or more seconds may be reasonable. Default: 0.5
- norm_task_options: A bool object specifying whether or not to normalize the task
‘options’ and ‘extra_options’ values after PyRosetta initialization on the remote compute cluster. If True, then this enables more facile simulation reproduction by the use of the ProtocolSettingsMetric SimpleMetric to normalize the PyRosetta initialization options and by relativization of any input files and directory paths to the current working directory from which the task is running. Default: True
- author: An optional str object specifying the author(s) of the simulation that is
written to the full simulation records and the PyRosetta initialization ‘.init’ file. Default: “”
- email: An optional str object specifying the email address(es) of the author(s) of
the simulation that is written to the full simulation records and the PyRosetta initialization ‘.init’ file. Default: “”
- license: An optional str object specifying the license of the output data of the
simulation that is written to the full simulation records and the PyRosetta initialization ‘.init’ file (e.g., “ODC-ODbL”, “CC BY-ND”, “CDLA Permissive-2.0”, etc.). Default: “”
- Returns:
A PyRosettaCluster instance.
- tasks¶
- nstruct¶
- tasks_size¶
- input_packed_pose¶
- seeds¶
- decoy_ids¶
- client¶
- clients¶
- scheduler¶
- cores¶
- processes¶
- memory¶
- scratch_dir¶
- adapt_threshold¶
- min_workers¶
- max_workers¶
- dashboard_address¶
- project_name¶
- simulation_name¶
- output_path¶
- output_decoy_types¶
- output_scorefile_types¶
- scorefile_name¶
- scorefile_path¶
- simulation_records_in_scorefile¶
- decoy_dir_name¶
- decoy_path¶
- logs_dir_name¶
- logs_path¶
- logging_level¶
- logging_file¶
- logging_address¶
- compressed¶
- compression¶
- sha1¶
- ignore_errors¶
- timeout¶
- max_delay_time¶
- filter_results¶
- save_all¶
- dry_run¶
- norm_task_options¶
- yield_results¶
- cooldown_time¶
- protocols_key¶
- system_info¶
- pyrosetta_build¶
- environment¶
- author¶
- email¶
- license¶
- output_init_file¶
- environment_file¶
- pyrosetta_init_args¶
- _create_future(client: Client, protocol_name: str, compressed_protocol: bytes, compressed_packed_pose: bytes, compressed_kwargs: bytes, pyrosetta_init_kwargs: Dict[str, Any], extra_args: Dict[str, Any], passkey: bytes, resource: Optional[Dict[Any, Any]]) Future ¶
Scatter data and return submitted ‘user_spawn_thread’ future.
- _run(*args: Any, protocols: Any = None, clients_indices: Any = None, resources: Any = None) Union[NoReturn, Generator[Tuple[PackedPose, Dict[Any, Any]], None, None]] ¶
Run user-provided PyRosetta protocols on a local or remote compute cluster using the user-customized PyRosettaCluster instance. Either arguments or the ‘protocols’ keyword argument is required. If both are provided, then the ‘protocols’ keyword argument gets concatenated after the input arguments.
- Examples:
PyRosettaCluster().distribute(protocol_1) PyRosettaCluster().distribute(protocols=protocol_1) PyRosettaCluster().distribute(protocol_1, protocol_2, protocol_3) PyRosettaCluster().distribute(protocols=(protocol_1, protocol_2, protocol_3)) PyRosettaCluster().distribute(protocol_1, protocol_2, protocols=[protocol_3, protocol_4])
# Run protocol_1 on client_1, # then protocol_2 on client_2, # then protocol_3 on client_1, # then protocol_4 on client_2: PyRosettaCluster(clients=[client_1, client_2]).distribute(
protocols=[protocol_1, protocol_2, protocol_3, protocol_4], clients_indices=[0, 1, 0, 1],
)
# Run protocol_1 on client_2, # then protocol_2 on client_3, # then protocol_3 on client_1: PyRosettaCluster(clients=[client_1, client_2, client_3]).distribute(
protocols=[protocol_1, protocol_2, protocol_3], clients_indices=[1, 2, 0],
)
# Run protocol_1 on client_1 with dask worker resource constraints “GPU=2”, # then protocol_2 on client_1 with dask worker resource constraints “MEMORY=100e9”, # then protocol_3 on client_1 without dask worker resource constraints: PyRosettaCluster(client=client_1).distribute(
protocols=[protocol_1, protocol_2, protocol_3], resources=[{“GPU”: 2}, {“MEMORY”: 100e9}, None],
)
# Run protocol_1 on client_1 with dask worker resource constraints “GPU=2”, # then protocol_2 on client_2 with dask worker resource constraints “MEMORY=100e9”: PyRosettaCluster(clients=[client_1, client_2]).distribute(
protocols=[protocol_1, protocol_2], clients_indices=[0, 1], resources=[{“GPU”: 2}, {“MEMORY”: 100e9}],
)
- Args:
- *args: Optional instances of type types.GeneratorType or types.FunctionType,
in the order of protocols to be executed.
- protocols: An optional iterable of extra callable PyRosetta protocols,
i.e. an iterable of objects of types.GeneratorType and/or types.FunctionType types; or a single instance of type types.GeneratorType or types.FunctionType. Default: None
- clients_indices: An optional list or tuple object of int objects, where each int object represents
a zero-based index corresponding to the initialized dask distributed.client.Client object(s) passed to the PyRosettaCluster(clients=…) class attribute. If not None, then the length of the clients_indices object must equal the number of protocols passed to the PyRosettaCluster().distribute method. Default: None
- resources: An optional list or tuple object of dict objects, where each dict object represents
an abstract, arbitrary resource to constrain which dask workers run the user-defined PyRosetta protocols. If None, then do not impose resource constaints on any protocols. If not None, then the length of the resources object must equal the number of protocols passed to the PyRosettaCluster().distribute method, such that each resource specified indicates the unique resource constraints for the protocol at the corresponding index of the protocols passed to PyRosettaCluster().distribute. Note that this feature is only useful when one passes in their own instantiated client(s) with dask workers set up with various resource constraints. If dask workers were not instantiated to satisfy the specified resource constraints, protocols will hang indefinitely because the dask scheduler is waiting for workers that meet the specified resource constraints so that it can schedule these protocols. Unless workers were created with these resource tags applied, the protocols will not run. See https://distributed.dask.org/en/stable/resources.html for more information. Default: None
- generate(*args: Any, protocols: Any = None, clients_indices: Any = None, resources: Any = None) Union[NoReturn, Generator[Tuple[PackedPose, Dict[Any, Any]], None, None]] ¶
Run user-provided PyRosetta protocols on a local or remote compute cluster using the user-customized PyRosettaCluster instance. Either arguments or the ‘protocols’ keyword argument is required. If both are provided, then the ‘protocols’ keyword argument gets concatenated after the input arguments.
- Examples:
PyRosettaCluster().distribute(protocol_1) PyRosettaCluster().distribute(protocols=protocol_1) PyRosettaCluster().distribute(protocol_1, protocol_2, protocol_3) PyRosettaCluster().distribute(protocols=(protocol_1, protocol_2, protocol_3)) PyRosettaCluster().distribute(protocol_1, protocol_2, protocols=[protocol_3, protocol_4])
# Run protocol_1 on client_1, # then protocol_2 on client_2, # then protocol_3 on client_1, # then protocol_4 on client_2: PyRosettaCluster(clients=[client_1, client_2]).distribute(
protocols=[protocol_1, protocol_2, protocol_3, protocol_4], clients_indices=[0, 1, 0, 1],
)
# Run protocol_1 on client_2, # then protocol_2 on client_3, # then protocol_3 on client_1: PyRosettaCluster(clients=[client_1, client_2, client_3]).distribute(
protocols=[protocol_1, protocol_2, protocol_3], clients_indices=[1, 2, 0],
)
# Run protocol_1 on client_1 with dask worker resource constraints “GPU=2”, # then protocol_2 on client_1 with dask worker resource constraints “MEMORY=100e9”, # then protocol_3 on client_1 without dask worker resource constraints: PyRosettaCluster(client=client_1).distribute(
protocols=[protocol_1, protocol_2, protocol_3], resources=[{“GPU”: 2}, {“MEMORY”: 100e9}, None],
)
# Run protocol_1 on client_1 with dask worker resource constraints “GPU=2”, # then protocol_2 on client_2 with dask worker resource constraints “MEMORY=100e9”: PyRosettaCluster(clients=[client_1, client_2]).distribute(
protocols=[protocol_1, protocol_2], clients_indices=[0, 1], resources=[{“GPU”: 2}, {“MEMORY”: 100e9}],
)
- Args:
- *args: Optional instances of type types.GeneratorType or types.FunctionType,
in the order of protocols to be executed.
- protocols: An optional iterable of extra callable PyRosetta protocols,
i.e. an iterable of objects of types.GeneratorType and/or types.FunctionType types; or a single instance of type types.GeneratorType or types.FunctionType. Default: None
- clients_indices: An optional list or tuple object of int objects, where each int object represents
a zero-based index corresponding to the initialized dask distributed.client.Client object(s) passed to the PyRosettaCluster(clients=…) class attribute. If not None, then the length of the clients_indices object must equal the number of protocols passed to the PyRosettaCluster().distribute method. Default: None
- resources: An optional list or tuple object of dict objects, where each dict object represents
an abstract, arbitrary resource to constrain which dask workers run the user-defined PyRosetta protocols. If None, then do not impose resource constaints on any protocols. If not None, then the length of the resources object must equal the number of protocols passed to the PyRosettaCluster().distribute method, such that each resource specified indicates the unique resource constraints for the protocol at the corresponding index of the protocols passed to PyRosettaCluster().distribute. Note that this feature is only useful when one passes in their own instantiated client(s) with dask workers set up with various resource constraints. If dask workers were not instantiated to satisfy the specified resource constraints, protocols will hang indefinitely because the dask scheduler is waiting for workers that meet the specified resource constraints so that it can schedule these protocols. Unless workers were created with these resource tags applied, the protocols will not run. See https://distributed.dask.org/en/stable/resources.html for more information. Default: None
Extra information:
The PyRosettaCluster.generate method may be used for developing PyRosetta protocols on a local or remote compute cluster and optionally post-processing or visualizing output PackedPose objects in memory. Importantly, subsequent code run on the yielded results is not captured by PyRosettaCluster, and so use of this method does not ensure reproducibility of the simulation. Use the PyRosettaCluster.distribute method for reproducible simulations.
Each yielded result is a tuple object with a PackedPose object as the first element and a dict object as the second element. The PackedPose object represents a returned or yielded PackedPose (or Pose or NoneType) object from the most recently run user-provided PyRosetta protocol. The dict object represents the optionally returned or yielded user-defined PyRosetta protocol kwargs dictionary object from the same most recently run user-provided PyRosetta protocol (see ‘protocols’ argument). If PyRosettaCluster(save_all=True), results are yielded after each user-provided PyRosetta protocol, otherwise results are yielded after the final user-defined PyRosetta protocol. Results are yielded in the order in which they arrive back to the client(s) from the distributed cluster (which may differ from the order that tasks are submitted, due to tasks running asynchronously). If PyRosettaCluster(dry_run=True), results are still yielded but ‘.pdb’ or ‘.pdb.bz2’ files are not saved to disk. See https://docs.dask.org/en/latest/futures.html#distributed.as_completed for more information.
Extra examples:
# Iterate over results in real-time as they are yielded from the cluster: for packed_pose, kwargs in PyRosettaCluster().generate(protocols):
…
# Iterate over submissions to the same client: client = Client() for packed_pose, kwargs in PyRosettaCluster(client=client).generate(protocols):
# Post-process results on host node asynchronously from results generation prc = PyRosettaCluster(
input_packed_pose=packed_pose, client=client, logs_dir_name=f”logs_{uuid.uuid4().hex}”, # Make sure to write new log files
) for packed_pose, kwargs in prc.generate(other_protocols):
…
# Iterate over multiple clients: client_1 = Client() client_2 = Client() for packed_pose, kwargs in PyRosettaCluster(client=client_1).generate(protocols):
# Post-process results on host node asynchronously from results generation prc = PyRosettaCluster(
input_packed_pose=packed_pose, client=client_2, logs_dir_name=f”logs_{uuid.uuid4().hex}”, # Make sure to write new log files
) for packed_pose, kwargs in prc.generate(other_protocols):
…
# Using multiple dask.distributed.as_completed iterators on the host node creates additional overhead. # If post-processing on the host node is not required between user-provided PyRosetta protocols, # the preferred method is to distribute protocols within a single PyRosettaCluster().generate() # method call using the clients_indices keyword argument: prc_generate = PyRosettaCluster(clients=[client_1, client_2]).generate(
protocols=[protocol_1, protocol_2], clients_indices=[0, 1], resources=[{“GPU”: 1}, {“CPU”: 1}],
) for packed_pose, kwargs in prc_generate:
# Post-process results on host node asynchronously from results generation
- Yields:
(PackedPose, dict) tuples from the most recently run user-provided PyRosetta protocol if PyRosettaCluster(save_all=True) otherwise from the final user-defined PyRosetta protocol.
- distribute(*args: Any, protocols: Any = None, clients_indices: Any = None, resources: Any = None) Optional[NoReturn] ¶
Run user-provided PyRosetta protocols on a local or remote compute cluster using the user-customized PyRosettaCluster instance. Either arguments or the ‘protocols’ keyword argument is required. If both are provided, then the ‘protocols’ keyword argument gets concatenated after the input arguments.
- Examples:
PyRosettaCluster().distribute(protocol_1) PyRosettaCluster().distribute(protocols=protocol_1) PyRosettaCluster().distribute(protocol_1, protocol_2, protocol_3) PyRosettaCluster().distribute(protocols=(protocol_1, protocol_2, protocol_3)) PyRosettaCluster().distribute(protocol_1, protocol_2, protocols=[protocol_3, protocol_4])
# Run protocol_1 on client_1, # then protocol_2 on client_2, # then protocol_3 on client_1, # then protocol_4 on client_2: PyRosettaCluster(clients=[client_1, client_2]).distribute(
protocols=[protocol_1, protocol_2, protocol_3, protocol_4], clients_indices=[0, 1, 0, 1],
)
# Run protocol_1 on client_2, # then protocol_2 on client_3, # then protocol_3 on client_1: PyRosettaCluster(clients=[client_1, client_2, client_3]).distribute(
protocols=[protocol_1, protocol_2, protocol_3], clients_indices=[1, 2, 0],
)
# Run protocol_1 on client_1 with dask worker resource constraints “GPU=2”, # then protocol_2 on client_1 with dask worker resource constraints “MEMORY=100e9”, # then protocol_3 on client_1 without dask worker resource constraints: PyRosettaCluster(client=client_1).distribute(
protocols=[protocol_1, protocol_2, protocol_3], resources=[{“GPU”: 2}, {“MEMORY”: 100e9}, None],
)
# Run protocol_1 on client_1 with dask worker resource constraints “GPU=2”, # then protocol_2 on client_2 with dask worker resource constraints “MEMORY=100e9”: PyRosettaCluster(clients=[client_1, client_2]).distribute(
protocols=[protocol_1, protocol_2], clients_indices=[0, 1], resources=[{“GPU”: 2}, {“MEMORY”: 100e9}],
)
- Args:
- *args: Optional instances of type types.GeneratorType or types.FunctionType,
in the order of protocols to be executed.
- protocols: An optional iterable of extra callable PyRosetta protocols,
i.e. an iterable of objects of types.GeneratorType and/or types.FunctionType types; or a single instance of type types.GeneratorType or types.FunctionType. Default: None
- clients_indices: An optional list or tuple object of int objects, where each int object represents
a zero-based index corresponding to the initialized dask distributed.client.Client object(s) passed to the PyRosettaCluster(clients=…) class attribute. If not None, then the length of the clients_indices object must equal the number of protocols passed to the PyRosettaCluster().distribute method. Default: None
- resources: An optional list or tuple object of dict objects, where each dict object represents
an abstract, arbitrary resource to constrain which dask workers run the user-defined PyRosetta protocols. If None, then do not impose resource constaints on any protocols. If not None, then the length of the resources object must equal the number of protocols passed to the PyRosettaCluster().distribute method, such that each resource specified indicates the unique resource constraints for the protocol at the corresponding index of the protocols passed to PyRosettaCluster().distribute. Note that this feature is only useful when one passes in their own instantiated client(s) with dask workers set up with various resource constraints. If dask workers were not instantiated to satisfy the specified resource constraints, protocols will hang indefinitely because the dask scheduler is waiting for workers that meet the specified resource constraints so that it can schedule these protocols. Unless workers were created with these resource tags applied, the protocols will not run. See https://distributed.dask.org/en/stable/resources.html for more information. Default: None
- Returns:
None
- DATETIME_FORMAT: str = '%Y-%m-%d %H:%M:%S.%f'¶
- REMARK_FORMAT: str = 'REMARK PyRosettaCluster: '¶
- __init__(*, tasks: Any = [{}], nstruct=1, input_packed_pose: Any = None, seeds: Optional[Any] = None, decoy_ids: Optional[Any] = None, client: Optional[Client] = None, clients: Optional[List[Client]] = None, scheduler: str = None, cores=1, processes=1, memory='4g', scratch_dir: Any = None, min_workers=1, max_workers=_Nothing.NOTHING, dashboard_address=':8787', project_name='2025.10.08.20.40.44.474398', simulation_name=_Nothing.NOTHING, output_path='/home/benchmark/rosetta/source/build/PyRosetta/Linux-5.4.0-84-generic-x86_64-with-glibc2.27/clang-6.0.0/python-3.11/minsizerel.serialization.thread/documentation/outputs', output_decoy_types: Any = None, output_scorefile_types: Any = None, scorefile_name='scores.json', simulation_records_in_scorefile=False, decoy_dir_name='decoys', logs_dir_name='logs', logging_level='INFO', logging_address: str = _Nothing.NOTHING, compressed=True, compression: Optional[Union[str, bool]] = True, sha1: Any = '', ignore_errors=False, timeout=0.5, max_delay_time=3.0, filter_results: Any = None, save_all=False, dry_run=False, norm_task_options: Any = None, cooldown_time=0.5, system_info: Any = None, pyrosetta_build: Any = None, environment: Any = None, author=None, email=None, license=None, output_init_file=_Nothing.NOTHING) None ¶
Method generated by attrs for class PyRosettaCluster.
- static _add_pose_comment(packed_pose: PackedPose, pdbfile_data: str) PackedPose ¶
Cache simulation data as a pose comment.
- _close_socket_logger_plugins(clients: Dict[int, Client]) None ¶
Purge cached logging socket addresses on all dask workers.
- _dump_init_file(filename: str, input_packed_pose: Optional[PackedPose] = None, output_packed_pose: Optional[PackedPose] = None, verbose: bool = True) None ¶
Dump compressed PyRosetta initialization input files and poses to the input filename.
- static _dump_json(data: Dict[str, Any]) str ¶
Return JSON-serialized data.
- static _filter_scores_dict(scores_dict: Dict[Any, Any]) Dict[Any, Any] ¶
- _format_result(result: Union[Pose, PackedPose]) Tuple[str, Dict[Any, Any], PackedPose] ¶
Given a Pose or PackedPose object, return a tuple containing the pdb string and a scores dictionary.
- _get_clients_index(clients_indices: List[int], protocols: List[Callable[[...], Any]]) int ¶
Return the clients index for the current protocol.
- _get_cluster() ClusterType ¶
Given user input arguments, return the requested cluster instance.
- _get_init_file_json(packed_pose: PackedPose) str ¶
Return a PyRosetta initialization file as a JSON-serialized string.
- _get_instance_and_metadata(kwargs: Dict[Any, Any]) Tuple[Dict[Any, Any], Dict[Any, Any]] ¶
Get the current state of the PyRosettaCluster instance, and split the kwargs into the PyRosettaCluster instance kwargs and ancillary metadata.
- _get_output_dir(decoy_dir: str) str ¶
Get the output directory in which to write files to disk.
- _get_resource(resources: List[Dict[Any, Any]], protocols: List[Callable[[...], Any]]) Optional[Dict[Any, Any]] ¶
Return the resource for the current protocol.
- _get_seed(protocols: Sized) Optional[str] ¶
Get the seed for the input user-provided PyRosetta protocol.
- _get_task_state(protocols: List[Callable[[...], Any]]) Tuple[List[Callable[[...], Any]], Callable[[...], Any], Optional[str]] ¶
Given the current state of protocols, returns a tuple of the updated state of protocols and current protocol and seed.
- _is_protocol = False¶
- _maybe_teardown(clients: Dict[int, ClientType], cluster: Optional[ClusterType]) None ¶
Teardown client and cluster.
- _parse_results(results: Optional[Union[Iterable[Optional[Union[Pose, PackedPose, bytes]]], Pose, PackedPose]]) Union[List[Tuple[str, Dict[Any, Any]]], NoReturn] ¶
Format output results on distributed worker. Input argument results can be a Pose, PackedPose, or None object, or a list or tuple of Pose and/or PackedPose objects, or an empty list or tuple. Returns a list of tuples, each tuple containing the pdb string and a scores dictionary.
- _process_kwargs(kwargs: Dict[Any, Any]) Dict[Any, Any] ¶
Remove seed specification from ‘extra_options’ or ‘options’, and remove protocols_key from kwargs.
- _register_socket_logger_plugin(clients: Dict[int, Client]) None ¶
Register SocketLoggerPlugin as a dask worker plugin on dask clients.
- _setup_clients_cluster_adaptive() Tuple[Dict[int, ClientType], Optional[ClusterType], Optional[AdaptiveType]] ¶
Given user input arguments, return the requested client, cluster, and adaptive instance.
- _setup_clients_dict() Union[Dict[int, ClientType], NoReturn] ¶
- _setup_initial_kwargs(protocols: List[Callable[[...], Any]], seed: Optional[str], task: Dict[Any, Any]) Tuple[bytes, Dict[str, Any]] ¶
Setup the kwargs for the initial task.
- _setup_kwargs(kwargs: Dict[Any, Any], clients_indices: List[int], resources: Optional[Dict[Any, Any]]) Tuple[bytes, Dict[str, Any], Callable[[...], Any], int, Optional[Dict[Any, Any]]] ¶
Setup the kwargs for the subsequent tasks.
- _setup_protocols_protocol_seed(args: Tuple[Any, ...], protocols: Any, clients_indices: Any, resources: Any) Tuple[List[Callable[[...], Any]], Callable[[...], Any], Optional[str], int, Optional[Dict[Any, Any]]] ¶
Parse, validate, and setup the user-provided PyRosetta protocol(s).
- _setup_pyrosetta_init_kwargs(kwargs: Dict[Any, Any]) Dict[str, Any] ¶
- _setup_seed(kwargs: Dict[Any, Any], seed: Optional[str]) Dict[Any, Any] ¶
Setup the ‘options’ or ‘extra_options’ task kwargs with the -run:jran PyRosetta command line flag.
- _setup_socket_listener(clients: Dict[int, Client]) Tuple[Tuple[str, int], bytes] ¶
Setup logging socket listener.