security¶
- class pyrosetta.distributed.cluster.security.SecurityIO¶
Bases:
objectSecurity methods for PyRosettaCluster.
- _register_task_security_plugin(clients: Dict[int, Client], prk: MaskedBytes) None¶
Register TaskSecurityPlugin as a Dask worker plugin on Dask clients.
- _clients_dict_has_security() bool¶
Test if the clients_dict attribute has security enabled on all Dask clients, excluding Dask clients with LocalCluster clusters.
- _setup_with_nonce() bool¶
Post-init hook to setup the PyRosettaCluster.with_nonce instance attribute.
- pyrosetta.distributed.cluster.security.generate_dask_tls_security(output_dir: str = '.', common_name: str = 'dask_tls_security', days: int = 365, openssl_bin: str = 'openssl', overwrite: bool = False, san_dns: Optional[Iterable[str]] = None, san_ip: Optional[Iterable[str]] = None, cleanup: bool = True) Security¶
Create cryptographic certificates and private keys for securing a Dask cluster, and return a Dask distributed.Security object that can be passed directly to the security keyword argument of PyRosettaCluster. See https://distributed.dask.org/en/latest/tls.html for more information.
This function uses the openssl command-line tool to generate the following:
A “certificate authority” certificate and key that act as a trusted parent identity used to sign other certificates:
ca.pem (the certificate)
ca.key (the private key)
- A “leaf” certificate and key that represent the actual Dask processes (scheduler, workers, and client):
tls.crt (the certificate)
tls.key (the private key)
By default, the leaf certificate is signed by the certificate authority, meaning that any process configured with this authority will trust the leaf certificate as valid. All generated files are placed in the output_dir keyword argument value, which defaults to the current working directory.
- Example:
Generate a new set of certificates and a configured Dask distributed.Security object:
>>> security = generate_dask_tls_security( ... output_dir="./dask_certs", ... common_name="my-cluster", ... san_dns=["localhost", "my-host.local"], ... san_ip=["127.0.0.1"], ... cleanup=False, ... )
After running this function, the directory ./dask_certs will contain:
ca.pem: certificate authority certificate (used by Dask)
ca.key: certificate authority private key
tls.crt: leaf certificate (used by Dask)
tls.key: leaf private key (used by Dask)
index.txt, serial, and ca.cnf: bookkeeping files used by OpenSSL (with cleanup=False)
Then, simply use the configured Dask distributed.Security object with PyRosettaCluster:
>>> PyRosettaCluster(security=security, ...).distribute(...)
- Additional Notes:
A “certificate authority” (CA) act as a trusted parent identity that confirms whether a certificate is real. In this function, the user generates their own local CA for the Dask cluster.
A “leaf certificate” is the actual identity used by a running process (i.e., the scheduler, a worker, or a client).
“Subject Alternative Names” (SANs) are extra hostnames or IP addresses for which the certificate is valid. This enables the user to connect using either a machine name or an IP address without validation errors.
File permissions are automatically set for private keys using chmod 600 so they are restricted to the owner (read/write only) for basic security.
This function generates all necessary files in a single directory. For proper TLS verification in a distributed Dask setup, the CA certificate must be accessible from all nodes (i.e., the scheduler, workers, and client). Leaf certificates and keys must be accessible by the process using them. For example, all files can be placed in a common directory from which all processes can read, or the directory can be mounted (e.g., if using Docker, Apptainer, or other container applications).
If cleanup=False and the same directory is used for multiple function calls, then OpenSSL may create additional files in the output directory (e.g., *.pem, index.txt.attr, index.txt.old, and serial.old). These are bookkeeping files used internally by OpenSSL and are not required by Dask, so they can be safely deleted after the leaf certificate has been issued.
- Args:
- output_dir: str
A str object representing the directory where all certificate and key files will be written. The directory will be created if it does not exist. All generated files (CA certificate, leaf certificate, leaf private key, and optional bookkeeping files) are output to this single directory. Therefore, for a distributed Dask setup, this directory must be readable by the scheduler, workers, and client processes, either via a shared filesystem or via copying and mounting (e.g., if using Docker, Apptainer, or other container applications).
Default: “.”
- common_name: str
A str object representing the “Common Name” placed inside the leaf certificate. This is a human-readable identifier that typically names the system or service to which the certificate belongs.
Default: “dask_tls_security”
- days: int
An int object representing the number of days the certificates will be valid before expiring.
Default: 365
- openssl_bin: str
A str object representing the path or name of the openssl executable. If the OpenSSL executable is not in the system “PATH” environment variable, then the full path must be provided.
Default: “openssl”
- overwrite: bool
A bool object specifying whether or not to overwrite existing files in ‘output_dir’ keyword argument value. If True is provided, the same filenames will be deleted and replaced with newly generated ones. If False is provided, then existing files are re-used.
Default: False
- san_dns: Iterable[str] | None
An optional iterable of str object representing a list of hostnames (e.g., [“localhost”, “cluster.example.com”]) that should be accepted when verifying the certificate. These are included in an extension field called “Subject Alternative Names”.
Default: None
- san_ip: Iterable[str] | None
An optional iterable of str object representing a list of IP addresses (e.g., [“127.0.0.1”, “111.111.111.1”]) that should be accepted when verifying the certificate. These are also included in the “Subject Alternative Names” field.
Default: None
- cleanup: bool
An optional bool object specifying whether or not to delete the index.txt and serial bookkeeping files used by OpenSSL.
Default: True
- Returns:
A distributed.Security instance configured to require encryption (i.e., with the require_encryption keyword argument value set to True) and configured to use the generated certificates and private keys for the Dask scheduler, workers, and client.