SSHCluster(hosts: 'list[str] | None' = None, connect_options: 'dict | list[dict]' = {}, worker_options: 'dict' = {}, scheduler_options: 'dict' = {}, worker_module: 'str' = 'deprecated', worker_class: 'str' = 'distributed.Nanny', remote_python: 'str | list[str] | None' = None, **kwargs)
The SSHCluster function deploys a Dask Scheduler and Workers for you on a set of machine addresses that you provide. The first address will be used for the scheduler while the rest will be used for the workers (feel free to repeat the first hostname if you want to have the scheduler and worker co-habitate one machine.)
You may configure the scheduler and workers by passing scheduler_options
and worker_options
dictionary keywords. See the dask.distributed.Scheduler
and dask.distributed.Worker
classes for details on the available options, but the defaults should work in most situations.
You may configure your use of SSH itself using the connect_options
keyword, which passes values to the asyncssh.connect
function. For more information on these see the documentation for the asyncssh
library https://asyncssh.readthedocs.io .
List of hostnames or addresses on which to launch our cluster. The first will be used for the scheduler and the rest for workers.
Keywords to pass through to asyncssh.connect
. This could include things such as port
, username
, password
or known_hosts
. See docs for asyncssh.connect
and asyncssh.SSHClientConnectionOptions
for full information. If a list it must have the same length as hosts
.
Keywords to pass on to workers.
Keywords to pass on to scheduler.
The python class to use to create the worker(s).
Path to Python on remote nodes.
Deploy a Dask cluster using SSH
Create a cluster with one worker:
This example is valid syntax, but we were not able to check execution>>> from dask.distributed import Client, SSHCluster
... cluster = SSHCluster(["localhost", "localhost"])
... client = Client(cluster)
Create a cluster with three workers, each with two threads and host the dashdoard on port 8797:
This example is valid syntax, but we were not able to check execution>>> from dask.distributed import Client, SSHCluster
... cluster = SSHCluster(
... ["localhost", "localhost", "localhost", "localhost"],
... connect_options={"known_hosts": None},
... worker_options={"nthreads": 2},
... scheduler_options={"port": 0, "dashboard_address": ":8797"}
... )
... client = Client(cluster)
An example using a different worker class, in particular the CUDAWorker
from the dask-cuda
project:
>>> from dask.distributed import Client, SSHClusterSee :
... cluster = SSHCluster(
... ["localhost", "hostwithgpus", "anothergpuhost"],
... connect_options={"known_hosts": None},
... scheduler_options={"port": 0, "dashboard_address": ":8797"},
... worker_class="dask_cuda.CUDAWorker")
... client = Client(cluster)
The following pages refer to to this document either explicitly or contain code examples using this.
distributed.deploy.ssh.SSHCluster
Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.
Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)
SVG is more flexible but power hungry; and does not scale well to 50 + nodes.
All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them