persist(self, collections, optimize_graph=True, workers=None, allow_other_workers=None, resources=None, retries=None, priority=0, fifo_timeout='60s', actors=None, **kwargs)
Starts computation of the collection on the cluster in the background. Provides a new dask collection that is semantically identical to the previous one, but now based off of futures currently in execution.
Collections like dask.array or dataframe or dask.value objects
Whether or not to optimize the underlying graphs
A set of worker hostnames on which computations may be performed. Leave empty to default to all workers (common case)
Used with :None:None:`workers`
. Indicates whether or not the computations may be performed on workers that are not in the :None:None:`workers`
set(s).
Number of allowed automatic retries if computing a result fails
Optional prioritization of task. Zero is default. Higher priorities take precedence
Allowed amount of time between calls to consider the same priority
Defines the :None:None:`resources`
each instance of this mapped task requires on the worker; e.g. {'GPU': 2}
. See worker resources <resources>
for details on defining resources.
Whether these tasks should exist on the worker as stateful actors. Specified on a global (True/False) or per-task ( {'x': True,
'y': False}
) basis. See actors
for additional details.
Options to pass to the graph optimize calls
Persist dask collections on cluster
>>> xx = client.persist(x) # doctest: +SKIPSee :
... xx, yy = client.persist([x, y]) # doctest: +SKIP
The following pages refer to to this document either explicitly or contain code examples using this.
distributed.client.Client.normalize_collection
distributed.client.Client.publish_dataset
Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.
Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)
SVG is more flexible but power hungry; and does not scale well to 50 + nodes.
All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them