Document

distributed 2021.10.0

.. attribute:: key: str
    The key is the unique identifier of a task, generally formed
    from the name of the function, followed by a hash of the function
    and arguments, like ``'inc-ab31c010444977004d656610d2d421ec'``.

.. attribute:: prefix: TaskPrefix
    The broad class of tasks to which this task belongs like "inc" or
    "read_csv"

.. attribute:: run_spec: object
    A specification of how to run the task.  The type and meaning of this
    value is opaque to the scheduler, as it is only interpreted by the
    worker to which the task is sent for executing.

    As a special case, this attribute may also be ``None``, in which case
    the task is "pure data" (such as, for example, a piece of data loaded
    in the scheduler using :meth:`Client.scatter`).  A "pure data" task
    cannot be computed again if its value is lost.

.. attribute:: priority: tuple
    The priority provides each task with a relative ranking which is used
    to break ties when many tasks are being considered for execution.

    This ranking is generally a 2-item tuple.  The first (and dominant)
    item corresponds to when it was submitted.  Generally, earlier tasks
    take precedence.  The second item is determined by the client, and is
    a way to prioritize tasks within a large graph that may be important,
    such as if they are on the critical path, or good to run in order to
    release many dependencies.  This is explained further in
    :doc:`Scheduling Policy <scheduling-policies>`.

.. attribute:: state: str
    This task's current state.  Valid states include ``released``,
    ``waiting``, ``no-worker``, ``processing``, ``memory``, ``erred``
    and ``forgotten``.  If it is ``forgotten``, the task isn't stored
    in the ``tasks`` dictionary anymore and will probably disappear
    soon from memory.

.. attribute:: dependencies: {TaskState}
    The set of tasks this task depends on for proper execution.  Only
    tasks still alive are listed in this set.  If, for whatever reason,
    this task also depends on a forgotten task, the
    :attr:`has_lost_dependencies` flag is set.

    A task can only be executed once all its dependencies have already
    been successfully executed and have their result stored on at least
    one worker.  This is tracked by progressively draining the
    :attr:`waiting_on` set.

.. attribute:: dependents: {TaskState}
    The set of tasks which depend on this task.  Only tasks still alive
    are listed in this set.

    This is the reverse mapping of :attr:`dependencies`.

.. attribute:: has_lost_dependencies: bool
    Whether any of the dependencies of this task has been forgotten.
    For memory consumption reasons, forgotten tasks are not kept in
    memory even though they may have dependent tasks.  When a task is
    forgotten, therefore, each of its dependents has their
    :attr:`has_lost_dependencies` attribute set to ``True``.

    If :attr:`has_lost_dependencies` is true, this task cannot go
    into the "processing" state anymore.

.. attribute:: waiting_on: {TaskState}
    The set of tasks this task is waiting on *before* it can be executed.
    This is always a subset of :attr:`dependencies`.  Each time one of the
    dependencies has finished processing, it is removed from the
    :attr:`waiting_on` set.

    Once :attr:`waiting_on` becomes empty, this task can move from the
    "waiting" state to the "processing" state (unless one of the
    dependencies errored out, in which case this task is instead
    marked "erred").

.. attribute:: waiters: {TaskState}
    The set of tasks which need this task to remain alive.  This is always
    a subset of :attr:`dependents`.  Each time one of the dependents
    has finished processing, it is removed from the :attr:`waiters`
    set.

    Once both :attr:`waiters` and :attr:`who_wants` become empty, this
    task can be released (if it has a non-empty :attr:`run_spec`) or
    forgotten (otherwise) by the scheduler, and by any workers
    in :attr:`who_has`.

    .. note:: Counter-intuitively, :attr:`waiting_on` and
       :attr:`waiters` are not reverse mappings of each other.

.. attribute:: who_wants: {ClientState}
    The set of clients who want this task's result to remain alive.
    This is the reverse mapping of :attr:`ClientState.wants_what`.

    When a client submits a graph to the scheduler it also specifies
    which output tasks it desires, such that their results are not released
    from memory.

    Once a task has finished executing (i.e. moves into the "memory"
    or "erred" state), the clients in :attr:`who_wants` are notified.

    Once both :attr:`waiters` and :attr:`who_wants` become empty, this
    task can be released (if it has a non-empty :attr:`run_spec`) or
    forgotten (otherwise) by the scheduler, and by any workers
    in :attr:`who_has`.

.. attribute:: who_has: {WorkerState}
    The set of workers who have this task's result in memory.
    It is non-empty iff the task is in the "memory" state.  There can be
    more than one worker in this set if, for example, :meth:`Client.scatter`
    or :meth:`Client.replicate` was used.

    This is the reverse mapping of :attr:`WorkerState.has_what`.

.. attribute:: processing_on: WorkerState (or None)
    If this task is in the "processing" state, which worker is currently
    processing it.  Otherwise this is ``None``.

    This attribute is kept in sync with :attr:`WorkerState.processing`.

.. attribute:: retries: int
    The number of times this task can automatically be retried in case
    of failure.  If a task fails executing (the worker returns with
    an error), its :attr:`retries` attribute is checked.  If it is
    equal to 0, the task is marked "erred".  If it is greater than 0,
    the :attr:`retries` attribute is decremented and execution is
    attempted again.

.. attribute:: nbytes: int (or None)
    The number of bytes, as determined by ``sizeof``, of the result
    of a finished task.  This number is used for diagnostics and to
    help prioritize work.

.. attribute:: type: str
    The type of the object as a string.  Only present for tasks that have
    been computed.

.. attribute:: exception: object
    If this task failed executing, the exception object is stored here.
    Otherwise this is ``None``.

.. attribute:: traceback: object
    If this task failed executing, the traceback object is stored here.
    Otherwise this is ``None``.

.. attribute:: exception_blame: TaskState (or None)
    If this task or one of its dependencies failed executing, the
    failed task is stored here (possibly itself).  Otherwise this
    is ``None``.

.. attribute:: erred_on: set(str)
    Worker addresses on which errors appeared causing this task to be in an error state.

.. attribute:: suspicious: int
    The number of times this task has been involved in a worker death.

    Some tasks may cause workers to die (such as calling ``os._exit(0)``).
    When a worker dies, all of the tasks on that worker are reassigned
    to others.  This combination of behaviors can cause a bad task to
    catastrophically destroy all workers on the cluster, one after
    another.  Whenever a worker dies, we mark each task currently
    processing on that worker (as recorded by
    :attr:`WorkerState.processing`) as suspicious.

    If a task is involved in three deaths (or some other fixed constant)
    then we mark the task as ``erred``.

.. attribute:: host_restrictions: {hostnames}
    A set of hostnames where this task can be run (or ``None`` if empty).
    Usually this is empty unless the task has been specifically restricted
    to only run on certain hosts.  A hostname may correspond to one or
    several connected workers.

.. attribute:: worker_restrictions: {worker addresses}
    A set of complete worker addresses where this can be run (or ``None``
    if empty).  Usually this is empty unless the task has been specifically
    restricted to only run on certain workers.

    Note this is tracking worker addresses, not worker states, since
    the specific workers may not be connected at this time.

.. attribute:: resource_restrictions: {resource: quantity}
    Resources required by this task, such as ``{'gpu': 1}`` or
    ``{'memory': 1e9}`` (or ``None`` if empty).  These are user-defined
    names and are matched against the contents of each
    :attr:`WorkerState.resources` dictionary.

.. attribute:: loose_restrictions: bool
    If ``False``, each of :attr:`host_restrictions`,
    :attr:`worker_restrictions` and :attr:`resource_restrictions` is
    a hard constraint: if no worker is available satisfying those
    restrictions, the task cannot go into the "processing" state and
    will instead go into the "no-worker" state.

    If ``True``, the above restrictions are mere preferences: if no worker
    is available satisfying those restrictions, the task can still go
    into the "processing" state and be sent for execution to another
    connected worker.

.. attribute:: metadata: dict
    Metadata related to task.

.. attribute:: actor: bool
    Whether or not this task is an Actor.

.. attribute:: group: TaskGroup
    The group of tasks to which this one belongs.

.. attribute:: annotations: dict
    Task annotations

A simple object holding information about a task.

Examples

See :

Back References

The following pages refer to to this document either explicitly or contain code examples using this.

distributed.scheduler.Computation

Local connectivity graph

Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.

Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)

SVG is more flexible but power hungry; and does not scale well to 50 + nodes.

All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them

File: /distributed/scheduler.py#1142
type: <class 'type'>
Commit: