This semaphore will track leases on the scheduler which can be acquired and released by an instance of this class. If the maximum amount of leases are already acquired, it is not possible to acquire more and the caller waits until another lease has been released.
The lifetime or leases are controlled using a timeout. This timeout is refreshed in regular intervals by the Client
of this instance and provides protection from deadlocks or resource starvation in case of worker failure. The timeout can be controlled using the configuration option distributed.scheduler.locks.lease-timeout
and the interval in which the scheduler verifies the timeout is set using the option distributed.scheduler.locks.lease-validation-interval
.
A noticeable difference to the Semaphore of the python standard library is that this implementation does not allow to release more often than it was acquired. If this happens, a warning is emitted but the internal state is not modified.
This implementation is still in an experimental state and subtle changes in behavior may occur without any change in the major version of this library.
This implementation is susceptible to lease overbooking in case of lease timeouts. It is advised to monitor log information and adjust above configuration options to suitable values for the user application.
If a client attempts to release the semaphore but doesn't have a lease acquired, this will raise an exception.
When a semaphore is closed, if, for that closed semaphore, a client attempts to:
Acquire a lease: an exception will be raised.
Release: a warning will be logged.
Close: nothing will happen.
dask executes functions by default assuming they are pure, when using semaphore acquire/releases inside such a function, it must be noted that there are in fact side-effects, thus, the function can no longer be considered pure. If this is not taken into account, this may lead to unexpected behavior.
The maximum amount of leases that may be granted at the same time. This effectively sets an upper limit to the amount of parallel access to a specific resource. Defaults to 1.
Name of the semaphore to acquire. Choosing the same name allows two disconnected processes to coordinate. If not given, a random name will be generated.
If True, register the semaphore with the scheduler. This needs to be done before any leases can be acquired. If not done during initialization, this can also be done by calling the register method of this class. When registering, this needs to be awaited.
The ConnectionPool to connect to the scheduler. If None is provided, it uses the worker or client pool. This paramter is mostly used for testing.
The event loop this instance is using. If None is provided, reuse the loop of the active worker or client.
Semaphore
>>> from distributed import SemaphoreSee :
... sem = Semaphore(max_leases=2, name='my_database') ... ... def access_resource(s, sem): ... # This automatically acquires a lease from the semaphore (if available) which will be ... # released when leaving the context manager. ... with sem: ... pass ... ... futures = client.map(access_resource, range(10), sem=sem) ... client.gather(futures) ... # Once done, close the semaphore to clean up the state on scheduler side. ... sem.close()
The following pages refer to to this document either explicitly or contain code examples using this.
distributed.semaphore.Semaphore
Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.
Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)
SVG is more flexible but power hungry; and does not scale well to 50 + nodes.
All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them