dual_annealing(func, bounds, args=(), maxiter=1000, minimizer_kwargs=None, initial_temp=5230.0, restart_temp_ratio=2e-05, visit=2.62, accept=-5.0, maxfun=10000000.0, seed=None, no_local_search=False, callback=None, x0=None, local_search_options=None)
This function implements the Dual Annealing optimization. This stochastic approach derived from combines the generalization of CSA (Classical Simulated Annealing) and FSA (Fast Simulated Annealing) coupled to a strategy for applying a local search on accepted locations . An alternative implementation of this same algorithm is described in and benchmarks are presented in . This approach introduces an advanced method to refine the solution found by the generalized annealing process. This algorithm uses a distorted Cauchy-Lorentz visiting distribution, with its shape controlled by the parameter $q_{v}$
$$g_{q_{v}}(\Delta x(t)) \propto \frac{ \ \left[T_{q_{v}}(t) \right]^{-\frac{D}{3-q_{v}}}}{ \ \left[{1+(q_{v}-1)\frac{(\Delta x(t))^{2}} { \ \left[T_{q_{v}}(t)\right]^{\frac{2}{3-q_{v}}}}}\right]^{ \ \frac{1}{q_{v}-1}+\frac{D-1}{2}}}$$Where $t$ is the artificial time. This visiting distribution is used to generate a trial jump distance $\Delta x(t)$ of variable $x(t)$ under artificial temperature $T_{q_{v}}(t)$ .
From the starting point, after calling the visiting distribution function, the acceptance probability is computed as follows:
$$p_{q_{a}} = \min{\{1,\left[1-(1-q_{a}) \beta \Delta E \right]^{ \ \frac{1}{1-q_{a}}}\}}$$Where $q_{a}$ is a acceptance parameter. For $q_{a}<1$ , zero acceptance probability is assigned to the cases where
$$[1-(1-q_{a}) \beta \Delta E] < 0$$The artificial temperature $T_{q_{v}}(t)$ is decreased according to
$$T_{q_{v}}(t) = T_{q_{v}}(1) \frac{2^{q_{v}-1}-1}{\left( \ 1 + t\right)^{q_{v}-1}-1}$$Where $q_{v}$ is the visiting parameter.
The objective function to be minimized. Must be in the form f(x, *args)
, where x
is the argument in the form of a 1-D array and args
is a tuple of any additional fixed parameters needed to completely specify the function.
Bounds for variables. (min, max)
pairs for each element in x
, defining bounds for the objective function parameter.
Any additional fixed parameters needed to completely specify the objective function.
The maximum number of global search iterations. Default value is 1000.
Extra keyword arguments to be passed to the local minimizer (:None:None:`minimize`
). Some important options could be: method
for the minimizer method to use and args
for objective function additional arguments.
The initial temperature, use higher values to facilitates a wider search of the energy landscape, allowing dual_annealing to escape local minima that it is trapped in. Default value is 5230. Range is (0.01, 5.e4].
During the annealing process, temperature is decreasing, when it reaches initial_temp * restart_temp_ratio
, the reannealing process is triggered. Default value of the ratio is 2e-5. Range is (0, 1).
Parameter for visiting distribution. Default value is 2.62. Higher values give the visiting distribution a heavier tail, this makes the algorithm jump to a more distant region. The value range is (1, 3].
Parameter for acceptance distribution. It is used to control the probability of acceptance. The lower the acceptance parameter, the smaller the probability of acceptance. Default value is -5.0 with a range (-1e4, -5].
Soft limit for the number of objective function calls. If the algorithm is in the middle of a local search, this number will be exceeded, the algorithm will stop just after the local search is done. Default value is 1e7.
numpy.random.RandomState
}, optional
If seed
is None (or :None:None:`np.random`
), the numpy.random.RandomState
singleton is used. If seed
is an int, a new RandomState
instance is used, seeded with seed
. If seed
is already a Generator
or RandomState
instance then that instance is used. Specify seed
for repeatable minimizations. The random numbers generated with this seed only affect the visiting distribution function and new coordinates generation.
If :None:None:`no_local_search`
is set to True, a traditional Generalized Simulated Annealing will be performed with no local search strategy applied.
A callback function with signature callback(x, f, context)
, which will be called for all minima found. x
and f
are the coordinates and function value of the latest minimum found, and context
has value in [0, 1, 2], with the following meaning:
0: minimum detected in the annealing process.
1: detection occurred in the local search process.
2: detection done in the dual annealing process.
If the callback implementation returns True, the algorithm will stop.
Coordinates of a single N-D starting point.
Backwards compatible flag for :None:None:`minimizer_kwargs`
, only one of these should be supplied.
The optimization result represented as a OptimizeResult
object. Important attributes are: x
the solution array, fun
the value of the function at the solution, and message
which describes the cause of the termination. See OptimizeResult
for a description of other attributes.
Find the global minimum of a function using Dual Annealing.
The following example is a 10-D problem, with many local minima. The function involved is called Rastrigin (https://en.wikipedia.org/wiki/Rastrigin_function)
>>> from scipy.optimize import dual_annealing
... func = lambda x: np.sum(x*x - 10*np.cos(2*np.pi*x)) + 10*np.size(x)
... lw = [-5.12] * 10
... up = [5.12] * 10
... ret = dual_annealing(func, bounds=list(zip(lw, up)))
... ret.x array([-4.26437714e-09, -3.91699361e-09, -1.86149218e-09, -3.97165720e-09, -6.29151648e-09, -6.53145322e-09, -3.93616815e-09, -6.55623025e-09, -6.05775280e-09, -5.00668935e-09]) # random
>>> ret.fun 0.000000See :
The following pages refer to to this document either explicitly or contain code examples using this.
scipy.optimize._dual_annealing.dual_annealing
Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.
Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)
SVG is more flexible but power hungry; and does not scale well to 50 + nodes.
All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them