differential_evolution(func, bounds, args=(), strategy='best1bin', maxiter=1000, popsize=15, tol=0.01, mutation=(0.5, 1), recombination=0.7, seed=None, callback=None, disp=False, polish=True, init='latinhypercube', atol=0, updating='immediate', workers=1, constraints=(), x0=None)
Differential Evolution is stochastic in nature (does not use gradient methods) to find the minimum, and can search large areas of candidate space, but often requires larger numbers of function evaluations than conventional gradient-based techniques.
The algorithm is due to Storn and Price .
Differential evolution is a stochastic population based method that is useful for global optimization problems. At each pass through the population the algorithm mutates each candidate solution by mixing with other candidate solutions to create a trial candidate. There are several strategies for creating trial candidates, which suit some problems more than others. The 'best1bin' strategy is a good starting point for many systems. In this strategy two members of the population are randomly chosen. Their difference is used to mutate the best member (the 'best' in 'best1bin'), $b_0$ , so far:
$$b' = b_0 + mutation * (population[rand0] - population[rand1])$$A trial vector is then constructed. Starting with a randomly chosen ith parameter the trial is sequentially filled (in modulo) with parameters from b'
or the original candidate. The choice of whether to use b'
or the original candidate is made with a binomial distribution (the 'bin' in 'best1bin') - a random number in [0, 1) is generated. If this number is less than the :None:None:`recombination`
constant then the parameter is loaded from b'
, otherwise it is loaded from the original candidate. The final parameter is always loaded from b'
. Once the trial candidate is built its fitness is assessed. If the trial is better than the original candidate then it takes its place. If it is also better than the best overall candidate it also replaces that. To improve your chances of finding a global minimum use higher :None:None:`popsize`
values, with higher :None:None:`mutation`
and (dithering), but lower :None:None:`recombination`
values. This has the effect of widening the search radius, but slowing convergence. By default the best solution vector is updated continuously within a single iteration ( updating='immediate'
). This is a modification of the original differential evolution algorithm which can lead to faster convergence as trial vectors can immediately benefit from improved solutions. To use the original Storn and Price behaviour, updating the best solution once per iteration, set updating='deferred'
.
The objective function to be minimized. Must be in the form f(x, *args)
, where x
is the argument in the form of a 1-D array and args
is a tuple of any additional fixed parameters needed to completely specify the function.
Bounds for variables. There are two ways to specify the bounds: 1. Instance of Bounds
class. 2. (min, max)
pairs for each element in x
, defining the finite lower and upper bounds for the optimizing argument of :None:None:`func`
. It is required to have len(bounds) == len(x)
. len(bounds)
is used to determine the number of parameters in x
.
Any additional fixed parameters needed to completely specify the objective function.
The differential evolution strategy to use. Should be one of:
'best1bin'
'best1exp'
'rand1exp'
'randtobest1exp'
'currenttobest1exp'
'best2exp'
'rand2exp'
'randtobest1bin'
'currenttobest1bin'
'best2bin'
'rand2bin'
'rand1bin'
The default is 'best1bin'.
The maximum number of generations over which the entire population is evolved. The maximum number of function evaluations (with no polishing) is: (maxiter + 1) * popsize * len(x)
A multiplier for setting the total population size. The population has popsize * len(x)
individuals. This keyword is overridden if an initial population is supplied via the :None:None:`init`
keyword. When using init='sobol'
the population size is calculated as the next power of 2 after popsize * len(x)
.
Relative tolerance for convergence, the solving stops when np.std(pop) <= atol + tol * np.abs(np.mean(population_energies))
, where and atol
and :None:None:`tol`
are the absolute and relative tolerance respectively.
The mutation constant. In the literature this is also known as differential weight, being denoted by F. If specified as a float it should be in the range [0, 2]. If specified as a tuple (min, max)
dithering is employed. Dithering randomly changes the mutation constant on a generation by generation basis. The mutation constant for that generation is taken from U[min, max)
. Dithering can help speed convergence significantly. Increasing the mutation constant increases the search radius, but will slow down convergence.
The recombination constant, should be in the range [0, 1]. In the literature this is also known as the crossover probability, being denoted by CR. Increasing this value allows a larger number of mutants to progress into the next generation, but at the risk of population stability.
numpy.random.RandomState
}, optional
If seed
is None (or :None:None:`np.random`
), the numpy.random.RandomState
singleton is used. If seed
is an int, a new RandomState
instance is used, seeded with seed
. If seed
is already a Generator
or RandomState
instance then that instance is used. Specify seed
for repeatable minimizations.
Prints the evaluated :None:None:`func`
at every iteration.
A function to follow the progress of the minimization. xk
is the best solution found so far. val
represents the fractional value of the population convergence. When val
is greater than one the function halts. If callback returns :None:None:`True`
, then the minimization is halted (any polishing is still carried out).
If True (default), then scipy.optimize.minimize
with the :None:None:`L-BFGS-B`
method is used to polish the best population member at the end, which can improve the minimization slightly. If a constrained problem is being studied then the :None:None:`trust-constr`
method is used instead.
Specify which type of population initialization is performed. Should be one of:
'latinhypercube'
'sobol'
'halton'
'random'
array specifying the initial population. The array should have shape
(M, len(x))
, where M is the total population size and len(x) is the number of parameters.:None:None:`init`
is clipped to:None:None:`bounds`
before use.
The default is 'latinhypercube'. Latin Hypercube sampling tries to maximize coverage of the available parameter space.
'sobol' and 'halton' are superior alternatives and maximize even more the parameter space. 'sobol' will enforce an initial population size which is calculated as the next power of 2 after popsize * len(x)
. 'halton' has no requirements but is a bit less efficient. See scipy.stats.qmc
for more details.
'random' initializes the population randomly - this has the drawback that clustering can occur, preventing the whole of parameter space being covered. Use of an array to specify a population could be used, for example, to create a tight bunch of initial guesses in an location where the solution is known to exist, thereby reducing time for convergence.
Absolute tolerance for convergence, the solving stops when np.std(pop) <= atol + tol * np.abs(np.mean(population_energies))
, where and atol
and :None:None:`tol`
are the absolute and relative tolerance respectively.
If 'immediate'
, the best solution vector is continuously updated within a single generation . This can lead to faster convergence as trial vectors can take advantage of continuous improvements in the best solution. With 'deferred'
, the best solution vector is updated once per generation. Only 'deferred'
is compatible with parallelization, and the :None:None:`workers`
keyword can over-ride this option.
If :None:None:`workers`
is an int the population is subdivided into :None:None:`workers`
sections and evaluated in parallel (uses :None:None:`multiprocessing.Pool <multiprocessing>`
). Supply -1 to use all available CPU cores. Alternatively supply a map-like callable, such as :None:None:`multiprocessing.Pool.map`
for evaluating the population in parallel. This evaluation is carried out as workers(func, iterable)
. This option will override the :None:None:`updating`
keyword to updating='deferred'
if workers != 1
. Requires that :None:None:`func`
be pickleable.
Constraints on the solver, over and above those applied by the :None:None:`bounds`
kwd. Uses the approach by Lampinen .
Provides an initial guess to the minimization. Once the population has been initialized this vector replaces the first (best) member. This replacement is done even if :None:None:`init`
is given an initial population.
The optimization result represented as a OptimizeResult
object. Important attributes are: x
the solution array, success
a Boolean flag indicating if the optimizer exited successfully and message
which describes the cause of the termination. See OptimizeResult
for a description of other attributes. If :None:None:`polish`
was employed, and a lower minimum was obtained by the polishing, then OptimizeResult also contains the jac
attribute. If the eventual solution does not satisfy the applied constraints success
will be :None:None:`False`
.
Finds the global minimum of a multivariate function.
Let us consider the problem of minimizing the Rosenbrock function. This function is implemented in rosen
in scipy.optimize
.
>>> from scipy.optimize import rosen, differential_evolution
... bounds = [(0,2), (0, 2), (0, 2), (0, 2), (0, 2)]
... result = differential_evolution(rosen, bounds)
... result.x, result.fun (array([1., 1., 1., 1., 1.]), 1.9216496320061384e-19)
Now repeat, but with parallelization.
>>> bounds = [(0,2), (0, 2), (0, 2), (0, 2), (0, 2)]
... result = differential_evolution(rosen, bounds, updating='deferred',
... workers=2)
... result.x, result.fun (array([1., 1., 1., 1., 1.]), 1.9216496320061384e-19)
Let's try and do a constrained minimization
>>> from scipy.optimize import NonlinearConstraint, Bounds
... def constr_f(x):
... return np.array(x[0] + x[1]) >>>
>>> # the sum of x[0] and x[1] must be less than 1.9
... nlc = NonlinearConstraint(constr_f, -np.inf, 1.9)
... # specify limits using a `Bounds` object.
... bounds = Bounds([0., 0.], [2., 2.])
... result = differential_evolution(rosen, bounds, constraints=(nlc),
... seed=1)
... result.x, result.fun (array([0.96633867, 0.93363577]), 0.0011361355854792312)
Next find the minimum of the Ackley function (https://en.wikipedia.org/wiki/Test_functions_for_optimization).
>>> from scipy.optimize import differential_evolutionSee :
... import numpy as np
... def ackley(x):
... arg1 = -0.2 * np.sqrt(0.5 * (x[0] ** 2 + x[1] ** 2))
... arg2 = 0.5 * (np.cos(2. * np.pi * x[0]) + np.cos(2. * np.pi * x[1]))
... return -20. * np.exp(arg1) - np.exp(arg2) + 20. + np.e
... bounds = [(-5, 5), (-5, 5)]
... result = differential_evolution(ackley, bounds)
... result.x, result.fun (array([ 0., 0.]), 4.4408920985006262e-16)
The following pages refer to to this document either explicitly or contain code examples using this.
scipy.optimize._optimize.brute
scipy.optimize._differentialevolution.differential_evolution
Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.
Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)
SVG is more flexible but power hungry; and does not scale well to 50 + nodes.
All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them