scipy 1.8.0 Pypi GitHub Homepage
Other Docs
NotesParametersRaisesReturnsBackRef
cdist(XA, XB, metric='euclidean', *, out=None, **kwargs)

See Notes for common calling conventions.

Notes

The following are common calling conventions:

  1. Y = cdist(XA, XB, 'euclidean')

    Computes the distance between $m$ points using Euclidean distance (2-norm) as the distance metric between the points. The points are arranged as $m$ $n$ -dimensional row vectors in the matrix X.

  2. Y = cdist(XA, XB, 'minkowski', p=2.)

    Computes the distances using the Minkowski distance $\|u-v\|_p$ ( $p$ -norm) where $p > 0$ (note that this is only a quasi-metric if $0 < p < 1$ ).

  3. Y = cdist(XA, XB, 'cityblock')

    Computes the city block or Manhattan distance between the points.

  4. Y = cdist(XA, XB, 'seuclidean', V=None)

    Computes the standardized Euclidean distance. The standardized Euclidean distance between two n-vectors u and v is

    $$\sqrt{\sum {(u_i-v_i)^2 / V[x_i]}}.$$

    V is the variance vector; V[i] is the variance computed over all the i'th components of the points. If not passed, it is automatically computed.

  5. Y = cdist(XA, XB, 'sqeuclidean')

    Computes the squared Euclidean distance $\|u-v\|_2^2$ between the vectors.

  6. Y = cdist(XA, XB, 'cosine')

    Computes the cosine distance between vectors u and v,

    $$1 - \frac{u \cdot v} {{\|u\|}_2 {\|v\|}_2}$$

    where $\|*\|_2$ is the 2-norm of its argument * , and $u \cdot v$ is the dot product of $u$ and $v$ .

  7. Y = cdist(XA, XB, 'correlation')

    Computes the correlation distance between vectors u and v. This is

    $$1 - \frac{(u - \bar{u}) \cdot (v - \bar{v})} {{\|(u - \bar{u})\|}_2 {\|(v - \bar{v})\|}_2}$$

    where $\bar{v}$ is the mean of the elements of vector v, and $x \cdot y$ is the dot product of $x$ and $y$ .

  8. Y = cdist(XA, XB, 'hamming')

    Computes the normalized Hamming distance, or the proportion of those vector elements between two n-vectors u and v which disagree. To save memory, the matrix X can be of type boolean.

  9. Y = cdist(XA, XB, 'jaccard')

    Computes the Jaccard distance between the points. Given two vectors, u and v , the Jaccard distance is the proportion of those elements u[i] and v[i] that disagree where at least one of them is non-zero.

  10. Y = cdist(XA, XB, 'jensenshannon')

    Computes the Jensen-Shannon distance between two probability arrays. Given two probability vectors, $p$ and $q$ , the Jensen-Shannon distance is

    $$\sqrt{\frac{D(p \parallel m) + D(q \parallel m)}{2}}$$

    where $m$ is the pointwise mean of $p$ and $q$ and $D$ is the Kullback-Leibler divergence.

  11. Y = cdist(XA, XB, 'chebyshev')

    Computes the Chebyshev distance between the points. The Chebyshev distance between two n-vectors u and v is the maximum norm-1 distance between their respective elements. More precisely, the distance is given by

    $$d(u,v) = \max_i {|u_i-v_i|}.$$
  12. Y = cdist(XA, XB, 'canberra')

    Computes the Canberra distance between the points. The Canberra distance between two points u and v is

    $$d(u,v) = \sum_i \frac{|u_i-v_i|} {|u_i|+|v_i|}.$$
  13. Y = cdist(XA, XB, 'braycurtis')

    Computes the Bray-Curtis distance between the points. The Bray-Curtis distance between two points u and v is

    $$d(u,v) = \frac{\sum_i (|u_i-v_i|)} {\sum_i (|u_i+v_i|)}$$
  14. Y = cdist(XA, XB, 'mahalanobis', VI=None)

    Computes the Mahalanobis distance between the points. The Mahalanobis distance between two points u and v is $\sqrt{(u-v)(1/V)(u-v)^T}$ where $(1/V)$ (the VI variable) is the inverse covariance. If VI is not None, VI will be used as the inverse covariance matrix.

  15. Y = cdist(XA, XB, 'yule')

    Computes the Yule distance between the boolean vectors. (see yule function documentation)

  16. Y = cdist(XA, XB, 'matching')

    Synonym for 'hamming'.

  17. Y = cdist(XA, XB, 'dice')

    Computes the Dice distance between the boolean vectors. (see dice function documentation)

  18. Y = cdist(XA, XB, 'kulsinski')

    Computes the Kulsinski distance between the boolean vectors. (see kulsinski function documentation)

  19. Y = cdist(XA, XB, 'rogerstanimoto')

    Computes the Rogers-Tanimoto distance between the boolean vectors. (see rogerstanimoto function documentation)

  20. Y = cdist(XA, XB, 'russellrao')

    Computes the Russell-Rao distance between the boolean vectors. (see russellrao function documentation)

  21. Y = cdist(XA, XB, 'sokalmichener')

    Computes the Sokal-Michener distance between the boolean vectors. (see sokalmichener function documentation)

  22. Y = cdist(XA, XB, 'sokalsneath')

    Computes the Sokal-Sneath distance between the vectors. (see sokalsneath function documentation)

  23. Y = cdist(XA, XB, f)

    Computes the distance between all pairs of vectors in X using the user supplied 2-arity function f. For example, Euclidean distance between the vectors could be computed as follows:

    dm = cdist(XA, XB, lambda u, v: np.sqrt(((u-v)**2).sum()))

    Note that you should avoid passing a reference to one of the distance functions defined in this library. For example,:

    dm = cdist(XA, XB, sokalsneath)

    would calculate the pair-wise distances between the vectors in X using the Python function sokalsneath . This would result in sokalsneath being called ${n \choose 2}$ times, which is inefficient. Instead, the optimized C version is more efficient, and we call it using the following syntax:

    dm = cdist(XA, XB, 'sokalsneath')

Parameters

XA : array_like

An $m_A$ by $n$ array of $m_A$ original observations in an $n$ -dimensional space. Inputs are converted to float type.

XB : array_like

An $m_B$ by $n$ array of $m_B$ original observations in an $n$ -dimensional space. Inputs are converted to float type.

metric : str or callable, optional

The distance metric to use. If a string, the distance function can be 'braycurtis', 'canberra', 'chebyshev', 'cityblock', 'correlation', 'cosine', 'dice', 'euclidean', 'hamming', 'jaccard', 'jensenshannon', 'kulsinski', 'kulczynski1', 'mahalanobis', 'matching', 'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule'.

**kwargs : dict, optional

Extra arguments to :None:None:`metric`: refer to each metric documentation for a list of all possible arguments.

Some possible arguments:

Raises

ValueError

An exception is thrown if :None:None:`XA` and :None:None:`XB` do not have the same number of columns.

Returns

Y : ndarray

A $m_A$ by $m_B$ distance matrix is returned. For each $i$ and $j$ , the metric dist(u=XA[i], v=XB[j]) is computed and stored in the $ij$ th entry.

Compute distance between each pair of the two collections of inputs.

Examples

Find the Euclidean distances between four 2-D coordinates:

>>> from scipy.spatial import distance
... coords = [(35.0456, -85.2672),
...  (35.1174, -89.9711),
...  (35.9728, -83.9422),
...  (36.1667, -86.7833)]
... distance.cdist(coords, coords, 'euclidean') array([[ 0. , 4.7044, 1.6172, 1.8856], [ 4.7044, 0. , 6.0893, 3.3561], [ 1.6172, 6.0893, 0. , 2.8477], [ 1.8856, 3.3561, 2.8477, 0. ]])

Find the Manhattan distance from a 3-D point to the corners of the unit cube:

>>> a = np.array([[0, 0, 0],
...  [0, 0, 1],
...  [0, 1, 0],
...  [0, 1, 1],
...  [1, 0, 0],
...  [1, 0, 1],
...  [1, 1, 0],
...  [1, 1, 1]])
... b = np.array([[ 0.1, 0.2, 0.4]])
... distance.cdist(a, b, 'cityblock') array([[ 0.7], [ 0.9], [ 1.3], [ 1.5], [ 1.5], [ 1.7], [ 2.1], [ 2.3]])
See :

Back References

The following pages refer to to this document either explicitly or contain code examples using this.

scipy.spatial.distance.cdist skimage.feature.match.match_descriptors

Local connectivity graph

Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.

Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)

SVG is more flexible but power hungry; and does not scale well to 50 + nodes.

All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them


GitHub : /scipy/spatial/distance.py#2617
type: <class 'function'>
Commit: