Document

scipy 1.8.0 Pypi GitHub Homepage

Other Docs

Notes Parameters Returns

clarkson_woodruff_transform(input_matrix, sketch_size, seed=None)

Given an input_matrix A of size (n, d) , compute a matrix A' of size (sketch_size, d) so that

$$\|Ax\| \approx \|A'x\|$$

with high probability via the Clarkson-Woodruff Transform, otherwise known as the CountSketch matrix.

Notes

To make the statement

$$\|Ax\| \approx \|A'x\|$$

precise, observe the following result which is adapted from the proof of Theorem 14 of via Markov's Inequality. If we have a sketch size sketch_size=k which is at least

$$k \geq \frac{2}{\epsilon^2\delta}$$

Then for any fixed vector x ,

$$\|Ax\| = (1\pm\epsilon)\|A'x\|$$

with probability at least one minus delta.

This implementation takes advantage of sparsity: computing a sketch takes time proportional to A.nnz . Data A which is in scipy.sparse.csc_matrix format gives the quickest computation time for sparse input.

>>> from scipy import linalg
>>> from scipy import sparse
>>> rng = np.random.default_rng()
>>> n_rows, n_columns, density, sketch_n_rows = 15000, 100, 0.01, 200
>>> A = sparse.rand(n_rows, n_columns, density=density, format='csc')
>>> B = sparse.rand(n_rows, n_columns, density=density, format='csr')
>>> C = sparse.rand(n_rows, n_columns, density=density, format='coo')
>>> D = rng.standard_normal((n_rows, n_columns))
>>> SA = linalg.clarkson_woodruff_transform(A, sketch_n_rows) # fastest
>>> SB = linalg.clarkson_woodruff_transform(B, sketch_n_rows) # fast
>>> SC = linalg.clarkson_woodruff_transform(C, sketch_n_rows) # slower
>>> SD = linalg.clarkson_woodruff_transform(D, sketch_n_rows) # slowest

That said, this method does perform well on dense inputs, just slower on a relative scale.

Parameters

input_matrix: array_like :: Input matrix, of shape (n, d) .
sketch_size: int :: Number of rows for the sketch.
seed : {None, int, `numpy.random.Generator`,: numpy.random.RandomState }, optional

If seed is None (or :None:None:`np.random`), the numpy.random.RandomState singleton is used. If seed is an int, a new RandomState instance is used, seeded with seed . If seed is already a Generator or RandomState instance then that instance is used.

Returns

A' : array_like: Sketch of the input matrix A , of size (sketch_size, d) .

Applies a Clarkson-Woodruff Transform/sketch to the input matrix.

Examples

Given a big dense matrix A :

>>> from scipy import linalg
... n_rows, n_columns, sketch_n_rows = 15000, 100, 200
... rng = np.random.default_rng()
... A = rng.standard_normal((n_rows, n_columns))
... sketch = linalg.clarkson_woodruff_transform(A, sketch_n_rows)
... sketch.shape
(200, 100)

>>> norm_A = np.linalg.norm(A)
... norm_sketch = np.linalg.norm(sketch)

Now with high probability, the true norm norm_A is close to the sketched norm norm_sketch in absolute value.

Similarly, applying our sketch preserves the solution to a linear regression of $\min \|Ax - b\|$ .

>>> from scipy import linalg
... n_rows, n_columns, sketch_n_rows = 15000, 100, 200
... rng = np.random.default_rng()
... A = rng.standard_normal((n_rows, n_columns))
... b = rng.standard_normal(n_rows)
... x = np.linalg.lstsq(A, b, rcond=None)
... Ab = np.hstack((A, b.reshape(-1,1)))
... SAb = linalg.clarkson_woodruff_transform(Ab, sketch_n_rows)
... SA, Sb = SAb[:,:-1], SAb[:,-1]
... x_sketched = np.linalg.lstsq(SA, Sb, rcond=None)

As with the matrix norm example, np.linalg.norm(A @ x - b) is close to np.linalg.norm(A @ x_sketched - b) with high probability.

See :

Local connectivity graph

Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.

Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)

SVG is more flexible but power hungry; and does not scale well to 50 + nodes.

All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them

GitHub : /scipy/linalg/_sketches.py#58
type: <class 'function'>
Commit: