Note that you want SciPy >= 0.7.2
Warning
In SciPy 0.6, scipy.csc_matrix.dot has a bug with singleton dimensions. There may be more bugs. It also has inconsistent implementation of sparse matrices.
We do not test against SciPy versions below 0.7.2.
There are four member variables that comprise a compressed matrix sp (for at least csc, csr and bsr):
- sp.shape
- gives the shape of the matrix.
- sp.data
- gives the values of the non-zero entries. For CSC, these should be in order from (I think, not sure) reading down in columns, starting at the leftmost column until we reach the rightmost column.
- sp.indices
- gives the location of the non-zero entry. For CSC, this is the row location.
- sp.indptr
- gives the other location of the non-zero entry. For CSC, there are as many values of indptr as there are columns + 1 in the matrix. sp.indptr[k] = x and indptr[k+1] = y means that column k contains sp.data[x:y], i.e. the xth through the y-1th non-zero values.
See the example below for details.
>>> import scipy.sparse
>>> sp = scipy.sparse.csc_matrix((5, 10))
>>> sp[4, 0] = 20
/u/lisa/local/byhost/test_maggie46.iro.umontreal.ca/lib64/python2.5/site-packages/scipy/sparse/compressed.py:494: SparseEfficiencyWarning: changing the sparsity structure of a csc_matrix is expensive. lil_matrix is more efficient.
SparseEfficiencyWarning)
>>> sp[0, 0] = 10
>>> sp[2, 3] = 30
>>> sp.todense()
matrix([[ 10., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 30., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 20., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
>>> print sp
(0, 0) 10.0
(4, 0) 20.0
(2, 3) 30.0
>>> sp.shape
(5, 10)
>>> sp.data
array([ 10., 20., 30.])
>>> sp.indices
array([0, 4, 2], dtype=int32)
>>> sp.indptr
array([0, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3], dtype=int32)
Several things should be learned from the above example:
TODO: Rewrite this documentation to do things in a smarter way.
The sparse equivalent of dmatrix is csc_matrix and csr_matrix.
Often when you use a sparse matrix it is because there is a meaning to the structure of non-zeros. The gradient on terms outside that structure has no meaning, so it is computationally efficient not to compute them.
StructuredDot is when you want the gradient to have zeroes corresponding to the sparse entries in the matrix.
TrueDot and Structured dot have different gradients but their perform functions should be the same.
The gradient of TrueDot can have non-zeros where the sparse matrix had zeros. The gradient of StructuredDot can’t.
Suppose you have dot(x,w) where x and w are square matrices. If w is dense, like randn((5,5)) and x is of full rank (though potentially sparse, like a diagonal matrix of 1s) then the output will be dense too. (But i guess the density of the output is a red herring.) What’s important is the density of the gradient on the output. If the gradient on the output is dense, and w is dense (as we said it was) then the True gradient on x will be dense. If our dot is a TrueDot, then it will say that the gradient on x is dense. If our dot is a StructuredDot, then it will say the gradient on x is only defined on the diagonal and ignore the gradients on the off-diagonal.