Sparse Matrices in Numba

Welcome file

Introduction

Numba is a widely-used just-in-time (JIT) compiler for Python that optimizes the execution of numerical computations by translating them to efficient machine code. It is popular for its ability to accelerate performance in various scientific and computational tasks. However, one limitation of Numba is the absence of a built-in implementation for sparse matrices (while many elements of SciPy were implemented in Numba, sparse matrices were never one of them). Sparse matrices – meaning, matrices where most of the elements are zero – are prevalent in several scientific fields such as graph theory, natural language processing, and computational biology.

In what follows, we will introduce the smn (Sparse Matrices in Numba) library that can be used in conjunction with Numba to handle sparse matrices – meaning can be used inside functions compiled through Numba – providing an effective solution for scientific computing tasks that involve sparse data structures. The present document can be seen both as a brief lesson on sparse matrices, but also as a documentation of smn with mathematical details.

First, let’s import smn which can be found in my GitHub repository, for the code presented here to work it suffices to have the file smn.py installed in the same directory. We will also import the standard numpy library for numerical analysis in python, which will be further used to compare the accuracy of the functions implemented in smn.

import smn
import numpy as np
import numba

What is a sparse matrix?

A sparse matrix is a data structure meant to represent matrices for which the vast majority of elements is zero. Instead of allocating memory for all elements in the matrix, these data structures store only the non-zero elements (and their corresponding indices). This can vastly conserve memory and allows for faster computations. For example, let us consider the following matrix
A=[0.2.0.0.0.0.1.0.2.0.0.0.0.1.0.2.0.0.0.0.1.0.2.0.0.0.0.1.0.2.0.0.0.0.1.0.] . A =\begin{bmatrix}0. & 2. & 0. & 0. & 0. & 0. \\1. & 0. & 2. & 0. & 0. & 0. \\0. & 1. & 0. & 2. & 0. & 0. \\0. & 0. & 1. & 0. & 2. & 0. \\0. & 0. & 0. & 1. & 0. & 2. \\0. & 0. & 0. & 0. & 1. & 0. \\\end{bmatrix} \ .

This could be, instead, represented as by the triad of 1 dimensional arrays as:

Array index 0 1 2 3 4 5 6 7 8 9
AlinesA_{\text{lines}} 0 1 2 3 4 5 4 3 2 1
AcolumnsA_{\text{columns}} 1 0 2 1 3 2 4 3 5 4
AvaluesA_{\text{values}} 2.0 1.0 2.0 1.0 2.0 1.0 2.0 1.0 2.0 1.0

This sparse matrix representation shown above is known as the “coordinate list” (COO) format. In the COO format, the non-zero elements of the matrix are represented by their corresponding row indices, column indices, and values. This is the way in which smn represents sparse matrices. Other formats, such as the “compressed sparse row” (CSR), “compressed sparse column” (CSC), and “dictionary of keys” (DOK) can be used instead, with higher efficiency in some specific applications, but for smn we opted for COO for more general applicability and easy of understanding.

Given that matrix \(A\) is 6×6, it has a total of 36 elements. In a dense representation, we would need to store all 36 elements, even though the majority of them are zero. On the other hand, the representation presented above only stores the non-zero elements (10 in this example) along with their corresponding row and column indices (30 elements in total). While this is a modest reduction in memory usage, it is significantly more advantageous for larger matrices that exhibit sparsity. As the size of the matrix increases, and the number of non-zero elements remains relatively small, the memory savings become even more significant. In the present document we proceed using 6×6 matrices for pedagogical pourposes, but the smn library is able to deal with arbitrarily large matricial dimensions.

Creating a sparse matrix

The smn library has a specialized class called sparse_matrix that is designed to store sparse matrices in the COO format. Our first example consists of generating a sparse matrix using the ‘smn’ library. For this, one has to explicitly says the line, colum and value of each non-zero term in the matrix. One example that will be further useful in this document is presented below.

smn.sparse_matrix(lines,columns,values,shape)

Parameters:

linesnumpy.array of integers.


columnsnumpy.array of integers.


valuesnumpy.array of floats.


shapeinteger the dimension of the matrix, (if not specified will assume it is the maximum element of lines) as of now it only implements square matrices of dimension (shape,shape).

Returns: The equivalent sparse matrix as an object of the smn.sparse_matrix class.

lines   = np.array((0,1,2,3,4,5,4,3,2,1),dtype=np.int64)
columns = np.array((1,2,3,4,5,4,3,2,1,0),dtype=np.int64)
values  = np.array((2,2,2,2,2,1,1,1,1,1),dtype=np.float64)

A = smn.sparse_matrix(lines,columns,values)

This creates our sparse matrix. One can recover the lines, columns and values arrays using the respective atributes:

A.lines
array([0, 1, 2, 3, 4, 5, 4, 3, 2, 1])
A.columns
array([1, 2, 3, 4, 5, 4, 3, 2, 1, 0])
A.values
array([2., 2., 2., 2., 2., 1., 1., 1., 1., 1.])

Some times, for ease of observation, one may want to see the sparse matrix as a dense matrix. For this, we have the to_dense method

A.to_dense()
array([[0., 2., 0., 0., 0., 0.],
       [1., 0., 2., 0., 0., 0.],
       [0., 1., 0., 2., 0., 0.],
       [0., 0., 1., 0., 2., 0.],
       [0., 0., 0., 1., 0., 2.],
       [0., 0., 0., 0., 1., 0.]])

which returns a 2-dimensional numpy array equivalent to the sparse matrix. Now, we proceed to show how the basic matrix operations are implemented in smn.

Adding and subtracting sparse matrices

In order to present how to add and subtract matrices within our library, let us first generate another spare matrix, \(B\),

lines   = np.arange(6,dtype=np.int64)
columns = np.arange(5,-1,-1,dtype=np.int64)
values  = np.ones(6,dtype=np.float64)

B = smn.sparse_matrix(lines,columns,values)
B.to_dense()
array([[0., 0., 0., 0., 0., 1.],
       [0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 1., 0., 0.],
       [0., 0., 1., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0.]])

Then we can add the two sparse matrices,\(A\) and\(B\), using the usual + operation

C=A+B
C.to_dense()
array([[0., 2., 0., 0., 0., 1.],
       [1., 0., 2., 0., 1., 0.],
       [0., 1., 0., 3., 0., 0.],
       [0., 0., 2., 0., 2., 0.],
       [0., 1., 0., 1., 0., 2.],
       [1., 0., 0., 0., 1., 0.]])

and, similarly, subtract matrices using the usual - operation

D=A-B
D.to_dense()
array([[ 0.,  2.,  0.,  0.,  0., -1.],
       [ 1.,  0.,  2.,  0., -1.,  0.],
       [ 0.,  1.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  2.,  0.],
       [ 0., -1.,  0.,  1.,  0.,  2.],
       [-1.,  0.,  0.,  0.,  1.,  0.]])

Here it is important to understand the method through which smn adds or subtracts the sparse matrices. It is done by simply concatenating the arrays of values and indices. Therefore it may contain duplicates in terms of values and corresponding indices. Let’s see the matrix \(D = A-B\) for an example

D.lines,D.columns,D.values
(array([0, 1, 2, 3, 4, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5]),
 array([1, 2, 3, 4, 5, 4, 3, 2, 1, 0, 5, 4, 3, 2, 1, 0]),
 array([ 2.,  2.,  2.,  2.,  2.,  1.,  1.,  1.,  1.,  1., -1., -1., -1.,
        -1., -1., -1.]))

Note that even though the element (3,2) of \(D\) is 0 (line 10), it still appears in the attributes of \(D\) associated to the value 1 and once again to the value -1 . With the sum of them 0. being the correct final value.
It is important to mention that such redundancy still leads to the correct result when performing matrix operations, as the redundant elements contribute to the final result as they should.

Nevertheless, eliminating redundancy in sparse matrix representations can significantly optimize memory usage and computational performance, particularly for large matrices. As a result, we recommend minimizing the use of the + and – operators. However, if one still finds the need to use them and aims to optimize performance, we have also provided the ‘simplify’ method. This method efficiently eliminates redundancies, ensuring streamlined operations.

D.simplify()
D.lines,D.columns,D.values
(array([0, 0, 1, 1, 1, 2, 2, 3, 4, 4, 4, 5, 5]),
 array([1, 5, 0, 2, 4, 1, 3, 4, 1, 3, 5, 0, 4]),
 array([ 2., -1.,  1.,  2., -1.,  1.,  1.,  2., -1.,  1.,  2., -1.,  1.]))

For a more dramatic example let’s create the null matrix as \(N=A-A\).

N = A-A
N.to_dense()
array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

Even though \(N\) it is an null matrix, it still occupies a lot more memory than necessary if the simplify method is not used, as it can be seen below:

N.lines,N.columns,N.values
(array([0, 1, 2, 3, 4, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 4, 3, 2, 1]),
 array([1, 2, 3, 4, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 4, 3, 2, 1, 0]),
 array([ 2.,  2.,  2.,  2.,  2.,  1.,  1.,  1.,  1.,  1., -2., -2., -2.,
        -2., -2., -1., -1., -1., -1., -1.]))
N.simplify()
N.lines,N.columns,N.values
(array([], dtype=int64), array([], dtype=int64), array([], dtype=float64))

After simplification, \(N\) is an null sparse matrix, as it has no non-zero elements.

Multiplying sparse matrices

Similarly to addition and subtraction, we can multiply sparse matrices using the usual * operator

H = A*B
H.to_dense()
array([[0., 0., 0., 0., 2., 0.],
       [0., 0., 0., 2., 0., 1.],
       [0., 0., 2., 0., 1., 0.],
       [0., 2., 0., 1., 0., 0.],
       [2., 0., 1., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0.]])
H.lines,H.columns,H.values
(array([0, 1, 1, 2, 2, 3, 3, 4, 4, 5]),
 array([4, 3, 5, 2, 4, 1, 3, 0, 2, 1]),
 array([2., 2., 1., 2., 1., 2., 1., 2., 1., 1.]))

which is consistent with the dot function in numpy

np.dot(A.to_dense(),B.to_dense())
array([[0., 0., 0., 0., 2., 0.],
       [0., 0., 0., 2., 0., 1.],
       [0., 0., 2., 0., 1., 0.],
       [0., 2., 0., 1., 0., 0.],
       [2., 0., 1., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0.]])

Note however that this is not exchangeable with the (numpy.ndarray) product operator, since it multiplies the elements term by term

A.to_dense()*B.to_dense()
array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 2., 0., 0.],
       [0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

Multiply matrix and vector

Although it is interesting to multiply matrix by matrix. In many scientific applications, the focus lies in multiplying matrices by vectors. This is especially common in physics, where vectors are typically represented as column matrices and multiplied on the right-hand side

[a11a12a1na21a22a2nam1am2amn][v1v2vn]=[ia1iviia2iviiamivi] .\begin{bmatrix} a_{11} & a_{12} & \dots & a_{1n} \\ a_{21} & a_{22} & \dots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \dots & a_{mn} \\ \end{bmatrix} \begin{bmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \\ \end{bmatrix} = \begin{bmatrix} \sum_i a_{1i}v_i \\ \sum_i a_{2i}v_i \\ \vdots \\ \sum_i a_{mi}v_i \\ \end{bmatrix} \ .

Here is where the sparse matrix format has a further oportunity to shine, as the sparse matix format wil not need to calculate the terms for which the matrix element is zero. When performing multiplication involving a sparse matrix, it can potentially introduce non-zero elements in previously empty positions, leading to a denser matrix. This increase in density occurs due to the multiplication process and the interaction between non-zero elements.

Meanwhile, when multiplying a sparse matrix by a (dense) vector, the memory usage remains the same, limited to the vector’s size. As a result, the size and sparsity pattern of the vector remain unchanged, preserving the memory efficiency associated with sparse representations.

Because of this, it is recommended for better performance to avoid direct matrix multiplications and opt for sucessive matrix times vector products whenever possible.

In smn a multiplication of sparse matrix times a (column) vector as above is implemented in the function sm_times_array

v = np.array((.25,.5,.75,1.,1.25,1.5))
smn.sm_times_array(A,v)
array([1.  , 1.75, 2.5 , 3.25, 4.  , 1.25])

which is equvalent to the numpy dot function for numpy arrays

np.dot(A.to_dense(),v)
array([1.  , 1.75, 2.5 , 3.25, 4.  , 1.25])

Similarly, a multiplication of a (row) vector represented by an array times a matrix, as it often preferred in some fields such as stochastic processes,

[v1v2vn][a11a12a1na21a22a2nam1am2amn]=[iviai1iviai2iviaim] \begin{bmatrix} v_1 & v_2 & \ldots & v_n \end{bmatrix} \begin{bmatrix} a_{11} & a_{12} & \dots & a_{1n} \\ a_{21} & a_{22} & \dots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \dots & a_{mn} \\ \end{bmatrix} = \begin{bmatrix} \sum_i v_i a_{i1} & \sum_i v_i a_{i2} & \ldots & \sum_i v_i a_{im} \end{bmatrix}
is implemented in the function array_times_sm

smn.array_times_sm(v,A)
array([0.5 , 1.25, 2.  , 2.75, 3.5 , 2.5 ])

which is equivalent to the numpy version.

np.dot(v,A.to_dense())
array([0.5 , 1.25, 2.  , 2.75, 3.5 , 2.5 ])

The dot function

Both sm_times_array and array_times_sm can be used within Numba. However, in order to have a syntax familiar to numpy users we also implemented an equivalent dot function that handles, at the same time, matrix multiplication

I = smn.dot(A,B)
I.lines,I.columns,I.values
(array([0, 1, 1, 2, 2, 3, 3, 4, 4, 5]),
 array([4, 3, 5, 2, 4, 1, 3, 0, 2, 1]),
 array([2., 2., 1., 2., 1., 2., 1., 2., 1., 1.]))

multiplication by a column vector

smn.dot(A,v)
array([1.  , 1.75, 2.5 , 3.25, 4.  , 1.25])

multiplication by a row vector

smn.dot(v,A)
array([0.5 , 1.25, 2.  , 2.75, 3.5 , 2.5 ])

and the regular dot product of numpy arrays.

smn.dot(v,v)
5.6875

Note that since dot gives results in different data types, it is not compiled in Numba and, because of this, can not be used inside a Numba compiled function. If necessary to do so, we recomend using the product operator *, sm_times_array, array_times_sm, and numpy.dot respectively.

Summing over a dimension

Now, let us implement a function that sums the elements of a matrix line by line, meaning \(f\) such that

f(A)=f([a11a12a1na21a22a2nam1am2amn])=[ia1iia2iiami] .f(A) = f \left( \begin{bmatrix} a_{11} & a_{12} & \dots & a_{1n} \\ a_{21} & a_{22} & \dots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \dots & a_{mn} \\ \end{bmatrix} \right) = \begin{bmatrix} \sum_i a_{1i} \\ \sum_i a_{2i} \\ \vdots \\ \sum_i a_{mi} \\ \end{bmatrix} \ .

which is implemented in smn through the method line_sum

A.line_sum()
array([2., 3., 3., 3., 3., 1.])

which is equivalent to the numpy function sum with the parameter axis=1 for dense matrices

np.sum(A.to_dense(),axis=1)
array([2., 3., 3., 3., 3., 1.])

and the column sum, meaning \(g\) such that

g(A)=g([a11a12a1na21a22a2nam1am2amn])=[iai1iai2iaim] .g(A) = g \left( \begin{bmatrix} a_{11} & a_{12} & \dots & a_{1n} \\ a_{21} & a_{22} & \dots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \dots & a_{mn} \\ \end{bmatrix} \right) = \begin{bmatrix} \sum_i a_{i1} \\ \sum_i a_{i2} \\ \vdots \\ \sum_i a_{im} \\ \end{bmatrix} \ .
Implemented in smn as column_sum

A.column_sum()
array([1., 3., 3., 3., 3., 2.])

equivalent to the numpy function sum with the parameter axis=0

np.sum(A.to_dense(),axis=0)
array([1., 3., 3., 3., 3., 2.])

Now, in a manner similar to the dot function implemented above, we also implemented the sum function for greater similarity with numpy

smn.sum(A,axis=1) #line
array([2., 3., 3., 3., 3., 1.])
smn.sum(A,axis=0) #column
array([1., 3., 3., 3., 3., 2.])

and when the parameter axis is not specified, it gives the sum of all elements, as in numpy

smn.sum(A)
15.0
np.sum(A.to_dense())
15.0

Please note that, as it happened with dot, smn.sum is not compiled with Numba.

Wall time comparison

It is reasonable to anticipate that algorithms implemented using sparse matrices, as mentioned earlier, would exhibit greater efficiency compared to their equivalent counterparts using dense matrices. As previously mentioned, memory requirements are significantly reduced. However, to validate this assumption, it is essential to conduct benchmarking tests to measure and compare the actual performance of algorithms utilizing sparse and dense matrices. For this, we shall first create a large sparse matrix

N = 1000 #size of the matrix, change for further testing
lines_1   = np.array(np.arange(N),dtype=np.int64)
columns_1 = np.array(np.arange(N),dtype=np.int64)
values_1  = np.random.rand(N) #diagonal matrix with random elements

lines_2   = np.array(np.arange(N),dtype=np.int64)
columns_2 = np.array(np.random.randint(N,size=N),dtype=np.int64) #matrix with 1 extra element per line in a random column
values_2  = np.random.rand(N) 

Z = smn.sparse_matrix(lines_1,columns_1,values_1)+smn.sparse_matrix(columns_2,lines_2,values_2)

Z_dense = Z.to_dense() 
Z_dense
array([[0.28965617, 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.04267883, 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.14100828, ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.        , 0.        , ..., 0.91822673, 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.33303376,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.66147908]])

In order to compare the execution time of product of matrices, we shall use the timeit special function

import timeit

allowing us to verify that the time spent to calculate a matrix product using smn sparse matrix is considerably smaller than the numpy made with the equivalent dense matrix.

%timeit (Z*Z)
3.44 ms ± 49.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Z_dense = Z.to_dense() 
%timeit np.dot(Z_dense,Z_dense)
15.8 ms ± 841 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

The time gained is even more noticeable when doing matrix times vector operations (when compared to the dense numpy version)

rho = np.random.rand(N)
%timeit smn.dot(rho,Z)
4.71 µs ± 46.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
%timeit np.dot(rho,Z_dense)
141 µs ± 9.32 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

and when summing by dimension

%timeit smn.sum(Z,axis=0)
2.83 µs ± 35.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
%timeit np.sum(Z_dense,axis=0)
375 µs ± 19.8 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Why use smn? – compiling Numba function

While the previous examples show that the sparse matrix structure makes matrix operations more significant, that is not the main reason why smn is useful. Rather, the reason leading to smn development is that we can use objects of the smn.sparse_matrix class within Numba compiled functions.

As an example let analize a function with three parameters: \(\rho \) a row vector, \(A \) a (sparse) matrix, and \(k \) a positive integer. The goal being to calculate \(\rho A^k \). Using smn this function can be written as

@numba.njit
def evolve_smnandnumba(rho,A,k):
    res = rho
    for i in range(k):
        res = smn.array_times_sm(rho,A)
    return res

The @numba.njit decorator above instructs Numba to compile the decorated function using just-in-time (JIT) compilation. The first time the function is ran, Numba analyzes the function code and generates machine code based on the specific data types encountered during execution. The JIT compilation process occurs in the background, and following calls utilize the compiled version.
With that in mind, we can benchmark the execution time of the JIT-compiled code,

void = evolve_smnandnumba(rho,Z,50) #Forces JIT compilation
%timeit evolve_smnandnumba(rho,Z,50)
150 µs ± 969 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

and compare it to another version that does use sparse matrix, but not compiled

def evolve_smn(rho,A,k):
    res = rho*1.0
    for i in range(k):
        res = smn.array_times_sm(rho,A)
    return res
%timeit evolve_smn(rho,Z,50)
225 µs ± 7.69 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

The reduction is even more dramatic when compared to a compiled code using the regular numpy dense array

@numba.njit
def evolve_numpy(rho,A,k):
    res = rho*1.0
    for i in range(k):
        res = np.dot(rho,A)
    return res

void = evolve_numpy(rho,Z_dense,50)
%timeit evolve_numpy(rho,Z_dense,50)
6.09 ms ± 53.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Conclusion

The present post introduces the world of sparse matrices, which offers a plethora of possibilities for optimizing data processing and numerical analysis. With a deeper understanding of their underlying concepts and advantages, it becomes evident that sparse matrices are indispensable in handling large datasets and dynamical processes efficiently.

The library smn presented here enables operations on sparse matrices within Numba compiled function. By leveraging smn, users can have the advantages of compiled code while still using python sintax and programing tools.

Author


Posted

in

by