It is often useful to be able to call BLAS routines directly from Cython. Doing so avoids calling the corresponding NumPy functions (which would incur a performance penalty of running interpreted code and type and shape checking) as well as re-implementing linear algebra operations in Cython (which will likely be both incorrect and slower).
Existing Cython BLAS wrappers
Correspondingly, there are several ways of doing so.
- CythonGSL provides Cython wrappers for the GNU Scientific Library.
- tokyo wraps a lot of BLAS routines in Cython functions.
- This StackOverflow thread suggests a way of calling the BLAS version bundled with SciPy.
If these projects fit your requirements, great! You can read no further. In my code, however, I often find myself needing only one or two BLAS routines that are called in a tight inner loop – and in these cases I find it preferable to write my own quick wrapper with just these two functions.
Calling BLAS directly
Declaring BLAS functions is a straightforward application of the Cython cdef extern
machinery.
Getting the BLAS level 1 double inner product function is very straightforward:
cdef extern from 'cblas.h':
double ddot 'cblas_ddot'(int N,
double* X, int incX,
double* Y, int incY) nogil
This gives a function that takes the length the vectors N
, the pointers to the first element of X
and Y
, and their strides incX
and incY
.
Calling it is also very easy:
cpdef double run_blas_dot(double[::1] x,
double[::1] y,
int dim):
# Get the pointers.
cdef double* x_ptr = &x[0]
cdef double* y_ptr = &y[0]
return ddot(dim, x_ptr, 1, y_ptr, 1)
Declaring level 2 and level 3 functions is a little bit trickier as we need to take care of the various flags passed into the routines. Taking DGEMV
(double matrix-vector product) as an example, we need:
cdef extern from 'cblas.h':
ctypedef enum CBLAS_ORDER:
CblasRowMajor
CblasColMajor
ctypedef enum CBLAS_TRANSPOSE:
CblasNoTrans
CblasTrans
CblasConjTrans
void dgemv 'cblas_dgemv'(CBLAS_ORDER order,
CBLAS_TRANSPOSE transpose,
int M, int N,
double alpha, double* A, int lda,
double* X, int incX,
double beta, double* Y, int incY) nogil
The first two ctypedef
s give us the flags governing the matrix-vector product operation:
CBLAS_ORDER
determines whether the matrixA
uses row-major or column-major storage (C and Fortran arrays in NumPy parlance), andCBLAS_TRANSPOSE
determines whether the matrixA
should be transposed for the multiplication.
The final lines gives us the actual function signature. To call it, we need:
- the two parameters above,
- the dimensions of the
A
matrix,M
byN
, - the scaling constants
alpha
andbeta
, - the pointers to the
A
matrix andX
andY
vector (where theY
vector stores the result), and - the strides of the
X
andY
arrays.
To call it:
cpdef run_blas_dgemv(double[:, ::1] A,
double[::1] x,
double[::1] y,
int M,
int N,
double alpha,
double beta):
cdef double* A_ptr = &A[0, 0]
cdef double* x_ptr = &x[0]
cdef double* y_ptr = &y[0]
dgemv(CblasRowMajor,
CblasNoTrans,
M,
N,
alpha,
A_ptr,
N,
x_ptr,
1,
beta,
y_ptr,
1)
And that’s it: good enough for quick and dirty projects.
Real-world examples
For some good examples of using Cython BLAS bindings in anger, sklearn uses the BLAS headers approach;
statsmodels and gensim extract function pointers out of scipy
.