It is often useful to be able to call BLAS routines directly from Cython. Doing so avoids calling the corresponding NumPy functions (which would incur a performance penalty of running interpreted code and type and shape checking) as well as re-implementing linear algebra operations in Cython (which will likely be both incorrect and slower).
Existing Cython BLAS wrappers
Correspondingly, there are several ways of doing so.
- CythonGSL provides Cython wrappers for the GNU Scientific Library.
- tokyo wraps a lot of BLAS routines in Cython functions.
- This StackOverflow thread suggests a way of calling the BLAS version bundled with SciPy.
If these projects fit your requirements, great! You can read no further. In my code, however, I often find myself needing only one or two BLAS routines that are called in a tight inner loop – and in these cases I find it preferable to write my own quick wrapper with just these two functions.
Calling BLAS directly
Declaring BLAS functions is a straightforward application of the Cython cdef extern machinery.
Getting the BLAS level 1 double inner product function is very straightforward:
cdef extern from 'cblas.h':
double ddot 'cblas_ddot'(int N,
double* X, int incX,
double* Y, int incY) nogilThis gives a function that takes the length the vectors N, the pointers to the first element of X and Y, and their strides incX and incY.
Calling it is also very easy:
cpdef double run_blas_dot(double[::1] x,
double[::1] y,
int dim):
# Get the pointers.
cdef double* x_ptr = &x[0]
cdef double* y_ptr = &y[0]
return ddot(dim, x_ptr, 1, y_ptr, 1)Declaring level 2 and level 3 functions is a little bit trickier as we need to take care of the various flags passed into the routines. Taking DGEMV (double matrix-vector product) as an example, we need:
cdef extern from 'cblas.h':
ctypedef enum CBLAS_ORDER:
CblasRowMajor
CblasColMajor
ctypedef enum CBLAS_TRANSPOSE:
CblasNoTrans
CblasTrans
CblasConjTrans
void dgemv 'cblas_dgemv'(CBLAS_ORDER order,
CBLAS_TRANSPOSE transpose,
int M, int N,
double alpha, double* A, int lda,
double* X, int incX,
double beta, double* Y, int incY) nogilThe first two ctypedefs give us the flags governing the matrix-vector product operation:
CBLAS_ORDERdetermines whether the matrixAuses row-major or column-major storage (C and Fortran arrays in NumPy parlance), andCBLAS_TRANSPOSEdetermines whether the matrixAshould be transposed for the multiplication.
The final lines gives us the actual function signature. To call it, we need:
- the two parameters above,
- the dimensions of the
Amatrix,MbyN, - the scaling constants
alphaandbeta, - the pointers to the
Amatrix andXandYvector (where theYvector stores the result), and - the strides of the
XandYarrays.
To call it:
cpdef run_blas_dgemv(double[:, ::1] A,
double[::1] x,
double[::1] y,
int M,
int N,
double alpha,
double beta):
cdef double* A_ptr = &A[0, 0]
cdef double* x_ptr = &x[0]
cdef double* y_ptr = &y[0]
dgemv(CblasRowMajor,
CblasNoTrans,
M,
N,
alpha,
A_ptr,
N,
x_ptr,
1,
beta,
y_ptr,
1)And that’s it: good enough for quick and dirty projects.
Real-world examples
For some good examples of using Cython BLAS bindings in anger, sklearn uses the BLAS headers approach;
statsmodels and gensim extract function pointers out of scipy.