Calling BLAS from Cython - Mostly Engineering

It is often useful to be able to call BLAS routines directly from Cython. Doing so avoids calling the corresponding NumPy functions (which would incur a performance penalty of running interpreted code and type and shape checking) as well as re-implementing linear algebra operations in Cython (which will likely be both incorrect and slower).

Existing Cython BLAS wrappers

Correspondingly, there are several ways of doing so.

CythonGSL provides Cython wrappers for the GNU Scientific Library.
tokyo wraps a lot of BLAS routines in Cython functions.
This StackOverflow thread suggests a way of calling the BLAS version bundled with SciPy.

If these projects fit your requirements, great! You can read no further. In my code, however, I often find myself needing only one or two BLAS routines that are called in a tight inner loop – and in these cases I find it preferable to write my own quick wrapper with just these two functions.

Calling BLAS directly

Declaring BLAS functions is a straightforward application of the Cython cdef extern machinery.

Getting the BLAS level 1 double inner product function is very straightforward:

cdef extern from 'cblas.h':
    double ddot 'cblas_ddot'(int N,
                             double* X, int incX,
                             double* Y, int incY) nogil

This gives a function that takes the length the vectors N, the pointers to the first element of X and Y, and their strides incX and incY.

Calling it is also very easy:

cpdef double run_blas_dot(double[::1] x,
                          double[::1] y,
                          int dim):

    # Get the pointers.
    cdef double* x_ptr = &x[0]
    cdef double* y_ptr = &y[0]

    return ddot(dim, x_ptr, 1, y_ptr, 1)

Declaring level 2 and level 3 functions is a little bit trickier as we need to take care of the various flags passed into the routines. Taking DGEMV (double matrix-vector product) as an example, we need:

cdef extern from 'cblas.h':
    ctypedef enum CBLAS_ORDER:
        CblasRowMajor
        CblasColMajor
    ctypedef enum CBLAS_TRANSPOSE:
        CblasNoTrans
        CblasTrans
        CblasConjTrans
    void dgemv 'cblas_dgemv'(CBLAS_ORDER order,
                             CBLAS_TRANSPOSE transpose,
                             int M, int N,
                             double alpha, double* A, int lda,
                             double* X, int incX,
                             double beta, double* Y, int incY) nogil

The first two ctypedefs give us the flags governing the matrix-vector product operation:

CBLAS_ORDER determines whether the matrix A uses row-major or column-major storage (C and Fortran arrays in NumPy parlance), and
CBLAS_TRANSPOSE determines whether the matrix A should be transposed for the multiplication.

The final lines gives us the actual function signature. To call it, we need:

the two parameters above,
the dimensions of the A matrix, M by N,
the scaling constants alpha and beta,
the pointers to the A matrix and X and Y vector (where the Y vector stores the result), and
the strides of the X and Y arrays.

To call it:

cpdef run_blas_dgemv(double[:, ::1] A,
                     double[::1] x,
                     double[::1] y,
                     int M,
                     int N,
                     double alpha,
                     double beta):

    cdef double* A_ptr = &A[0, 0]
    cdef double* x_ptr = &x[0]
    cdef double* y_ptr = &y[0]

    dgemv(CblasRowMajor,
          CblasNoTrans,
          M,
          N,
          alpha,
          A_ptr,
          N,
          x_ptr,
          1,
          beta,
          y_ptr,
          1)

And that’s it: good enough for quick and dirty projects.

Real-world examples

For some good examples of using Cython BLAS bindings in anger, sklearn uses the BLAS headers approach; statsmodels and gensim extract function pointers out of scipy.