Tuesday, April 12, 2016

Error handling in BLAS libraries

It is very common to use a BLAS library to perform linear algebra operations such as dense matrix times dense matrix multiplication which can be performed using the dgemm function. The advantage of BLAS is

  • it is well a defined standard.
  • and hardware vendors such as Intel supplies a tuned version.

Now at MOSEK my employeer we use the Intel MKL library that includes a BLAS implementation. It really helps us deliver good floating point performance. Indeed we use a sequential version of Intel MKL but call it from potentially many threads using Clik plus. This works well due to the well designed BLAS interface. However, there is one rotten apple in the basket and that is error handling.

Here I will summarize why the error handling in the BLAS standard is awful from my perspective.

First of all why can errors occur when you do the dgemm operation if we assume the dimensions of the matrices are correct and ignoring issues with NANs and the like. Well, in order to obtain good performance the dgemm function may allocate additional memory to store smallish matrices that fit into the cache. I.e. the library use a blocked version to improve the performance.

Oh wait that means it can run of memory and then what? The BLAS standard error handling is to print a message to stderr or something along that line.

Recall that dgemm is embedded deep inside MOSEK which might be embedded deep inside a third party program. This implies an error message printed to stderr does not make sense to the user. Also the user would NOT like us to terminate the application with a fatal error. Rather we want to know that an out of space situation happened and terminate gracefully. Or doing something to lower the space requirement. E.g. use a fewer threads.

What is the solution to this problem? The only solution offered is to replace a function named xerbla that gets called when an error happens. The idea is that the function can set a global flag indicating an error happened. This might be a reasonable solution if the program is single threaded. Now instead assume you use a single threaded dgemm (from say Intel MKL) but call it from many threads. Then first of all you have to introduce a lock (a mutex) around the global error flag leading to performance issues. Next it is hard to figure out which of all the dgemm calls that failed. Hence, you have to fail them all. What pain.

Why is the error handling so primitive in BLAS libraries. I think the reasons are:

  • BLAS is an old Fortran based standard. 
  • For many years BLAS routine would not allocate storage. Hence, dgemm would never fail unless the dimensions where wrong.
  • BLAS is proposed by academics which does not care so much about error handling. I mean if you run out of memory you just buy a bigger supercomputer and rerun your computations.
If the BLAS had been invented today it would have been designed in C most likely and then all functions would have returned an error code. I know dealing with error codes is a pain too but that would have made error reporting much easier for those who wanted to do it properly.


Here are some links with background information: