Back when I was new to the world of HPC, I was often confused as to what the distinction was between BLAS and LAPACK. With the wealth of available memory model variants (shared, distributed), vendor implementations, and subroutine naming conventions, the amount of acronyms can get out of hand quickly. The following is a little cheat sheet I refer to when I need a quick refresher.
BLAS (Basic Linear Algebra Subprograms)
BLAS is a specification for low-level linear algebra routines. The routines are split into three levels:
- Level 1: These routines are those which can be performed in linear time O(n). These include vector operations.
- Level 2: These routines are those which can be performed in quadratic time O(n^2). These include matrix-vector operations
- Level 3: These routines are those which can be performed in cubic time O(n^3). These include matrix-matrix operations.
Architecture specific implementations (like MKL for Intel chips, for example) take advantage of vectorization and other optimizations available to their respective hardware. It should also be noted that most vendor specific implementations (and some open source, like OpenBLAS) support multithreaded operation for use on shared memory machines. A few BLAS implementations are listed here:
- Netlib BLAS: The official reference implementation (Fortran 77).
- Netlib CBLAS: Reference C interface to BLAS
- Intel MKL: x86 (32- and 64-bit). Optimized for Pentium, Core, Xeon, Xeon Phi.
- OpenBLAS: Optimized successor to GotoBLAS. Supports x86 (32- and 64-bit), MIPS, ARM
- ATLAS (Automatically Tuned Linear Algebra Software): Implementation that automatically creates an optimized BLAS library for any architecture.
LAPACK (Linear Algebra Package)
LAPACK is library for performing high level linear algebra operations. It is built “on top” of BLAS, and often the underlying BLAS routines are completely transparent to the programmer. Some of the stuff that LAPACK can do:
- Solve systems of linear equations
- Linear least squares
- Eigenvalue problems
- Singular Value Decomposition (SVD)
- Matrix Factorizations (LU, QR, Cholesky, Schur decomposition)
Both real and complex, single and double precision matrices can be used.
The function naming scheme is pmmaaa where:
- p: precision, S for single, D for double
- mm: two-letter code which denotes what type matrix is expected of the algorithm
- aaa: three-letter code denoting the actual algorithm that is being performed by the subroutine
Like BLAS, architecture specific implementations of LAPACK exist as well (Intel MKL, Apple vecLib, etc). Most of these are multithreaded for use on shared memory machines.
For use on parallel distributed memory machines, scaLAPACK is available (and also in some vendor implementations as well like Intel MKL).