Just a note to myself to check out this command when I get the chance. For whatever reason, despite having used Ubuntu on and off for years, I first learned of it today.

Sep 12

## BLAS and LAPACK

Back when I was new to the world of HPC, I was often confused as to what the distinction was between BLAS and LAPACK. With the wealth of available memory model variants (shared, distributed), vendor implementations, and subroutine naming conventions, the amount of acronyms can get out of hand quickly. The following is a little cheat sheet I refer to when I need a quick refresher.

**BLAS (Basic Linear Algebra Subprograms)**

BLAS is a specification for low-level linear algebra routines. The routines are split into three levels:

- Level 1: These routines are those which can be performed in linear time O(n). These include
*vector*operations. - Level 2: These routines are those which can be performed in quadratic time O(n^2). These include
*matrix-vector*operations - Level 3: These routines are those which can be performed in cubic time O(n^3). These include
*matrix-matrix*operations.

Architecture specific implementations (like MKL for Intel chips, for example) take advantage of vectorization and other optimizations available to their respective hardware. It should also be noted that most vendor specific implementations (and some open source, like OpenBLAS) support multithreaded operation for use on shared memory machines. A few BLAS implementations are listed here:

- Netlib BLAS: The official reference implementation (Fortran 77).
- Netlib CBLAS: Reference C interface to BLAS
- Intel MKL: x86 (32- and 64-bit). Optimized for Pentium, Core, Xeon, Xeon Phi.
- OpenBLAS: Optimized successor to GotoBLAS. Supports x86 (32- and 64-bit), MIPS, ARM
- ATLAS (Automatically Tuned Linear Algebra Software): Implementation that automatically creates an optimized BLAS library for any architecture.

**LAPACK (Linear Algebra Package)**

LAPACK is library for performing high level linear algebra operations. It is built “on top” of BLAS, and often the underlying BLAS routines are completely transparent to the programmer. Some of the stuff that LAPACK can do:

- Solve systems of linear equations
- Linear least squares
- Eigenvalue problems
- Singular Value Decomposition (SVD)
- Matrix Factorizations (LU, QR, Cholesky, Schur decomposition)

Both real and complex, single and double precision matrices can be used.

The function naming scheme is* pmmaaa *where:

*p*: precision, S for single, D for double*mm*: two-letter code which denotes what type matrix is expected of the algorithm*aaa*: three-letter code denoting the actual algorithm that is being performed by the subroutine

Like BLAS, architecture specific implementations of LAPACK exist as well (Intel MKL, Apple vecLib, etc). Most of these are multithreaded for use on shared memory machines.

For use on parallel distributed memory machines, scaLAPACK is available (and also in some vendor implementations as well like Intel MKL).

Jul 22

## Euclidean Algorithm

The Euclidean Algorithm recursively calculates the greatest common divisor (GCD) of two integers. It works because GCD(a,b) = GCD(a mod b, b) = GCD(b, a mod b). Each step of the algorithm reduces the size of the numbers by about a factor of 2. The total number of steps required is roughly log(a*b). It is much more efficient than brute force testing all possible divisors up to a+b.

#include <iostream> int calcGCD(const int& a, const int& b) { if (b == 0) return a; int aPrime = a % b; return calcGCD(b, aPrime); } int main() { int aa, bb; std::cout << "Enter two integers for GCD: "; std::cin >> aa >> bb; std::cin.ignore(); std::cout << "Answer: " << calcGCD(aa, bb) << std::endl; std::cin.get(); }

Jul 20

## Calculating Fibonacci Numbers

The following C++ code calculates the Nth number in the Fibonacci sequence. The first routine recursively calculates F_n, however it is very inefficient and takes an unrealistic amount of time to calculate even moderately high values for F_n (say 100). The second routine generates the sequence sequentially on the fly up to F_n, and is a much quicker algorithm (orders of magnitude for high values).

#include <iostream> #include <vector> int calcFibSlow(const int& n) { if (n <= 1) return n; else // Recursively calculate F_n return (calcFibSlow(n-1) + calcFibSlow(n-2)); } int calcFibFast(const int& n) { // Create a vector of integers, n+1 because wants zeroth F_n std::vector<int> listN(n+1,0); listN[1] = 1; // Generate up to F_n for (int ii = 2; ii <= n; ++ii) { listN[ii] = listN[ii-1] + listN[ii-2]; } return listN[n]; } int main() { int nVal; // Read in the desired F_n std::cin >> nVal; std::cin.ignore(); // Output the values of both routines std::cout << calcFibSlow(nVal) << std::endl; std::cout << calcFibFast(nVal) << std::endl; std::cin.get(); }

Jul 11

## References and Pointers (C++)

Inspired by C++ Primer, Fifth Edition

A reference refers to (is another name for) another object. Any operation on the reference actually operates on the object to which the reference is bounded.

int ivalue = 128; int &refValue = ivalue; int &refValue; // error: a reference must be initialized

A pointer is an object itself (unlike a reference) and “points to” another object.

int *intPtr; // Pointer to an int double *dPtr; // Pointer to a double

Pointers can have four states: 1) It can point to an object, 2) It can point to the location just following the end of an object, 3) null pointer, 4) Invalid (not one of aforementioned states)

The address-of operator (&) is used to get the address of an object:

int intVal = 100; int *intPtr = &intVal;

The dereference operator (*) is used to access the value “pointed to” by the pointer:

int ival = 100; int *intPtr = &ival; cout << *intPtr; // outputs the value pointed at by intPtr

The following code illustrates how to pass a vector of integers by reference to a function:

#include <iostream> #include <vector> using std::vector; using std::cin; using std::cout; int MaxPairwiseProduct(const vector<int>& numbers) { int result = 0; int n = numbers.size(); for (int i = 0; i < n; ++i) { for (int j = i + 1; j < n; ++j) { if (numbers[i] * numbers[j] > result) { result = numbers[i] * numbers[j]; } } } return result; } int main() { int n; cin >> n; vector<int> numbers(n); for (int i = 0; i < n; ++i) { cin >> numbers[i]; } int result = MaxPairwiseProduct(numbers); cout << result << "\n"; return 0; }

Jun 28

## Linear Regression in C++

The following is the gradient descent algorithm implemented for linear regression using C++ and the Eigen library:

// Gradient Descent double numTrain = double(totalPayVec.size()); double learnParam = .0000002; int numIter = 50000; for (int ii = 0; ii < numIter; ii++) { // Iterate the parameter vector pVec = pVec - ((learnParam/(numTrain))*(((xVec * pVec) - yVec).transpose() * xVec).transpose()); // Calculate and output the Cost function value for each iteration MatrixXd sumCost = (((xVec * pVec) - yVec).transpose()) * ((xVec * pVec) - yVec); double costFuncVal = (1/(2.0 * numTrain)) * sumCost(0); std::cout << "Iteration: "<< ii << " " << "Cost Function Value: " << costFuncVal << std::endl; }

Jun 28

## Logistic Regression in C++

After taking Andrew Ng’s Machine Learning course, I decided to implement some of the algorithms in C++. The following codes utilize the Eigen template library for linear algebra. Also, I’m no master in C++, therefore take these codes with a grain of salt!

In the course itself, a matlab optimization solver (fminunc) is used in order to learn the parameter vector. However, in this C++ implementation, I first attempted to minimize the cost function using gradient descent:

VectorXd onesVec = VectorXd::Ones(yVec.size()); double numTrain = double(admitVec.size()); VectorXd logVec = VectorXd::Zero((xVec * pVec).size()); VectorXd oneMinusLogVec = VectorXd::Zero((xVec * pVec).size()); // Gradient Descent double learnParam = .001; int numIter = 1000; for (int ii = 0; ii < numIter; ii++) { VectorXd sigVec = xVec * pVec; // Create the sigmoid and log vectors for(int jj = 0; jj < sigVec.size(); jj++) { sigVec(jj) = calcSigmoid(double(sigVec(jj))); logVec(jj) = log(sigVec(jj)); oneMinusLogVec(jj) = log(onesVec(jj) - sigVec(jj)); } // Iterate the parameter vector pVec = pVec - ((learnParam/(numTrain))*((sigVec - yVec).transpose() * xVec).transpose()); // Calculate the Cost Function MatrixXd sumCost = (((-1*yVec).transpose()*logVec) - (onesVec - yVec).transpose()*oneMinusLogVec); double costFuncVal = (1/(numTrain)) * sumCost(0); }

However, I found that for the dataset I was using, gradient descent was converging much too slowly. I decided to try out Newton’s Method:

VectorXd onesVec = VectorXd::Ones(yVec.size()); double numTrain = double(admitVec.size()); VectorXd logVec = VectorXd::Zero((xVec * pVec).size()); VectorXd oneMinusLogVec = VectorXd::Zero((xVec * pVec).size()); // Newton's Method int numIterNewton = 5000; for (int ii = 0; ii < numIterNewton; ii++) { VectorXd sigVec = xVec * pVec; // Create the sigmoid and log vectors for(int jj = 0; jj < sigVec.size(); jj++) { sigVec(jj) = calcSigmoid(sigVec(jj)); logVec(jj) = log(sigVec(jj)); oneMinusLogVec(jj) = log(onesVec(jj) - sigVec(jj)); } // Create the gradient and hessian VectorXd gradJ = ((1/(numTrain))*((sigVec - yVec).transpose() * xVec).transpose()); MatrixXd hMat = ((1/(numTrain))*((sigVec.transpose() * (onesVec-sigVec))(0)) * (xVec.transpose() * xVec)); // Iterate the parameter vector pVec = pVec - hMat.inverse() * gradJ; // Calculate the Cost Function MatrixXd sumCost = (((-1*yVec).transpose()*logVec) - (onesVec - yVec).transpose()*oneMinusLogVec); double costFuncVal = (1/(numTrain)) * sumCost(0); }

Newton’s Method converged much quicker. It should be noted, however, that the dataset I was using had only 2x features (n=2), which made the inverse computation relatively harmless. For larger, more complex problems, the use of Newton’s Method over Gradient Descent may not be so obvious.

Here are some other C++ numerical solvers (not written by me!) located on GitHub.

Mar 30

## Setting up Visual Studio 2012 and SFML

The detailed directions can be found here: SFML with Visual Studio

- For all configurations, add the path to the SFML headers (
*<sfml-install-path>/include*) to C/C++ » General » Additional Include Directories. - For all configurations, add the path to the SFML libraries (
*<sfml-install-path>/lib*) to Linker » General » Additional Library Directories. - Next link your application to the SFML libraries, Linker » Input » Additional Dependencies, and add “sfml-graphics.lib”, “sfml-window.lib” and “sfml-system.lib”, for example.
- If linked dynamically (external DLLs) copy the SFML DLLs to the executable directory.

Here is the “Hello World” test code to see if SFML is working correctly:

#include <SFML/Graphics.hpp> int main() { sf::RenderWindow window(sf::VideoMode(200, 200), "SFML works!"); sf::CircleShape shape(100.f); shape.setFillColor(sf::Color::Green); while (window.isOpen()) { sf::Event event; while (window.pollEvent(event)) { if (event.type == sf::Event::Closed) window.close(); } window.clear(); window.draw(shape); window.display(); } return 0; }

## Recent Comments