Sep 12

BLAS and LAPACK

Back when I was new to the world of HPC, I was often confused as to what the distinction was between BLAS and LAPACK.  With the wealth of available memory model variants (shared, distributed), vendor implementations, and subroutine naming conventions, the amount of acronyms can get out of hand quickly.  The following is a little cheat sheet I refer to when I need a quick refresher.

BLAS (Basic Linear Algebra Subprograms)

BLAS is a specification for low-level linear algebra routines.  The routines are split into three levels:

  • Level 1: These routines are those which can be performed in linear time O(n).  These include vector operations.
  • Level 2: These routines are those which can be performed in quadratic time O(n^2).  These include matrix-vector operations
  • Level 3: These routines are those which can be performed in cubic time O(n^3).  These include matrix-matrix operations.

Architecture specific implementations (like MKL for Intel chips, for example) take advantage of vectorization and other optimizations available to their respective hardware.  It should also be noted that most vendor specific implementations (and some open source, like OpenBLAS) support multithreaded operation for use on shared memory machines.  A few BLAS implementations are listed here:

  1. Netlib BLAS: The official reference implementation (Fortran 77).
  2. Netlib CBLAS: Reference C interface to BLAS
  3. Intel MKL: x86 (32- and 64-bit).  Optimized for Pentium, Core, Xeon, Xeon Phi.
  4. OpenBLAS: Optimized successor to GotoBLAS. Supports x86 (32- and 64-bit), MIPS, ARM
  5. ATLAS (Automatically Tuned Linear Algebra Software):  Implementation that automatically creates an optimized BLAS library for any architecture.

LAPACK (Linear Algebra Package)

LAPACK is library for performing high level linear algebra operations.  It is built “on top” of BLAS, and often the underlying BLAS routines are completely transparent to the programmer.  Some of the stuff that LAPACK can do:

  • Solve systems of linear equations
  • Linear least squares
  • Eigenvalue problems
  • Singular Value Decomposition (SVD)
  • Matrix Factorizations (LU, QR, Cholesky, Schur decomposition)

Both real and complex, single and double precision matrices can be used.

The function naming scheme is pmmaaa where:

  • p: precision, S for single, D for double
  • mm: two-letter code which denotes what type matrix is expected of the algorithm
  • aaa: three-letter code denoting the actual algorithm that is being performed by the subroutine

Like BLAS, architecture specific implementations of LAPACK exist as well (Intel MKL, Apple vecLib, etc).  Most of these are multithreaded for use on shared memory machines.

For use on parallel distributed memory machines, scaLAPACK is available (and also in some vendor implementations as well like Intel MKL).

Jul 22

Euclidean Algorithm

The Euclidean Algorithm recursively calculates the greatest common divisor (GCD) of two integers.  It works because GCD(a,b) = GCD(a mod b, b) = GCD(b, a mod b).  Each step of the algorithm reduces the size of the numbers by about a factor of 2.  The total number of steps required is roughly log(a*b).  It is much more efficient than brute force testing all possible divisors up to a+b.


#include <iostream>

int calcGCD(const int& a, const int& b)
{
    if (b == 0)
        return a;
    int aPrime = a % b;
    return calcGCD(b, aPrime);
}

int main()
{
    int aa, bb;
    std::cout << "Enter two integers for GCD: ";
    std::cin >> aa >> bb;
    std::cin.ignore();
    std::cout << "Answer: " << calcGCD(aa, bb) << std::endl;
    std::cin.get();
}

Jul 20

Calculating Fibonacci Numbers

The following C++ code calculates the Nth number in the Fibonacci sequence.  The first routine recursively calculates F_n, however it is very inefficient and takes an unrealistic amount of time to calculate even moderately high values for F_n (say 100).  The second routine generates the sequence sequentially on the fly up to F_n, and is a much quicker algorithm (orders of magnitude for high values).

#include <iostream>
#include <vector>

int calcFibSlow(const int& n)
{
    if (n <= 1)
        return n;
    else
        // Recursively calculate F_n
        return (calcFibSlow(n-1) + calcFibSlow(n-2));
}

int calcFibFast(const int& n)
{
    // Create a vector of integers, n+1 because wants zeroth F_n
    std::vector<int> listN(n+1,0);
    listN[1] = 1;

    // Generate up to F_n
    for (int ii = 2; ii <= n; ++ii)
    {
        listN[ii] = listN[ii-1] + listN[ii-2];
    }

    return listN[n];

}

int main()
{
    int nVal;
    // Read in the desired F_n
    std::cin >> nVal;
    std::cin.ignore();
    // Output the values of both routines
    std::cout << calcFibSlow(nVal) << std::endl;
    std::cout << calcFibFast(nVal) << std::endl;
    std::cin.get();
}

Jul 11

References and Pointers (C++)

Inspired by C++ Primer, Fifth Edition

A reference refers to (is another name for) another object. Any operation on the reference actually operates on the object to which the reference is bounded.

int ivalue = 128;
int &refValue = ivalue;        
int &refValue;                // error: a reference must be initialized

A pointer is an object itself (unlike a reference) and “points to” another object.

int *intPtr;                 // Pointer to an int
double *dPtr;                // Pointer to a double

Pointers can have four states: 1) It can point to an object, 2) It can point to the location just following the end of an object, 3) null pointer, 4) Invalid (not one of aforementioned states)

The address-of operator (&) is used to get the address of an object:

int intVal = 100;
int *intPtr = &intVal;

The dereference operator (*) is used to access the value “pointed to” by the pointer:

int ival = 100;
int *intPtr = &ival;
cout << *intPtr;             // outputs the value pointed at by intPtr

The following code illustrates how to pass a vector of integers by reference to a function:

#include <iostream>
#include <vector>

using std::vector;
using std::cin;
using std::cout;

int MaxPairwiseProduct(const vector<int>& numbers) {
  int result = 0;
  int n = numbers.size();
  for (int i = 0; i < n; ++i) {
    for (int j = i + 1; j < n; ++j) {
      if (numbers[i] * numbers[j] > result) {
        result = numbers[i] * numbers[j];
      }
    }
  }
  return result;
}

int main() {
    int n;
    cin >> n;
    vector<int> numbers(n);
    for (int i = 0; i < n; ++i) {
        cin >> numbers[i];
    }

    int result = MaxPairwiseProduct(numbers);
    cout << result << "\n";
    return 0;
}

Jun 28

Linear Regression in C++

The following is the gradient descent algorithm implemented for linear regression using C++ and the Eigen library:

    // Gradient Descent
    double numTrain = double(totalPayVec.size());
    double learnParam = .0000002;
    int numIter = 50000;
    for (int ii = 0; ii < numIter; ii++)
    { 
        // Iterate the parameter vector
        pVec = pVec - ((learnParam/(numTrain))*(((xVec * pVec) - yVec).transpose() * xVec).transpose());
        
        // Calculate and output the Cost function value for each iteration 
        MatrixXd sumCost = (((xVec * pVec) - yVec).transpose()) * ((xVec * pVec) - yVec);
        double costFuncVal = (1/(2.0 * numTrain)) * sumCost(0);
        std::cout << "Iteration: "<< ii << " " << "Cost Function Value: " << costFuncVal << std::endl;
    }

Jun 28

Logistic Regression in C++

After taking Andrew Ng’s Machine Learning course, I decided to implement some of the algorithms in C++.  The following codes utilize the Eigen template library for linear algebra.  Also, I’m no master in C++, therefore take these codes with a grain of salt!

In the course itself, a matlab optimization solver (fminunc) is used in order to learn the parameter vector.  However, in this C++ implementation, I first attempted to minimize the cost function using gradient descent:

    VectorXd onesVec = VectorXd::Ones(yVec.size());
    double numTrain = double(admitVec.size());

    VectorXd logVec = VectorXd::Zero((xVec * pVec).size());
    VectorXd oneMinusLogVec = VectorXd::Zero((xVec * pVec).size());

    // Gradient Descent
    double learnParam = .001;
    int numIter = 1000;

    for (int ii = 0; ii < numIter; ii++)
    { 
        VectorXd sigVec = xVec * pVec;

        // Create the sigmoid and log vectors
        for(int jj = 0; jj < sigVec.size(); jj++)
        {
            sigVec(jj) = calcSigmoid(double(sigVec(jj)));    
            logVec(jj) = log(sigVec(jj));                    
            oneMinusLogVec(jj) = log(onesVec(jj) - sigVec(jj));            
        }

        // Iterate the parameter vector
        pVec = pVec - ((learnParam/(numTrain))*((sigVec - yVec).transpose() * xVec).transpose());

        // Calculate the Cost Function
        MatrixXd sumCost = (((-1*yVec).transpose()*logVec) - (onesVec - yVec).transpose()*oneMinusLogVec);
        double costFuncVal = (1/(numTrain)) * sumCost(0);
    }

However, I found that for the dataset I was using, gradient descent was converging much too slowly.  I decided to try out Newton’s Method:

    VectorXd onesVec = VectorXd::Ones(yVec.size());
    double numTrain = double(admitVec.size());

    VectorXd logVec = VectorXd::Zero((xVec * pVec).size());
    VectorXd oneMinusLogVec = VectorXd::Zero((xVec * pVec).size());

    // Newton's Method
    int numIterNewton = 5000;

    for (int ii = 0; ii < numIterNewton; ii++)
    {
        VectorXd sigVec = xVec * pVec;

        // Create the sigmoid and log vectors
        for(int jj = 0; jj < sigVec.size(); jj++)
        {
            sigVec(jj) = calcSigmoid(sigVec(jj));
            logVec(jj) = log(sigVec(jj));
            oneMinusLogVec(jj) = log(onesVec(jj) - sigVec(jj));
        }

        // Create the gradient and hessian 
        VectorXd gradJ = ((1/(numTrain))*((sigVec - yVec).transpose() * xVec).transpose());
        MatrixXd hMat = ((1/(numTrain))*((sigVec.transpose() * (onesVec-sigVec))(0)) * (xVec.transpose() * xVec));

        // Iterate the parameter vector
        pVec = pVec - hMat.inverse() * gradJ;

        // Calculate the Cost Function
        MatrixXd sumCost = (((-1*yVec).transpose()*logVec) - (onesVec - yVec).transpose()*oneMinusLogVec);
        double costFuncVal = (1/(numTrain)) * sumCost(0);
    }

Newton’s Method converged much quicker. It should be noted, however, that the dataset I was using had only 2x features (n=2), which made the inverse computation relatively harmless.  For larger, more complex problems, the use of Newton’s Method over Gradient Descent may not be so obvious.

Here are some other C++ numerical solvers (not written by me!) located on GitHub.

Mar 30

Setting up Visual Studio 2012 and SFML

The detailed directions can be found here: SFML with Visual Studio

  1. For all configurations, add the path to the SFML headers (<sfml-install-path>/include) to C/C++ » General » Additional Include Directories.
  2. For all configurations, add the path to the SFML libraries (<sfml-install-path>/lib) to Linker » General » Additional Library Directories.
  3. Next link your application to the SFML libraries, Linker » Input » Additional Dependencies, and add “sfml-graphics.lib”, “sfml-window.lib” and “sfml-system.lib”, for example.
  4. If linked dynamically (external DLLs) copy the SFML DLLs to the executable directory.

Here is the “Hello World” test code to see if SFML is working correctly:

#include <SFML/Graphics.hpp>

int main()
{
    sf::RenderWindow window(sf::VideoMode(200, 200), "SFML works!");
    sf::CircleShape shape(100.f);
    shape.setFillColor(sf::Color::Green);

    while (window.isOpen())
    {
        sf::Event event;
        while (window.pollEvent(event))
        {
            if (event.type == sf::Event::Closed)
                window.close();
        }

        window.clear();
        window.draw(shape);
        window.display();
    }

    return 0;
}

Dec 10

Trauma

The semi that slammed into me, I never saw it coming. Screeching tires, a loud metallic thud, and a wave of broken glass all compressed into a single second, before darkness and silence.

Consciousness returns in a hospital bed, my vision cloudy and fading. The nebulous forms of my wife and child are beside me, holding my hand. I can’t make out their words, but I find comfort in the cadence of their voices. Life continues to fall away. It won’t be long now. I try to squeeze the hand in mine, to let them know it will be ok, but to no avail. I’m so sorry. Goodbye.

I awaken in a cloud of light. Does Heaven exist after all? If it does, I better come up with some good excuses as to why I was absent all of those Sundays. That’s strange. I can remember each and every Sunday of my entire life. The other days of the week as well, I can remember them too, in absolute detail. I don’t have to strain to locate these memories; I just… know.

No pearly gates. No angelic choir. No white robed deity.

“Hello?” I ask my surroundings, “Is anyone else here?”

An answer comes, not from without, but from within. I know where I am: I’m everywhere I choose to be. As to others, well… that question doesn’t exactly make sense: It’s just a remnant thought that lingers from my former existence as a human.

I wonder what happened to me. The answer arrives seemingly before I even finish asking myself the question. I remember it vividly now: Planar collision. The fusion of high dimensional space is one of the few things that can wreak havoc on my being. I’ll have to be more careful, next time.

With my mind torn apart, my consciousness collapsed into a mere shadow of its former self: Human. The healing process would soon begin.

Sentience, while no omnipotence, is still preferable to primal instinct: Had I wandered closer to the point of dimensional intersect, I could have found myself a lizard or a cockroach. I remember when I too closely investigated the overlapping event horizons of a binary black hole. Life as a goldfish had been simple and calm, in retrospect. That is, until, I re-integrated myself with the universe. It’s much more traumatic for lower life projections that that of a human, the latter of which are much closer to me in terms of evolution than they could ever imagine.

My absolute knowledge of all things past and present means that I have fully recovered. Unfortunately, I can’t use this infinite knowledge of what has, and is currently, happening to extrapolate information about the future. If I could, perhaps I could avoid setbacks like these.

In the last fleeting moment before all traces of my human existence are absorbed into me, I remember my wife and child. They are here with me. They always have been. They always will be.

It feels good to be back.

Nov 30

Perception

The device from deep space parked itself in orbit before orienting it’s lens towards Los Angeles. Earth collectively sighed in relief when it was determined that the alien technology was not a weapon, but a probe. Further analysis revealed that the satellite took a total of 18 weeks worth of images in zeptosecond increments. The probe then processed the images, chained them together, and sped them up into a single femtosecond long video, which was then transmitted back to the stars. Something, somewhere, had created a time-lapse of us, so that they might study us at their natural rate of perception. Otherwise, staring at immobile objects, like these humans here, would get old quick.

Nov 30

Apathetic

Four years ago, in 2012, I decided to upgrade my brain. The purchase of the Samsung Galaxy S3 meant that the totality of human knowledge was now accessible to me anytime, anywhere.

The pathway by which this data is accessed originates in my grey matter, weaves down through clunky typing fingers, and extends out to some remote server and back, before flowing up the optical nerves, to close the loop. It does exhibit significantly more lag than my biological neuron-to-neuron network; however this is a small price to pay for omnipotence.

Engineers will solve the latency problem in time, anyways.

Fast forward to this morning, when I decided to purchase the Nexus 5x. I completed the transaction using the web browser on the S3. Did my brain just upgrade itself?

It’s not just me however: Modern society is obsessed with the acquisition of these brain-mods. The time between new model releases is shrinking and sales are only on the rise. What is it that we are trying to achieve by this endless cycle of self-upgrade? Why is there such a high priority on the enhancement of this distributed neural pathway?

The answer to that question is especially tough when you consider the vast majority who use their enhanced brains only to post pictures, build farms, and crush candy. We have the accumulated knowledge of our entire species as an integrated part of us now, yet we embrace only the most mundane of applications. Why, on the precipice of singularity, do we find ourselves so apathetic?

Older posts «