«

»

Jun 28

Logistic Regression in C++

After taking Andrew Ng’s Machine Learning course, I decided to implement some of the algorithms in C++.  The following codes utilize the Eigen template library for linear algebra.  Also, I’m no master in C++, therefore take these codes with a grain of salt!

In the course itself, a matlab optimization solver (fminunc) is used in order to learn the parameter vector.  However, in this C++ implementation, I first attempted to minimize the cost function using gradient descent:

    VectorXd onesVec = VectorXd::Ones(yVec.size());
    double numTrain = double(admitVec.size());

    VectorXd logVec = VectorXd::Zero((xVec * pVec).size());
    VectorXd oneMinusLogVec = VectorXd::Zero((xVec * pVec).size());

    // Gradient Descent
    double learnParam = .001;
    int numIter = 1000;

    for (int ii = 0; ii < numIter; ii++)
    { 
        VectorXd sigVec = xVec * pVec;

        // Create the sigmoid and log vectors
        for(int jj = 0; jj < sigVec.size(); jj++)
        {
            sigVec(jj) = calcSigmoid(double(sigVec(jj)));    
            logVec(jj) = log(sigVec(jj));                    
            oneMinusLogVec(jj) = log(onesVec(jj) - sigVec(jj));            
        }

        // Iterate the parameter vector
        pVec = pVec - ((learnParam/(numTrain))*((sigVec - yVec).transpose() * xVec).transpose());

        // Calculate the Cost Function
        MatrixXd sumCost = (((-1*yVec).transpose()*logVec) - (onesVec - yVec).transpose()*oneMinusLogVec);
        double costFuncVal = (1/(numTrain)) * sumCost(0);
    }

However, I found that for the dataset I was using, gradient descent was converging much too slowly.  I decided to try out Newton’s Method:

    VectorXd onesVec = VectorXd::Ones(yVec.size());
    double numTrain = double(admitVec.size());

    VectorXd logVec = VectorXd::Zero((xVec * pVec).size());
    VectorXd oneMinusLogVec = VectorXd::Zero((xVec * pVec).size());

    // Newton's Method
    int numIterNewton = 5000;

    for (int ii = 0; ii < numIterNewton; ii++)
    {
        VectorXd sigVec = xVec * pVec;

        // Create the sigmoid and log vectors
        for(int jj = 0; jj < sigVec.size(); jj++)
        {
            sigVec(jj) = calcSigmoid(sigVec(jj));
            logVec(jj) = log(sigVec(jj));
            oneMinusLogVec(jj) = log(onesVec(jj) - sigVec(jj));
        }

        // Create the gradient and hessian 
        VectorXd gradJ = ((1/(numTrain))*((sigVec - yVec).transpose() * xVec).transpose());
        MatrixXd hMat = ((1/(numTrain))*((sigVec.transpose() * (onesVec-sigVec))(0)) * (xVec.transpose() * xVec));

        // Iterate the parameter vector
        pVec = pVec - hMat.inverse() * gradJ;

        // Calculate the Cost Function
        MatrixXd sumCost = (((-1*yVec).transpose()*logVec) - (onesVec - yVec).transpose()*oneMinusLogVec);
        double costFuncVal = (1/(numTrain)) * sumCost(0);
    }

Newton’s Method converged much quicker. It should be noted, however, that the dataset I was using had only 2x features (n=2), which made the inverse computation relatively harmless.  For larger, more complex problems, the use of Newton’s Method over Gradient Descent may not be so obvious.

Here are some other C++ numerical solvers (not written by me!) located on GitHub.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>