Skynet has continued to progress. I’ve now validated my Logistic Regression code.

To the left you can see some label data plotted. The blue is labeled true and red false.

This iteration of code improves the Sketchpad class significantly. Arrow keys now allow you to move forward and backward between the different plots and the name of the actively displayed graph is displayed in the upper left corner. Previously, the Sketchpad only drew circles, but I’ve added the ability for it to draw squares and Xs (took a page from Octave).

Okay, so how did the logistic regression do? You can see to the right the hypothesis plotted. It comes pretty darn close with 98.7% correctly labeled with a modest iteration count of 1.5k

Maybe it was because a large portion of the code is shared, I found that once the underlying math was solid, Logistic Regression just worked. If I crank up the iterations to 10x, I get a 99.9% of my data correctly labeled. At 15x, I achieve 100%.

Obviously this is an extremely simple dataset. It was generated by labeling anything to the right of a slope to be true… so for more complex datasets, I’m sure 100x iterations or more will be necessary. I definitely feel the need to crack open the Numerical Recipes book and find more sophisticated function minimizing algorithm.

At 100k iterations, the Gradient Descent process took a noticeably longer time to compute for Logistic Regression than a comparable Linear Regression counterpart. I’m guessing that this has to do with a more mathematically intense Sigmoid function. At some point, I’ll go through and profile the hell out of the code to see where my hotspots are. My assumption is that my inefficient Matrix math will be the leading offender.. but complex math like Sigmoid or Log are likely good low hanging fruit for optimization.