neovintage by Rimas Silkaitis

Start Using Shogun Toolbox in Ruby

Shogun Toolbox is the ultimate in machine learning. It provides a number of different machine learning models like support vector machines (SVM) and hidden Markov models just to name a few. So why use Shogun toolbox? Well, when you look at the SVM landscape, for example, you'll find many different publicly available implementations. Let's say you needed to switch your code to use OCAS instead of LibSVM, the semantics of interacting with each implementation could vary wildly. As a result the authors of Shogun Toolbox state, "the motivation for creating a machine learning toolbox was to provide an easy, unified way for solving certain types of machine learning problems." This is great for us because that means we'll get access to many more machine learning models without having to find or build C extensions for Ruby that interface with the SVM implementation we want to use.


What are we trying to solve with Shogun Toolbox?

According to the wikipedia page for Shogun Toolbox, it was developed with bioinformatics in mind. With an industry like that, you should expect lots of data, and do I mean lots. We're talking about millions and millions of data points. Ok... so it can handle a lot of data, but what else is it good for? The real sweet spot are two- and multiclass classification and regression problems, according to the overview paper. There are some other things packaged with the toolbox but know that if you're looking to do classification this is the package for you.


Compiling Shogun Toolbox

The easiest way to get started would be to download the source from github. Once downloaded, make your way to the src directory so that you can compile the toolbox for your machine.


It's as simple as that. Theres a whole host of options that you can configure when you're compiling the program. I would encourage you to check them out as you use the toolbox. Since we're really interested in doing this for Ruby, we need to do a couple more things before we get the toolbox working. First, install the ruby gem narray to your system. Its pretty simple:


Then when you configure the toolbox you'll need to specify that you also want the ruby_modular interface:


There are other interfaces for languages like Octave, Java and Python. Feel free to include them at this time. You should be good to go from this point, unless you're an RVM user.


Gotchas for RVM users

Unfortunately at this time, when you configure the library, it's not smart enough to know that you need to point to your current ruby instead of the system ruby. You'll see a weird error in the build process like Undefined Symbol: _rb_str_new_cstr. You'll need to collect two different pieces of information.

  • The first is where your ruby lib lives. You can issue rvm info and find the environment variable MY_RUBY_HOME.
  • The second is where your platform specific include directory. You can issue ruby -e "puts $:" at the command line and that will show you all of the directories ruby will look to include code. Make sure you've switched to the ruby you want to use! I recommend you put the compiled modshogun lib in the path with site_ruby and the platform specific folder.
Once you get that, all you need to do is slap those on the end of your configuration options:


If you didn't notice when you add the ruby dir to install to, modshogun will be available to all gemsets of the ruby you compiled against. If you need to namespace modshogun to a particular gemset, let me know if you have any ideas.


Running through some examples

Let's go through one of the classifier examples packaged with the source code, namely classifier_libsvm_minimal_modular.rb. I've added my extra notes in the comments:


Resources

If you're new to machine learning and/or classification, I would recommend checking out some other articles to get up to speed on the subject before jumping into just a big piece of software like Shogun Toolbox: