Using VoltDB to Get JPMML into Production

VoltDB Technical Spotlight blog

In the previous articles, Getting Machine Learning into Production and Using VoltDB to Get H20 into Production, I discussed the broad problems you face when you try to commercialize machine learning and showed how to integrate h2o. In this article I’ll show how you can use PMML inside VoltDB.

There are two ways to use JPMML inside VoltDB.

  • Instantiated from inside a stored procedure.
  • As part of a user defined function.

JPMML is tricker to use than H20, for a number of reasons:

  1. Creating the JPMML engine can take anything up to 700 milliseconds. Given that in the VoltDB universe a millisecond is a long time we therefore need to be careful that we aren’t instantiating JPPML each time.
  2. In H20 we had a customer generated Java POJO that implemented the model. In JPMML we have to feed an XML definition into an engine. Given that a VoltDB cluster can have anything up to 30 nodes or more making sure that the right XMl is in the right place at the right time is non-trivial.
  3. If your business use case requires that you access two models H20 will give you two fundamentally different POJOS; In JPMML you’ll have two instances of the same class, with different properties because they were fed different XML.

To solve these problems I used Apache Commons Pools to create a pool of JPMML instances of different kids. You probably don’t need to do this. If you do the code is here. You will probably find VoltDBJPMMLWrangler useful, as it has a whole series of helper methods to do things like translate JPPML data types to VoltDB and vice-versa.

Integrating JPMML into VoltDB stored procedures

An example stored procedure is below. Note that to keep this as simple as possible it doesn’t actually interact with the database tables. A real world deployment would presumably only accept Primary Key information (‘id’), retrieve the record and feed it into JPMML.

package jppml.generated;

package jppml;

import java.io.File;

import org.voltdb.VoltProcedure;
import org.voltdb.VoltTable;
import org.voltdb.jpmml.JPMMLImpl;
import org.voltdb.jpmml.JPMMLImplPool;
import org.voltdb.jpmml.VoltDBJPMMLWrangler;

public class AuditTreeStoredProcNoPool extends VoltProcedure {

    final String modelName = "AuditTree.xml";
    final String modelDir = "/foo/models";

    VoltTable[] pmmlOut = null;
    JPMMLImpl i = null;
    VoltDBJPMMLWrangler w = null;

    public VoltTable[] run(int id, double age, String employment, String education, String marital, String occupation,
            double income, String gender, double deductions, double hours) {

        try {

            // Only incur the cost of creating the model if we
            // have to...
            if (w == null) {
                w = new VoltDBJPMMLWrangler(new File(modelDir + File.separator + modelName));
            }

            // Call a helpful method to give us a parameter object to fill in.
            // IRL this VoltTable could also be the result of a Volt SQL Query.
            VoltTable paramtable = w.getEmptyTable(modelName);

            // Load parameters. Note table is only suppose to have 1 row.
            paramtable.addRow(age, employment, education, marital, occupation, income, gender, deductions, hours);

            // Run model and use wrangler method to convert output to VoltDB format.
            pmmlOut = w.runModel(modelName, paramtable);

            // Return results. Note that in this example we never touched the database tables.
            return pmmlOut;

        } catch (Exception e) {

            System.err.println(e.getMessage());
            e.printStackTrace();
            return null;

        } 
    }

}

Creating a new SQL function in VoltDB that uses JPMML

VoltDB allows you to create new SQL functions from Java classes. There’s no reason it can’t call JPMML. In the example below we define such a function in Java. it uses the pool utility we mentioned earlier to gain access to a PMML engine, and then returns the first column of the first result row as a Java String.

The Java code is here.

package jppml.generated;

import org.voltdb.VoltTable;
import org.voltdb.jpmml.JPMMLImpl;
import org.voltdb.jpmml.JPMMLImplPool;
import org.voltdb.jpmml.VoltDBJPMMLWrangler;

public class AuditTreeProcedure {

    public String auditTree(double age, String employment, String education, String marital, String occupation, double income, String gender, double deductions, double hours) {

        final String modelName = "AuditTree.xml";
        VoltTable[] pmmlOut = null;
        JPMMLImpl i = null;
        VoltDBJPMMLWrangler w = null;

        try {

            i = JPMMLImpl.getInstance();
            JPMMLImplPool p = i.getPool();
            w = p.borrowObject();
            VoltTable paramtable = w.getEmptyTable(modelName);
            paramtable.addRow(age, employment, education, marital, occupation, income, gender, deductions, hours);
            pmmlOut = w.runModel(modelName, paramtable);
            return pmmlOut[0].getString(0);

        } catch (Exception e) {

           System.err.println(e.getMessage());
           e.printStackTrace();
           return null;

        } finally {

          if (i != null && w != null)
              {
              i.getPool().returnObject(w);
              }

        }
    } 

}

To make it usable we need to load it into the database and then create it as a function using SQL:

CREATE FUNCTION auditTree FROM METHOD ppml.generated.AuditTreeProcedure.auditTree;

We can then access it via SQL, either directly or by creating a wrapper procedure around it:

CREATE PROCEDURE auditId
PARTITION ON TABLE auditdata COLUMN id 
 AS
select id, age, employment, education, marital, occupation, income, gender, deductions, hours
, auditTree(age, employment, education, marital, occupation, income, gender, deductions, hours)
from auditdata
where id = ? 
order by id;
  • 184/A, Newman, Main Street Victor
  • info@examplehigh.com
  • 889 787 685 6