Using VoltDB to Get h20 into Production

Using VoltDB to Get h20 into Production

August 29, 2018

In my previous article I discussed the broad problems you face when you try to commercialize machine learning. In this one I’ll go through the specific steps you’ll need to get everything to work.

There are two ways to integrate h20 generated classes. The first, and most obvious is to reference it from inside a stored procedure:

See: https://bitbucket.org/voltdbseteam/h20-demo/src/master/srcDb/h20/AirlineDemo.java

import org.voltdb.VoltProcedure;
import org.voltdb.VoltTable;
import org.voltdb.VoltType;

import hex.genmodel.easy.RowData;
import hex.genmodel.easy.EasyPredictModelWrapper;
import hex.genmodel.easy.prediction.*;

/**
 * VoltDB procedure to invoke a generated h20.ai class.
 * 
 * @author drolfe
 *
 */
public class AirlineDemo extends VoltProcedure {

    /**
     * Name of h20.ai class we're going to use.
     */
    private static final String modelClassName = "gbm_pojo_test";

    /**
     * This VoltDB procedure uses an H20.AI function to guess whether a given
     * flight will be late. To make the example as simple as possible all values
     * are passed in as parameters. In a real world deployment we would
     * obviously pass in a Primary Key and the retrieve all the other fields.
     * This example uses a US Federal Aviation Administration dataset of
     * commercial flight data.
     * 
     * Note that all VoltDB procedure calls have a method like:
     * 
     * <br>
     * <code>
     * VoltTable[] run(params...);
     * </code>
     * 
     * @param cRSDepTime
     *            Depature time
     * @param year
     *            year
     * @param month
     *            Month
     * @param dayOfMonth
     *            Day
     * @param dayOfWeek
     *            Day of week
     * @param uniqueCarrier
     *            Airline
     * @param origin
     *            Origin Airport
     * @param dest
     *            Destination Airport
     * @return pmmlOut An array of VoltTable objects containing the results.
     * @throws VoltAbortException
     */
    public VoltTable[] run(String cRSDepTime, String year, String month, String dayOfMonth, String dayOfWeek,
            String uniqueCarrier, String origin, String dest) throws VoltAbortException {

        // We need to return an array of VoltTable[]. Normally we get
        // VoltTable's by issuing SQL queries. In this case we'll be inventing
        // one based on the results of h20.
        VoltTable[] pmmlOut;

        // Trap h20 failures...
        try {

            // keep track of how long it takes to instantiate the model. VoltDB
            // procedures normally run in tenths or hundreths of a milliseconds.
            // If it takes too long to instantiate the ML engine you need to do
            // define it as a variable and then instantiate it if and only if
            // it's null.
            long startMs = System.currentTimeMillis();
            hex.genmodel.GenModel rawModel;
            rawModel = (hex.genmodel.GenModel) Class.forName(modelClassName).newInstance();
            EasyPredictModelWrapper model = new EasyPredictModelWrapper(rawModel);

            long duration = System.currentTimeMillis() - startMs;
            if (duration > 1) {
                System.out.println("ms=" + (System.currentTimeMillis() - startMs));
            }

            // Load our params into the data structure uses by H20...
            RowData row = new RowData();
            row.put("Year", year);
            row.put("Month", month);
            row.put("DayofMonth", dayOfMonth);
            row.put("DayOfWeek", dayOfWeek);
            row.put("CRSDepTime", cRSDepTime);
            row.put("UniqueCarrier", uniqueCarrier);
            row.put("Origin", origin);
            row.put("Dest", dest);

            // Run the model. As before we track the time it takes.
            startMs = System.currentTimeMillis();
            BinomialModelPrediction p = model.predictBinomial(row);

            duration = System.currentTimeMillis() - startMs;
            if (duration > 1) {
                System.out.println("ms=" + (System.currentTimeMillis() - startMs));
            }

            // In a real world deployment we'd never use System.out.println, but
            // this is a demo...
            System.out.println("Label (aka prediction) is flight departure delayed: " + p.label);

            // We now need to load the results into a VoltTable.
            pmmlOut = new VoltTable[1];

            // Our table will have one row, a LABEL column and then some other
            // columns taken from the H20 output.
            VoltTable.ColumnInfo[] cols = new VoltTable.ColumnInfo[p.classProbabilities.length + 1];
            Object[] vals = new Object[p.classProbabilities.length + 1];

            cols[0] = new VoltTable.ColumnInfo("LABEL", VoltType.STRING);
            vals[0] = p.label;

            for (int i = 0; i < p.classProbabilities.length; i++) {
                cols[i + 1] = new VoltTable.ColumnInfo("P" + (i + 1), VoltType.FLOAT);
                vals[i + 1] = p.classProbabilities[i];
            }

            // We are required to return an array every though we have only one
            // table. Create an try in position 0.
            pmmlOut[0] = new VoltTable(cols);

            // Add the row we invented.
            pmmlOut[0].addRow(vals);

        } catch (Exception e) {

            System.err.println(e.getMessage());

            // VoltAbortException undoes all the DB changes made by a stored
            // procedure call. In thus case there aren't any, but if we added
            // logic to store or change state after the call we'd need this.
            throw new VoltAbortException(e);

        }

        // Execute any SQL statements we have queued as a last step. In this
        // case it's a null-op.
        voltExecuteSQL(true);

        // Return the array we invented.
        return pmmlOut;
    }

}

The key line here is:

rawModel = (hex.genmodel.GenModel) Class.forName(modelClassName).newInstance();

Once I have a model in Java there’s nothing to stop me doing whatever I want with it in a VoltDB procedure.

The second approach is less flexible but easier to implement. VoltDB allows you to create SQL functions from methods in Java classes. There are a couple of caveats – the classes in question should not be maintaining state between method calls, and the functions can’t return complex objects.

In the example below we reuse the same ML engine we used earlier. The first step is to code the Java class:

See: https://bitbucket.org/voltdbseteam/h20-demo/src/master/srcDb/h20/AirlineDemoUDF.java

import hex.genmodel.easy.RowData;
import hex.genmodel.easy.EasyPredictModelWrapper;
import hex.genmodel.easy.prediction.*;

/**
* VoltDB user defined function for H2o. Note that all of the 'intelligence' is in the method 'ademo'.
*
* @author drolfe
*
*/
public class AirlineDemoUDF {

/**
* Name of h20.ai class we're going to use.
*/
private static String modelClassName = "gbm_pojo_test";

/**
* This VoltDB User Defined Function uses an H20.AI function to guess whether a given
* flight will be late. This example uses a US Federal Aviation Administration dataset of
* commercial flight data.
*
* To make it usable in VoltDB you create it as a SQL function:
*
* 
* CREATE FUNCTION ademo FROM METHOD AirlineDemoUDF.ademo;
* 
*
* @param cRSDepTime
* Depature time
* @param year
* year
* @param month
* Month
* @param dayOfMonth
* Day
* @param dayOfWeek
* Day of week
* @param uniqueCarrier
* Airline
* @param origin
* Origin Airport
* @param dest
* Destination Airport
* @return pmmlOut An array of VoltTable objects containing the results.
* @throws VoltAbortException
*/
public String ademo(String cRSDepTime, String year, String month, String dayOfMonth, String dayOfWeek,
String uniqueCarrier, String origin, String dest) {

try {

// Instantiate model...
hex.genmodel.GenModel rawModel;
rawModel = (hex.genmodel.GenModel) Class.forName(modelClassName).newInstance();
EasyPredictModelWrapper model = new EasyPredictModelWrapper(rawModel);

// Load params into a H20 data structure...
RowData row = new RowData();
row.put("Year", year);
row.put("Month", month);
row.put("DayofMonth", dayOfMonth);
row.put("DayOfWeek", dayOfWeek);
row.put("CRSDepTime", cRSDepTime);
row.put("UniqueCarrier", uniqueCarrier);
row.put("Origin", origin);
row.put("Dest", dest);

// Call model...
BinomialModelPrediction p = model.predictBinomial(row);

// Return the most obvious result...
return (p.label);

} catch (Exception e) {

System.err.println(e.getMessage());
return null;

}

}

}

Once this has done and is in a JAR file we can load it into the database:

load classes my_h20_function.jar;

And then create a User Defined Function:

CREATE FUNCTION ademo FROM METHOD h20.AirlineDemoUDF.ademo;

To use this function we need to pass in a bunch of parameters. Rather than spend lots of time and energy writing another Java stored procedure we leverage VoltDB’s ability to create a Stored Procedure for any individual SQL statement. Note how we find all the occurrences of a flight number, call ‘ademo’ for them, and return the data:

CREATE PROCEDURE flight_hist
PARTITION ON TABLE flights COLUMN f_FlightNum
AS
SELECT f_cRSDepTime, f_year, f_month, f_dayOfMonth,
f_dayOfWeek, f_uniqueCarrier, f_origin, f_dest
,ademo(f_cRSDepTime, f_year, f_month, f_dayOfMonth,
f_dayOfWeek, f_uniqueCarrier, f_origin, f_dest ) ademo
from flights
where f_FlightNum = ?
order by f_year, f_month, f_dayOfMonth,f_cRSDepTime;
  • 184/A, Newman, Main Street Victor
  • info@examplehigh.com
  • 889 787 685 6