Using VoltDB to Get JPMML into Production - VoltDB
post-template-default,single,single-post,postid-13633,single-format-standard,mkd-core-1.0,highrise-ver-1.0,,mkd-smooth-page-transitions,mkd-ajax,mkd-grid-1300,mkd-blog-installed,mkd-header-standard,mkd-sticky-header-on-scroll-up,mkd-default-mobile-header,mkd-sticky-up-mobile-header,mkd-dropdown-slide-from-bottom,mkd-dark-header,mkd-header-style-on-scroll,mkd-full-width-wide-menu,mkd-header-standard-in-grid-shadow-disable,mkd-search-dropdown,wpb-js-composer js-comp-ver-6.0.3,vc_responsive
VoltDB / Tech Spotlight  / Using VoltDB to Get JPMML into Production


Using VoltDB to Get JPMML into Production

In the previous articles, Getting Machine Learning into Production and Using VoltDB to Get H20 into Production, I discussed the broad problems you face when you try to commercialize machine learning and showed how to integrate h2o. In this article I’ll show how you can use PMML inside VoltDB.

There are two ways to use JPMML inside VoltDB.

  • Instantiated from inside a stored procedure.
  • As part of a user defined function.

JPMML is tricker to use than H20, for a number of reasons:

  1. Creating the JPMML engine can take anything up to 700 milliseconds. Given that in the VoltDB universe a millisecond is a long time we therefore need to be careful that we aren’t instantiating JPPML each time.
  2. In H20 we had a customer generated Java POJO that implemented the model. In JPMML we have to feed an XML definition into an engine. Given that a VoltDB cluster can have anything up to 30 nodes or more making sure that the right XMl is in the right place at the right time is non-trivial.
  3. If your business use case requires that you access two models H20 will give you two fundamentally different POJOS; In JPMML you’ll have two instances of the same class, with different properties because they were fed different XML.

To solve these problems I used Apache Commons Pools to create a pool of JPMML instances of different kids. You probably don’t need to do this. If you do the code is here. You will probably find VoltDBJPMMLWrangler useful, as it has a whole series of helper methods to do things like translate JPPML data types to VoltDB and vice-versa.

Integrating JPMML into VoltDB stored procedures

An example stored procedure is below. Note that to keep this as simple as possible it doesn’t actually interact with the database tables. A real world deployment would presumably only accept Primary Key information (‘id’), retrieve the record and feed it into JPMML.

Creating a new SQL function in VoltDB that uses JPMML

VoltDB allows you to create new SQL functions from Java classes. There’s no reason it can’t call JPMML. In the example below we define such a function in Java. it uses the pool utility we mentioned earlier to gain access to a PMML engine, and then returns the first column of the first result row as a Java String.

The Java code is here.

To make it usable we need to load it into the database and then create it as a function using SQL:

We can then access it via SQL, either directly or by creating a wrapper procedure around it:

Post Tags: