Connecting Volt Active Data to Amazon AWS Kinesis Streams

Connecting Volt Active Data and Amazon Kinesis

Connecting Volt Active Data to Amazon AWS Kinesis Streams

July 13, 2017

Volt Active Data includes interfaces – both import and export – to many other applications and systems in the broader ecosystem of big and fast data, including the cloud. A recent integration, released in Volt Active Data v6.6, allows system builders to connect Volt Active Data in either data producer and data consumer roles with Amazon (AWS) Kinesis streaming services.

Connections to a variety of producers and consumers in the big and fast data world include connectors to Hadoop, Kafka, RabbitMQ, and Elasticsearch, as well as to generic interfaces like JDBC to connect to other database systems, HTTP, and file import and streaming export (CSV, TSV).

Our goals are consistent – connectors must be easy to setup and use, high performance, and the no-compromises strong ACID compliance that’s at the core of Volt Active Data.

Streaming to Amazon Kinesis Firehose

For the first case, Volt Active Data is the data source with respect to Amazon Kinesis. Let’s imagine a use case where client applications load high frequency stock trade records into a Volt Active Data cluster for validation and aggregation. Since Volt Active Data is an in-memory database, we set up time windowing procedures to insert rows into streams (export tables) and delete the corresponding rows from the in-memory tables.

The plumbing to transmit streaming data from Volt Active Data to Amazon is the Volt Active Data Kinesis Firehose Export Conduit (VDBKFHEC), freely available at https://github.com/Volt Active Data/export-kinesis.

A stream in Volt Active Data can be thought of as a virtual table. This table doesn’t hold state; rather, it defines an outbound schema that can be consumed, e.g. streamed, by an external system. This enables a Volt Active Data application to easily, and transactionally, export data from Volt Active Data to another system, in this example, Volt Active Data Kinesis, but also to many other systems such as Kafka, HDFS, relational databases, etc.

Here’s an example of a simple stream definition:

CREATE STREAM export_transactions PARTITION ON COLUMN INSTANCE_ID EXPORT TO TARGET external_consumer
(
  INSTANCE_ID         BIGINT NOT NULL,
  SEQ                 BIGINT,
  EVENT_TYPE_ID       INTEGER,
  EVENT_DATE          TIMESTAMP NOT NULL
  EXPORT_DATE         TIMESTAMP default now,
  TRANS               VARCHAR(1000),
  CONSTRAINT PK_EXPORT_TRANS PRIMARY KEY (INSTANCE_ID)
);

 

In this case, we connect the stream to Amazon’s Kinesis Firehose service. The connection is specified in the Volt Active Data “deployment” file:

<export>
  <configuration target="external_consumer" enabled="true" type="kinesis">
    <property name="region">us-east-1</property>
    <property name="stream.name">streamtest</property>
    <property name="access.key">MYKEY</property>
    <property name="secret.key">MYSECRETKEY</property>
  </configuration>
</export>

 

In Kinesis Firehose, we’ve already created the stream “streamdata” and configured its handling in the world of Amazon Services. This involves creating the delivery stream, directing the data flow through Amazon S3, and then specifying that the rows of data are inserted into Amazon Redshift, a Postgres-like database for persistent storage hosted on a EC2 cluster.

In “demo” applications in AWS, the default data transfer rate to Amazon using the Kinesis Firehose service is quite limited – default is 2,000 transactions per second or 5,000 records per second or 5MB of data per second. However, to support our development and testing at typical Volt Active Data transaction/second rates, the Amazon Kinesis team graciously provisioned much higher transfer rates, and likewise will work with business teams to support a wide range of data transfer requirements.

With provisioning for better performance from Amazon, our export streaming speed exceeded 28,000 rows per second. That’s still slower than Volt Active Data’s transactional capability, but as mentioned above, the database buffers rows to disk and sends them to the Amazon consumer at its maximum consumption rate.

Behind the scenes in Volt Active Data, the streaming service is capable of buffering data on disk to match slower consumers without reducing primary database performance. In a Volt Active Data application, rows are inserted into the stream, just as you would insert data into a relational table. Behind the scenes, the Volt Active Data export stream manager delivers the rows to the stream consumer or consumers at their maximum rate. This is handled with Volt Active Data reliability.

Streaming to Volt Active Data from Amazon Kinesis Streams

In the second case, Volt Active Data is the data consumer, ingesting rows of data from Amazon Kinesis streams and inserting them into database tables transactionally.  This data can be ingested using default Volt Active Data insert transactions, or the application can specify custom business logic via a transactional Stored Procedure, for example to handle validation, data cleansing, aggregation, or routing.

In this example use case, Amazon Kinesis streams consume click data from numerous geographically-distributed web servers. Volt Active Data consumes this stream data, validating and “sessionizing” the data using stored procedures and inserting the processed rows into appropriate tables and materialized views for aggregation and real-time analysis. It’s easy to visualize output streaming as well, since it’s possible there’s a downstream data warehouse as the next stop.

We specify the Volt Active Data consumer connection to the Amazon Kinesis stream in the “import” section of the database deployment file:

<import>
  <configuration type="kinesis" format="csv" enabled="true">
    <property name="app.name">StockApp</property>
    <property name="region">us-east-1</property>
    <property name="stream.name">StockTradeStream</property>
    <property name="procedure">stock.insert</property>
    <property name="max.read.batch.size">100</property>
    <property name="access.key">MYACESSKEY</property>
    <property name="secret.key">MYSECRETKEY</property>
    <format-property name="custom.null.string">AAPL</format-property>
  </configuration>
</import>

 

Note the “procedure” property in the deployment file. This is the stored procedure that processes the incoming data. Volt Active Data calls the procedure for each incoming row. The procedure can validate the data, query other tables to “sessionize,” and insert the row into a database table. This happens as a transaction, atomic by definition, so if there are problems or errors, all the processing will be rolled back.

Volt Active Data automatically creates “default” procedures, including a simple INSERT. If the goal is to get the row into a table as fast as possible, it can be done without any coding at all.

Conclusion

We’ve seen how Volt Active Data can consume – import – streaming data from Amazon Kinesis and it can produce – export – streaming data to Amazon Kinesis, or both if that’s the application design requirement.

It is easy to import or stream export data with no application coding at all.  If complex processing and aggregation are required, that’s straightforward too, as the example in this post illustrates.

Give it a try today and let us know what you think.  Download Volt Active Data here. For more information on the Kinesis connector, see the Volt Active Data documentation, https://docs.voltdb.com/. In particular, see “Importing and Export Live Data”.

 

  • 184/A, Newman, Main Street Victor
  • info@examplehigh.com
  • 889 787 685 6