Introducing Geospatial Capabilities in Volt Active Data v6.0

Introducing Geospatial Capabilities in Volt Active Data v6.0

February 04, 2016

As part of our 6.0 release, we’ve added new data types and functions that let you work with geospatial data in your applications. If you care about where things are in the real world and how they relate to each other spatially, we have some new features that will help you out.

The New Data Types

We’ve added two new SQL data types that let you store the locations and regions that are important to your application: GEOGRAPHY and GEOGRAPHY_POINT. We’ll discuss the latter first.

The new data type GEOGRAPHY_POINT lets you store data that represents a point on the surface of the earth. This could be the location of a cell phone, a sensor, or anything, really. Here’s some DDL:

create table devices (
  id integer not null primary key,
  location geography_point
);

insert into devices values (
0,
pointFromText(‘POINT(-122.0264 36.9719)’)
);

create table devices (
id integer not null primary key,
location geography_point
);
insert into devices values (
0,
pointFromText(‘POINT(-122.0264 36.9719)’)
);
view rawgistfile1.txt hosted with ❤ by GitHub

The above SQL creates a table with a GEOGRAPHY_POINT column and then inserts a row where the device id is zero, and the location field is a point at 36.9719° N, 122.0264° W. To create a point, we use a scalar function called pointFromText. It produces a GEOGRAPHY_POINT value from string input that is formatted as “well-known text.” Well-known text (WKT) is a common convention for representing geometric data as strings.

Likewise, GEOGRAPHY is a data type that can be used to represent polygons whose vertices are points on the surface of the earth. You might wish to use this data type to represent political boundaries (such as states or counties), or any region of interest, with arbitrary shape and size.

create table regions (
  id integer not null primary key,
  name varchar(128) not null,
  border geography
);

insert into regions values (
0,
‘The Bermuda Triangle’,
polygonFromText(‘POLYGON((-64.72 32.16, -80.41 25.30, -65.82 18.40, -64.72 32.16))’)
);

create table regions (
id integer not null primary key,
name varchar(128) not null,
border geography
);
insert into regions values (
0,
‘The Bermuda Triangle’,
polygonFromText(‘POLYGON((-64.72 32.16, -80.41 25.30, -65.82 18.40, -64.72 32.16))’)
);
view rawgistfile1.txt hosted with ❤ by GitHub

The above SQL will create a table with a GEOGRAPHY column, and insert a row into it. The third column uses a function analogous to the one we used to insert a point, polygonFromText. The WKT for polygons is a little more complicated than for points. There is a detailed description in our documentation.

The New Functions

New data types for representing points and polygons aren’t terribly useful without ways to query the data in a meaningful way. That’s why we’ve also added functions. The following examples use the schema described in the previous section.

CONTAINS

The new sql function CONTAINS answers the question, “Does this polygon contain that point?” Here’s an example:

-- Which devices are within which borders?
select r.id, r.name, d.id
from devices as d
inner join regions as r
on contains(r.border, d.location);
— Which devices are within which borders?
select r.id, r.name, d.id
from devices as d
inner join regions as r
on contains(r.border, d.location);
view rawgistfile1.txt hosted with ❤ by GitHub

This query does a geospatial join – that is, relating rows in one table to rows in another, based on a geospatial relationship.

DISTANCE

The DISTANCE function answers the question, “How far is this point from that point or polygon?” Here’s an example:

-- Given the id of a particular point, which points are within 1000
-- meters of that point?
select d2.id
from devices as d1
  inner join devices as d2
  on distance(d1.location, d2.location) <= 1000
where d1.id = 0;
— Given the id of a particular point, which points are within 1000
— meters of that point?
select d2.id
from devices as d1
inner join devices as d2
on distance(d1.location, d2.location) <= 1000
where d1.id = 0;
view rawgistfile1.txt hosted with ❤ by GitHub

AREA

The AREA function tells us the size of a polygon in square meters.

-- Which regions have the highest density of devices?
-- Returns the top 5 regions and devices per m^2.
select r.id, count(*) / area(r.border) as device_density
from devices as d
  inner join regions as r
  on contains(r.border, d.location)
group by r.id
order by device_density desc
limit 5;
— Which regions have the highest density of devices?
— Returns the top 5 regions and devices per m^2.
select r.id, count(*) / area(r.border) as device_density
from devices as d
inner join regions as r
on contains(r.border, d.location)
group by r.id
order by device_density desc
limit 5;
view rawgistfile1.txt hosted with ❤ by GitHub

There are other functions described in our documentation, but these are the most important ones for answering interesting questions.

Putting It All Together

We’ve got an example of a geospatial app in the 6.0 kit. In the example, we simulate a mobile ad broker app. Consider the problem of serving particular ads to a mobile device user based on his or her location. This is a problem that requires high throughput, because there are a lot of mobile device users out there. It also requires a fast response. We want to deliver the ad while the user is still standing right in front of the store, with the device in hand, and we want to deliver it only once.

In the example, we maintain a number of bids made by businesses participating in our ad network. Each bid says how much the business will pay to serve an ad, the region (as a GEOGRAPHY column) in which the ad should be served, and a time window during which the bid is valid (such as during a sale):

CREATE TABLE bids
(
  id bigint not null
, advertiser_id bigint not null
, region geography(16384) not null
, ts_start timestamp not null
, ts_end timestamp not null
, bid_amount float not null
, primary key (id)
);
CREATE TABLE bids
(
id bigint not null
, advertiser_id bigint not null
, region geography(16384) not null
, ts_start timestamp not null
, ts_end timestamp not null
, bid_amount float not null
, primary key (id)
);
view rawgistfile1.txt hosted with ❤ by GitHub

The demo simulates a user logging into social media by invoking a stored procedure to the database. The question of interest to answer here is, “Which ad should the user be shown based on his current location?” We get an answer to this question with a geospatial query:

select id, advertiser_id, bid_amount
from bids
where current_timestamp between ts_start and ts_end
    and contains(region, ?)
  order by bid_amount desc, id
    limit 1;
select id, advertiser_id, bid_amount
from bids
where current_timestamp between ts_start and ts_end
and contains(region, ?)
order by bid_amount desc, id
limit 1;
view rawgistfile1.txt hosted with ❤ by GitHub

The parameter in this query, passed in from the client, is the current location of the user’s device. The query finds the relevant bids for the current time and the user’s location, and returns the id of the bid that was for the highest dollar amount.

For a more complete discussion of the app, see the accompanying README: https://github.com/Volt Active Data/voltdb/tree/master/examples/geospatial

Or if you’re interested in trying out the demo for yourself, download our Volt Active Data 6.0 kit and have a look.

Stay Tuned for Future Work

In upcoming versions we’ll introduce indexes on GEOGRAPHY columns that will allow very fast evaluation of CONTAINS predicates, so geospatial queries can be evaluated at the exceptional speeds to which our users have become accustomed.

  • 184/A, Newman, Main Street Victor
  • info@examplehigh.com
  • 889 787 685 6