IValue: environment friendly illustration of dynamic sorts in C++

IValue: environment friendly illustration of dynamic sorts in C++

[ad_1]

Introduction

In conventional SQL techniques, a column’s kind is set when the desk is created, and by no means modifications whereas executing a question. In case you create a desk with an integer-valued column, the values in that column will all the time be integers (or presumably NULL).

Rockset, nevertheless, is dynamically typed, which implies that we frequently do not know the kind of a price till we really execute the question. That is just like different dynamically typed programming languages, the place the identical variable might include values of various sorts at totally different deadlines:

$ python3
>>> a = 3
>>> kind(a)
<class 'int'>
>>> a="foo"
>>> kind(a)
<class 'str'>

Rockset’s kind system was initially based mostly on JSON, and has since been prolonged to assist different sorts as effectively:

  • bytes: taking a cue from Python, we distinguish between sequences of legitimate Unicode characters (string, which is internally represented as UTF-8) and sequences of arbitrary bytes (bytes)
  • date- and time-specific sorts (date, time, datetime, timestamp, microsecond_interval, month_interval)

There are different sorts that we use internally (and are by no means uncovered to our customers); additionally, the kind system is extensible, with deliberate assist for decimal (base-10 floating-point), geometry / geography sorts, and others.

Within the following instance, assortment ivtest has paperwork containing one area a, which takes a wide range of sorts:

$ rock create assortment ivtest
Assortment "ivtest" was created efficiently in workspace "commons".

$ cat /tmp/a.docs
{"a": 2}
{"a": "whats up"}
{"a": null}
{"a": {"b": 10}}
{"a": [2, "foo"]}

$ rock add ivtest /tmp/a.docs
{
 "file_name":"a.docs",
 "file_upload_id":"c5ccc261-0096-4a73-8dfe-d6db8b8d130e",
 "uploaded_at":"2019-06-05T18:12:46Z"
}

$ rock sql
> choose typeof(a), a from ivtest order by a;
+-----------+------------+
| ?typeof   | a          |
|-----------+------------|
| null_type | <null>     |
| int       | 2          |
| string    | whats up      |
| array     | [2, 'foo'] |
| object    | {'b': 10}  |
+-----------+------------+
Time: 0.014s

This submit reveals one in every of many challenges that we encountered whereas constructing a completely dynamically typed SQL database: how we manipulate values of unknown sorts in our question execution backend (written in C++), whereas approaching the efficiency of utilizing native sorts immediately.

At first, we used protocol buffers just like the definition beneath (simplified to solely present integers, floats, strings, arrays, and objects; the precise oneof that we use has just a few further fields):

message Worth {
  oneof value_union {
    int64 int_value = 1;
    double float_value = 2;
    string string_value = 3;
    ArrayValue array_value = 4;
    ObjectValue object_value = 5;
  }
}

message ArrayValue {
  repeated Worth values = 1;
}

message ObjectValue {
  repeated KeyValue kvs = 1;
}

message KeyValue {
  string key = 1;
  Worth worth = 2;
}   

However we rapidly realized that that is inefficient, each when it comes to velocity and when it comes to reminiscence utilization. First, protobuf requires a heap reminiscence allocation for each object; making a Worth that accommodates an array of 10 integers would carry out:

  • a reminiscence allocation for the top-level Worth
  • an allocation for the array_value member
  • an allocation for the record of values (ArrayValue.values, which is a RepeatedPtrField)
  • an allocation for every of the ten values within the array

for a complete of 13 reminiscence allocations.

Additionally, the ten values within the array will not be allotted contiguously in reminiscence, which causes an additional lower in efficiency because of cache locality.

It was rapidly clear that we wanted one thing higher, which we known as IValue. In comparison with the protobuf model, IValue is:

  • Extra reminiscence environment friendly: whereas not as environment friendly as utilizing native sorts immediately, IValue have to be small, and should keep away from heap allocations wherever potential. IValue is all the time 16 bytes, and doesn’t allocate heap reminiscence for integers, booleans, floating-point numbers, and quick strings.
  • Quicker: arrays of scalar IValues are allotted contiguously in reminiscence, main to higher cache locality. This isn’t as environment friendly as utilizing native sorts immediately, however it’s a important enchancment over protobuf.

Most of Rockset’s question execution engine operates on IValues (there are some elements which have specialised implementation for particular sorts, and that is an space of lively enchancment).

We would wish to share an summary of the IValue design. Be aware that IValue is optimized for Rockset’s wants and isn’t meant to be transportable — we use Linux and x86_64-specific methods, and assume a little-endian reminiscence structure.

The thought is in itself not novel; the strategies that we use date again to not less than 1993, as surveyed in “Representing Kind Info in Dynamically Typed Languages”. We determined to make IValue 128 bits as a substitute of 64, because it permits us to keep away from heap allocations in additional instances (together with all 64-bit integers); utilizing the taxonomy outlined within the paper, IValue is a double-wrapper scheme with qualifiers.

Internally, IValue is represented as a 128-bit (16-byte) worth, consisting of:

  • a 64-bit area (known as information)
  • a 48-bit area (known as pointer, because it usually, however not all the time, shops a pointer)
  • two 8-bit discriminator fields (known as tag0 and tag1)


IValue overview (1)

tag1 signifies the kind of the worth. tag0 is often a subtype, and the which means of the opposite two fields modifications relying on kind. The pointer area is commonly a pointer to another information construction, allotted on the heap, for the instances the place heap allocations cannot be prevented; as pointers are solely 48 bits on x86_64, we’re in a position to match a pointer and the 2 discriminator fields in the identical uint64_t.

We acknowledge two kinds of IValues:

tag1 has bit 7 clear (tag1 < 0x80) for all speedy values, and set (tag1 >= 0x80) for all non-immediate values. This permits us to tell apart between speedy and non-immediate values in a short time, utilizing one easy bit operation. We will then copy, hash, and examine for equality speedy values by treating them as a pair of uint64_t integers.

Scalar Varieties

The illustration for many scalar sorts is simple: tag0 is often zero, tag1 identifies the kind, pointer is often zero, and information accommodates the worth.

SQL NULL is all zeros, which is handy (memset()ing a bit of reminiscence to zero makes it NULL when interpreted as IValue):

null

Booleans have information = 0 for false and information = 1 for true, tag1 = 0x01

boolean

Integers have the worth saved in information (as int64_t) and tag1 = 0x02

integer

And so forth. The layouts for different scalar sorts (floating level, date / time, and so on) are related.

Strings

We deal with character strings and byte strings equally; the worth of tag1 is the one distinction. For the remainder of the part, we’ll solely give attention to character strings.

IValue strings are immutable, keep the string’s size explicitly, and will not be null-terminated. In step with our aim to reduce heap allocations, IValue would not use any exterior reminiscence for brief strings (lower than 16 bytes).

As an alternative, we implement the small string optimization: we retailer the string contents (padded with nulls) within the information, pointer, and tag0 fields; we retailer the string size within the tag1 area: tag1 is 0x1n, the place n is the string’s size.

An empty string has tag1 = 0x10 and all different bytes zero:

empty string (1)

And, for instance, the 11-byte string “Whats up world” has tag1 = 0x1b (word the little-endian illustration; the byte 'H' is first):

hello world (1)

Strings longer than 15 bytes are saved out-of-line: tag1 is 0x80, pointer factors to the start of the string (allotted on the heap utilizing malloc()), and information accommodates the string size. (There may be additionally the opportunity of referencing a “overseas” string, the place IValue would not personal the reminiscence however factors inside a preallocated buffer, however that’s past the scope of this submit.)

long string (1)

For instance, the 19-byte string “Rockset is superior!”:

long string (2)

Vectors

Vectors (which we name “arrays”, adopting JSON’s terminology) are equally allotted on the heap: they’re just like vectors in most programming languages (together with C++’s std::vector). tag1 is 0x82, pointer factors to the start of the vector (allotted on the heap utilizing malloc()), and information accommodates the vector’s measurement and capability (32 bits every). The vector itself is a contiguously allotted block of capability() IValues (capability() * 16 bytes); when reallocation is required, the vector grows exponentially (with an element that’s lower than 2, for the explanations described in Fb’s fbvector implementation.)

vector (3)

Hash Maps

Maps (which we name “objects”, adopting JSON’s terminology) are additionally allotted on the heap. We symbolize objects as open-addressing hash tables with quadratic probing; the scale of the desk is all the time an influence of two, which simplifies probing. We probe with triangular numbers, similar to Google’s sparsehash, which. as Knuth tells us in The Artwork of Pc Programming (quantity 3, chapter 6.4, train 20), routinely covers all slots.

Every hash desk slot is 32 bytes — two IValues, one for the important thing, one for the worth. As is often the case with open-addressing hash tables, we’d like two particular keys — one to symbolize empty slots, and one to symbolize deleted parts (tombstones). We reserve two values of tag1 for that function (0x06 and 0x05, respectively).

The pointer area factors to the start of the hash desk (a contiguous array of slots, allotted on the heap utilizing malloc().) We retailer the present measurement of the hash desk within the least-significant 32 bits of the information area. The tag0 area accommodates the variety of allotted slots (because it’s all the time an influence of two, we retailer log2(variety of slots) + 1, or zero if the desk is empty).

The capability area (most important 32 bits of information) deserves additional curiosity: it’s the variety of slots obtainable for storing consumer information. Initially, it’s the identical as the full variety of slots, however, as in all open-addressing hash tables, erasing a component from the desk marks the slot as “deleted” and renders it unusable till the subsequent rehash. So erasing a component really decreases the desk’s capability.

hashtable (1)

Efficiency

IValue provides a considerable efficiency enchancment over the outdated protobuf-based implementation:

  • creating arrays of strings is between 2x and 7x sooner (relying on the string measurement; due to the small-string optimization, IValue is considerably sooner for small strings)
  • creating arrays of integers can be 7x sooner (as a result of we now not allocate reminiscence for each particular person array ingredient)
  • iterating over giant arrays of integers is 3x sooner (as a result of the values within the array are actually allotted contiguously)

Future Work

Regardless that Rockset paperwork are allowed to include information of a number of sorts in the identical area, the scenario proven within the introduction is comparatively uncommon. In observe, a lot of the information is of the identical kind (or NULL), and, to acknowledge this, we’re extending IValue to assist homogeneous arrays.

All parts in a homogeneous array are of the identical kind (or NULL). The construction is just like the common (heterogeneous) arrays (described above), however the pointer area factors on to an array of the native kind (int64_t for an array of integers, double for an array of floating-point values, and so on). Much like techniques like Apache Arrow, we additionally keep an elective bitmap that signifies whether or not a selected worth is NULL or not.

The question execution code acknowledges the widespread case the place it produces a column of values of the identical kind, by which case it would generate a homogeneous array. We’ve environment friendly, vectorized implementations of widespread database operations on homogeneous arrays, permitting us important efficiency enhancements within the widespread case.

That is nonetheless an space of lively work, and benchmark outcomes are forthcoming.

Conclusion

We hope that you simply loved a quick look below the hood of Rockset’s engine. Sooner or later, we’ll share extra particulars about our approaches to constructing a completely dynamically typed SQL database; if you would like to present us a attempt, join an account; if you would like to assist construct this, we’re hiring!



[ad_2]

Previous Article

4 Core Tenets for Your SASE Journey

Next Article

Verizon's Seen outage, clients unable to ship texts

Write a Comment

Leave a Comment

Your email address will not be published. Required fields are marked *

Subscribe to our Newsletter

Subscribe to our email newsletter to get the latest posts delivered right to your email.
Pure inspiration, zero spam ✨