Обсуждение: Postgres as key/value store

Поиск
Список
Период
Сортировка

Postgres as key/value store

От
snacktime
Дата:
I'm looking for some feedback on the design I'm using for a basic key/value storage using postgres.

Just some quick background.  This design is for large scale games that can get up to 10K writes per second or more.  The storage will be behind a distributed memory cache that is built on top of Akka, and has a write behind caching mechanism to cut down on the number of writes when you have many updates in a short time period of the same key, which is common for a lot of multiplayer type games.

I have been using Couchbase, but this is an open source project, and Couchbase is basically a commercial product for all intents and purposes, which is problematic.  I will still support Couchbase, but I don't want it have to tell people if you really want to scale, couchbase is the only option.

The schema is that a key is a string, and the value is a string or binary.  I am actually storing protocol buffer messages, but the library gives me the ability to serialize to native protobuf or to json.  Json is useful at times especially for debugging.

This is my current schema:

CREATE TABLE entities
(
  id character varying(128) NOT NULL,
  value bytea,
  datatype smallint,
  CONSTRAINT entities_pkey PRIMARY KEY (id)
);

CREATE OR REPLACE RULE entities_merge AS
    ON INSERT TO entities
   WHERE (EXISTS ( SELECT 1
           FROM entities entities_1
          WHERE entities_1.id::text = new.id::text)) DO INSTEAD  UPDATE entities SET value = new.value, datatype = new.datatype
  WHERE entities.id::text = new.id::text;

Additional functionality I want is to do basic fuzzy searches by key.  Currently I'm using a left anchored LIKE query.  This works well because keys are left prefixed with a scope, a delimiter, and then the actual key for the data.  These fuzzxy searches would never be used in game logic, they would be admin only queries for doing things like  obtaining a list of players.  So they should be infrequent.

The scope of the query ability will not expand in the future.  I support multiple backends for the key/value storage so I'm working with the lowest common denominator.  Plus I have a different approach for data that you need to do complex queries on (regular tables and an ORM).

 Chris

Re: Postgres as key/value store

От
Gavin Flower
Дата:
On 28/09/14 12:48, snacktime wrote:
I'm looking for some feedback on the design I'm using for a basic key/value storage using postgres.

Just some quick background.  This design is for large scale games that can get up to 10K writes per second or more.  The storage will be behind a distributed memory cache that is built on top of Akka, and has a write behind caching mechanism to cut down on the number of writes when you have many updates in a short time period of the same key, which is common for a lot of multiplayer type games.

I have been using Couchbase, but this is an open source project, and Couchbase is basically a commercial product for all intents and purposes, which is problematic.  I will still support Couchbase, but I don't want it have to tell people if you really want to scale, couchbase is the only option.

The schema is that a key is a string, and the value is a string or binary.  I am actually storing protocol buffer messages, but the library gives me the ability to serialize to native protobuf or to json.  Json is useful at times especially for debugging.

This is my current schema:

CREATE TABLE entities
(
  id character varying(128) NOT NULL,
  value bytea,
  datatype smallint,
  CONSTRAINT entities_pkey PRIMARY KEY (id)
);

CREATE OR REPLACE RULE entities_merge AS
    ON INSERT TO entities
   WHERE (EXISTS ( SELECT 1
           FROM entities entities_1
          WHERE entities_1.id::text = new.id::text)) DO INSTEAD  UPDATE entities SET value = new.value, datatype = new.datatype
  WHERE entities.id::text = new.id::text;

Additional functionality I want is to do basic fuzzy searches by key.  Currently I'm using a left anchored LIKE query.  This works well because keys are left prefixed with a scope, a delimiter, and then the actual key for the data.  These fuzzxy searches would never be used in game logic, they would be admin only queries for doing things like  obtaining a list of players.  So they should be infrequent.

The scope of the query ability will not expand in the future.  I support multiple backends for the key/value storage so I'm working with the lowest common denominator.  Plus I have a different approach for data that you need to do complex queries on (regular tables and an ORM).

 Chris
Note:
I suspect that what I suggest below will probably NOT improve performance, and may not necessarily be appropriate for your use case.  However, they may facilitate a wider range of queries, and might be easier to understand.

Note the comment about using 'PRIMARY KEY' in  http://www.postgresql.org/docs/9.2/static/sql-createtable.html
[...]
The primary key constraint specifies that a column or columns of a table can contain only unique (non-duplicate), nonnull values. Technically, PRIMARY KEY is merely a combination of UNIQUE and NOT NULL, but identifying a set of columns as primary key also provides metadata about the design of the schema, as a primary key implies that other tables can rely on this set of columns as a unique identifier for rows.
[...]

My first thought was to simplify the table create, though I think the length check on the id is best done in the software updating the databased:

CREATE TABLE entities
(
    id         text PRIMARY KEY,
    value      bytea,
    datatype   smallint,
    CONSTRAINT id_too_long CHECK (length(id) <= 128)
);
Then I noticed that your id is actually a compound key, and probably would be better modelled as:

CREATE TABLE entities
(
    scope      text,
    key        text,
    value      bytea,
    datatype   smallint,
    CONSTRAINT entities_pkey PRIMARY KEY (scope, key)
);
I suspect that making 'datatype' an 'int' would improve performance, but only by a negligible amount!


Cheers,
Gavin

Re: Postgres as key/value store

От
Thomas Kellerer
Дата:
snacktime wrote on 28.09.2014 01:48:
> I'm looking for some feedback on the design I'm using for a basic key/value storage using postgres.

Are you aware of Postgres' "native" key/value store: hstore?
http://www.postgresql.org/docs/current/static/hstore.html

Or Postgres JSON support?

Especially the new JSONB in the upcoming 9.4 release which makes indexing JSON documents much eaasier/faster.

I would expect both solutions to be much faster than your entity-attribute-value design.

Thomas


Re: Postgres as key/value store

От
Jonathan Vanasco
Дата:
On Sep 27, 2014, at 7:48 PM, snacktime wrote:

> The schema is that a key is a string, and the value is a string or binary.  I am actually storing protocol buffer
messages,but the library gives me the ability to serialize to native protobuf or to json.  Json is useful at times
especiallyfor debugging. 

I don't know if this will apply to you, but i received significant speed improvements on Postgres key searches by using
substringindexes. 

If your keys are just random hashes, this would probably work well for you.
if your keys are person-readable, it's probably not going to work as well as the distribution of prefix characters will
probablybe too uniform. 

But the general idea is twofold:

    1. create an additional partial index on the key field -- CREATE INDEX _entities_id__subst_7 ON
entities(substr(id,1,7));
    2. update your SELECTS to search for both the full string AND the substring

        - WHERE id = :id
        + WHERE (id = :id) AND (substr(id,1,7) = substr(:id, 1, 7))

By adding in the substring query, the planner will (usually) optimize the select by doing a first pass on the substring
index. then it searches that limited set for the rest of matching criteria. 

on a table with 4MM+ records , introducing a substring index/query improved my searches by a few orders of magnitude.
 

before trying this indexing strategy, we were actively looking to migrate this particular query service off of postgres
--it was such a bottleneck and was not scalable.   
now there is no reason to leave in the foreseeable future.


On Sep 27, 2014, at 8:33 PM, Gavin Flower wrote:

>>  This works well because keys are left prefixed with a scope, a delimiter, and then the actual key for the data.
> Then I noticed that your id is actually a compound key, and probably would be better modelled as:

if you're able to standardize the scope out, an index of "(scope, substring(key,1,7))" might work well.

i only used 1,7 as my key arguments, because that was an optimal speed/space mix on my dataset.  depending on yours, a
shorteror longer index might be more appropriate