Обсуждение: Postgres as key/value store
I'm looking for some feedback on the design I'm using for a basic key/value storage using postgres.
Just some quick background. This design is for large scale games that can get up to 10K writes per second or more. The storage will be behind a distributed memory cache that is built on top of Akka, and has a write behind caching mechanism to cut down on the number of writes when you have many updates in a short time period of the same key, which is common for a lot of multiplayer type games.
I have been using Couchbase, but this is an open source project, and Couchbase is basically a commercial product for all intents and purposes, which is problematic. I will still support Couchbase, but I don't want it have to tell people if you really want to scale, couchbase is the only option.
The schema is that a key is a string, and the value is a string or binary. I am actually storing protocol buffer messages, but the library gives me the ability to serialize to native protobuf or to json. Json is useful at times especially for debugging.
This is my current schema:
CREATE TABLE entities
(
id character varying(128) NOT NULL,
value bytea,
datatype smallint,
CONSTRAINT entities_pkey PRIMARY KEY (id)
);
CREATE OR REPLACE RULE entities_merge AS
ON INSERT TO entities
WHERE (EXISTS ( SELECT 1
FROM entities entities_1
WHERE entities_1.id::text = new.id::text)) DO INSTEAD UPDATE entities SET value = new.value, datatype = new.datatype
WHERE entities.id::text = new.id::text;
Additional functionality I want is to do basic fuzzy searches by key. Currently I'm using a left anchored LIKE query. This works well because keys are left prefixed with a scope, a delimiter, and then the actual key for the data. These fuzzxy searches would never be used in game logic, they would be admin only queries for doing things like obtaining a list of players. So they should be infrequent.
The scope of the query ability will not expand in the future. I support multiple backends for the key/value storage so I'm working with the lowest common denominator. Plus I have a different approach for data that you need to do complex queries on (regular tables and an ORM).
Chris
On 28/09/14 12:48, snacktime wrote:
I'm looking for some feedback on the design I'm using for a basic key/value storage using postgres.Note:Just some quick background. This design is for large scale games that can get up to 10K writes per second or more. The storage will be behind a distributed memory cache that is built on top of Akka, and has a write behind caching mechanism to cut down on the number of writes when you have many updates in a short time period of the same key, which is common for a lot of multiplayer type games.I have been using Couchbase, but this is an open source project, and Couchbase is basically a commercial product for all intents and purposes, which is problematic. I will still support Couchbase, but I don't want it have to tell people if you really want to scale, couchbase is the only option.The schema is that a key is a string, and the value is a string or binary. I am actually storing protocol buffer messages, but the library gives me the ability to serialize to native protobuf or to json. Json is useful at times especially for debugging.This is my current schema:CREATE TABLE entities(id character varying(128) NOT NULL,value bytea,datatype smallint,CONSTRAINT entities_pkey PRIMARY KEY (id));CREATE OR REPLACE RULE entities_merge ASON INSERT TO entitiesWHERE (EXISTS ( SELECT 1FROM entities entities_1WHERE entities_1.id::text = new.id::text)) DO INSTEAD UPDATE entities SET value = new.value, datatype = new.datatypeWHERE entities.id::text = new.id::text;Additional functionality I want is to do basic fuzzy searches by key. Currently I'm using a left anchored LIKE query. This works well because keys are left prefixed with a scope, a delimiter, and then the actual key for the data. These fuzzxy searches would never be used in game logic, they would be admin only queries for doing things like obtaining a list of players. So they should be infrequent.The scope of the query ability will not expand in the future. I support multiple backends for the key/value storage so I'm working with the lowest common denominator. Plus I have a different approach for data that you need to do complex queries on (regular tables and an ORM).Chris
I suspect that what I suggest below will probably NOT improve performance, and may not necessarily be appropriate for your use case. However, they may facilitate a wider range of queries, and might be easier to understand.
Note the comment about using 'PRIMARY KEY' in http://www.postgresql.org/docs/9.2/static/sql-createtable.html
[...]
The primary key constraint specifies that a column or columns of a table can contain only unique (non-duplicate), nonnull values. Technically, PRIMARY KEY is merely a combination of UNIQUE and NOT NULL, but identifying a set of columns as primary key also provides metadata about the design of the schema, as a primary key implies that other tables can rely on this set of columns as a unique identifier for rows.
[...]
My first thought was to simplify the table create, though I think the length check on the id is best done in the software updating the databased:
CREATE TABLE entitiesThen I noticed that your id is actually a compound key, and probably would be better modelled as:
(
id text PRIMARY KEY,
value bytea,
datatype smallint,
CONSTRAINT id_too_long CHECK (length(id) <= 128)
);
CREATE TABLE entitiesI suspect that making 'datatype' an 'int' would improve performance, but only by a negligible amount!
(
scope text,
key text,
value bytea,
datatype smallint,
CONSTRAINT entities_pkey PRIMARY KEY (scope, key)
);
Cheers,
Gavin
snacktime wrote on 28.09.2014 01:48: > I'm looking for some feedback on the design I'm using for a basic key/value storage using postgres. Are you aware of Postgres' "native" key/value store: hstore? http://www.postgresql.org/docs/current/static/hstore.html Or Postgres JSON support? Especially the new JSONB in the upcoming 9.4 release which makes indexing JSON documents much eaasier/faster. I would expect both solutions to be much faster than your entity-attribute-value design. Thomas
On Sep 27, 2014, at 7:48 PM, snacktime wrote: > The schema is that a key is a string, and the value is a string or binary. I am actually storing protocol buffer messages,but the library gives me the ability to serialize to native protobuf or to json. Json is useful at times especiallyfor debugging. I don't know if this will apply to you, but i received significant speed improvements on Postgres key searches by using substringindexes. If your keys are just random hashes, this would probably work well for you. if your keys are person-readable, it's probably not going to work as well as the distribution of prefix characters will probablybe too uniform. But the general idea is twofold: 1. create an additional partial index on the key field -- CREATE INDEX _entities_id__subst_7 ON entities(substr(id,1,7)); 2. update your SELECTS to search for both the full string AND the substring - WHERE id = :id + WHERE (id = :id) AND (substr(id,1,7) = substr(:id, 1, 7)) By adding in the substring query, the planner will (usually) optimize the select by doing a first pass on the substring index. then it searches that limited set for the rest of matching criteria. on a table with 4MM+ records , introducing a substring index/query improved my searches by a few orders of magnitude. before trying this indexing strategy, we were actively looking to migrate this particular query service off of postgres --it was such a bottleneck and was not scalable. now there is no reason to leave in the foreseeable future. On Sep 27, 2014, at 8:33 PM, Gavin Flower wrote: >> This works well because keys are left prefixed with a scope, a delimiter, and then the actual key for the data. > Then I noticed that your id is actually a compound key, and probably would be better modelled as: if you're able to standardize the scope out, an index of "(scope, substring(key,1,7))" might work well. i only used 1,7 as my key arguments, because that was an optimal speed/space mix on my dataset. depending on yours, a shorteror longer index might be more appropriate