Re: ZSON, PostgreSQL extension for compressing JSONB

Поиск
Список
Период
Сортировка
От Aleksander Alekseev
Тема Re: ZSON, PostgreSQL extension for compressing JSONB
Дата
Msg-id 20161006094859.GA22564@e733.localdomain
обсуждение исходный текст
Ответ на ZSON, PostgreSQL extension for compressing JSONB  (Aleksander Alekseev <a.alekseev@postgrespro.ru>)
Список pgsql-general
Hello, Eduardo.

> Why do you use a dictionary compression and not zlib/lz4/bzip/anyother?

Internally PostgreSQL already has LZ77 family algorithm - PGLZ. I didn't
try to replace it, only to supplement. PGLZ compresses every piece of
data (JSONB documents in this case) independently. What I did is removed
redundant data that exists between documents and that PGLZ can't
compress since every single document usually uses every key and similar
strings (some sort of string tags in arrays, etc) only once.

> Compress/Decompress speed?

By my observations PGLZ has characteristics similar to GZIP. I didn't
benchmark ZSON encoding/decoding separately from DBMS because end
user is interested only in TPS which depends on IO, amount of documents
that we could fit into memory and other factors.

> As I understand, postgresql must decompress before use.

Only if you try to read document fields. For deleting a tuple, doing
vacuum, etc there is no need to decompress a data.

> Some compressing algs (dictionary transforms where a token is word)
> allow search for tokens/words directly on compressed data transforming
> the token/word to search in dictionary entry and searching it in
> compressed data. From it, replace, substring, etc... string
> manipulations algs at word level can be implemented.

Unfortunately I doubt that current ZSON implementation can use these
ideas. However I must agree that it's a very interesting field of
research. I don't think anyone tried to do something like this in
PostgreSQL yet.

> My passion is compression, do you care if I try other algorithms? For
> that, some dict id numbers (>1024 or >1<<16 or <128 for example) say
> which compression algorithm is used or must change zson_header to store
> that information. Doing that, each document could be compressed with
> the best compressor (size or decompression speed) at idle times or at
> request.

By all means! Naturally if you'll find a better encoding I would be happy
to merge corresponding code in ZSON's repository.

> Thanks for sharing and time.

Thanks for feedback and sharing your thoughts!

--
Best regards,
Aleksander Alekseev

Вложения

В списке pgsql-general по дате отправления:

Предыдущее
От: Geoff Winkless
Дата:
Сообщение: Transactional-DDL DROP/CREATE TABLE
Следующее
От: Francisco Olarte
Дата:
Сообщение: Re: Transactional-DDL DROP/CREATE TABLE