Обсуждение: ZSON, PostgreSQL extension for compressing JSONB

Поиск
Список
Период
Сортировка

ZSON, PostgreSQL extension for compressing JSONB

От
Aleksander Alekseev
Дата:
Hello.

I've just uploaded ZSON extension on GitHub:

https://github.com/afiskon/zson

ZSON learns on your common JSONB documents and creates a dictionary
with strings that are frequently used in all documents. After that you
can use ZSON type to compress documents using this dictionary. When
documents schema evolve and compression becomes inefficient you can
re-learn on new documents. New documents will be compressed with a new
dictionary, old documents will be decompressed using old dictionary.

In some cases ZSON can save half of your disk space and give you about
10% more TPS. Everything depends on your data and workload though.
Memory is saved as well. For more details see README.md.

Please don't hesitate to ask any questions. Any feedback and pull
requests are welcome too!

--
Best regards,
Aleksander Alekseev

Вложения

Re: ZSON, PostgreSQL extension for compressing JSONB

От
Alan Gano
Дата:
I like this, seeing that the keys of JSON docs are replicated in every record.

I makes my old-school DBA-Sense start to itch.



On Fri, Sep 30, 2016 at 8:58 AM, Aleksander Alekseev <a.alekseev@postgrespro.ru> wrote:
Hello.

I've just uploaded ZSON extension on GitHub:

https://github.com/afiskon/zson

ZSON learns on your common JSONB documents and creates a dictionary
with strings that are frequently used in all documents. After that you
can use ZSON type to compress documents using this dictionary. When
documents schema evolve and compression becomes inefficient you can
re-learn on new documents. New documents will be compressed with a new
dictionary, old documents will be decompressed using old dictionary.

In some cases ZSON can save half of your disk space and give you about
10% more TPS. Everything depends on your data and workload though.
Memory is saved as well. For more details see README.md.

Please don't hesitate to ask any questions. Any feedback and pull
requests are welcome too!

--
Best regards,
Aleksander Alekseev

Re: ZSON, PostgreSQL extension for compressing JSONB

От
Simon Riggs
Дата:
On 30 September 2016 at 16:58, Aleksander Alekseev
<a.alekseev@postgrespro.ru> wrote:

> I've just uploaded ZSON extension on GitHub:
>
> https://github.com/afiskon/zson
>
> ZSON learns on your common JSONB documents and creates a dictionary
> with strings that are frequently used in all documents. After that you
> can use ZSON type to compress documents using this dictionary. When
> documents schema evolve and compression becomes inefficient you can
> re-learn on new documents. New documents will be compressed with a new
> dictionary, old documents will be decompressed using old dictionary.
>
> In some cases ZSON can save half of your disk space and give you about
> 10% more TPS. Everything depends on your data and workload though.
> Memory is saved as well. For more details see README.md.
>
> Please don't hesitate to ask any questions. Any feedback and pull
> requests are welcome too!

Very good. Oleg had mentioned that dictionary compression was being considered.

It would be useful to be able to define compression dictionaries for
many use cases.

Will you be submitting this to core?

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: ZSON, PostgreSQL extension for compressing JSONB

От
Oleg Bartunov
Дата:


On Tue, Oct 4, 2016 at 4:20 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 30 September 2016 at 16:58, Aleksander Alekseev
<a.alekseev@postgrespro.ru> wrote:

> I've just uploaded ZSON extension on GitHub:
>
> https://github.com/afiskon/zson
>
> ZSON learns on your common JSONB documents and creates a dictionary
> with strings that are frequently used in all documents. After that you
> can use ZSON type to compress documents using this dictionary. When
> documents schema evolve and compression becomes inefficient you can
> re-learn on new documents. New documents will be compressed with a new
> dictionary, old documents will be decompressed using old dictionary.
>
> In some cases ZSON can save half of your disk space and give you about
> 10% more TPS. Everything depends on your data and workload though.
> Memory is saved as well. For more details see README.md.
>
> Please don't hesitate to ask any questions. Any feedback and pull
> requests are welcome too!

Very good. Oleg had mentioned that dictionary compression was being considered.

It would be useful to be able to define compression dictionaries for
many use cases.


Dictionary compression is a different project, we'll publish it after testing.
 

Will you be submitting this to core?

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: ZSON, PostgreSQL extension for compressing JSONB

От
Aleksander Alekseev
Дата:
Hello, Simon.

Thanks for you interest to this project!

> Will you be submitting this to core?

I could align ZSON to PostgreSQL code style. I only need to run pgindent
and write a few comments. Do you think community would be interested in
adding it to /contrib/ ? I mean doesn't ZSON solve a bit too specific
problem for this?

--
Best regards,
Aleksander Alekseev

Вложения

Re: ZSON, PostgreSQL extension for compressing JSONB

От
Dorian Hoxha
Дата:
@Aleksander
~everyone wants lower data storage and wants some kind of compression.
Can this be made to automatically retrain when analyzing (makes sense?)? And create a new dictionary only if it changes compared to the last one.

On Tue, Oct 4, 2016 at 5:34 PM, Aleksander Alekseev <a.alekseev@postgrespro.ru> wrote:
Hello, Simon.

Thanks for you interest to this project!

> Will you be submitting this to core?

I could align ZSON to PostgreSQL code style. I only need to run pgindent
and write a few comments. Do you think community would be interested in
adding it to /contrib/ ? I mean doesn't ZSON solve a bit too specific
problem for this?

--
Best regards,
Aleksander Alekseev

Re: ZSON, PostgreSQL extension for compressing JSONB

От
Michael Paquier
Дата:
On Wed, Oct 5, 2016 at 12:34 AM, Aleksander Alekseev
<a.alekseev@postgrespro.ru> wrote:
> I could align ZSON to PostgreSQL code style. I only need to run pgindent
> and write a few comments. Do you think community would be interested in
> adding it to /contrib/ ? I mean doesn't ZSON solve a bit too specific
> problem for this?

I find the references to pglz quite troubling, particularly by reading
that this data type visibly uses its own compression logic.
--
Michael


Re: ZSON, PostgreSQL extension for compressing JSONB

От
Vitaly Burovoy
Дата:
On 10/4/16, Dorian Hoxha <dorian.hoxha@gmail.com> wrote:
> On Tue, Oct 4, 2016 at 5:34 PM, Aleksander Alekseev <a.alekseev@postgrespro.ru> wrote:
>> Hello, Simon.
>>
>> Thanks for you interest to this project!
>>
>> > Will you be submitting this to core?
>>
>> I could align ZSON to PostgreSQL code style. I only need to run pgindent
>> and write a few comments. Do you think community would be interested in
>> adding it to /contrib/ ? I mean doesn't ZSON solve a bit too specific
>> problem for this?
>
> ~everyone wants lower data storage and wants some kind of compression.
> Can this be made to automatically retrain when analyzing (makes sense?)?
> And create a new dictionary only if it changes compared to the last one.

ANALYZE does not walk through all the table and does not change its records.
Creating a new dictionary for compressing purposes supposes updating
rows of the original table to replace entries to references to a
dictionary.

--
Best regards,
Vitaly Burovoy


Re: ZSON, PostgreSQL extension for compressing JSONB

От
Aleksander Alekseev
Дата:
> ~everyone wants lower data storage and wants some kind of compression.
> Can this be made to automatically retrain when analyzing (makes sense?)?
> And create a new dictionary only if it changes compared to the last one.

It's an interesting idea. However I doubt it could be automated in
one-size-fits-all manner. One users would like to do re-learning during
analyzing, others during vacuum, once a month or say using triggers and
some sort of heuristics.

Despite that I see no reason not to accept pull requests with
implementations of different re-learning automation strategies. It's
just not a priority for me personally.

--
Best regards,
Aleksander Alekseev

Вложения

Re: ZSON, PostgreSQL extension for compressing JSONB

От
Aleksander Alekseev
Дата:
> > I could align ZSON to PostgreSQL code style. I only need to run pgindent
> > and write a few comments. Do you think community would be interested in
> > adding it to /contrib/ ? I mean doesn't ZSON solve a bit too specific
> > problem for this?
>
> I find the references to pglz quite troubling, particularly by reading
> that this data type visibly uses its own compression logic.

ZSON just replaces frequently used strings to 16-bit codes. It also adds
PGLZ_HINT_SIZE (=32 by default, could be also 0) zero bytes in the
beginning to make it more likely that data will be compressed using PGLZ.
After all, who will use ZSON for small documents? Thats all.

Hope it explains references to PGLZ.

--
Best regards,
Aleksander Alekseev

Вложения

Re: ZSON, PostgreSQL extension for compressing JSONB

От
Simon Riggs
Дата:
On 4 October 2016 at 16:34, Aleksander Alekseev
<a.alekseev@postgrespro.ru> wrote:
> Hello, Simon.
>
> Thanks for you interest to this project!
>
>> Will you be submitting this to core?
>
> I could align ZSON to PostgreSQL code style. I only need to run pgindent
> and write a few comments. Do you think community would be interested in
> adding it to /contrib/ ? I mean doesn't ZSON solve a bit too specific
> problem for this?

CREATE COMPRESSION DICTIONARY public.simple_dict (
    WORDS = my_json_schema
);

Then use the dictionary by referencing the DictionaryId within the
datatype modifier, e.g. ZSON(357)

That way we can use this for TEXT, XML, JSON etc.. as ZTEXT, ZXML, ZJSON

So it seems like a generally useful thing to me.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: ZSON, PostgreSQL extension for compressing JSONB

От
Aleksander Alekseev
Дата:
> > I could align ZSON to PostgreSQL code style. I only need to run pgindent
> > and write a few comments. Do you think community would be interested in
> > adding it to /contrib/ ? I mean doesn't ZSON solve a bit too specific
> > problem for this?
>
> CREATE COMPRESSION DICTIONARY public.simple_dict (
>     WORDS = my_json_schema
> );
>
> Then use the dictionary by referencing the DictionaryId within the
> datatype modifier, e.g. ZSON(357)
>
> That way we can use this for TEXT, XML, JSON etc.. as ZTEXT, ZXML, ZJSON
>
> So it seems like a generally useful thing to me.

Good idea!

What about evolving schema of JSON/JSONB/XML? For instance,
adding/removing keys in new versions of the application. UPDATE
COMPRESSION DICTIONARY?

--
Best regards,
Aleksander Alekseev

Вложения

Re: ZSON, PostgreSQL extension for compressing JSONB

От
Simon Riggs
Дата:
On 5 October 2016 at 16:58, Aleksander Alekseev
<a.alekseev@postgrespro.ru> wrote:

> What about evolving schema of JSON/JSONB/XML? For instance,
> adding/removing keys in new versions of the application. UPDATE
> COMPRESSION DICTIONARY?

You can add to a dictionary, but not remove things. I'm not sure
that's a big issue.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: ZSON, PostgreSQL extension for compressing JSONB

От
Aleksander Alekseev
Дата:
Hello, Eduardo.

> Why do you use a dictionary compression and not zlib/lz4/bzip/anyother?

Internally PostgreSQL already has LZ77 family algorithm - PGLZ. I didn't
try to replace it, only to supplement. PGLZ compresses every piece of
data (JSONB documents in this case) independently. What I did is removed
redundant data that exists between documents and that PGLZ can't
compress since every single document usually uses every key and similar
strings (some sort of string tags in arrays, etc) only once.

> Compress/Decompress speed?

By my observations PGLZ has characteristics similar to GZIP. I didn't
benchmark ZSON encoding/decoding separately from DBMS because end
user is interested only in TPS which depends on IO, amount of documents
that we could fit into memory and other factors.

> As I understand, postgresql must decompress before use.

Only if you try to read document fields. For deleting a tuple, doing
vacuum, etc there is no need to decompress a data.

> Some compressing algs (dictionary transforms where a token is word)
> allow search for tokens/words directly on compressed data transforming
> the token/word to search in dictionary entry and searching it in
> compressed data. From it, replace, substring, etc... string
> manipulations algs at word level can be implemented.

Unfortunately I doubt that current ZSON implementation can use these
ideas. However I must agree that it's a very interesting field of
research. I don't think anyone tried to do something like this in
PostgreSQL yet.

> My passion is compression, do you care if I try other algorithms? For
> that, some dict id numbers (>1024 or >1<<16 or <128 for example) say
> which compression algorithm is used or must change zson_header to store
> that information. Doing that, each document could be compressed with
> the best compressor (size or decompression speed) at idle times or at
> request.

By all means! Naturally if you'll find a better encoding I would be happy
to merge corresponding code in ZSON's repository.

> Thanks for sharing and time.

Thanks for feedback and sharing your thoughts!

--
Best regards,
Aleksander Alekseev

Вложения