Re: [HACKERS] Custom compression methods

Поиск
Список
Период
Сортировка
От Konstantin Knizhnik
Тема Re: [HACKERS] Custom compression methods
Дата
Msg-id 03c376ed-839f-35f4-5f03-35b21b47e9a2@postgrespro.ru
обсуждение исходный текст
Ответ на Re: [HACKERS] Custom compression methods  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Ответы Re: [HACKERS] Custom compression methods  (Ildus Kurbangaliev <i.kurbangaliev@postgrespro.ru>)
Re: [HACKERS] Custom compression methods  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Re: [HACKERS] Custom compression methods  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers


On 23.04.2018 18:32, Alexander Korotkov wrote:
But that the main goal of this patch: let somebody implement own compression
algorithm which best fit for particular dataset.

Hmmm...Frankly speaking I don't believe in this "somebody".

 
From my point of view the main value of this patch is that it allows to replace pglz algorithm with more efficient one, for example zstd.
At some data sets zstd provides more than 10 times better compression ratio and at the same time is faster then pglz.

Not exactly.  If we want to replace pglz with more efficient one, then we should
just replace pglz with better algorithm.  Pluggable compression methods are
definitely don't worth it for just replacing pglz with zstd.

As far as I understand it is not possible for many reasons (portability, patents,...) to replace pglz with zstd.
I think that even replacing pglz with zlib (which is much worser than zstd) will not be accepted by community.
So from my point of view the main advantage of custom compression method is to replace builting pglz compression with more advanced one.


 Some types blob-like datatypes might be not long enough to let generic
compression algorithms like zlib or zstd train a dictionary.  For example,
MySQL successfully utilize column-level dictionaries for JSON [1].  Also
JSON(B) might utilize some compression which let user extract
particular attributes without decompression of the whole document.

Well, I am not an expert in compression.
But I will be very surprised if somebody will show me some real example with large enough compressed data buffer (>2kb) where some specialized algorithm will provide significantly
better compression ratio than advanced universal compression algorithm.

Also may be I missed something, but current compression API doesn't support partial extraction (extra some particular attribute or range).
If we really need it, then it should be expressed in custom compressor API. But I am not sure how frequently it will needed.
Large values are splitted into 2kb TOAST chunks. With compression it can be about 4-8k of raw data. IMHO storing larger JSON objects is database design flaw.
And taken in account that in JSONB we need also extract header (so at least two chunks), it makes more obscure advantages of partial JSONB decompression.



I do not think that assignment default compression method through GUC is so bad idea.

It's probably not so bad, but it's a different story.  Unrelated to this patch, I think.

May be. But in any cases, there are several direction where compression can be used:
- custom compression algorithms
- libpq compression
- page level compression
...

and  them should be somehow finally "married" with each other.


I think streaming compression seems like a completely different story.
client-server traffic compression is not just server feature.  It must
be also supported at client side.  And I really doubt it should be
pluggable.

In my opinion, you propose good things like compression of WAL
with better algorithm and compression of client-server traffic.
But I think those features are unrelated to this patch and should
be considered separately.  It's not features, which should be
added to this patch.  Regarding this patch the points you provided
more seems like criticism of the general idea.

I think the problem of this patch is that it lacks of good example.
It would be nice if Ildus implement simple compression with
column-defined dictionary (like [1] does), and show its efficiency
of real-life examples, which can't be achieved with generic
compression methods (like zlib or zstd).  That would be a good
answer to the criticism you provide.

Links


------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

 
Sorry, I really looking at this patch under the different angle.
And this is why I have some doubts about general idea.
Postgres allows to defined custom types, access methods,...
But do you know any production system using some special data types or custom indexes which are not included in standard Postgres distribution
or popular extensions (like postgis)?

IMHO end-user do not have skills and time to create their own compression algorithms. And without knowledge of specific of particular data set,
it is very hard to implement something more efficient than universal compression library.
But if you think that it is not a right place and time to discuss it, I do not insist.

But in any case, I think that it will be useful to provide some more examples of custom compression API usage.
From my point of view the most useful will be integration with zstd.
But if it is possible to find some example of data-specific compression algorithms which show better results than universal compression,
it will be even more impressive.


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Problem while setting the fpw with SIGHUP
Следующее
От: Tom Lane
Дата:
Сообщение: Re: perltidy version