Re: pglz performance

Поиск
Список
Период
Сортировка
От Petr Jelinek
Тема Re: pglz performance
Дата
Msg-id 7f52464f-5058-1186-ab49-3ac0931c3413@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: pglz performance  (Andrey Borodin <x4mmm@yandex-team.ru>)
Список pgsql-hackers
Hi,

On 04/08/2019 11:57, Andrey Borodin wrote:
> 
> 
>> 2 авг. 2019 г., в 21:39, Andres Freund <andres@anarazel.de> написал(а):
>>
>> On 2019-08-02 20:40:51 +0500, Andrey Borodin wrote:
>>> We have some kind of "roadmap" of "extensible pglz". We plan to provide implementation on Novembers CF.
>>
>> I don't understand why it's a good idea to improve the compression side
>> of pglz. There's plenty other people that spent a lot of time developing
>> better compression algorithms.
> Improving compression side of pglz has two different projects:
> 1. Faster compression with less code and same compression ratio (patch in this thread).
> 2. Better compression ratio with at least same compression speed of uncompressed values.
> Why I want to do patch for 2? Because it's interesting.
> Will 1 or 2 be reviewed or committed? I have no idea.
> Will many users benefit from 1 or 2? Yes, clearly. Unless we force everyone to stop compressing with pglz.
> 

FWIW I agree.

>> Just so that we don't idly talk, what do you think about the attached?
>> It:
>> - adds new GUC compression_algorithm with possible values of pglz (default) and lz4 (if lz4 is compiled in),
requiresSIGHUP
 
>> - adds --with-lz4 configure option (default yes, so the configure option is actually --without-lz4) that enables the
lz4,it's using system library
 
>> - uses the compression_algorithm for both TOAST and WAL compression (if on)
>> - supports slicing for lz4 as well (pglz was already supported)
>> - supports reading old TOAST values
>> - adds 1 byte header to the compressed data where we currently store the algorithm kind, that leaves us with 254
moreto add :) (that's an extra overhead compared to the current state)
 
>> - changes the rawsize in TOAST header to 31 bits via bit packing
>> - uses the extra bit to differentiate between old and new format
>> - supports reading from table which has different rows stored with different algorithm (so that the GUC itself can
befreely changed)
 
> That's cool. I suggest defaulting to lz4 if it is available. You cannot start cluster on non-lz4 binaries which used
lz4once.
 
> Do we plan the possibility of compression algorithm as extension? Or will all algorithms be packed into that byte in
core?

What I wrote does not expect extensions providing new compression. We'd 
have to somehow reserve compression ids for specific extensions and that 
seems like a lot of extra complexity for little benefit. I don't see 
much benefit in having more than say 3 generic compressors (I could 
imagine adding zstd). If you are thinking about data type specific 
compression then I think this is wrong layer.

> What about lz4 "common prefix"? System or user-defined. If lz4 is compiled in we can even offer in-system training,
justmake sure that trained prefixes will make their way to standbys.
 
> 

I definitely don't plan to work on common prefix. But don't see why that 
could not be added later.

-- 
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: First draft of back-branch release notes is done
Следующее
От: Petr Jelinek
Дата:
Сообщение: Re: pglz performance