Re: pglz performance
От | Petr Jelinek |
---|---|
Тема | Re: pglz performance |
Дата | |
Msg-id | 7f52464f-5058-1186-ab49-3ac0931c3413@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: pglz performance (Andrey Borodin <x4mmm@yandex-team.ru>) |
Список | pgsql-hackers |
Hi, On 04/08/2019 11:57, Andrey Borodin wrote: > > >> 2 авг. 2019 г., в 21:39, Andres Freund <andres@anarazel.de> написал(а): >> >> On 2019-08-02 20:40:51 +0500, Andrey Borodin wrote: >>> We have some kind of "roadmap" of "extensible pglz". We plan to provide implementation on Novembers CF. >> >> I don't understand why it's a good idea to improve the compression side >> of pglz. There's plenty other people that spent a lot of time developing >> better compression algorithms. > Improving compression side of pglz has two different projects: > 1. Faster compression with less code and same compression ratio (patch in this thread). > 2. Better compression ratio with at least same compression speed of uncompressed values. > Why I want to do patch for 2? Because it's interesting. > Will 1 or 2 be reviewed or committed? I have no idea. > Will many users benefit from 1 or 2? Yes, clearly. Unless we force everyone to stop compressing with pglz. > FWIW I agree. >> Just so that we don't idly talk, what do you think about the attached? >> It: >> - adds new GUC compression_algorithm with possible values of pglz (default) and lz4 (if lz4 is compiled in), requiresSIGHUP >> - adds --with-lz4 configure option (default yes, so the configure option is actually --without-lz4) that enables the lz4,it's using system library >> - uses the compression_algorithm for both TOAST and WAL compression (if on) >> - supports slicing for lz4 as well (pglz was already supported) >> - supports reading old TOAST values >> - adds 1 byte header to the compressed data where we currently store the algorithm kind, that leaves us with 254 moreto add :) (that's an extra overhead compared to the current state) >> - changes the rawsize in TOAST header to 31 bits via bit packing >> - uses the extra bit to differentiate between old and new format >> - supports reading from table which has different rows stored with different algorithm (so that the GUC itself can befreely changed) > That's cool. I suggest defaulting to lz4 if it is available. You cannot start cluster on non-lz4 binaries which used lz4once. > Do we plan the possibility of compression algorithm as extension? Or will all algorithms be packed into that byte in core? What I wrote does not expect extensions providing new compression. We'd have to somehow reserve compression ids for specific extensions and that seems like a lot of extra complexity for little benefit. I don't see much benefit in having more than say 3 generic compressors (I could imagine adding zstd). If you are thinking about data type specific compression then I think this is wrong layer. > What about lz4 "common prefix"? System or user-defined. If lz4 is compiled in we can even offer in-system training, justmake sure that trained prefixes will make their way to standbys. > I definitely don't plan to work on common prefix. But don't see why that could not be added later. -- Petr Jelinek 2ndQuadrant - PostgreSQL Solutions for the Enterprise https://www.2ndQuadrant.com/
В списке pgsql-hackers по дате отправления: