Re: pglz performance

Поиск
Список
Период
Сортировка
От Petr Jelinek
Тема Re: pglz performance
Дата
Msg-id d8576096-76ba-487d-515b-44fdedba8bb5@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: pglz performance  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: pglz performance  (Andrey Borodin <x4mmm@yandex-team.ru>)
Re: pglz performance  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: pglz performance  (Andres Freund <andres@anarazel.de>)
Re: pglz performance  ("Andrey M. Borodin" <x4mmm@yandex-team.ru>)
Список pgsql-hackers
Hi,

On 02/08/2019 21:48, Tomas Vondra wrote:
> On Fri, Aug 02, 2019 at 11:20:03AM -0700, Andres Freund wrote:
> 
>>
>>> Another question is whether we'd actually want to include the code in
>>> core directly, or use system libraries (and if some packagers might
>>> decide to disable that, for whatever reason).
>>
>> I'd personally say we should have an included version, and a
>> --with-system-... flag that uses the system one.
>>
> 
> OK. I'd say to require a system library, but that's a minor detail.
> 

Same here.

Just so that we don't idly talk, what do you think about the attached?
It:
- adds new GUC compression_algorithm with possible values of pglz 
(default) and lz4 (if lz4 is compiled in), requires SIGHUP
- adds --with-lz4 configure option (default yes, so the configure option 
is actually --without-lz4) that enables the lz4, it's using system library
- uses the compression_algorithm for both TOAST and WAL compression (if on)
- supports slicing for lz4 as well (pglz was already supported)
- supports reading old TOAST values
- adds 1 byte header to the compressed data where we currently store the 
algorithm kind, that leaves us with 254 more to add :) (that's an extra 
overhead compared to the current state)
- changes the rawsize in TOAST header to 31 bits via bit packing
- uses the extra bit to differentiate between old and new format
- supports reading from table which has different rows stored with 
different algorithm (so that the GUC itself can be freely changed)

Simple docs and a TAP test included.

I did some basic performance testing (it's not really my thing though, 
so I would appreciate if somebody did more).
I get about 7x perf improvement on data load with lz4 compared to pglz 
on my dataset but strangely only tiny decompression improvement. Perhaps 
more importantly I also did before patch and after patch tests with pglz 
and the performance difference with my data set was <1%.

Note that this will just link against lz4, it does not add lz4 into 
PostgreSQL code-base.

The issues I know of:
- the pg_decompress function really ought to throw error in the default 
branch but that file is also used in front-end so not sure how to do that
- the TAP test probably does not work with all possible configurations 
(but that's why it needs to be set in PG_TEST_EXTRA like for example ssl)
- we don't really have any automated test for reading old TOAST format, 
no idea how to do that
- I expect my changes to configure.in are not the greatest as I don't 
have pretty much zero experience with autoconf

-- 
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Redacting information from logs
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: More refactoring for BuildIndexInfo