Re: Different compression methods for FPI

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: Different compression methods for FPI
Дата
Msg-id YMgS3DxyLj7Dqx7S@paquier.xyz
обсуждение исходный текст
Ответ на Re: Different compression methods for FPI  (Justin Pryzby <pryzby@telsasoft.com>)
Ответы Re: Different compression methods for FPI  (Justin Pryzby <pryzby@telsasoft.com>)
Список pgsql-hackers
On Mon, Jun 14, 2021 at 08:42:08PM -0500, Justin Pryzby wrote:
> On Tue, Jun 15, 2021 at 09:50:41AM +0900, Michael Paquier wrote:
>> +       {"wal_compression_method", PGC_SIGHUP, WAL_SETTINGS,
>> +           gettext_noop("Set the method used to compress full page images in the WAL."),
>> +           NULL
>> +       },
>> +       &wal_compression_method,
>> +       WAL_COMPRESSION_PGLZ, wal_compression_options,
>> +       NULL, NULL, NULL
>> Any reason to not make that user-settable?  If you merge that with
>> wal_compression, that's not an issue.

Hmm, yeah.  This can be read as using PGC_USERSET.  With the second
part of my sentence, I think that I imply to use PGC_SUSET and be
consistent with wal_compression, but I don't recall my mood from one
month ago :)  Sorry for the confusion.

> I don't see how restricting it to superusers would mitigate the hazard at all:
> If the local admin enables wal compression, then every user's data will be
> compressed, and the degree of compression indicatates a bit about their data,
> no matter whether it's pglz or lz4.

I would vote for having some consistency with wal_compression.
Perhaps we could even revisit c2e5f4d, but I'd rather not do that.

>> The compression level may be better if specified with a different
>> GUC.  That's less parsing to have within the GUC machinery.
>
> I'm not sure about that - then there's an interdependency between GUCs.
> If zlib range is 1..9, and zstd is -50..10, then you may have to set the
> compression level first, to avoid an error.  I believe there's a previous
> discussion about inter-dependent GUCs, and maybe a commit fixing a problem they
> caused.  But I cannot find it offhand.

You cannot do cross-checks for GUCs in their assign hooks or even rely
in the order of those parameters, but you can do that in some backend
code paths.  A recent discussion on the matter is for example what led
to 79dfa8a for the GUCs controlling the min/max SSL protocols
allowed.

>> seems to me that if we can get the same amount of compression and CPU
>> usage just by tweaking the compression level, there is no need to
>> support more than one extra compression algorithm, easing the life of
>> packagers and users.
>
> I don't think it eases it for packagers, since I anticipate the initial patch
> would support {none/pglz/lz4/zlib}.  I anticipate that zstd may not be in pg15.

Yes, without zstd we have all the infra to track the dependencies.

> The goal of the patch is to give options, and the overhead of adding both zlib
> and lz4 is low.  zlib gives good compression at some CPUcost and may be
> preferable for (some) DWs, and lz4 is almost certainly better (than pglz) for
> OLTPs.

Anything will be better than pglz.  I am rather confident in that.

What I am wondering is if we need to eat more bits than necessary for
the WAL record format, because we will end up supporting it until the
end of times.  We may have twenty years from now a better solution
than what has been integrated, and we may not care much about 1 extra
byte for a WAL record at this point, or perhaps we will.  From what I
hear here, there are many cases that we may care about depending on
how much CPU one is ready to pay in order to get more compression,
knowing that there are no magic solutions for something that's cheap
in CPU with a very good compression ratio or we could just go with
that.  So it seems to me that there is still an argument for adding
only one new compression method with a good range of levels, able to
support the range of cases we'd care about:
- High compression ratio but high CPU cost.
- Low compression ratio but low CPU cost.

So we could also take a decision here based on the range of
(compression,CPU) an algorithm is able to cover.
--
Michael

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Isolation tests vs. SERIALIZABLE isolation level
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: Teaching users how they can get the most out of HOT in Postgres 14