Re: Commitfest 2021-11 Patch Triage - Part 2

Поиск
Список
Период
Сортировка
От Daniil Zakhlystov
Тема Re: Commitfest 2021-11 Patch Triage - Part 2
Дата
Msg-id AD1EEE7F-EC21-49D5-BF28-05F47899665E@yandex-team.ru
обсуждение исходный текст
Ответ на Re: Commitfest 2021-11 Patch Triage - Part 2  (Stephen Frost <sfrost@snowman.net>)
Список pgsql-hackers
Hi! It’s been a while since the original patch release. Let me provide a brief
overview of the current patch status.

The initial approach was to use the streaming compression to compress all
outgoing and decompress all incoming bytes. However, after the long discussion
in the thread, the initial approach has been changed.

The current implementation allows compressing only specific message types,
use the different compression algorithms for different message types, configure
the allowed compression methods and levels both for server- and client- sides
via GUC setting / connection string respectively.

Also, current implementation combines (when possible) multiple protocol messages
into the single CompressedData message for a better compression ratio.

>
> On 16 Nov 2021, at 01:23, Robert Haas <robertmhaas@gmail.com> wrote:
>
> To me, this feels like an attempt to move the goalposts far enough to
> kill the project. Sure, in a perfect world, that would be nice. But,
> we don't do it anywhere else. If you try to store a JPEG into a bytea
> column, we'll try to compress it just like we would any other data,
> and it may not work out. If you then take a pg_basebackup of the
> database using -Z, there's no attempt made to avoid the overhead of
> CPU overhead of compressing those TOAST table pages that contain
> already-compressed data and not the others. And it's easy to
> understand why that's the case: when you insert data into the
> database, there's no way for the database to magically know whether
> that data has been previously compressed by some means, and if so, how
> effectively. And when you back up a database, the backup doesn't know
> which relfilenodes contain TOAST tables or which pages of those
> relfilenodes contain that is already pre-compressed. In both cases,
> your options are either (1) shut off compression yourself or (2) hope
> that the compressor doesn't waste too much effort on it.
>
> I think the same approach ought to be completely acceptable here. I
> don't even really understand how we could do anything else. printtup()
> just gets datums, and it has no idea whether or how they are toasted.
> It calls the type output functions which don't know that data is being
> prepared for transmission to the client as opposed to some other
> hypothetical way you could call that function, nor do they know what
> compression method the client wants. It does not seem at all
> straightforward to teach them that ... and even if they did, what
> then? It's not like every column value is sent as a separate packet;
> the whole row is a single protocol message, and some columns may be
> compressed and others uncompressed. Trying to guess what to do about
> that seems to boil down to a sheer guess. Unless you try to compress
> that mixture of compressed and uncompressed values - and it's
> moderately uncommon for every column of a table to be even be
> toastable - you aren't going to know how well it will compress. You
> could easily waste more CPU cycles trying to guess than you would have
> spent just doing what the user asked for.
>

Agree. From my POV, it is OK to use the protocol message type and length to decide
should it be compressed or not. Also, this can be optimized later without the need to change
the protocol.

Regarding the LZ4 support patch, it still has some minor polishing to do. Basically, it only adds the LZ4
algorithm support and does not change anything fundamentally. So I would appreciate someone doing
a review of the current patch version.

The original thread is quite huge so I guess that it makes it hard to catch up with the current patch status.
I can make a new one with a detailed summary if that would help.

Thanks,

Daniil Zakhlystov




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: pgsql: Fix headerscheck failure in replication/worker_internal.h
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: pgsql: Fix headerscheck failure in replication/worker_internal.h