Re: Commitfest 2021-11 Patch Triage - Part 2

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Commitfest 2021-11 Patch Triage - Part 2
Дата
Msg-id CA+Tgmob7tqjUDcy9ZhPw=kczB=Gj-WxU4G2B0ARcUmJnC_rT2g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Commitfest 2021-11 Patch Triage - Part 2  (Stephen Frost <sfrost@snowman.net>)
Ответы Re: Commitfest 2021-11 Patch Triage - Part 2  (Stephen Frost <sfrost@snowman.net>)
Список pgsql-hackers
On Mon, Nov 15, 2021 at 2:51 PM Stephen Frost <sfrost@snowman.net> wrote:
> I get that just compressing the entire stream is simpler and easier and
> such, but it's surely cheaper and more efficient to not decompress and
> then recompress data that's already compressed.  Finding a way to pass
> through data that's already compressed when stored as-is while also
> supporting compression of everything else (in a sensible way- wouldn't
> make sense to just compress each attribute independently since a 4 byte
> integer isn't going to get smaller with compression) definitely
> complicates the overall idea but perhaps would be possible to do.

To me, this feels like an attempt to move the goalposts far enough to
kill the project. Sure, in a perfect world, that would be nice. But,
we don't do it anywhere else. If you try to store a JPEG into a bytea
column, we'll try to compress it just like we would any other data,
and it may not work out. If you then take a pg_basebackup of the
database using -Z, there's no attempt made to avoid the overhead of
CPU overhead of compressing those TOAST table pages that contain
already-compressed data and not the others. And it's easy to
understand why that's the case: when you insert data into the
database, there's no way for the database to magically know whether
that data has been previously compressed by some means, and if so, how
effectively. And when you back up a database, the backup doesn't know
which relfilenodes contain TOAST tables or which pages of those
relfilenodes contain that is already pre-compressed. In both cases,
your options are either (1) shut off compression yourself or (2) hope
that the compressor doesn't waste too much effort on it.

I think the same approach ought to be completely acceptable here. I
don't even really understand how we could do anything else. printtup()
just gets datums, and it has no idea whether or how they are toasted.
It calls the type output functions which don't know that data is being
prepared for transmission to the client as opposed to some other
hypothetical way you could call that function, nor do they know what
compression method the client wants. It does not seem at all
straightforward to teach them that ... and even if they did, what
then? It's not like every column value is sent as a separate packet;
the whole row is a single protocol message, and some columns may be
compressed and others uncompressed. Trying to guess what to do about
that seems to boil down to a sheer guess. Unless you try to compress
that mixture of compressed and uncompressed values - and it's
moderately uncommon for every column of a table to be even be
toastable - you aren't going to know how well it will compress. You
could easily waste more CPU cycles trying to guess than you would have
spent just doing what the user asked for.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dagfinn Ilmari Mannsåker
Дата:
Сообщение: Re: Test::More version
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Time to drop plpython2?