Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility

Поиск
Список
Период
Сортировка
От Chapman Flack
Тема Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility
Дата
Msg-id 5A92F0C0.7070706@anastigmatix.net
обсуждение исходный текст
Ответ на Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility  (Michael Paquier <michael.paquier@gmail.com>)
Ответы Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility
Список pgsql-hackers
On 07/17/17 11:29, Michael Paquier wrote:
> On Thu, Jul 6, 2017 at 3:48 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> On 07/03/2017 06:30 PM, Chapman Flack wrote:
>>> Although it's moot in the straightforward approach of re-zeroing in
>>> the loop, it would still help my understanding of the system to know
>>> if there is some subtlety that would have broken what I proposed
>>> earlier, which was an extra flag to AdvanceXLInsertBuffer() that
>>> ...
>>
>> Yeah, I suppose that would work, too.
> 
> FWIW, I would rather see any optimization done in
> AdvanceXLInsertBuffer() instead of seeing a second memset re-zeroing
> the WAL page header after its data has been initialized by
> AdvanceXLInsertBuffer() once.

Here is a patch implementing the simpler approach Heikki suggested
(though my original leaning had been to wrench on AdvanceXLInsertBuffer
as Michael suggests). The sheer simplicity of the smaller change
eventually won me over, unless there's a strong objection.

As noted before, the cost of the extra small MemSet is proportional
to the number of unused pages in the segment, and that is an indication
of how busy the system *isn't*. I don't have a time benchmark of the
patch's impact; if I should, what would be a good methodology?

Before the change, what some common compression tools can achieve on
a mostly empty (select pg_switch_wal()) segment on my hardware are:

gzip  27052 in 0.145s
xz     5852 in 0.678s
lzip   5747 in 1.254s
bzip2  1445 in 0.261s

bzip2 is already the clear leader (I don't have lz4 handy to include in
the comparison) at around 1/20th size gzip can achieve, and that's before
this patch. After:

gzip 16393 in 0.143s
xz    2640 in 0.520s
lzip  2535 in 1.198s
bzip2  147 in 0.238s

The patch gives gzip almost an extra factor of two, and about the same
for xz and lzip, and bzip2 gains nearly another order of magnitude.

I considered adding a regression test for unfilled-segment compressibility,
but wasn't sure where it would most appropriately go. I assume a TAP test
would be the way to do it.

-Chap

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amirouche Boubekki
Дата:
Сообщение: neon: a functional database, git for structured data
Следующее
От: Lætitia Avrot
Дата:
Сообщение: VACUUM FULL name is very confusing to some people (or to most nonexpert people)