Обсуждение: Reduce amount of WAL generated by CREATE INDEX for gist, gin andsp-gist

Поиск
Список
Период
Сортировка

Reduce amount of WAL generated by CREATE INDEX for gist, gin andsp-gist

От
Anastasia Lubennikova
Дата:
Hi,

I want to propose a bunch of patches which allow to reduce WAL traffic
generated by CREATE INDEX for GiST, GIN and SP-GiST. Similarly to b-tree
and RUM, we can now log index pages of other access methods only once
in the end of indexbuild process. Implementation is based on generic_xlog.

Not only it decreases the amount of WAL generated, but also completely
eliminates WAL overhead in case of error during index build.

I also attached sql scripts which I used to measure xlog size.
They show that pg_wal_lsn_diff for patched version is from 3 to 5 times 
smaller.


Not sure if regression tests are needed, since it is just an optimization.
But I do not mind to add them if someone feels that it is necessary.

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Вложения

Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin andsp-gist

От
David Steele
Дата:
Hi Anastasia,

On 2/28/18 11:03 AM, Anastasia Lubennikova wrote:
> I want to propose a bunch of patches which allow to reduce WAL traffic
> generated by CREATE INDEX for GiST, GIN and SP-GiST. Similarly to b-tree
> and RUM, we can now log index pages of other access methods only once
> in the end of indexbuild process. Implementation is based on generic_xlog.
> 
> Not only it decreases the amount of WAL generated, but also completely
> eliminates WAL overhead in case of error during index build.

This seems to be a worthwhile patch, but it was submitted to the 2018-03
CF at the last moment with no prior discussion or review as far as I can
tell. It appears to be non-trivial and therefore not a good fit for the
last CF for PG11.

I have moved it to the 2018-09 CF.

Regards,
-- 
-David
david@pgmasters.net


Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin andsp-gist

От
Heikki Linnakangas
Дата:
On 28/02/18 18:03, Anastasia Lubennikova wrote:
> Hi,
> 
> I want to propose a bunch of patches which allow to reduce WAL traffic
> generated by CREATE INDEX for GiST, GIN and SP-GiST. Similarly to b-tree
> and RUM, we can now log index pages of other access methods only once
> in the end of indexbuild process.

Makes sense. Is there any scenario where the current method is better? 
In theory, doing another pass through the whole index, to create the WAL 
records, requires more I/O. But in practice, it seems that the reduction 
in WAL is almost certainly a win.

> Implementation is based on generic_xlog.

Why? I think we should just add a log_relation() function in 
xloginsert.c directly, alongside log_newpage_buffer().

This makes the assumption that all the pages in these indexes used the 
standard page layout. I think that's a valid assumption, but needs at 
least a comment on that. And perhaps an Assert, to check that 
pd_lower/upper look sane.

As a further optimization, would it be a win to WAL-log multiple pages 
in each record?

This leaves the XLOG_*_CREATE_INDEX WAL record types unused, BTW.

- Heikki


Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist

От
Alexander Korotkov
Дата:
On Tue, Jul 10, 2018 at 10:03 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> On 28/02/18 18:03, Anastasia Lubennikova wrote:
> > I want to propose a bunch of patches which allow to reduce WAL traffic
> > generated by CREATE INDEX for GiST, GIN and SP-GiST. Similarly to b-tree
> > and RUM, we can now log index pages of other access methods only once
> > in the end of indexbuild process.
>
> Makes sense. Is there any scenario where the current method is better?
> In theory, doing another pass through the whole index, to create the WAL
> records, requires more I/O. But in practice, it seems that the reduction
> in WAL is almost certainly a win.

It's frequently advised to move WAL to the separate storage device.
Then, it's not hard to imagine how current method can be faster, when
storage device containing index is much more loaded than storage
device containing WAL.  But despite being faster it would still
consume way more overall I/O resources.  I'm not sure if we should
provide some option to switch between WAL methods (or heuristics)...

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin andsp-gist

От
Andrey Lepikhov
Дата:
With the consent of Anastasia I will improving this patch further.
Attachment contains next version of the patch set.

11.07.2018 00:03, Heikki Linnakangas пишет:
> On 28/02/18 18:03, Anastasia Lubennikova wrote:

>> Implementation is based on generic_xlog.
> 
> Why? I think we should just add a log_relation() function in 
> xloginsert.c directly, alongside log_newpage_buffer().
> 
I have some arguments to stay this functionality at generic_xlog module:
1. xloginsert.c functions work on low level of abstraction, use buffers 
and pages.
2. Code size using generic_xlog service functions looks more compact and 
safe.
> This makes the assumption that all the pages in these indexes used the 
> standard page layout. I think that's a valid assumption, but needs at 
> least a comment on that. And perhaps an Assert, to check that 
> pd_lower/upper look sane.
Done
> 
> As a further optimization, would it be a win to WAL-log multiple pages 
> in each record?
In this version of the patch we use simple optimization: pack 
XLR_NORMAL_MAX_BLOCK_ID blocks pieces into each WAL-record.
> 
> This leaves the XLOG_*_CREATE_INDEX WAL record types unused, BTW.
> 
Done
> - Heikki
> 

Benchmarks:
-----------

Test: pgbench -f gin-WAL-test.sql -t 5:
---------------------------------------
master:
Latency average: 27696.299 ms
WAL size: 2.66 GB

patched
Latency average: 22812.103 ms
WAL size: 1.23 GB


Test: pgbench -f gist-WAL-test.sql -t 5:
----------------------------------------
master:
Latency average: 19928.284 ms
WAL size: 1.25 GB

patched
Latency average: 18175.064 ms
WAL size: 0.63 GB


Test: pgbench -f spgist-WAL-test.sql -t 5:
------------------------------------------
master:
Latency average: 11529.384 ms
WAL size: 1.07 GB

patched
Latency average: 9330.828 ms
WAL size: 0.6 GB

-- 
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

Вложения

Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist

От
Dmitry Dolgov
Дата:
> On Tue, Jul 31, 2018 at 6:36 AM Andrey Lepikhov <a.lepikhov@postgrespro.ru> wrote:
>
> With the consent of Anastasia I will improving this patch further.
> Attachment contains next version of the patch set.

Thank you. I played a bit with this patch, and can confirm visible WAL size
reduction (it's rather obvious, but still). Although, probably due to lack of
my knowledge here, I wonder how does it work when an index is build
concurrently?

> Benchmarks:
> -----------
>
> Test: pgbench -f gin-WAL-test.sql -t 5:
> ---------------------------------------
> master:
> Latency average: 27696.299 ms
> WAL size: 2.66 GB

Does it makes sense to measure latency of the entire script, since it contains
also some preparation work? Of course it still shows the difference between
patched version and master, but probably in a more noisy way.

I'm moving this patch to the next CF.


Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin andsp-gist

От
Andrey Lepikhov
Дата:

On 30.11.2018 15:10, Dmitry Dolgov wrote:
> Thank you. I played a bit with this patch, and can confirm visible WAL size
> reduction (it's rather obvious, but still). Although, probably due to lack of
> my knowledge here, I wonder how does it work when an index is build
> concurrently?

Routine generate_xlog_for_rel() works only at phase 2 of concurrent 
index building.
At the phase 3, during validate_index() execution we use aminsert() -> 
PlaceToPage() mechanism as before the patch.
In the concurrent mode, I do not expect any problems, caused by the patch.

> 
>> Benchmarks:
>> -----------
>>
>> Test: pgbench -f gin-WAL-test.sql -t 5:
>> ---------------------------------------
>> master:
>> Latency average: 27696.299 ms
>> WAL size: 2.66 GB
> 
> Does it makes sense to measure latency of the entire script, since it contains
> also some preparation work? Of course it still shows the difference between
> patched version and master, but probably in a more noisy way.

Ok. It is used only for demonstration.

-- 
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company


Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin andsp-gist

От
Michael Paquier
Дата:
On Tue, Dec 18, 2018 at 10:41:48AM +0500, Andrey Lepikhov wrote:
> Ok. It is used only for demonstration.

The latest patch set needs a rebase, so moved to next CF, waiting on
author as this got no reviews.
--
Michael

Вложения

Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin andsp-gist

От
Andrey Lepikhov
Дата:
On 04.02.2019 10:04, Michael Paquier wrote:
> On Tue, Dec 18, 2018 at 10:41:48AM +0500, Andrey Lepikhov wrote:
>> Ok. It is used only for demonstration.
>
> The latest patch set needs a rebase, so moved to next CF, waiting on
> author as this got no reviews.
The new version in attachment.
> --
> Michael
>

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

Вложения

Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin andsp-gist

От
Andrey Lepikhov
Дата:
The patchset had a problem with all-zero pages, has appeared at index
build stage: the generic_log_relation() routine sends all pages into the
WAL. So  lsn field at all-zero page was initialized and the
PageIsVerified() routine detects it as a bad page.
The solution may be:
1. To improve index build algorithms and eliminate the possibility of
not used pages appearing.
2. To mark each page as 'dirty' right after initialization. In this case
we will got 'empty' page instead of the all-zeroed.
3. Do not write into the WAL all-zero pages.

In the patchset (see attachment) I used approach No.3.

On 04.02.2019 10:04, Michael Paquier wrote:
> On Tue, Dec 18, 2018 at 10:41:48AM +0500, Andrey Lepikhov wrote:
>> Ok. It is used only for demonstration.
>
> The latest patch set needs a rebase, so moved to next CF, waiting on
> author as this got no reviews.
> --
> Michael
>

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

Вложения

Re: Re: Reduce amount of WAL generated by CREATE INDEX for gist, ginand sp-gist

От
David Steele
Дата:
On 2/6/19 2:08 PM, Andrey Lepikhov wrote:
> The patchset had a problem with all-zero pages, has appeared at index
> build stage: the generic_log_relation() routine sends all pages into the
> WAL. So  lsn field at all-zero page was initialized and the
> PageIsVerified() routine detects it as a bad page.
> The solution may be:
> 1. To improve index build algorithms and eliminate the possibility of
> not used pages appearing.
> 2. To mark each page as 'dirty' right after initialization. In this case
> we will got 'empty' page instead of the all-zeroed.
> 3. Do not write into the WAL all-zero pages.
> 
> In the patchset (see attachment) I used approach No.3.
> 
> On 04.02.2019 10:04, Michael Paquier wrote:
>> On Tue, Dec 18, 2018 at 10:41:48AM +0500, Andrey Lepikhov wrote:
>>> Ok. It is used only for demonstration.
>>
>> The latest patch set needs a rebase, so moved to next CF, waiting on
>> author as this got no reviews.

The patch no longer applies so marked Waiting on Author.

Alexander, Heikki, are either of you planning to review the patch in 
this CF?

Regards,
-- 
-David
david@pgmasters.net


Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin andsp-gist

От
Heikki Linnakangas
Дата:
On 25/03/2019 09:57, David Steele wrote:
> On 2/6/19 2:08 PM, Andrey Lepikhov wrote:
>> The patchset had a problem with all-zero pages, has appeared at index
>> build stage: the generic_log_relation() routine sends all pages into the
>> WAL. So  lsn field at all-zero page was initialized and the
>> PageIsVerified() routine detects it as a bad page.
>> The solution may be:
>> 1. To improve index build algorithms and eliminate the possibility of
>> not used pages appearing.
>> 2. To mark each page as 'dirty' right after initialization. In this case
>> we will got 'empty' page instead of the all-zeroed.
>> 3. Do not write into the WAL all-zero pages.

Hmm. When do we create all-zero pages during index build? That seems 
pretty surprising.

>> On 04.02.2019 10:04, Michael Paquier wrote:
>>> On Tue, Dec 18, 2018 at 10:41:48AM +0500, Andrey Lepikhov wrote:
>>>> Ok. It is used only for demonstration.
>>>
>>> The latest patch set needs a rebase, so moved to next CF, waiting on
>>> author as this got no reviews.
> 
> The patch no longer applies so marked Waiting on Author.
> 
> Alexander, Heikki, are either of you planning to review the patch in
> this CF?

I had another quick look.

I still think using the "generic xlog AM" for this is a wrong level of 
abstraction, and we should use the XLOG_FPI records for this directly. 
We can extend XLOG_FPI so that it can store multiple pages in a single 
record, if it doesn't already handle it.

Another counter-point to using the generic xlog record is that you're 
currently doing unnecessary two memcpy's of all pages in the index, in 
GenericXLogRegisterBuffer() and GenericXLogFinish(). That's not free.

I guess the generic_log_relation() function can stay where it is, but it 
should use XLogRegisterBuffer() and XLogInsert() directly.

- Heikki


Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin andsp-gist

От
Andrey Lepikhov
Дата:

On 25/03/2019 15:21, Heikki Linnakangas wrote:
> On 25/03/2019 09:57, David Steele wrote:
>> On 2/6/19 2:08 PM, Andrey Lepikhov wrote:
>>> The patchset had a problem with all-zero pages, has appeared at index
>>> build stage: the generic_log_relation() routine sends all pages into the
>>> WAL. So  lsn field at all-zero page was initialized and the
>>> PageIsVerified() routine detects it as a bad page.
>>> The solution may be:
>>> 1. To improve index build algorithms and eliminate the possibility of
>>> not used pages appearing.
>>> 2. To mark each page as 'dirty' right after initialization. In this case
>>> we will got 'empty' page instead of the all-zeroed.
>>> 3. Do not write into the WAL all-zero pages.
> 
> Hmm. When do we create all-zero pages during index build? That seems 
> pretty surprising.
GIST uses buffered pages. During GIST build it is possible (very rarely) 
what no one index tuple was written to the block page before new block 
was allocated. And the page has become an all-zero page.
You can't have problems in the current GIST code, because it writes into 
the WAL only changed pages.
But the idea of the patch is traversing blocks of index relation 
one-by-one after end of index building process and write this blocks to 
the WAL. In this case we will see all-zeroed pages and need to check it.
> 
>>> On 04.02.2019 10:04, Michael Paquier wrote:
>>>> On Tue, Dec 18, 2018 at 10:41:48AM +0500, Andrey Lepikhov wrote:
>>>>> Ok. It is used only for demonstration.
>>>>
>>>> The latest patch set needs a rebase, so moved to next CF, waiting on
>>>> author as this got no reviews.
>>
>> The patch no longer applies so marked Waiting on Author.
>>
>> Alexander, Heikki, are either of you planning to review the patch in
>> this CF?
> 
> I had another quick look.
> 
> I still think using the "generic xlog AM" for this is a wrong level of 
> abstraction, and we should use the XLOG_FPI records for this directly. 
> We can extend XLOG_FPI so that it can store multiple pages in a single 
> record, if it doesn't already handle it.
> 
> Another counter-point to using the generic xlog record is that you're 
> currently doing unnecessary two memcpy's of all pages in the index, in 
> GenericXLogRegisterBuffer() and GenericXLogFinish(). That's not free.
> 
> I guess the generic_log_relation() function can stay where it is, but it 
> should use XLogRegisterBuffer() and XLogInsert() directly.
Ok. This patch waited feedback for a long time. I will check the GIST 
code changes from previous review and will try to use your advice.
> 
> - Heikki

-- 
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company


Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin andsp-gist

От
Heikki Linnakangas
Дата:
On 26/03/2019 11:29, Andrey Lepikhov wrote:
> On 25/03/2019 15:21, Heikki Linnakangas wrote:
>> Hmm. When do we create all-zero pages during index build? That seems
>> pretty surprising.
>
> GIST uses buffered pages. During GIST build it is possible (very rarely)
> what no one index tuple was written to the block page before new block
> was allocated. And the page has become an all-zero page.
> You can't have problems in the current GIST code, because it writes into
> the WAL only changed pages.

Looking at the code, I don't see how that could happen. The only place 
where the GiST index file is extended is in gistNewBuffer(), and all 
callers of that initialize the page immediately after the call. What am 
I missing?

- Heikki


Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin andsp-gist

От
Andrey Lepikhov
Дата:

On 26/03/2019 15:59, Heikki Linnakangas wrote:
> On 26/03/2019 11:29, Andrey Lepikhov wrote:
>> On 25/03/2019 15:21, Heikki Linnakangas wrote:
>>> Hmm. When do we create all-zero pages during index build? That seems
>>> pretty surprising.
>>
>> GIST uses buffered pages. During GIST build it is possible (very rarely)
>> what no one index tuple was written to the block page before new block
>> was allocated. And the page has become an all-zero page.
>> You can't have problems in the current GIST code, because it writes into
>> the WAL only changed pages.
> 
> Looking at the code, I don't see how that could happen. The only place 
> where the GiST index file is extended is in gistNewBuffer(), and all 
> callers of that initialize the page immediately after the call. What am 
> I missing?
Sorry, This issue was found in SP-GiST AM. You can show it:
1. Apply v2 version of the patch set (see attachment).
2. In the generic_log_relation() routine set logging on PageIsNew(buf)
3. Run script t1.sql (in attachment).

This problem can be resolved by calling MarkBufferDirty() after 
SpGistInitBuffer() in the allocNewBuffer() routine. But in this case we 
will write to the WAL more pages than necessary.
To avoid it in the patch '0001-Relation-into-WAL-function' I do not 
write new pages to the WAL.

Attached patch set is not final version. It is needed for demonstration 
of 'all-zero pages' issue only. The sentence for the direct use of 
XLOG_FPI records will be considered in v3.

-- 
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

Вложения

Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin andsp-gist

От
Andrey Lepikhov
Дата:

On 25/03/2019 15:21, Heikki Linnakangas wrote:
> I had another quick look.
> 
> I still think using the "generic xlog AM" for this is a wrong level of 
> abstraction, and we should use the XLOG_FPI records for this directly. 
> We can extend XLOG_FPI so that it can store multiple pages in a single 
> record, if it doesn't already handle it.
> 
> Another counter-point to using the generic xlog record is that you're 
> currently doing unnecessary two memcpy's of all pages in the index, in 
> GenericXLogRegisterBuffer() and GenericXLogFinish(). That's not free.
> 
> I guess the generic_log_relation() function can stay where it is, but it 
> should use XLogRegisterBuffer() and XLogInsert() directly.

Patch set v.3 uses XLOG_FPI records directly.

As a benchmark I use the script (test.sql in attachment) which show WAL 
size increment during index build. In the table below you can see the 
influence of the patch on WAL growth.

Results
=======
AM    | master | patch |
GIN    | 347 MB | 66 MB |
GiST    | 157 MB | 43 MB |
SP-GiST | 119 MB | 38 MB |

-- 
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

Вложения

Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin andsp-gist

От
Heikki Linnakangas
Дата:
On 02/04/2019 08:58, Andrey Lepikhov wrote:
> On 25/03/2019 15:21, Heikki Linnakangas wrote:
>> I had another quick look.
>>
>> I still think using the "generic xlog AM" for this is a wrong level of
>> abstraction, and we should use the XLOG_FPI records for this directly.
>> We can extend XLOG_FPI so that it can store multiple pages in a single
>> record, if it doesn't already handle it.
>>
>> Another counter-point to using the generic xlog record is that you're
>> currently doing unnecessary two memcpy's of all pages in the index, in
>> GenericXLogRegisterBuffer() and GenericXLogFinish(). That's not free.
>>
>> I guess the generic_log_relation() function can stay where it is, but it
>> should use XLogRegisterBuffer() and XLogInsert() directly.
> 
> Patch set v.3 uses XLOG_FPI records directly.

Thanks! Committed, with some changes:

* I moved the log_relation() function to xlog.c, so that it sits beside 
log_newpage_*() functions. I renamed it to log_newpage_range(), and 
changed the argument so that the caller provides the beginning and end 
block ranges. I added a 'page_std' flag, instead of just assuming that 
all pages use the standard page layout. All of the callers pass 
page_std=true at the moment, but seems better to be explicit about it.

I made those changes because I felt that the function was too narrowly 
tailored for the current callers. The assumption about standard page 
layout, and also the fact that it only logged the main fork. It's more 
flexible now, for any future AMs that might not be exactly like that. It 
feels like it's at the same level of abstraction now as the other 
log_newpage_*() functions. Even if we never need the flexibility, I 
think making the 'page_std' and 'forknum' arguments explicit is good, to 
draw attention to those details, for anyone calling the function.

* I fixed the REDO code. It was trivially broken, it only restored the 
first page in each FPI WAL record.

* Using "fake" unlogged LSNs for GiST index build seemed fishy. I could 
not convince myself that it was safe in all corner cases. In a recently 
initdb'd cluster, it's theoretically possible that the fake LSN counter 
overtakes the real LSN value, and that could lead to strange behavior. 
For example, how would the buffer manager behave, if there was a dirty 
page in the buffer cache with an LSN value that's greater than the 
current WAL flush pointer? I think you'd get "ERROR: xlog flush request 
%X/%X is not satisfied --- flushed only to %X/%X".

I changed that so that we use LSN 1 for all pages during index build. 
That's smaller than any real or fake LSN. Or actually, the fake LSN 
counter used to start at 1 - I bumped that up to 1000, so now it's 
safely smaller. Could've used 0 instead, but there's an assertion in 
gist scan code that didn't like that.

> As a benchmark I use the script (test.sql in attachment) which show WAL
> size increment during index build. In the table below you can see the
> influence of the patch on WAL growth.
> 
> Results
> =======
> AM    | master | patch |
> GIN    | 347 MB | 66 MB |
> GiST    | 157 MB | 43 MB |
> SP-GiST | 119 MB | 38 MB |

Nice!

- Heikki