Re: GiST VACUUM

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: GiST VACUUM
Дата
Msg-id 738ecf7d-abc1-ed64-6af7-09eb6d9f3d2f@iki.fi
обсуждение исходный текст
Ответ на Re: GiST VACUUM  (Andrey Borodin <x4mmm@yandex-team.ru>)
Ответы Re: GiST VACUUM  (Andrey Borodin <x4mmm@yandex-team.ru>)
Re: GiST VACUUM  (Heikki Linnakangas <hlinnaka@iki.fi>)
Список pgsql-hackers
On 21/03/2019 18:06, Andrey Borodin wrote:
>> 21 марта 2019 г., в 2:30, Heikki Linnakangas <hlinnaka@iki.fi>
>> написал(а): one remaining issue that needs to be fixed:
>> 
>> During Hot Standby, the B-tree code writes a WAL reord, when a
>> deleted page is recycled, to prevent the deletion from being
>> replayed too early in the hot standby. See _bt_getbuf() and
>> btree_xlog_reuse_page(). I think we need to do something similar in
>> GiST.
>> 
>> I'll try fixing that tomorrow, unless you beat me to it. Making
>> the changes is pretty straightforward, but it's a bit cumbersome to
>> test.
> 
> I've tried to deal with it and stuck...

So, I came up with the attached. I basically copy-pasted the page-reuse 
WAL-logging stuff from nbtree.

When I started testing this, I quickly noticed that empty pages were not 
being deleted nearly as much as I expected. I tracked it to this check 
in gistdeletepage:

> +       if (GistFollowRight(leafPage)
> +               || GistPageGetNSN(parentPage) > GistPageGetNSN(leafPage))
> +       {
> +               /* Don't mess with a concurrent page split. */
> +               return false;
> +       }

That NSN test was bogus. It prevented the leaf page from being reused, 
if the parent page was *ever* split after the leaf page was created. I 
don't see any reason to check the NSN here. The NSN is normally used to 
detect if a (leaf) page has been concurrently split, when you descend 
the tree. We don't need to care about that here; as long as the 
FOLLOW_RIGHT flag is not set, the page has a downlink, and if we can 
find the downlink and the page is empty, we can delete it.

After removing that bogus NSN check, page reuse become much more 
effective. I've been testing this by running this test script repeatedly:

----------
/*
create sequence gist_test_seq;
create table gist_point_tbl(id int4, p point);
create index gist_pointidx on gist_point_tbl using gist(p);
*/

insert into gist_point_tbl (id, p)
    select nextval('gist_test_seq'), point(nextval('gist_test_seq'), 
1000 + g) from generate_series(1, 10000) g;

delete from gist_point_tbl where id < currval('gist_test_seq') - 20000;
vacuum gist_point_tbl;

select pg_table_size('gist_point_tbl'), pg_indexes_size('gist_point_tbl');
----------

It inserts a bunch of rows, deletes a bunch of older rows, and vacuums. 
The interesting thing here is that the key values keep "moving", so that 
new tuples are added to different places than where old ones are 
removed. That's the case where page reuse is needed.

Before this patch, the index bloated very quickly. With the patch, it 
still bloats, because we still don't delete internal pages. But it's a 
small fraction of the bloat you got before.

Attached is the latest patch version, to be applied on top of the 
IntegerSet patch.

> I think we should make B-tree WAL record for this shared, remove
> BlockNumber and other unused stuff, leaving just xid and db oid. And
> reuse this record for B-tree, GiST and GIN (yeah, it is not checking
> for that conflict).
Good point. For now, I didn't try to generalize this, but perhaps we 
should.

> Though, I'm not sure it is important for GIN. Scariest thing that can
> happen: it will return same tid twice. But it is doing bitmap scan,
> you cannot return same bit twice...

Hmm. Could it return a completely unrelated tuple? We don't always 
recheck the original index quals in a bitmap index scan, IIRC. Also, a 
search might get confused if it's descending down a posting tree, and 
lands on a different kind of a page, altogether.

Alexander, you added the mechanism to GIN recently, to prevent pages 
from being reused too early (commit 52ac6cd2d0). Do we need something 
like B-tree's REUSE_PAGE records in GIN, too, to prevent the same bug 
from happening in hot standby?


PS. for Gist, we could almost use the LSN / NSN mechanism to detect the 
case that a deleted page is reused: Add a new field to the GiST page 
header, to store a new "deleteNSN" field. When a page is deleted, the 
deleted page's deleteNSN is set to the LSN of the deletion record. When 
the page is reused, the deleteNSN field is kept unchanged. When you 
follow a downlink during search, if you see that the page's deleteNSN > 
parent's LSN, you know that it was concurrently deleted and recycled, 
and should be ignored. That would allow reusing deleted pages 
immediately. Unfortunately that would require adding a new field to the 
gist page header/footer, which requires upgrade work :-(. Maybe one day, 
we'll bite the bullet. Something to keep in mind, if we have to change 
the page format anyway, for some reason.

- Heikki

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrey Borodin
Дата:
Сообщение: Re: GiST VACUUM
Следующее
От: David Steele
Дата:
Сообщение: Re: Re: ToDo: show size of partitioned table