Re: [HACKERS] GUC for cleanup indexes threshold.

Поиск
Список
Период
Сортировка
От Claudio Freire
Тема Re: [HACKERS] GUC for cleanup indexes threshold.
Дата
Msg-id CAGTBQpYOc-8wBdSr-fbGuOix7ayLz=HRPpXD-H=q8+GxrdjNAQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] GUC for cleanup indexes threshold.  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Список pgsql-hackers
On Fri, Sep 22, 2017 at 4:46 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> I apologize in advance of possible silliness.
>
> At Thu, 21 Sep 2017 13:54:01 -0300, Claudio Freire <klaussfreire@gmail.com> wrote in
<CAGTBQpYvgdqxVaiyui=BKrzw7ZZfTQi9KECUL4-Lkc2ThqX8QQ@mail.gmail.com>
>> On Tue, Sep 19, 2017 at 8:55 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>> > On Tue, Sep 19, 2017 at 4:47 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> >> Maybe this is looking at the problem from the wrong direction.
>> >>
>> >> Why can't the page be added to the FSM immediately and the check be
>> >> done at runtime when looking for a reusable page?
>> >>
>> >> Index FSMs currently store only 0 or 255, couldn't they store 128 for
>> >> half-recyclable pages and make the caller re-check reusability before
>> >> using it?
>> >
>> > No, because it's impossible for them to know whether or not the page
>> > that their index scan just landed on recycled just a second ago, or
>> > was like this since before their xact began/snapshot was acquired.
>> >
>> > For your reference, this RecentGlobalXmin interlock stuff is what
>> > Lanin & Shasha call "The Drain Technique" within "2.5 Freeing Empty
>> > Nodes". Seems pretty hard to do it any other way.
>>
>> I don't see the difference between a vacuum run and distributed
>> maintainance at _bt_getbuf time. In fact, the code seems to be in
>> place already.
>
> The pages prohibited to register as "free" by RecentGlobalXmin
> cannot be grabbed _bt_getbuf since the page is liked from nowhere
> nor FSM doesn't offer the pages is "free".

Yes, but suppose vacuum did add them to the FSM in the first round,
but with a special marker that differentiates them from immediately
recycleable ones.

>> _bt_page_recyclable seems to prevent old transactions from treating
>> those pages as recyclable already, and the description of the
>> technique in 2.5 doesn't seem to preclude doing the drain while doing
>> other operations. In fact, Lehman even considers the possibility of
>> multiple concurrent garbage collectors.
>
> _bt_page_recyclable prevent a vacuum scan from discarding pages
> that might be looked from any active transaction, and the "drain"
> itself is a technique to prevent freeing still-active pages so a
> scan using the "drain" technique is freely executed
> simultaneously with other transactions. The paper might allow
> concurrent GCs (or vacuums) but our nbtree is saying that no
> concurrent vacuum is assumed. Er... here it is.
>
> nbtpages.c:1589: _bt_unlink_halfdead_page
> | * right.  This search could fail if either the sibling or the target page
> | * was deleted by someone else meanwhile; if so, give up.  (Right now,
> | * that should never happen, since page deletion is only done in VACUUM
> | * and there shouldn't be multiple VACUUMs concurrently on the same
> | * table.)

Ok, yes, but we're not talking about halfdead pages, but deleted pages
that haven't been recycled yet.

>> It's only a matter of making the page visible in the FSM in a way that
>> can be efficiently skipped if we want to go directly to a page that
>> actually CAN be recycled to avoid looping forever looking for a
>> recyclable page in _bt_getbuf. In fact, that's pretty much Lehman's
>
> Mmm. What _bt_getbuf does is recheck the page given from FSM as a
> "free page". If FSM gives no more page, it just tries to extend
> the index relation. Or am I reading you wrongly?

On non-index FSMs, you can request a page that has at least N free bytes.

Index FSMs always mark pages as fully empty or fully full, no in-betweens,
but suppose we used that capability of the data structure to mark
"maybe recycleable"
pages with 50% free space, and "surely recycleable" pages with 100% free space.

Then _bt_getbuf could request for a 50% free page a few times, check if they're
recycleable (ie: check _bt_page_recyclable), and essentially do microvacuum on
that page, and if it cannot find a recycleable page, then try again with 100%
recycleable ones.

The code is almost there, only thing missing is the distinction
between "maybe recycleable"
and "surely recycleable" pages in the index FSM.

Take this with a grain of salt, I'm not an expert on that code. But it
seems feasible to me.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: [HACKERS] visual studio 2017 build support
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: [HACKERS] [PATCH] Generic type subscripting