Re: [HACKERS] Write Ahead Logging for Hash Indexes

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: [HACKERS] Write Ahead Logging for Hash Indexes
Дата
Msg-id CAA4eK1JCk-abtsVMMP8xqGZktUH73ZmLaZ6b_+-oCRtRkdqPrQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Write Ahead Logging for Hash Indexes  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: [HACKERS] Write Ahead Logging for Hash Indexes  (Amit Kapila <amit.kapila16@gmail.com>)
Re: [HACKERS] Write Ahead Logging for Hash Indexes  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Thu, Feb 16, 2017 at 7:15 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Feb 13, 2017 at 10:22 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> As discussed, attached are refactoring patches and a patch to enable
>> WAL for the hash index on top of them.
>
> Thanks.  I think that the refactoring patches shouldn't add
> START_CRIT_SECTION() and END_CRIT_SECTION() calls; let's leave that
> for the final patch.  Nor should their comments reference the idea of
> writing WAL; that should also be for the final patch.
>

Okay, changed as per suggestion.

> PageGetFreeSpaceForMulTups doesn't seem like a great name.
> PageGetFreeSpaceForMultipleTuples?

Okay, changed as per suggestion.

>  Or maybe just use
> PageGetExactFreeSpace and then do the account in the caller.  I'm not
> sure it's really worth having a function just to subtract a multiple
> of sizeof(ItemIdData), and it would actually be more efficient to have
> the caller take care of this, since you wouldn't need to keep
> recalculating the value for every iteration of the loop.
>

I have tried this one, but I think even if track outside we need
something like PageGetFreeSpaceForMultipleTuples for the cases when we
have to try next write page/'s where data (set of index tuples) can be
accommodated.

> I think we ought to just rip (as a preliminary patch) out the support
> for freeing an overflow page in the middle of a bucket chain.  That
> code is impossible to hit, I believe, and instead of complicating the
> WAL machinery to cater to a nonexistent case, maybe we ought to just
> get rid of it.
>

I also could not think of a case where we need to care about the page
in the middle of bucket chain as per it's current usage.  In
particular, are you worried about the code related to nextblock number
in _hash_freeovflpage()?   Surely, we can remove it, but what
complexity are you referring to here?   There is some additional
book-keeping for primary bucket page, but there is nothing much about
maintaining a backward link.  One more thing to note is that this is
an exposed API, so to avoid getting it used in some wrong way, we
might want to make it static.

> +                               /*
> +                                * We need to release and if required
> reacquire the lock on
> +                                * rbuf to ensure that standby
> shouldn't see an intermediate
> +                                * state of it.
> +                                */
>
> How does releasing and reacquiring the lock on the master affect
> whether the standby can see an intermediate state?

I am talking about the intermediate state of standby with respect to
the master.  We will release the lock on standby after replaying the
WAL for moving tuples from read page to write page whereas master will
hold that for the much longer period (till we move all the tuples from
read page and free that page).  I understand that there is no
correctness issue here, but was trying to make operations on master
and standby similar and another thing is that it will help us in
obeying the coding rule of releasing the lock on buffers after writing
WAL record.  I see no harm in maintaining the existing coding pattern,
however, if you think that is not important in this case, then I can
change the code to avoid releasing the lock on read page.

>  That sounds
> totally wrong to me (and also doesn't belong in a refactoring patch
> anyway since we're not emitting any WAL here).
>

Agreed that even if we want to do such a change, then also it should
be part of WAL patch.  So for now, I have reverted that change.

> -       Assert(PageIsNew(page));
>
> This worries me a bit.  What's going on here?
>

This is related to below change.

+ /*
+ * Initialise the freed overflow page, here we can't complete zeroed the
+ *
page as WAL replay routines expect pages to be initialized. See
+ * explanation of RBM_NORMAL
mode atop XLogReadBufferExtended.
+ */
+ _hash_pageinit(ovflpage, BufferGetPageSize
(ovflbuf));

Basically, we need this routine to initialize freed overflow pages.
Also, if you see the similar function in btree (_bt_pageinit(Page
page, Size size)), it doesn't have any such Assertion.

Attached are refactoring patches.  WAL patch needs some changes based
on above comments, so will post it later.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: [HACKERS] Parallel Append implementation
Следующее
От: Thom Brown
Дата:
Сообщение: [HACKERS] Partitioning vs ON CONFLICT