Re: [HACKERS] Possibly too stringent Assert() in b-tree code

Поиск
Список
Период
Сортировка
От Kyotaro HORIGUCHI
Тема Re: [HACKERS] Possibly too stringent Assert() in b-tree code
Дата
Msg-id 20180626.184748.29200427.horiguchi.kyotaro@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: Possibly too stringent Assert() in b-tree code  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: [HACKERS] Possibly too stringent Assert() in b-tree code  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Hello. I came from the following mail.


https://www.postgresql.org/message-id/flat/0F97FA9ABBDBE54F91744A9B37151A5116E4DD@g01jpexmbkw24#8be5cc9b7c3cf454a82e561d84f4118f

At Mon, 26 Sep 2016 09:12:04 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
<CAA4eK1K5YyDmndko0zzW6WNCN_DGFVHa6DCYcyuvkBWTH5+nUQ@mail.gmail.com>
> On Mon, Sep 26, 2016 at 8:54 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Sun, Sep 25, 2016 at 9:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Amit Kapila <amit.kapila16@gmail.com> writes:
> >>> On Fri, Sep 23, 2016 at 12:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>>> This is clearly an oversight in Simon's patch fafa374f2, which introduced
> >>>> this code without any consideration for the possibility that the page
> >>>> doesn't have a valid special area. ...
> >>>> but I'm not very clear on whether this is a safe fix, mainly because
> >>>> I don't understand what the reuse WAL record really accomplishes.
> >>>> Maybe we need to instead generate a reuse record with some special
> >>>> transaction ID indicating worst-case assumptions?
> >>
> >>> I think it is basically to ensure that the queries on standby should
> >>> not be considering the page being recycled as valid and if there is
> >>> any pending query to which this page is visible, cancel it.  If this
> >>> understanding is correct, then I think the fix you are proposing is
> >>> correct.
> >>
> >> After thinking about it some more, I do not understand why we are emitting
> >> WAL here at all in *any* case, either for a new page or for a deleted one
> >> that we're about to recycle.  Why wouldn't the appropriate actions have
> >> been taken when the page was deleted, or even before that when its last
> >> tuple was deleted?
> >>
> >
> > It seems to me that we do take actions for conflict resolution during
> > the page deletion (that looks to be covered by XLOG_HEAP2_CLEANUP_INFO
> > which we emit in vacuum), but not sure if that is sufficient.
> > Consider a case where the new transaction is started on standby after
> >
> 
> Here by new transaction, I intend to say some newer snapshot with
> valid MyPgXact->xmin.

I agree to the diagnosis. So the WAL record is not necessary if
it is a new page since no one cannot be grabbing it.

_bt_page_recyclable() has two callers. _bt_getbuf and
btvacuumpage. A comment in btvacuumpage is saying that.

>      * _bt_checkpage(), which will barf on an all-zero page. We want to
>      * recycle all-zero pages, not fail.  Also, we want to use a nondefault

So it is right thing for _bt_page_recyclable() to return true for
zeroed pages and it is consistent with the comment in
_bt_page_recyclable().

At the problematic point, a new page is safe to recycle but
there's no need to issue WAL record since it is not recycled with
the meaning that _bt_log_resuse_page() thinking.

> _bt_log_reuse_page(Relation rel, BlockNumber blkno, TransactionId latestRemovedXid)
> {
>   xl_btree_reuse_page xlrec_reuse;
> 
>   /*
>    * Note that we don't register the buffer with the record, because this
>    * operation doesn't modify the page. This record only exists to provide a
>    * conflict point for Hot Standby.
>    */

I think we have all requred testimony from exiting code and
comments. FWIW my conclustion is that Tom's solution upthread is
right thing to do and needs addition in comment.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index a24e64156a..b39634cafd 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -782,9 +782,12 @@ _bt_getbuf(Relation rel, BlockNumber blkno, int access)
                     /*
                      * If we are generating WAL for Hot Standby then create a
                      * WAL record that will allow us to conflict with queries
-                     * running on standby.
+                     * running on standby. No need to do that if the page is
+                     * all-zoeroed since no one cannot be interested in the
+                     * page on standbys. See _bt_page_recyclable().
                      */
-                    if (XLogStandbyInfoActive() && RelationNeedsWAL(rel))
+                    if (XLogStandbyInfoActive() && RelationNeedsWAL(rel) &&
+                        !PageIsNew())
                     {
                         BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: We still claim "cannot begin/end transactions in PL/pgSQL"
Следующее
От: Daniel Gustafsson
Дата:
Сообщение: Re: ssl_library parameter