Re: 7.4Beta1 hang?

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: 7.4Beta1 hang?
Дата
Msg-id 22788.1060545006@sss.pgh.pa.us
обсуждение исходный текст
Ответ на 7.4Beta1 hang?  (Robert Creager <Robert_Creager@LogicalChaos.org>)
Список pgsql-hackers
>> What is the index temp_obs_i_obs_id_index on, exactly?  Is it a serial
>> column or some such?

> Yup:

Okay, that explains it then.  In a serial column's index, essentially
all splits will be on the rightmost page of the tree.  This means that
when bt_split tries to get a new free page, it will almost always be
holding lock on the most recently acquired free page (since that was
the righthand side of the previous split).  That's the factor that makes
the coincidence likely.  A vacuum running concurrently with a page split
may mistakenly place a just-used page back into FSM (if the page gets
used between the time vacuum examines it and the time vacuum finishes
and loads its results into FSM).  So if that happens, and said page is
the first to be returned by FSM for the next split, you lose.

I've committed the attached fix.
        regards, tom lane

*** src/backend/access/nbtree/nbtpage.c.orig    Fri Aug  8 17:47:01 2003
--- src/backend/access/nbtree/nbtpage.c    Sun Aug 10 15:32:16 2003
***************
*** 409,414 ****
--- 409,430 ----          * that the page is still free.  (For example, an already-free          * page could have been
re-usedbetween the time the last VACUUM          * scanned it and the time the VACUUM made its FSM updates.)
 
+          *
+          * In fact, it's worse than that: we can't even assume that it's
+          * safe to take a lock on the reported page.  If somebody else
+          * has a lock on it, or even worse our own caller does, we could
+          * deadlock.  (The own-caller scenario is actually not improbable.
+          * Consider an index on a serial or timestamp column.  Nearly all
+          * splits will be at the rightmost page, so it's entirely likely
+          * that _bt_split will call us while holding a lock on the page most
+          * recently acquired from FSM.  A VACUUM running concurrently with
+          * the previous split could well have placed that page back in FSM.)
+          *
+          * To get around that, we ask for only a conditional lock on the
+          * reported page.  If we fail, then someone else is using the page,
+          * and we may reasonably assume it's not free.  (If we happen to be
+          * wrong, the worst consequence is the page will be lost to use till
+          * the next VACUUM, which is no big problem.)          */         for (;;)         {
***************
*** 416,431 ****             if (blkno == InvalidBlockNumber)                 break;             buf = ReadBuffer(rel,
blkno);
!             LockBuffer(buf, access);
!             page = BufferGetPage(buf);
!             if (_bt_page_recyclable(page))             {
!                 /* Okay to use page.  Re-initialize and return it */
!                 _bt_pageinit(page, BufferGetPageSize(buf));
!                 return buf;             }
-             elog(DEBUG2, "FSM returned nonrecyclable page");
-             _bt_relbuf(rel, buf);         }          /*
--- 432,455 ----             if (blkno == InvalidBlockNumber)                 break;             buf = ReadBuffer(rel,
blkno);
!             if (ConditionalLockBuffer(buf))             {
!                 page = BufferGetPage(buf);
!                 if (_bt_page_recyclable(page))
!                 {
!                     /* Okay to use page.  Re-initialize and return it */
!                     _bt_pageinit(page, BufferGetPageSize(buf));
!                     return buf;
!                 }
!                 elog(DEBUG2, "FSM returned nonrecyclable page");
!                 _bt_relbuf(rel, buf);
!             }
!             else
!             {
!                 elog(DEBUG2, "FSM returned nonlockable page");
!                 /* couldn't get lock, so just drop pin */
!                 ReleaseBuffer(buf);             }         }          /*
*** src/backend/storage/buffer/bufmgr.c.orig    Sun Aug  3 23:00:55 2003
--- src/backend/storage/buffer/bufmgr.c    Sun Aug 10 15:17:28 2003
***************
*** 1937,1942 ****
--- 1937,1973 ---- }  /*
+  * Acquire the cntx_lock for the buffer, but only if we don't have to wait.
+  *
+  * This assumes the caller wants BUFFER_LOCK_EXCLUSIVE mode.
+  */
+ bool
+ ConditionalLockBuffer(Buffer buffer)
+ {
+     BufferDesc *buf;
+ 
+     Assert(BufferIsValid(buffer));
+     if (BufferIsLocal(buffer))
+         return true;            /* act as though we got it */
+ 
+     buf = &(BufferDescriptors[buffer - 1]);
+ 
+     if (LWLockConditionalAcquire(buf->cntx_lock, LW_EXCLUSIVE))
+     {
+         /*
+          * This is not the best place to set cntxDirty flag (eg indices do
+          * not always change buffer they lock in excl mode). But please
+          * remember that it's critical to set cntxDirty *before* logging
+          * changes with XLogInsert() - see comments in BufferSync().
+          */
+         buf->cntxDirty = true;
+ 
+         return true;
+     }
+     return false;
+ }
+ 
+ /*  * LockBufferForCleanup - lock a buffer in preparation for deleting items  *  * Items may be deleted from a disk
pageonly when the caller (a) holds an
 
*** src/include/storage/bufmgr.h.orig    Sun Aug  3 23:01:42 2003
--- src/include/storage/bufmgr.h    Sun Aug 10 15:12:06 2003
***************
*** 180,185 ****
--- 180,186 ----  extern void UnlockBuffers(void); extern void LockBuffer(Buffer buffer, int mode);
+ extern bool ConditionalLockBuffer(Buffer buffer); extern void LockBufferForCleanup(Buffer buffer);  extern void
AbortBufferIO(void);


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Larry Rosenman
Дата:
Сообщение: Re: Another day, another SCO Compiler Error...
Следующее
От: Alexey Mahotkin
Дата:
Сообщение: Proper Unicode support