Race condition between hot standby and restoring a FPW

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Race condition between hot standby and restoring a FPW
Дата
Msg-id 546354E7.1050902@vmware.com
обсуждение исходный текст
Ответы Re: Race condition between hot standby and restoring a FPW  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Race condition between hot standby and restoring a FPW  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
There's a race condition between a backend running queries in hot 
standby mode, and restoring a full-page image from a WAL record. It's 
present in all supported versions.

RestoreBackupBlockContents does this:

>     buffer = XLogReadBufferExtended(bkpb.node, bkpb.fork, bkpb.block,
>                                     RBM_ZERO);
>     Assert(BufferIsValid(buffer));
>     if (get_cleanup_lock)
>         LockBufferForCleanup(buffer);
>     else
>         LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);

If the page is not in buffer cache yet, and a backend reads and locks 
the buffer after the above XLogReadBufferExtended call has zeroed it, 
but before it has locked it, the backend sees an empty page.

The principle of fixing that is straightforward: the zeroed page should 
not be visible to others, even momentarily. Unfortunately, I think 
that's going to require an API change to ReadBufferExtended(RBM_ZERO) :-(.

I can think of two ways to fix this:

1. Have ReadBufferExtended lock the page in RBM_ZERO mode, before 
returning it. That makes the API inconsistent, as the function would 
sometimes lock the page, and sometimes not.

2. When ReadBufferExtended doesn't find the page in cache, it returns 
the buffer in !BM_VALID state (i.e. still in I/O in-progress state). 
Require the caller to call a second function, after locking the page, to 
finish the I/O.

Anyone have a better idea?

- Heikki



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fujii Masao
Дата:
Сообщение: Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: HEAD seems to generate larger WAL regarding GIN index
Следующее
От: Robert Haas
Дата:
Сообщение: Re: using custom scan nodes to prototype parallel sequential scan