Re: Improving replay of XLOG_BTREE_VACUUM records

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Improving replay of XLOG_BTREE_VACUUM records
Дата
Msg-id 20150726124656.GC5143@awork2.anarazel.de
обсуждение исходный текст
Ответ на Re: Improving replay of XLOG_BTREE_VACUUM records  (Heikki Linnakangas <hlinnaka@iki.fi>)
Ответы Re: Improving replay of XLOG_BTREE_VACUUM records  (Michael Paquier <michael.paquier@gmail.com>)
Re: Improving replay of XLOG_BTREE_VACUUM records  (Greg Stark <stark@mit.edu>)
Список pgsql-hackers
On 2015-07-24 09:53:49 +0300, Heikki Linnakangas wrote:
> On 05/02/2015 02:10 AM, Jim Nasby wrote:
> >This looks like a good way to address this until the more significant
> >work can be done.
> >
> >I'm not a fan of "RBM_ZERO_NO_BM_VALID"; how about RBM_ZERO_BM_INVALID?
> >or BM_NOT_VALID? Or maybe I'm just trying to impose too much English on
> >the code; I see the logic to NO_BM_VALID...
> >
> >+ * RBM_ZERO_NO_BM_VALID is the same as RBM_ZERO_AND_LOCK, but does not set
> >+ * BM_VALID bit before returning buffer so that noone could pin it.
> >
> >It would be better to explain why we want that mode. How about:
> >
> >RBM_ZERO_NO_BM_VALID is the same as RBM_ZERO_AND_LOCK but does not set
> >BM_VALID before returning the buffer. This is done to ensure that no one
> >can pin the buffer without actually reading the buffer contents in. This
> >is necessary while replying XLOG_BTREE_VACUUM records in hot standby.
>
> That's a rather strange mode. The BM_VALID flag is internal to the buffer
> manager - if you don't know how the buffer manager works, you cannot
> understand that description. I couldn't quite understand what that means
> myself. What can or can you not do with a buffer that's not marked as
> BM_VALID? I'd suggest a mode like this instead:
>
> In RBM_IF_IN_CACHE mode, the page is returned if it is in the buffer cache
> already. If it's not, it is not read from disk, and InvalidBuffer is
> returned instead.

To me it sounds like this shouldn't go through the full ReadBuffer()
rigamarole. That code is already complex enough, and here it's really
not needed. I think it'll be much easier to review - and actually faster
in many cases to simply have something like

bool
BufferInCache(Relation, ForkNumber, BlockNumber)
{   /* XXX: setup tag, hash, partition */
   LWLockAcquire(newPartitionLock, LW_SHARED);   buf_id = BufTableLookup(&newTag, newHash);
LWLockRelease(newPartitionLock);  return buf_id != -1;
 
}

and then fall back to the normal ReadBuffer() when it's in cache.


Andres



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fabien COELHO
Дата:
Сообщение: Re: extend pgbench expressions with functions
Следующее
От: Andreas Seltenreich
Дата:
Сообщение: Failing assertions in indxpath.c, placeholder.c and brin_minmax.c