Another nasty cache problem

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Another nasty cache problem
Дата
Msg-id 22885.949246873@sss.pgh.pa.us
обсуждение исходный текст
Ответы Re: [HACKERS] Another nasty cache problem  (Bruce Momjian <pgman@candle.pha.pa.us>)
Re: [HACKERS] Another nasty cache problem  (Peter Eisentraut <e99re41@DoCS.UU.SE>)
Re: [HACKERS] Another nasty cache problem  (Patrick Welche <prlw1@newn.cam.ac.uk>)
Список pgsql-hackers
I'm down to the point where the parallel tests mostly work with a small
SI buffer --- but they do still sometimes fail.  I've realized that
there is a whole class of bugs along the following lines:

There are plenty of routines that do two or more SearchSysCacheTuple
calls to get the information they need.  As the code stands, it is
unsafe to continue accessing the tuple returned by SearchSysCacheTuple
after making a second such call, because the second call could possibly
cause an SI cache reset message to be processed, thereby flushing the
contents of the caches.

heap_open and CommandCounterIncrement are other routines that could
cause cache entries to be dropped.

This is a very insidious kind of bug because the probability of
occurrence is very low (at normal SI buffer size a reset is unlikely,
and even if it happens, you won't observe a failure unless the
pfree'd tuple is actually overwritten before you're done with it).
So we cannot hope to catch these things by testing.

I am not sure what to do about it.  One solution path is to make
all the potential trouble spots do SearchSysCacheTupleCopy and then
pfree the copied tuple when done.  However, that adds a nontrivial
amount of overhead, and it'd be awfully easy to miss some trouble
spots or to introduce new ones in the future.

Another possibility is to introduce some sort of notion of a reference
count, and to make the standard usage pattern betuple = SearchSysCacheTuple(...);... use tuple
...ReleaseSysCacheTuple(tuple);
The idea here is that a tuple with positive refcount would not be
deleted during a cache reset, but would simply be removed from its
cache, and then finally deleted when released (or during elog
recovery).

This might allow us to get rid of SearchSysCacheTupleCopy, too,
since the refcount should be just as good as palloc'ing one's own
copy for most purposes.

I haven't looked at the callers of SearchSysCacheTuple to see whether
this would be a practical change to make.  I was wondering if anyone
had any comments or better ideas...
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Adriaan Joubert
Дата:
Сообщение: Re: [HACKERS] Bit strings
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: [HACKERS] Another nasty cache problem