Re: "ERROR: latch already owned" on gharial

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: "ERROR: latch already owned" on gharial
Дата
Msg-id 20240208214114.cpkib3tnfypjcjau@awork3.anarazel.de
обсуждение исходный текст
Ответ на Re: "ERROR: latch already owned" on gharial  (Heikki Linnakangas <hlinnaka@iki.fi>)
Ответы Re: "ERROR: latch already owned" on gharial  (Soumyadeep Chakraborty <soumyadeep2007@gmail.com>)
Список pgsql-hackers
Hi,

On 2024-02-08 14:57:47 +0200, Heikki Linnakangas wrote:
> On 08/02/2024 04:08, Soumyadeep Chakraborty wrote:
> > A possible ordering of events:
> > 
> > (1) DisownLatch() is called by pid Y during ProcKill() and the write for
> > latch->owner_pid = 0 is NOT yet flushed to shmem.
> > 
> > (2) The PGPROC object for pid Y is returned to the free list.
> > 
> > (3) Pid X sees the same PGPROC object on the free list and grabs it.
> > 
> > (4) Pid X does sanity check inside OwnLatch during InitProcess and
> > still sees the
> > old value of latch->owner_pid = Y (and not = 0), and trips the ERROR.
> > 
> > The above sequence of operations should apply to PG HEAD as well.
> > 
> > Suggestion:
> > 
> > Should we do a pg_memory_barrier() at the end of DisownLatch(), like in
> > ResetLatch(), like the one introduced in [3]? This would ensure that the write
> > latch->owner_pid = 0; is flushed to shmem. The attached patch does this.
> 
> Hmm, there is a pair of SpinLockAcquire() and SpinLockRelease() in
> ProcKill(), before step 3 can happen.

Right.  I wonder if the issue istead could be something similar to what was
fixed in 8fb13dd6ab5b and more generally in 97550c0711972a. If two procs go
through proc_exit() for the same process, you can get all kinds of weird
mixed up resource ownership.  The bug fixed in 8fb13dd6ab5b wouldn't apply,
but it's pretty easy to introduce similar bugs in other places, so it seems
quite plausible that greenplum might have done so.  We also did have more
proc_exit()s in signal handlers in older branches, so it might just be an
issue that also was present before.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Where can I find the doxyfile?
Следующее
От: Maiquel Grassi
Дата:
Сообщение: RE: Psql meta-command conninfo+