Re: backends stuck in "startup"

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: backends stuck in "startup"
Дата
Msg-id 14525.1511397830@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: backends stuck in "startup"  (Justin Pryzby <pryzby@telsasoft.com>)
Ответы Re: backends stuck in "startup"  (Justin Pryzby <pryzby@telsasoft.com>)
Список pgsql-general
Justin Pryzby <pryzby@telsasoft.com> writes:
> For starters, I found that PID 27427 has:

> (gdb) p proc->lwWaiting
> $1 = 0 '\000'
> (gdb) p proc->lwWaitMode
> $2 = 1 '\001'

To confirm, this is LWLockAcquire's "proc", equal to MyProc?
If so, and if LWLockAcquire is blocked at PGSemaphoreLock,
that sure seems like a smoking gun.

> Note: I've compiled locally PG 10.1 with PREFERRED_SEMAPHORES=SYSV to keep the
> service up (and to the degree that serves to verify that avoids the issue,
> great).

Good idea, I was going to suggest that.  It will be very interesting
to see if that makes the problem go away.

> Would you suggest how I can maximize the likelyhood/speed of triggering that ?
> Five years ago, with a report of similar symptoms, you said "You need to hack
> pgbench to suppress the single initialization connection it normally likes to
> make, else the test degenerates to the one-incoming-connection case"
> https://www.postgresql.org/message-id/8896.1337998337%40sss.pgh.pa.us

I don't think that case was related at all.

My theory suggests that any contended use of an LWLock is at risk,
in which case just running pgbench with about as many sessions as
you have in the live server ought to be able to trigger it.  However,
that doesn't really account for your having observed the problem
only during session startup, so there may be some other factor
involved.  I wonder if it only happens during the first wait for
an LWLock ... and if so, how could that be?
        regards, tom lane


В списке pgsql-general по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: query causes connection termination
Следующее
От: Tom Lane
Дата:
Сообщение: Re: query causes connection termination