Re: Windows buildfarm members vs. new async-notify isolation test

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Windows buildfarm members vs. new async-notify isolation test
Дата
Msg-id 4412.1575748586@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Windows buildfarm members vs. new async-notify isolation test  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Windows buildfarm members vs. new async-notify isolation test  (Amit Kapila <amit.kapila16@gmail.com>)
Re: Windows buildfarm members vs. new async-notify isolation test  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
I wrote:
> Amit Kapila <amit.kapila16@gmail.com> writes:
>> On Sat, Dec 7, 2019 at 5:01 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> A possible theory as to what's happening is that the kernel scheduler
>>> is discriminating against listener2's signal management thread(s)
>>> and not running them until everything else goes idle for a moment.

>> If we have to believe that theory then why the other similar test is
>> not showing the problem.

> There are fewer processes involved in that case, so I don't think
> it disproves the theory that this is a scheduler glitch.

So, just idly looking at the code in src/backend/port/win32/signal.c
and src/port/kill.c, I have to wonder why we have this baroque-looking
design of using *two* signal management threads.  And, if I'm
reading it right, we create an entire new pipe object and an entire
new instance of the second thread for each incoming signal.  Plus, the
signal senders use CallNamedPipe (hence, underneath, TransactNamedPipe)
which means they in effect wait for the recipient's signal-handling
thread to ack receipt of the signal.  Maybe there's a good reason for
all this but it sure seems like a lot of wasted cycles from here.

I have to wonder why we don't have a single named pipe that lasts as
long as the recipient process does, and a signal sender just writes
one byte to it, and considers the signal delivered if it is able to
do that.  The "message" semantics seem like overkill for that.

I dug around in the contemporaneous archives and could only find
https://www.postgresql.org/message-id/303E00EBDD07B943924382E153890E5434AA47%40cuthbert.rcsinc.local
which describes the existing approach but fails to explain why we
should do it like that.

This might or might not have much to do with the immediate problem,
but I can't help wondering if there's some race-condition-ish behavior
in there that's contributing to what we're seeing.  We already had to
fix a couple of race conditions from doing it like this, cf commits
2e371183e, 04a4413c2, f27a4696f.  Perhaps 0ea1f2a3a is relevant
as well.

            regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: psql small improvement patch
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: ssl passphrase callback