Re: Windows buildfarm members vs. new async-notify isolation test

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Windows buildfarm members vs. new async-notify isolation test
Дата
Msg-id CAA4eK1KpRMRJG0krbiL8sUA9wZTVwvoHejEkJK2sVH2idG-rSQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Windows buildfarm members vs. new async-notify isolation test  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Windows buildfarm members vs. new async-notify isolation test  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Sun, Dec 8, 2019 at 1:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> So, just idly looking at the code in src/backend/port/win32/signal.c
> and src/port/kill.c, I have to wonder why we have this baroque-looking
> design of using *two* signal management threads.  And, if I'm
> reading it right, we create an entire new pipe object and an entire
> new instance of the second thread for each incoming signal.  Plus, the
> signal senders use CallNamedPipe (hence, underneath, TransactNamedPipe)
> which means they in effect wait for the recipient's signal-handling
> thread to ack receipt of the signal.  Maybe there's a good reason for
> all this but it sure seems like a lot of wasted cycles from here.
>
> I have to wonder why we don't have a single named pipe that lasts as
> long as the recipient process does, and a signal sender just writes
> one byte to it, and considers the signal delivered if it is able to
> do that.  The "message" semantics seem like overkill for that.
>
> I dug around in the contemporaneous archives and could only find
> https://www.postgresql.org/message-id/303E00EBDD07B943924382E153890E5434AA47%40cuthbert.rcsinc.local
> which describes the existing approach but fails to explain why we
> should do it like that.
>
> This might or might not have much to do with the immediate problem,
> but I can't help wondering if there's some race-condition-ish behavior
> in there that's contributing to what we're seeing.
>

On the receiving side, the work we do after the 'notify' is finished
(or before CallNamedPipe gets control back) is as follows:

pg_signal_dispatch_thread()
{
..
FlushFileBuffers(pipe);
DisconnectNamedPipe(pipe);
CloseHandle(pipe);

pg_queue_signal(sigNum);
}

It seems most of these are the system calls which makes me think that
they might be slow enough on some Windows version that it could lead
to such race condition.

Now, coming back to the other theory the scheduler is not able to
schedule these signal management threads.  I think if that would be
the case, then notify could not have finished, because CallNamedPipe
returns only when dispatch thread writes back to the pipe.   Now, if
somehow after writing back on the pipe if the scheduler kicks this
thread out, it is possible that we see such behavior, however, I am
not sure if we can do anything about that.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Karl O. Pinc"
Дата:
Сообщение: Re: proposal: minscale, rtrim, btrim functions for numeric
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Windows buildfarm members vs. new async-notify isolation test