Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: Why is src/test/modules/committs/t/002_standby.pl flaky?
Дата
Msg-id CA+hUKG+G5DUNJfdE-qusq5pcj6omYTuWmmFuxCvs=q1jNjkKKA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Why is src/test/modules/committs/t/002_standby.pl flaky?  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Why is src/test/modules/committs/t/002_standby.pl flaky?
Список pgsql-hackers
On Tue, Jan 25, 2022 at 3:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > I vote for reverting in release branches only.  I'll propose a better
> > WES patch set for master that hopefully also covers async append etc
> > (which I was already planning to do before we knew about this Windows
> > problem).  More soon.
>
> WFM, but we'll have to remember to revert this in v15 if we don't
> have a solid fix by then.

Phew, after a couple of days of very slow compile/test cycles on
Windows exploring a couple of different ideas, I finally have
something new.  First let me recap the three main ideas in this
thread:

1.  It sounds like no one really loves the WSAPoll() kludge, even
though it apparently works for simple cases.  It's not totally clear
that it really works in enough cases, for one thing.  It doesn't allow
for a socket to be in two WESes at the same time, and I'm not sure I
want to bank on Winsock's WSAPoll() being guaranteed to report POLLHUP
when half closed (as mentioned, no other OS does AFAIK).

2.  The long-lived-WaitEventSets-everywhere concept was initially
appealling to me and solves the walreceiver problem (when combined
with a sticky seen_fd_close flag), and I've managed to get that
working correctly across libpq reconnects.  As mentioned, I also have
some toy patches along those lines for the equivalent but more complex
problem in postgres_fdw, because I've been studying how to make
parallel append generate a tidy stream of epoll_wait()/kevent() calls,
instead of a quadratic explosion of setup/teardown spam.  I'll write
some more about those patches and hopefully propose them soon anyway,
but on reflection I don't really want that Unix efficiency problem to
be tangled up with this Windows correctness problem.  That'd require a
programming rule that I don't want to burden us with forever: you'd
*never* be able to use a socket in more than one WaitEventSet, and
WaitLatchOrSocket() would have to be removed.

3.  The real solution to this problem is to recognise that we just
have the event objects in the wrong place.  WaitEventSets shouldn't
own them: they need to be 1:1 with sockets, or Winsock will eat
events.  Likewise for the flag you need for edge->level conversion, or
*we'll* eat events.  Having now tried that, it's starting to feel like
the best way forward, even though my initial prototype (see attached)
is maybe a tad cumbersome with bookkeeping.  I believe it means that
all existing coding patterns *should* now be safe (not yet confirmed
by testing), and we're free to put sockets in multiple WESes even at
the same time if the need arises.

The basic question is: how should a socket user find the associated
event handle and flags?  Some answers:

1.  "pgsocket" could become a pointer to a heap-allocated wrapper
object containing { socket, event, flags } on Windows, or something
like that, but that seems a bit invasive and tangled up with public
APIs like libpq, which put me off trying that.  I'm willing to explore
it if people object to my other idea.

2.  "pgsocket" could stay unchanged, but we could have a parallel
array with extra socket state, indexed by file descriptor.  We could
use new socket()/close() libpq events so that libpq's sockets could be
registered in this scheme without libpq itself having to know anything
about that.  That worked pretty nicely when I developed it on my
FreeBSD box, but on Windows I soon learned that SOCKET is really yet
another name for HANDLE, so it's not a dense number space anchored at
0 like Unix file descriptors.  The array could be prohibitively big.

3.  I tried the same as #2 but with a hash table, and ran into another
small problem when putting it all together: we probably don't want to
longjump out of libpq callbacks on allocation failure.  So, I modified
simplehash to add a no-OOM behaviour.  That's the POC patch set I'm
attaching for show-and-tell.  Some notes and TODOs in the commit
messages and comments.

Thoughts?

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: Add connection active, idle time to pg_stat_activity
Следующее
От: Greg Nancarrow
Дата:
Сообщение: Re: row filtering for logical replication