Re: Why is src/test/modules/committs/t/002_standby.pl flaky?
От | Thomas Munro |
---|---|
Тема | Re: Why is src/test/modules/committs/t/002_standby.pl flaky? |
Дата | |
Msg-id | CA+hUKG+G5DUNJfdE-qusq5pcj6omYTuWmmFuxCvs=q1jNjkKKA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Why is src/test/modules/committs/t/002_standby.pl flaky? (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Why is src/test/modules/committs/t/002_standby.pl flaky?
(Andres Freund <andres@anarazel.de>)
|
Список | pgsql-hackers |
On Tue, Jan 25, 2022 at 3:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > I vote for reverting in release branches only. I'll propose a better > > WES patch set for master that hopefully also covers async append etc > > (which I was already planning to do before we knew about this Windows > > problem). More soon. > > WFM, but we'll have to remember to revert this in v15 if we don't > have a solid fix by then. Phew, after a couple of days of very slow compile/test cycles on Windows exploring a couple of different ideas, I finally have something new. First let me recap the three main ideas in this thread: 1. It sounds like no one really loves the WSAPoll() kludge, even though it apparently works for simple cases. It's not totally clear that it really works in enough cases, for one thing. It doesn't allow for a socket to be in two WESes at the same time, and I'm not sure I want to bank on Winsock's WSAPoll() being guaranteed to report POLLHUP when half closed (as mentioned, no other OS does AFAIK). 2. The long-lived-WaitEventSets-everywhere concept was initially appealling to me and solves the walreceiver problem (when combined with a sticky seen_fd_close flag), and I've managed to get that working correctly across libpq reconnects. As mentioned, I also have some toy patches along those lines for the equivalent but more complex problem in postgres_fdw, because I've been studying how to make parallel append generate a tidy stream of epoll_wait()/kevent() calls, instead of a quadratic explosion of setup/teardown spam. I'll write some more about those patches and hopefully propose them soon anyway, but on reflection I don't really want that Unix efficiency problem to be tangled up with this Windows correctness problem. That'd require a programming rule that I don't want to burden us with forever: you'd *never* be able to use a socket in more than one WaitEventSet, and WaitLatchOrSocket() would have to be removed. 3. The real solution to this problem is to recognise that we just have the event objects in the wrong place. WaitEventSets shouldn't own them: they need to be 1:1 with sockets, or Winsock will eat events. Likewise for the flag you need for edge->level conversion, or *we'll* eat events. Having now tried that, it's starting to feel like the best way forward, even though my initial prototype (see attached) is maybe a tad cumbersome with bookkeeping. I believe it means that all existing coding patterns *should* now be safe (not yet confirmed by testing), and we're free to put sockets in multiple WESes even at the same time if the need arises. The basic question is: how should a socket user find the associated event handle and flags? Some answers: 1. "pgsocket" could become a pointer to a heap-allocated wrapper object containing { socket, event, flags } on Windows, or something like that, but that seems a bit invasive and tangled up with public APIs like libpq, which put me off trying that. I'm willing to explore it if people object to my other idea. 2. "pgsocket" could stay unchanged, but we could have a parallel array with extra socket state, indexed by file descriptor. We could use new socket()/close() libpq events so that libpq's sockets could be registered in this scheme without libpq itself having to know anything about that. That worked pretty nicely when I developed it on my FreeBSD box, but on Windows I soon learned that SOCKET is really yet another name for HANDLE, so it's not a dense number space anchored at 0 like Unix file descriptors. The array could be prohibitively big. 3. I tried the same as #2 but with a hash table, and ran into another small problem when putting it all together: we probably don't want to longjump out of libpq callbacks on allocation failure. So, I modified simplehash to add a no-OOM behaviour. That's the POC patch set I'm attaching for show-and-tell. Some notes and TODOs in the commit messages and comments. Thoughts?
Вложения
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Kyotaro HoriguchiДата:
Сообщение: Re: Add connection active, idle time to pg_stat_activity