Re: postgres_fdw test timeouts

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: postgres_fdw test timeouts
Дата
Msg-id CA+hUKGLKZhs8Y-r1MxTDdiDvtbgaAGnCL=br5i4LDKGYd+e9YA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: postgres_fdw test timeouts  (Alexander Lakhin <exclusion@gmail.com>)
Ответы Re: postgres_fdw test timeouts  (Nathan Bossart <nathandbossart@gmail.com>)
Список pgsql-hackers
On Thu, Dec 7, 2023 at 10:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
> I think, I've found out what's going on here.
> The culprit is WSAEnumNetworkEvents() assisted by non-trivial logic of
> ExecAppendAsyncEventWait().
> For the case noccurred > 1, ExecAppendAsyncEventWait() performs a loop,
> where ExecAsyncNotify() is called for the first AsyncRequest, but the
> second one also processed inside, through a recursive call to
> ExecAppendAsyncEventWait():
>   -> ExecAsyncNotify -> produce_tuple_asynchronously
> -> ExecScan -> ExecInterpExpr -> ExecSetParamPlan -> ExecProcNodeFirst
> -> ExecAgg -> agg_retrieve_direct -> ExecProcNodeInstr -> ExecAppend
> -> ExecAppendAsyncEventWait.
> Here we get into the first loop and call ExecAsyncConfigureWait() for the
> second AsyncRequest (because we haven't reset it's callback_pending yet),
> and it leads to creating another WaitEvent for the second socket inside
> postgresForeignAsyncConfigureWait():
>      AddWaitEventToSet(set, WL_SOCKET_READABLE, PQsocket(fsstate->conn), ...

Oh, wow.  Nice detective work!  Thank you for figuring that out.

> So it looks like we have the same issue with multiple event handles
> associated with a single socket here.
> And v2-0003-Redesign-Windows-socket-event-management.patch from [1]
> "surprisingly" helps in this case as well (I could not see a failure for
> 100 iterations of 10 tests in parallel).

Yeah, this makes perfect sense.

So, commit 04a09ee is not guilty.  But as the saying goes, "no good
deed goes unpunished", and work on our Windows port seems to be
especially prone to backfiring when kludges combine...

Now we have the question of whether to go forwards (commit the "socket
table" thing), or backwards (revert 04a09ee for now to clear the CI
failures).  I don't love the hidden complexity of the socket table and
am not in a hurry to commit it, but I don't currently see another
way... on the other hand we have other CI flapping due to that problem
too so reverting 04a09ee would be sweeping problems under the carpet.
I still need to process your feedback/discoveries on that other thread
and it may take a few weeks for me to get to it.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dave Cramer
Дата:
Сообщение: Re: errors building on windows using meson
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: [CAUTION!! freemail] Re: Partial aggregates pushdown