Re: tests against running server occasionally fail, postgres_fdw & tenk1

Поиск

Список

Период

Сортировка

От	Andres Freund
Тема	Re: tests against running server occasionally fail, postgres_fdw & tenk1
Дата	7 февраля 2023 г. 06:29:46
Msg-id	20230207032946.nxow3yqlfaefsp2d@awork3.anarazel.de обсуждение исходный текст
Ответ на	Re: tests against running server occasionally fail, postgres_fdw & tenk1 (Andres Freund <andres@anarazel.de>)
Ответы	Re: tests against running server occasionally fail, postgres_fdw & tenk1 (Andres Freund <andres@anarazel.de>)
Список	pgsql-hackers

Дерево обсуждения

Hi,

On 2023-02-06 17:53:00 -0800, Andres Freund wrote:
> WRT the fdw_retry_check: I wonder if we should increase the log level of
> a) pgfdw_inval_callback deciding to disconnect
> b) ReceiveSharedInvalidMessages() deciding to reset
>
> to DEBUG1, at least temporarily?
>
> Alternatively we could add a
> SET log_min_messages=debug4;
> ...
> RESET log_min_messages;
>
> around the postgres_fdw connection re-establishment test?
>
>
> One thing nudging me towards the more global approach is that I have the vague
> feelign that there's a wider issue around hitting more sinval resets than we
> should - and it'd right now be very hard to know about that.

Luckily it proved to be easy enough to reproduce on a private branch, by
setting the test to repeat a couple times.

I set the aforementioned log messages to LOG. And indeed:

2023-02-07 02:54:18.773 UTC [10800][client backend] [pg_regress/postgres_fdw][7/10526:0] LOG:  cache state reset
2023-02-07 02:54:18.773 UTC [10800][client backend] [pg_regress/postgres_fdw][7/10526:0] LOG:  discarding connection
0x802251f00

that was preceded by another log message less than 200 ms before:
2023-02-07 02:54:18.588 UTC [10800][client backend] [pg_regress/postgres_fdw][7/10523:55242] STATEMENT:  ALTER SERVER
loopbackOPTIONS (application_name 'fdw_retry_check');

The log statements inbetween are about isolation/reindex-concurrently-toast
and pg_regress/indexing.

So the problem is indeed that we somehow quickly overflow the sinval queue. I
guess we need a bit more logging around the size of the sinval queue and its
"fullness"?

I'm a bit surprised that MAXNUMMESSAGES is a hardcoded 4096.  It's not
particularly surprising that that's quickly overflown?

There's something off. Isolationtester's control connection emits *loads* of
invalidation messages:
2023-02-06 19:29:06.430 PST [2125297][client backend][6/0:121864][isolation/receipt-report/control connection] LOG:
previouslyemitted 7662 messages, 21 this time

2023-02-06 19:29:06.566 PST [2125297][client backend][6/0:121873][isolation/receipt-report/control connection] LOG:
previouslyemitted 8155 messages, 99 this time

2023-02-06 19:29:06.655 PST [2125297][client backend][6/0:121881][isolation/receipt-report/control connection] LOG:
previouslyemitted 8621 messages, 99 this time

2023-02-06 19:29:06.772 PST [2125297][client backend][6/0:121892][isolation/receipt-report/control connection] LOG:
previouslyemitted 9208 messages, 85 this time

2023-02-06 19:29:06.867 PST [2125297][client backend][6/0:121900][isolation/receipt-report/control connection] LOG:
previouslyemitted 9674 messages, 85 this time

and this happens with lots of other tests.

Greetings,

Andres Freund

PS: The reindex-concurrently-toast output seems to indicate something is
broken in the test... There's lots of non-existing table references in the
expected file, without that immediately making sense.

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Amit Kapila
Дата: 07 февраля 2023 г., 06:16:48
Сообщение: Re: Perform streaming logical transactions by background workers and parallel apply

Следующее

От: Amit Kapila
Дата: 07 февраля 2023 г., 06:30:13
Сообщение: Re: Exit walsender before confirming remote flush in logical replication

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: tests against running server occasionally fail, postgres_fdw & tenk1

Предыдущее

Следующее