Re: Test failures of 100_bugs.pl

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Test failures of 100_bugs.pl
Дата
Msg-id CAA4eK1Jz876y92377VdL854XzbTAVuTG5FDEYim7sMZynk4fSA@mail.gmail.com
обсуждение исходный текст
Ответ на Test failures of 100_bugs.pl  (Andres Freund <andres@anarazel.de>)
Ответы RE: Test failures of 100_bugs.pl  ("Yu Shi (Fujitsu)" <shiy.fnst@fujitsu.com>)
Список pgsql-hackers
On Tue, Jan 24, 2023 at 8:53 AM Andres Freund <andres@anarazel.de> wrote:
>
> cfbot, the buildfarm and locally I have seen 100_bugs.pl fail
> occasionally. Just rarely enough that I never got around to looking into it
> for real.
>
...
>
> We see t2 added to the publication:
> 2023-01-24 00:57:30.099 UTC [73654][client backend] [100_bugs.pl][7/5:0] LOG:  statement: ALTER PUBLICATION testpub
ADDTABLE t2
 
>
> And that *then* "t" was synchronized:
> 2023-01-24 00:57:30.102 UTC [73640][logical replication worker] LOG:  logical replication table synchronization
workerfor subscription "testsub", table "t" has finished
 
>
> and then that the refresh was issued:
> 2023-01-24 00:57:30.128 UTC [73657][client backend] [100_bugs.pl][5/10:0] LOG:  statement: ALTER SUBSCRIPTION testsub
REFRESHPUBLICATION
 
>
> And we see a walsender starting and the query to get the new tables being executed:
> 2023-01-24 00:57:30.139 UTC [73660][walsender] [testsub][6/8:0] LOG:  statement: SELECT DISTINCT t.schemaname,
t.tablename
>         , t.attnames
>         FROM pg_catalog.pg_publication_tables t
>          WHERE t.pubname IN ('testpub')
>
>
> And that's it, the rest of the time is just polling.
>
>
> Perhaps wait_for_subscription_sync() should dump the set of rel states to make
> something like this more debuggable?
>
>
> The fact that the synchronization for t finished just before the refresh makes
> me wonder if a wakeup or a cache invalidation got lost?
>

From the LOGs, the only thing one could draw is lost invalidation
because the nap time of the apply worker is 1s, so it should process
invalidation during the time we are polling. Also, the rel should be
added to pg_subscription_rel because the test is still polling for
rels to be in 'ready' or 'done' state.

I think we can do three things to debug (a) as you suggest dump the
rel state in wait_for_subscription_sync; (b) add some DEBUG log in
invalidate_syncing_table_states() to ensure that invalidation has been
processed; (c) print rel states and relids from table_states_not_ready
in process_syncing_tables_for_apply() to see if t2 has ever appeared
in that list.

-- 
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: Schema variables - new implementation for Postgres 15 (typo)
Следующее
От: Jelte Fennema
Дата:
Сообщение: Re: run pgindent on a regular basis / scripted manner