On Fri, Mar 18, 2022 at 12:45 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> Hmm, this seems to have failed on wrasse [1], due to a timeout when
> waiting for tablesync to complete:
>
> 2022-03-17 17:39:28.247 CET [19962:1] LOG: logical replication table
> synchronization worker for subscription "sub2", table "tab1" has started
> 2022-03-17 17:39:28.258 CET [19964:1] LOG: logical replication table
> synchronization worker for subscription "sub2", table "tab4" has started
>
> In the previous runs this completed pretty much immediately (less than a
> second), but this time the workers got stuck, so the script keeps
> looping on the $synced_query. There's nothing in the log, so either it's
> some sort of lock wait or infinite loop.
>
> However, this fails in 013_partition.sql, which was not modified in this
> commit. And there have been multiple successful runs since it was
> modified (in c91f71b9dc). So it's not clear if this is a pre-existing
> issue and we just happened to hit it now, or maybe it's introduced by
> either c91f71b9dc or 5a07966225. But neither of these commits touched
> tablesync at all, so I'm puzzled how could it happen.
>
I have shared my analysis on the -hackers thread [1]. See, if that helps.
[1] -
https://www.postgresql.org/message-id/CAA4eK1LpBFU49Ohbnk%3Ddv_v9YP%2BKqh1%2BSf8i%2B%2B_s-QhD1Gy4Qw%40mail.gmail.com
--
With Regards,
Amit Kapila.