Обсуждение: [PATCH TEST] Fix logical replication setup in subscription test `t/009_matviews.pl`

Поиск
Список
Период
Сортировка

[PATCH TEST] Fix logical replication setup in subscription test `t/009_matviews.pl`

От
Грем Снорт
Дата:
Hello!
I've found a simple problem in one of subscription tests (`src/test/subscription/t/009_matviews.pl`).
When running this test, it completes successfully (All tests successful. Result: PASS) but there is an error in subscriber's logs:
```
logical replication apply worker[1865731] ERROR:  logical replication target relation "public.test1" does not exist
```

How do I run the test:
```
# configure
./configure --enable-debug --enable-cassert --enable-tap-tests --with-icu --prefix=$PG_PREFIX
# make ...
cd src/test/subscription
make installcheck PG_TEST_INITDB_EXTRA_OPTS="--locale=C" PROVE_TESTS="t/009_matviews.pl"
```

The main purpose of this test is to check materialized views behavior, which are not supported by logical replication, but logical decoding does produce change information for them, so we need to make sure they are properly ignored.

The problem I found is about incorrect setup for logical replication: publisher performs INSERT to a table that has not yet been created on subscriber, and this causes an error `target relation does not exist`.

According to docs: https://www.postgresql.org/docs/17/logical-replication-subscription.html :
> The schema definitions are not replicated, and the published tables must exist on the subscriber. Only regular tables may be the target of replication. For example, you can't replicate to a view.

Also, the sequence of actions in subscription test does not match the correct example in documentation (https://www.postgresql.org/docs/17/logical-replication-subscription.html#LOGICAL-REPLICATION-SUBSCRIPTION-EXAMPLES).

Therefore, I would like to propose a patch with trivial fix for logical replication in subscription test `t/009_matviews.pl`.

The patch is in attachments to this letter.
The patch was created via: `git format-patch master --base master`.

Вложения

Re: [PATCH TEST] Fix logical replication setup in subscription test `t/009_matviews.pl`

От
Michael Paquier
Дата:
On Thu, Oct 09, 2025 at 03:37:25PM +0300, Грем Снорт wrote:
> I've found a simple problem in one of subscription tests
> (`src/test/subscription/t/009_matviews.pl`).

(Added a couple of folks in CC.)

Hmm, something else is going on here, and I am not sure what yet (a
bisect is annoying as the test depends on a timeout for failure
detection, see below for more ranting).

The backend change coupled with this test comes from bc1adc651b8e,
first introduced in v11.  At the top of REL_11_STABLE, which is the
first branch where the test has been introduced, if I update
pgoutput.c and remove the is_publishable_relation() call in
pgoutput_change() to undo the fix, then the test is able to hang as it
is designed.

Now, if I do the same thing on HEAD, removing the check, then the test
passes!  Something else is going on here: the test is not checking
what it has been written for.  Applying your patch does not change
this state.

As far as I can see, the test is broken since v17.  Up to v16, the
test would hang once the fix in pgoutput.c is reverted.  In v17 and
newer versions, it does not.

While something specific to v17 is to blame here, I am also going to
complain about the way this test is writen and designed to fail: a
failing scenario should be deterministic, and should check some state
in the cluster to validate something, be it a lookup at some relation,
some catalogs or some server logs.  009_matviews.pl does nothing like
that: a failure is a test hanging with the failure detected by a
timeout.  From my perspective, this is a poor design choice, and one
reason why nobody has noticed the regression I'm just finding in v17
after looking more closely as an effect of your patch.

Amit, Kurada-san or Sawada-san, does something ring a bell?  There
have been many changes in the logical replication code since v17, and
it sounds like an issue introduced by one of these recent changes, but
I have to admit that I am not seeing anything obvious (that's not
dcd4454590e7, checked it).

Up to v16, the test loops with the following failure popping in the
subscriber logs:
2025-10-10 11:24:15.884 JST [25148] ERROR:  logical replication target
relation "public.testmv1" does not exist
2025-10-10 11:24:15.884 JST [25148] CONTEXT:  processing remote data
for replication origin "pg_16391" during message type "INSERT" in
transaction 733, finished at 0/14BBE08

From v17, the subscriber logs just accepts things, without the worker
complaining about a matview:
2025-10-10 11:27:10.020 JST [32467] LOG:  logical replication table
synchronization worker for subscription "mysub", table "test1" has
started
2025-10-10 11:27:10.041 JST [32467] LOG:  logical replication table
synchronization worker for subscription "mysub", table "test1" has
finished
2025-10-10 11:27:10.120 JST [32443] LOG:  received fast shutdown request

I am attempting a bisect, as well, perhaps I'll be able to catch
something...
--
Michael

Вложения
On Thu, Oct 9, 2025 at 6:07 PM Грем Снорт <grem.snoort@gmail.com> wrote:
>
> Hello!
> I've found a simple problem in one of subscription tests (`src/test/subscription/t/009_matviews.pl`).
> When running this test, it completes successfully (All tests successful. Result: PASS) but there is an error in
subscriber'slogs: 
> ```
> logical replication apply worker[1865731] ERROR:  logical replication target relation "public.test1" does not exist
> ```
>

This is a harmless ERROR as in the next try, the apply for insert will
be successful. But for neatness, we can use your change.

--
With Regards,
Amit Kapila.



RE: [PATCH TEST] Fix logical replication setup in subscription test `t/009_matviews.pl`

От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Michael,

> On Thu, Oct 09, 2025 at 03:37:25PM +0300, Грем Снорт wrote:
> > I've found a simple problem in one of subscription tests
> > (`src/test/subscription/t/009_matviews.pl`).
> 
> (Added a couple of folks in CC.)

Thanks for adding.

> The backend change coupled with this test comes from bc1adc651b8e,
> first introduced in v11.  At the top of REL_11_STABLE, which is the
> first branch where the test has been introduced, if I update
> pgoutput.c and remove the is_publishable_relation() call in
> pgoutput_change() to undo the fix, then the test is able to hang as it
> is designed.
> 
> Now, if I do the same thing on HEAD, removing the check, then the test
> passes!  Something else is going on here: the test is not checking
> what it has been written for.  Applying your patch does not change
> this state.

I found that b4da732 [1] changes the behavior. It modifies how we create the
materialized view.

Since b4da732, CREATE MATERIALIZED VIEW ... AS ... behaves similarly with REFRESH
MATERIALIZED VIEW command.
Refresh command initially creates a transient table, insert tuples then swapping
the relfilenumber between the mateview and transient one.
Thus, from the WAL's perspective, INSERT happens to the temporary one, so it won't
be replicated to the subscriber. See comments atop RefreshMatViewByOid().

So...I think it is not caused by changes for logical replication.

[1]: https://github.com/postgres/postgres/commit/b4da732

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Re: [PATCH TEST] Fix logical replication setup in subscription test `t/009_matviews.pl`

От
Ashutosh Bapat
Дата:
On Fri, Oct 10, 2025 at 1:20 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Michael,
>
> > On Thu, Oct 09, 2025 at 03:37:25PM +0300, Грем Снорт wrote:
> > > I've found a simple problem in one of subscription tests
> > > (`src/test/subscription/t/009_matviews.pl`).
> >
> > (Added a couple of folks in CC.)
>
> Thanks for adding.
>
> > The backend change coupled with this test comes from bc1adc651b8e,
> > first introduced in v11.  At the top of REL_11_STABLE, which is the
> > first branch where the test has been introduced, if I update
> > pgoutput.c and remove the is_publishable_relation() call in
> > pgoutput_change() to undo the fix, then the test is able to hang as it
> > is designed.
> >
> > Now, if I do the same thing on HEAD, removing the check, then the test
> > passes!  Something else is going on here: the test is not checking
> > what it has been written for.  Applying your patch does not change
> > this state.
>
> I found that b4da732 [1] changes the behavior. It modifies how we create the
> materialized view.
>
> Since b4da732, CREATE MATERIALIZED VIEW ... AS ... behaves similarly with REFRESH
> MATERIALIZED VIEW command.
> Refresh command initially creates a transient table, insert tuples then swapping
> the relfilenumber between the mateview and transient one.
> Thus, from the WAL's perspective, INSERT happens to the temporary one, so it won't
> be replicated to the subscriber. See comments atop RefreshMatViewByOid().
>
> So...I think it is not caused by changes for logical replication.

Thanks for the investigation. Makes sense.

I agree with Michael that the test could be written better, rather
than testing for timeout. AFAIU, the test is testing that the changes
to materialized view are not sent downstream; are properly filtered
out. I think the test should create MV and make sure that it doesn't
get populated through the logical replication but when refreshed
explicitly on the subscriber. While checking for that we should
perform usual wait for LSN. However, after your explanation above, it
seems that MVs should be considered similar to unlogged tables or
temporary tables after that change. So we could just merge this test
into subscription/100_bugs.pl.

--
Best Wishes,
Ashutosh Bapat



On Fri, Oct 10, 2025 at 1:19 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Michael,
>
> > On Thu, Oct 09, 2025 at 03:37:25PM +0300, Грем Снорт wrote:
> > > I've found a simple problem in one of subscription tests
> > > (`src/test/subscription/t/009_matviews.pl`).
> >
> > (Added a couple of folks in CC.)
>
> Thanks for adding.
>
> > The backend change coupled with this test comes from bc1adc651b8e,
> > first introduced in v11.  At the top of REL_11_STABLE, which is the
> > first branch where the test has been introduced, if I update
> > pgoutput.c and remove the is_publishable_relation() call in
> > pgoutput_change() to undo the fix, then the test is able to hang as it
> > is designed.
> >
> > Now, if I do the same thing on HEAD, removing the check, then the test
> > passes!  Something else is going on here: the test is not checking
> > what it has been written for.  Applying your patch does not change
> > this state.
>
> I found that b4da732 [1] changes the behavior. It modifies how we create the
> materialized view.
>
> Since b4da732, CREATE MATERIALIZED VIEW ... AS ... behaves similarly with REFRESH
> MATERIALIZED VIEW command.
> Refresh command initially creates a transient table, insert tuples then swapping
> the relfilenumber between the mateview and transient one.
> Thus, from the WAL's perspective, INSERT happens to the temporary one, so it won't
> be replicated to the subscriber. See comments atop RefreshMatViewByOid().
>
> So...I think it is not caused by changes for logical replication.
>

Thanks for the analysis. I think we should go with the simple patch
proposed in this thread to improve this test and that too only for
HEAD. The other larger improvements could be discussed separately
though I feel those are not required at this stage as this code is
battle tested now. If we do any improvement in this area then it is
worth considering additional test cycles for this case.

--
With Regards,
Amit Kapila.



RE: [PATCH TEST] Fix logical replication setup in subscription test `t/009_matviews.pl`

От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Amit, Ashutosh,

> Thanks for the analysis. I think we should go with the simple patch
> proposed in this thread to improve this test and that too only for
> HEAD. 

To confirm, are you suggesting something like attached? It initially creates MVs
on both instances, then ensure REFRESH MATERIALIZED VIEW won't be replicated.
It also fixed the issue reported by Грем.

I'm not sure it could move to 100_bugs.pl. If we do that, 009 test can be removed or
empty?

Best regards,
Hayato Kuroda
FUJITSU LIMITED 


Вложения
On Sun, Oct 12, 2025 at 11:29 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Amit, Ashutosh,
>
> > Thanks for the analysis. I think we should go with the simple patch
> > proposed in this thread to improve this test and that too only for
> > HEAD.
>
> To confirm, are you suggesting something like attached? It initially creates MVs
> on both instances, then ensure REFRESH MATERIALIZED VIEW won't be replicated.
> It also fixed the issue reported by Грем.
>

What difference REFRESH will make instead of CREATE? I think it is
okay to just patch what is posted by the reporter in the first email
[1] in this thread.

[1]: https://www.postgresql.org/message-id/CANV9Qw5HD7%3DFp4nox2O7DoVctHoabRRVW9Soo4A%3DQipqW5B%3DTg%40mail.gmail.com

--
With Regards,
Amit Kapila.



Re: [PATCH TEST] Fix logical replication setup in subscription test `t/009_matviews.pl`

От
Ashutosh Bapat
Дата:
On Mon, Oct 13, 2025 at 5:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sun, Oct 12, 2025 at 11:29 AM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > Dear Amit, Ashutosh,
> >
> > > Thanks for the analysis. I think we should go with the simple patch
> > > proposed in this thread to improve this test and that too only for
> > > HEAD.
> >
> > To confirm, are you suggesting something like attached? It initially creates MVs
> > on both instances, then ensure REFRESH MATERIALIZED VIEW won't be replicated.
> > It also fixed the issue reported by Грем.
> >
>
> What difference REFRESH will make instead of CREATE? I think it is
> okay to just patch what is posted by the reporter in the first email
> [1] in this thread.
>
> [1]: https://www.postgresql.org/message-id/CANV9Qw5HD7%3DFp4nox2O7DoVctHoabRRVW9Soo4A%3DQipqW5B%3DTg%40mail.gmail.com

For now this makes sense.

We could avoid running a full test, and save time and resources, if we
piggy back MV vs logical replication testing on 100_bugs.pl. If we do
that, it should be master-only and can be a separate discussion.

--
Best Wishes,
Ashutosh Bapat



Re: [PATCH TEST] Fix logical replication setup in subscription test `t/009_matviews.pl`

От
Michael Paquier
Дата:
On Mon, Oct 13, 2025 at 06:12:07PM +0530, Ashutosh Bapat wrote:
> For now this makes sense.

The arguments and the patches I am seeing do not really make sense
here.

> We could avoid running a full test, and save time and resources, if we
> piggy back MV vs logical replication testing on 100_bugs.pl. If we do
> that, it should be master-only and can be a separate discussion.

Moving the test would be one thing, I'd be OK to do that only on HEAD
and not bother about the back-branches.  Still, that's a separate
discussion.

As far as I can see, this test is not able to check what it wants to
check since v17.  Hence, doing something only on HEAD, if that's
really what you are implying here (I understand that you are not
implying that but I am not entirely sure, either), is not sufficient.

I also fail to understand how Kuroda-san's patch helps in solving
everything at hand: more is required.  If apply the patch posted at
[1] on HEAD, the test passes.  If I revert the change introduced by
bc1adc651b8e in pgoutput.c with the patch posted at [1], the test
passes as well, but I would expect that the test *fails* in some way.

I did not take the time to dive into the details here, but should
there be at least some pattern detected in the TAP test?  Or do we
actually need to patch pgoutput.c at all now and remove the check
based on is_publishable_relation().  Or is there any meaning in
spending cycles for such a test now if we cannot detect a difference
in behavior?

In any case, none of the proposals posted on this thread seem
sufficient to me: 009_matviews.pl is not useful if it does not choke
in some way if we reintroduce a problem equivalent to what
bc1adc651b8e has done.  If it is not useful anymore and if pgoutput.c
can be simplified, there may be a point in considering these options.

Good find about b4da732fd64e, by the way, Kuroda-san.  That's the
commit that has changed the way 009_matviews.pl behaves.  If I revert
b4da732fd64e, 009_matviews.pl would fail on timeout, as expected based
on the way the test is currently written.

[1]:
https://www.postgresql.org/message-id/OSCPR01MB149660172D9F21AA75365DC9BF5EDA@OSCPR01MB14966.jpnprd01.prod.outlook.com
--
Michael

Вложения