Обсуждение: [PATCH TEST] Fix logical replication setup in subscription test `t/009_matviews.pl`
Hello!
I've found a simple problem in one of subscription tests (`src/test/subscription/t/009_matviews.pl`).```
logical replication apply worker[1865731] ERROR: logical replication target relation "public.test1" does not exist
```
logical replication apply worker[1865731] ERROR: logical replication target relation "public.test1" does not exist
```
How do I run the test:
```
# configure
./configure --enable-debug --enable-cassert --enable-tap-tests --with-icu --prefix=$PG_PREFIX
./configure --enable-debug --enable-cassert --enable-tap-tests --with-icu --prefix=$PG_PREFIX
# make ...
cd src/test/subscription
make installcheck PG_TEST_INITDB_EXTRA_OPTS="--locale=C" PROVE_TESTS="t/009_matviews.pl"
```
make installcheck PG_TEST_INITDB_EXTRA_OPTS="--locale=C" PROVE_TESTS="t/009_matviews.pl"
```
The main purpose of this test is to check materialized views behavior, which are not supported by logical replication, but logical decoding does produce change information for them, so we need to make sure they are properly ignored.
The problem I found is about incorrect setup for logical replication: publisher performs INSERT to a table that has not yet been created on subscriber, and this causes an error `target relation does not exist`.
According to docs: https://www.postgresql.org/docs/17/logical-replication-subscription.html :
> The schema definitions are not replicated, and the published tables must exist on the subscriber. Only regular tables may be the target of replication. For example, you can't replicate to a view.
> The schema definitions are not replicated, and the published tables must exist on the subscriber. Only regular tables may be the target of replication. For example, you can't replicate to a view.
Also, the sequence of actions in subscription test does not match the correct example in documentation (https://www.postgresql.org/docs/17/logical-replication-subscription.html#LOGICAL-REPLICATION-SUBSCRIPTION-EXAMPLES).
Therefore, I would like to propose a patch with trivial fix for logical replication in subscription test `t/009_matviews.pl`.
The patch is in attachments to this letter.
The patch was created via: `
git format-patch master --base master
`.Вложения
Re: [PATCH TEST] Fix logical replication setup in subscription test `t/009_matviews.pl`
От
Michael Paquier
Дата:
On Thu, Oct 09, 2025 at 03:37:25PM +0300, Грем Снорт wrote: > I've found a simple problem in one of subscription tests > (`src/test/subscription/t/009_matviews.pl`). (Added a couple of folks in CC.) Hmm, something else is going on here, and I am not sure what yet (a bisect is annoying as the test depends on a timeout for failure detection, see below for more ranting). The backend change coupled with this test comes from bc1adc651b8e, first introduced in v11. At the top of REL_11_STABLE, which is the first branch where the test has been introduced, if I update pgoutput.c and remove the is_publishable_relation() call in pgoutput_change() to undo the fix, then the test is able to hang as it is designed. Now, if I do the same thing on HEAD, removing the check, then the test passes! Something else is going on here: the test is not checking what it has been written for. Applying your patch does not change this state. As far as I can see, the test is broken since v17. Up to v16, the test would hang once the fix in pgoutput.c is reverted. In v17 and newer versions, it does not. While something specific to v17 is to blame here, I am also going to complain about the way this test is writen and designed to fail: a failing scenario should be deterministic, and should check some state in the cluster to validate something, be it a lookup at some relation, some catalogs or some server logs. 009_matviews.pl does nothing like that: a failure is a test hanging with the failure detected by a timeout. From my perspective, this is a poor design choice, and one reason why nobody has noticed the regression I'm just finding in v17 after looking more closely as an effect of your patch. Amit, Kurada-san or Sawada-san, does something ring a bell? There have been many changes in the logical replication code since v17, and it sounds like an issue introduced by one of these recent changes, but I have to admit that I am not seeing anything obvious (that's not dcd4454590e7, checked it). Up to v16, the test loops with the following failure popping in the subscriber logs: 2025-10-10 11:24:15.884 JST [25148] ERROR: logical replication target relation "public.testmv1" does not exist 2025-10-10 11:24:15.884 JST [25148] CONTEXT: processing remote data for replication origin "pg_16391" during message type "INSERT" in transaction 733, finished at 0/14BBE08 From v17, the subscriber logs just accepts things, without the worker complaining about a matview: 2025-10-10 11:27:10.020 JST [32467] LOG: logical replication table synchronization worker for subscription "mysub", table "test1" has started 2025-10-10 11:27:10.041 JST [32467] LOG: logical replication table synchronization worker for subscription "mysub", table "test1" has finished 2025-10-10 11:27:10.120 JST [32443] LOG: received fast shutdown request I am attempting a bisect, as well, perhaps I'll be able to catch something... -- Michael
Вложения
On Thu, Oct 9, 2025 at 6:07 PM Грем Снорт <grem.snoort@gmail.com> wrote: > > Hello! > I've found a simple problem in one of subscription tests (`src/test/subscription/t/009_matviews.pl`). > When running this test, it completes successfully (All tests successful. Result: PASS) but there is an error in subscriber'slogs: > ``` > logical replication apply worker[1865731] ERROR: logical replication target relation "public.test1" does not exist > ``` > This is a harmless ERROR as in the next try, the apply for insert will be successful. But for neatness, we can use your change. -- With Regards, Amit Kapila.
RE: [PATCH TEST] Fix logical replication setup in subscription test `t/009_matviews.pl`
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Michael, > On Thu, Oct 09, 2025 at 03:37:25PM +0300, Грем Снорт wrote: > > I've found a simple problem in one of subscription tests > > (`src/test/subscription/t/009_matviews.pl`). > > (Added a couple of folks in CC.) Thanks for adding. > The backend change coupled with this test comes from bc1adc651b8e, > first introduced in v11. At the top of REL_11_STABLE, which is the > first branch where the test has been introduced, if I update > pgoutput.c and remove the is_publishable_relation() call in > pgoutput_change() to undo the fix, then the test is able to hang as it > is designed. > > Now, if I do the same thing on HEAD, removing the check, then the test > passes! Something else is going on here: the test is not checking > what it has been written for. Applying your patch does not change > this state. I found that b4da732 [1] changes the behavior. It modifies how we create the materialized view. Since b4da732, CREATE MATERIALIZED VIEW ... AS ... behaves similarly with REFRESH MATERIALIZED VIEW command. Refresh command initially creates a transient table, insert tuples then swapping the relfilenumber between the mateview and transient one. Thus, from the WAL's perspective, INSERT happens to the temporary one, so it won't be replicated to the subscriber. See comments atop RefreshMatViewByOid(). So...I think it is not caused by changes for logical replication. [1]: https://github.com/postgres/postgres/commit/b4da732 Best regards, Hayato Kuroda FUJITSU LIMITED
Re: [PATCH TEST] Fix logical replication setup in subscription test `t/009_matviews.pl`
От
Ashutosh Bapat
Дата:
On Fri, Oct 10, 2025 at 1:20 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear Michael, > > > On Thu, Oct 09, 2025 at 03:37:25PM +0300, Грем Снорт wrote: > > > I've found a simple problem in one of subscription tests > > > (`src/test/subscription/t/009_matviews.pl`). > > > > (Added a couple of folks in CC.) > > Thanks for adding. > > > The backend change coupled with this test comes from bc1adc651b8e, > > first introduced in v11. At the top of REL_11_STABLE, which is the > > first branch where the test has been introduced, if I update > > pgoutput.c and remove the is_publishable_relation() call in > > pgoutput_change() to undo the fix, then the test is able to hang as it > > is designed. > > > > Now, if I do the same thing on HEAD, removing the check, then the test > > passes! Something else is going on here: the test is not checking > > what it has been written for. Applying your patch does not change > > this state. > > I found that b4da732 [1] changes the behavior. It modifies how we create the > materialized view. > > Since b4da732, CREATE MATERIALIZED VIEW ... AS ... behaves similarly with REFRESH > MATERIALIZED VIEW command. > Refresh command initially creates a transient table, insert tuples then swapping > the relfilenumber between the mateview and transient one. > Thus, from the WAL's perspective, INSERT happens to the temporary one, so it won't > be replicated to the subscriber. See comments atop RefreshMatViewByOid(). > > So...I think it is not caused by changes for logical replication. Thanks for the investigation. Makes sense. I agree with Michael that the test could be written better, rather than testing for timeout. AFAIU, the test is testing that the changes to materialized view are not sent downstream; are properly filtered out. I think the test should create MV and make sure that it doesn't get populated through the logical replication but when refreshed explicitly on the subscriber. While checking for that we should perform usual wait for LSN. However, after your explanation above, it seems that MVs should be considered similar to unlogged tables or temporary tables after that change. So we could just merge this test into subscription/100_bugs.pl. -- Best Wishes, Ashutosh Bapat
On Fri, Oct 10, 2025 at 1:19 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear Michael, > > > On Thu, Oct 09, 2025 at 03:37:25PM +0300, Грем Снорт wrote: > > > I've found a simple problem in one of subscription tests > > > (`src/test/subscription/t/009_matviews.pl`). > > > > (Added a couple of folks in CC.) > > Thanks for adding. > > > The backend change coupled with this test comes from bc1adc651b8e, > > first introduced in v11. At the top of REL_11_STABLE, which is the > > first branch where the test has been introduced, if I update > > pgoutput.c and remove the is_publishable_relation() call in > > pgoutput_change() to undo the fix, then the test is able to hang as it > > is designed. > > > > Now, if I do the same thing on HEAD, removing the check, then the test > > passes! Something else is going on here: the test is not checking > > what it has been written for. Applying your patch does not change > > this state. > > I found that b4da732 [1] changes the behavior. It modifies how we create the > materialized view. > > Since b4da732, CREATE MATERIALIZED VIEW ... AS ... behaves similarly with REFRESH > MATERIALIZED VIEW command. > Refresh command initially creates a transient table, insert tuples then swapping > the relfilenumber between the mateview and transient one. > Thus, from the WAL's perspective, INSERT happens to the temporary one, so it won't > be replicated to the subscriber. See comments atop RefreshMatViewByOid(). > > So...I think it is not caused by changes for logical replication. > Thanks for the analysis. I think we should go with the simple patch proposed in this thread to improve this test and that too only for HEAD. The other larger improvements could be discussed separately though I feel those are not required at this stage as this code is battle tested now. If we do any improvement in this area then it is worth considering additional test cycles for this case. -- With Regards, Amit Kapila.
RE: [PATCH TEST] Fix logical replication setup in subscription test `t/009_matviews.pl`
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Amit, Ashutosh, > Thanks for the analysis. I think we should go with the simple patch > proposed in this thread to improve this test and that too only for > HEAD. To confirm, are you suggesting something like attached? It initially creates MVs on both instances, then ensure REFRESH MATERIALIZED VIEW won't be replicated. It also fixed the issue reported by Грем. I'm not sure it could move to 100_bugs.pl. If we do that, 009 test can be removed or empty? Best regards, Hayato Kuroda FUJITSU LIMITED
Вложения
On Sun, Oct 12, 2025 at 11:29 AM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear Amit, Ashutosh, > > > Thanks for the analysis. I think we should go with the simple patch > > proposed in this thread to improve this test and that too only for > > HEAD. > > To confirm, are you suggesting something like attached? It initially creates MVs > on both instances, then ensure REFRESH MATERIALIZED VIEW won't be replicated. > It also fixed the issue reported by Грем. > What difference REFRESH will make instead of CREATE? I think it is okay to just patch what is posted by the reporter in the first email [1] in this thread. [1]: https://www.postgresql.org/message-id/CANV9Qw5HD7%3DFp4nox2O7DoVctHoabRRVW9Soo4A%3DQipqW5B%3DTg%40mail.gmail.com -- With Regards, Amit Kapila.
Re: [PATCH TEST] Fix logical replication setup in subscription test `t/009_matviews.pl`
От
Ashutosh Bapat
Дата:
On Mon, Oct 13, 2025 at 5:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sun, Oct 12, 2025 at 11:29 AM Hayato Kuroda (Fujitsu) > <kuroda.hayato@fujitsu.com> wrote: > > > > Dear Amit, Ashutosh, > > > > > Thanks for the analysis. I think we should go with the simple patch > > > proposed in this thread to improve this test and that too only for > > > HEAD. > > > > To confirm, are you suggesting something like attached? It initially creates MVs > > on both instances, then ensure REFRESH MATERIALIZED VIEW won't be replicated. > > It also fixed the issue reported by Грем. > > > > What difference REFRESH will make instead of CREATE? I think it is > okay to just patch what is posted by the reporter in the first email > [1] in this thread. > > [1]: https://www.postgresql.org/message-id/CANV9Qw5HD7%3DFp4nox2O7DoVctHoabRRVW9Soo4A%3DQipqW5B%3DTg%40mail.gmail.com For now this makes sense. We could avoid running a full test, and save time and resources, if we piggy back MV vs logical replication testing on 100_bugs.pl. If we do that, it should be master-only and can be a separate discussion. -- Best Wishes, Ashutosh Bapat
Re: [PATCH TEST] Fix logical replication setup in subscription test `t/009_matviews.pl`
От
Michael Paquier
Дата:
On Mon, Oct 13, 2025 at 06:12:07PM +0530, Ashutosh Bapat wrote: > For now this makes sense. The arguments and the patches I am seeing do not really make sense here. > We could avoid running a full test, and save time and resources, if we > piggy back MV vs logical replication testing on 100_bugs.pl. If we do > that, it should be master-only and can be a separate discussion. Moving the test would be one thing, I'd be OK to do that only on HEAD and not bother about the back-branches. Still, that's a separate discussion. As far as I can see, this test is not able to check what it wants to check since v17. Hence, doing something only on HEAD, if that's really what you are implying here (I understand that you are not implying that but I am not entirely sure, either), is not sufficient. I also fail to understand how Kuroda-san's patch helps in solving everything at hand: more is required. If apply the patch posted at [1] on HEAD, the test passes. If I revert the change introduced by bc1adc651b8e in pgoutput.c with the patch posted at [1], the test passes as well, but I would expect that the test *fails* in some way. I did not take the time to dive into the details here, but should there be at least some pattern detected in the TAP test? Or do we actually need to patch pgoutput.c at all now and remove the check based on is_publishable_relation(). Or is there any meaning in spending cycles for such a test now if we cannot detect a difference in behavior? In any case, none of the proposals posted on this thread seem sufficient to me: 009_matviews.pl is not useful if it does not choke in some way if we reintroduce a problem equivalent to what bc1adc651b8e has done. If it is not useful anymore and if pgoutput.c can be simplified, there may be a point in considering these options. Good find about b4da732fd64e, by the way, Kuroda-san. That's the commit that has changed the way 009_matviews.pl behaves. If I revert b4da732fd64e, 009_matviews.pl would fail on timeout, as expected based on the way the test is currently written. [1]: https://www.postgresql.org/message-id/OSCPR01MB149660172D9F21AA75365DC9BF5EDA@OSCPR01MB14966.jpnprd01.prod.outlook.com -- Michael