Обсуждение: pg_publication_tables: return NULL attnames when no column list is specified
Hi,
When a table is added to a publication without a column list:
ALTER PUBLICATION my_pub ADD TABLE my_table;
Any new column added to my_table will automatically be replicated. The
underlying catalog (pg_publication_rel.prattrs) stores NULL for this case.
When a table is added with an explicit column list:
ALTER PUBLICATION my_pub ADD TABLE my_table (id, name, status);
New columns will NOT be replicated until the publication is explicitly altered.
The catalog stores an int2vector of the specified attribute numbers.
The problem is that pg_get_publication_tables() in pg_publication.c (the set-returning
function backing the pg_publication_tables view) erases this distinction.
When prattrs is NULL, it opens the table, iterates all eligible attributes,
and builds a synthetic int2vector of all current columns. The view then shows
the same attnames output for both cases.
| Scenario | prattrs in catalog | attrs from SRF | attnames in view |
|--------------------------|--------------------|---------------------|------------------|
| column list (a, b) | {1, 2 } | {1,2} | {a, b} |
| No column list | NULL | {1,2,3,...} (synth) | {a, b, c, ...} |
| FOR ALL TABLES/IN SCHEMA | no catalog row | {1,2,3,...} (synth) | {a, b, c, ...} |
This is a problem for operations: schema migrations that add columns to published
tables may or may not replicate depending on how the publication was originally
defined, and the only way to check is querying pg_publication_rel directly.
Additionally, tablesync.c has a workaround heuristic that tries to reverse
this synthesis by comparing array_length(gpt.attrs, 1) against c.relnatts.
This heuristic is buggy: relnatts counts all user attributes including dropped
columns, but the synthesized list excludes dropped columns and conditionally
excludes generated columns. So for any table with a dropped column, the
heuristic incorrectly treats "no column list" as if an explicit list were
specified.
The problematic code is the result of commits that incrementally
built on each other, with later ones introducing workarounds for side effects
of earlier ones:
1. fd0b9dcebda (06/2022) Amit Kapila (author: Hou Zhijie)
Prohibit combining publications with different column lists
Added the subscriber-side check that prevents combining publications with
different column lists for the same table. The subscriber runs
SELECT DISTINCT gpt.attrs and expects at most one row. To handle the fact
that pg_get_publication_tables() returned NULL for no-column-list
publications (which would always be distinct from any explicit list), this
commit introduced the relnatts heuristic in tablesync.c:
CASE WHEN (array_length(gpt.attrs, 1) = c.relnatts)
THEN NULL ELSE gpt.attrs END
The idea was: if the SRF returns a synthesized list whose length equals
relnatts, it must be the "all columns" case, so collapse it back to NULL.
But at this point the SRF didn't synthesize anything yet, so the heuristic was anticipating the next commit.
2. b7ae0395369 (01/2023) Amit Kapila (author: Shi yu)
Ignore dropped and generated columns from the column list
This is the commit that introduced the synthesis loop. It "solved" a problem
where a table in two publications (one with a column list naming all columns,
one without a column list) was erroring with "cannot use different column lists"
because one returned NULL and the other returned an int2vector. The fix was
to synthesize a column list when none was specified, filtering out dropped
and generated columns, so that both publications would produce identical
int2vectors and DISTINCT would collapse them to one row.
But this synthesis made it impossible to distinguish "all columns" from
"explicit list of all columns" in the view. And the relnatts heuristic from
fd0b9dcebda - which was supposed to reverse the synthesis - was broken from
the start because relnatts includes dropped columns while the synthesized list
excludes them.
The synthesis in b7ae0395369 tried to make "no column list" and "explicit list
of all columns" look identical. But they have genuinely different
semantics:
- No column list (NULL): all current and future columns are replicated.
ALTER TABLE ADD COLUMN automatically replicates the new column.
- Explicit full list: only the named columns are replicated. New columns
are NOT replicated until the publication is explicitly altered.
By making them indistinguishable, the synthesis hid a real conflict from users
who had a table in two publications with different column semantics on the
same subscription. I am proposing a fix that restores the distinction and correctly
(IMO) surfaces this conflict.
The fix is to stop synthesizing the full column list in pg_get_publication_tables().
When prattrs is NULL in the catalog, let attrs remain NULL in the SRF output.
Remove the buggy CASE WHEN heuristic in tablesync.c since it is no longer
needed.
There is one scenario where there is a change for users: one pub no list + one pub explicit all columns. Anyone with that specific configuration will see a new error on the
When a table is added to a publication without a column list:
ALTER PUBLICATION my_pub ADD TABLE my_table;
Any new column added to my_table will automatically be replicated. The
underlying catalog (pg_publication_rel.prattrs) stores NULL for this case.
When a table is added with an explicit column list:
ALTER PUBLICATION my_pub ADD TABLE my_table (id, name, status);
New columns will NOT be replicated until the publication is explicitly altered.
The catalog stores an int2vector of the specified attribute numbers.
The problem is that pg_get_publication_tables() in pg_publication.c (the set-returning
function backing the pg_publication_tables view) erases this distinction.
When prattrs is NULL, it opens the table, iterates all eligible attributes,
and builds a synthetic int2vector of all current columns. The view then shows
the same attnames output for both cases.
| Scenario | prattrs in catalog | attrs from SRF | attnames in view |
|--------------------------|--------------------|---------------------|------------------|
| column list (a, b) | {1, 2 } | {1,2} | {a, b} |
| No column list | NULL | {1,2,3,...} (synth) | {a, b, c, ...} |
| FOR ALL TABLES/IN SCHEMA | no catalog row | {1,2,3,...} (synth) | {a, b, c, ...} |
This is a problem for operations: schema migrations that add columns to published
tables may or may not replicate depending on how the publication was originally
defined, and the only way to check is querying pg_publication_rel directly.
Additionally, tablesync.c has a workaround heuristic that tries to reverse
this synthesis by comparing array_length(gpt.attrs, 1) against c.relnatts.
This heuristic is buggy: relnatts counts all user attributes including dropped
columns, but the synthesized list excludes dropped columns and conditionally
excludes generated columns. So for any table with a dropped column, the
heuristic incorrectly treats "no column list" as if an explicit list were
specified.
The problematic code is the result of commits that incrementally
built on each other, with later ones introducing workarounds for side effects
of earlier ones:
1. fd0b9dcebda (06/2022) Amit Kapila (author: Hou Zhijie)
Prohibit combining publications with different column lists
Added the subscriber-side check that prevents combining publications with
different column lists for the same table. The subscriber runs
SELECT DISTINCT gpt.attrs and expects at most one row. To handle the fact
that pg_get_publication_tables() returned NULL for no-column-list
publications (which would always be distinct from any explicit list), this
commit introduced the relnatts heuristic in tablesync.c:
CASE WHEN (array_length(gpt.attrs, 1) = c.relnatts)
THEN NULL ELSE gpt.attrs END
The idea was: if the SRF returns a synthesized list whose length equals
relnatts, it must be the "all columns" case, so collapse it back to NULL.
But at this point the SRF didn't synthesize anything yet, so the heuristic was anticipating the next commit.
2. b7ae0395369 (01/2023) Amit Kapila (author: Shi yu)
Ignore dropped and generated columns from the column list
This is the commit that introduced the synthesis loop. It "solved" a problem
where a table in two publications (one with a column list naming all columns,
one without a column list) was erroring with "cannot use different column lists"
because one returned NULL and the other returned an int2vector. The fix was
to synthesize a column list when none was specified, filtering out dropped
and generated columns, so that both publications would produce identical
int2vectors and DISTINCT would collapse them to one row.
But this synthesis made it impossible to distinguish "all columns" from
"explicit list of all columns" in the view. And the relnatts heuristic from
fd0b9dcebda - which was supposed to reverse the synthesis - was broken from
the start because relnatts includes dropped columns while the synthesized list
excludes them.
The synthesis in b7ae0395369 tried to make "no column list" and "explicit list
of all columns" look identical. But they have genuinely different
semantics:
- No column list (NULL): all current and future columns are replicated.
ALTER TABLE ADD COLUMN automatically replicates the new column.
- Explicit full list: only the named columns are replicated. New columns
are NOT replicated until the publication is explicitly altered.
By making them indistinguishable, the synthesis hid a real conflict from users
who had a table in two publications with different column semantics on the
same subscription. I am proposing a fix that restores the distinction and correctly
(IMO) surfaces this conflict.
The fix is to stop synthesizing the full column list in pg_get_publication_tables().
When prattrs is NULL in the catalog, let attrs remain NULL in the SRF output.
Remove the buggy CASE WHEN heuristic in tablesync.c since it is no longer
needed.
There is one scenario where there is a change for users: one pub no list + one pub explicit all columns. Anyone with that specific configuration will see a new error on the
next tablesync or subscription refresh after upgrading. The fix for those users is to
either remove the explicit column list from the second publication (making both "all
columns"), or keep the difference and use separate subscriptions.
AFAICT this is safe except for the change in behavior I describe above. psql and pg_dump query pg_publication_rel directly and not the affected view. New subscriber
talking to old publisher still works (old pub synthesizes a list, new sub handles it).
Old subscriber talking to new publisher also works (the old CASE WHEN evaluates
array_length(NULL, 1) which returns NULL, falling through to the ELSE branch
returning NULL).
Attached is a patch with the fix, including updates to documentation, 7 updated
regression tests where existing expected outputs where attnames changes from
{a} to NULL, and added new test cases that verify both cases are distinguishable in
the same query.
Roberto Mello
Snowflake
Вложения
Re: pg_publication_tables: return NULL attnames when no column list is specified
От
Roberto Mello
Дата:
On Wed, Mar 25, 2026 at 5:32 PM Peter Smith <smithpb2250@gmail.com> wrote:
FYI - the patch failed to apply for me because of publication.out issues.
There appear to be some missing trailing spaces in the result table headers
Apologies.
Generated a v2 patch. Applied cleanly and tests passed.
Roberto Mello
Snowflake
Вложения
Re: pg_publication_tables: return NULL attnames when no column list is specified
От
Álvaro Herrera
Дата:
On 2026-Mar-25, Roberto Mello wrote: > The problem is that pg_get_publication_tables() in pg_publication.c > (the set-returning function backing the pg_publication_tables view) > erases this distinction. When prattrs is NULL, it opens the table, > iterates all eligible attributes, and builds a synthetic int2vector of > all current columns. The view then shows the same attnames output for > both cases. I agree that this is wrong. This distinction was explicitly discussed when the column-list feature was developed. I don't think we can backpatch the fix though, out of fear that we would break something for existing users; but we should definitely fix it for pg19. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "But static content is just dynamic content that isn't moving!" http://smylers.hates-software.com/2007/08/15/fe244d0c.html
Re: pg_publication_tables: return NULL attnames when no column list is specified
От
Ajin Cherian
Дата:
On Thu, Mar 26, 2026 at 11:49 AM Roberto Mello <roberto.mello@gmail.com> wrote:
>
> On Wed, Mar 25, 2026 at 5:32 PM Peter Smith <smithpb2250@gmail.com> wrote:
>>
>> FYI - the patch failed to apply for me because of publication.out issues.
>>
>> There appear to be some missing trailing spaces in the result table headers
>
>
> Apologies.
>
> Generated a v2 patch. Applied cleanly and tests passed.
Hello,
Tested the patch and the patch fixes the bug as described. My tests below:
postgres=# CREATE TABLE my_table (id int PRIMARY KEY, name text, status text);
CREATE TABLE
postgres=# ALTER TABLE my_table ADD COLUMN old_col text;
ALTER TABLE
postgres=# ALTER TABLE my_table DROP COLUMN old_col;
ALTER TABLE
postgres=# CREATE PUBLICATION pub_no_list FOR TABLE my_table;
CREATE PUBLICATION pub_explicit_all FOR TABLE my_table (id, name, status);
CREATE PUBLICATION
CREATE PUBLICATION
=========================
On head - without patch:
=========================
postgres=# SELECT pubname, attnames
FROM pg_publication_tables
WHERE tablename = 'my_table'
ORDER BY pubname;
pubname | attnames
------------------+------------------
pub_explicit_all | {id,name,status}
pub_no_list | {id,name,status}
(2 rows)
postgres=# SELECT p.pubname, pr.prattrs
FROM pg_publication p
LEFT JOIN pg_publication_rel pr
ON pr.prpubid = p.oid
JOIN pg_class c ON c.oid = pr.prrelid
WHERE c.relname = 'my_table'
ORDER BY p.pubname;
pubname | prattrs
------------------+---------
pub_explicit_all | 1 2 3
pub_no_list |
(2 rows)
postgres=# SELECT
p.pubname,
pr.prattrs AS raw_catalog_attrs,
gpt.attnames AS view_attnames,
c.relnatts,
array_length(gpt.attnames, 1) AS synth_len,
-- Replicating the buggy CASE WHEN heuristic from tablesync.c
CASE WHEN array_length(gpt.attnames, 1) = c.relnatts
THEN NULL
ELSE gpt.attnames
END AS heuristic_result
FROM pg_publication_tables gpt
JOIN pg_class c ON c.relname = gpt.tablename
JOIN pg_publication p ON p.pubname = gpt.pubname
LEFT JOIN pg_publication_rel pr
ON pr.prrelid = c.oid
AND pr.prpubid = p.oid
WHERE gpt.tablename = 'my_table'
ORDER BY p.pubname;
pubname | raw_catalog_attrs | view_attnames | relnatts |
synth_len | heuristic_result
------------------+-------------------+------------------+----------+-----------+------------------
pub_explicit_all | 1 2 3 | {id,name,status} | 4 |
3 | {id,name,status}
pub_no_list | | {id,name,status} | 4 |
3 | {id,name,status}
(2 rows)
---------------------------------------------------------------------
============
After patch:
============
postgres=# SELECT pubname, attnames
FROM pg_publication_tables
WHERE tablename = 'my_table'
ORDER BY pubname;
pubname | attnames
------------------+------------------
pub_explicit_all | {id,name,status}
pub_no_list |
(2 rows)
postgres=# SELECT p.pubname, pr.prattrs
FROM pg_publication p
LEFT JOIN pg_publication_rel pr ON pr.prpubid = p.oid
JOIN pg_class c ON c.oid = pr.prrelid
WHERE c.relname = 'my_table'
ORDER BY p.pubname;
pubname | prattrs
------------------+---------
pub_explicit_all | 1 2 3
pub_no_list |
(2 rows)
postgres=# SELECT
p.pubname,
pr.prattrs AS raw_catalog_attrs,
gpt.attnames AS view_attnames,
c.relnatts,
array_length(gpt.attnames, 1) AS synth_len,
-- Replicating the buggy CASE WHEN heuristic from tablesync.c
CASE WHEN array_length(gpt.attnames, 1) = c.relnatts
THEN NULL
ELSE gpt.attnames
END AS heuristic_result
FROM pg_publication_tables gpt
JOIN pg_class c ON c.relname = gpt.tablename
JOIN pg_publication p ON p.pubname = gpt.pubname
LEFT JOIN pg_publication_rel pr
ON pr.prrelid = c.oid
AND pr.prpubid = p.oid
WHERE gpt.tablename = 'my_table'
ORDER BY p.pubname;
pubname | raw_catalog_attrs | view_attnames | relnatts |
synth_len | heuristic_result
------------------+-------------------+------------------+----------+-----------+------------------
pub_explicit_all | 1 2 3 | {id,name,status} | 4 |
3 | {id,name,status}
pub_no_list | | | 4 |
|
(2 rows)
One observation from reviewing the patch: the test suite covers the
partial column list and dropped column cases well, but is missing a
test for the two-publication conflict scenario (one pub with no list +
one pub with an explicit list of all columns on the same table). That
is the breaking change called out in the original report and probably
deserves its own regression test with a comment explaining the
expected behavior change for users in that configuration.
Patch LGTM otherwise.
regards,
Ajin Cherian
Fujitsu Australia
Re: pg_publication_tables: return NULL attnames when no column list is specified
От
Roberto Mello
Дата:
On Thu, Mar 26, 2026 at 10:07 PM Ajin Cherian <itsajin@gmail.com> wrote:
Tested the patch and the patch fixes the bug as described. My tests below:
Thank you for the review.
<snip>
One observation from reviewing the patch: the test suite covers the
partial column list and dropped column cases well, but is missing a
test for the two-publication conflict scenario (one pub with no list +
one pub with an explicit list of all columns on the same table). That
is the breaking change called out in the original report and probably
deserves its own regression test with a comment explaining the
expected behavior change for users in that configuration.
Patch LGTM otherwise.
Good point. I know I worked on that, so I must have accidentally dropped it from the patch.
Will prepare a revision and submit.
Thanks again.
Roberto Mello
Snowflake
P.S.: I forgot to acknowledge and thank Greg Mullane for the initial review on the patch, so doing that now.
Re: pg_publication_tables: return NULL attnames when no column list is specified
От
Roberto Mello
Дата:
On Fri, Mar 27, 2026 at 11:27 AM Roberto Mello <roberto.mello@gmail.com> wrote:
On Thu, Mar 26, 2026 at 10:07 PM Ajin Cherian <itsajin@gmail.com> wrote:One observation from reviewing the patch: the test suite covers the
partial column list and dropped column cases well, but is missing a
test for the two-publication conflict scenario (one pub with no list +
one pub with an explicit list of all columns on the same table). That
is the breaking change called out in the original report and probably
deserves its own regression test with a comment explaining the
expected behavior change for users in that configuration.
Patch LGTM otherwise.Good point. I know I worked on that, so I must have accidentally dropped it from the patch.Will prepare a revision and submit.
Here's v3 of the patch with the missing test.
Roberto Mello
Snowflake