Обсуждение: Logical Replication - Should Destination Table Columns Be Defined With Default Value
Logical Replication - Should Destination Table Columns Be Defined With Default Value
Hi Experts,
I'm using logical replication and have the following open issue:
If the table on the publisher side has column with default value, should the table on the destination (subscriber) be defined with default value as well?
My intuition was not to define it with default value since it gets its values from publisher, but then I face the following issue that default value given to existing data when a column is added is not copied to the destination side.
****on source/publisher side ****
create table table1(id int primary key, a int);
insert into table1 select 1,1;
alter table table1 add column b int not null default 1;
select * from table1
output id,a, b:
1 1 1
**** on destination/subscriber side ***
create table table1(id int primary key, a int);
select * from table1
alter table table1 add column b int; -- I purposely defined it without default value
ALTER SUBSCRIPTION aaa REFRESH PUBLICATION
select * from table1
output id,a, b:
1 1 null
Why the (default) value given to column b for existing row is not synced to the subscriber. Of course, for new rows the default value is copied to subscriber table.
Was this done on purpose, that default value for new column is not copied for existing data? Does this mean that on destination side we must also define the table with default value?
If instead of the default on the publisher side I do the following it works and the value is copied to the subscriber.
do $$
begin
alter table table1 add column b int;
update table1 set b = 1;
ALTER TABLE table1 ALTER COLUMN b SET NOT NULL;
end $$;
Re: Logical Replication - Should Destination Table Columns Be Defined With Default Value
Was this done on purpose, that default value for new column is not copied for existing data? Does this mean that on destination side we must also define the table with default value?
RE: Logical Replication - Should Destination Table Columns Be Defined With Default Value
Thanks David for the reply.
I also thought that default value on the subscriber side are immaterial. However, with the case I showed without having default value on subscriber side it get null when the following occurs:
- Table was created with two columns on publisher and subscriber side
- Data inserted into this table
- A third column is added to table with default value on publisher side, but without default value on subscriber side
- The default value column has value for existing rows on publisher, but null on the subscriber side.
- Doing refresh publication etc. does not help and the column on subscriber side remains with nulls
Your input is most welcome
From: David G. Johnston [mailto:david.g.johnston@gmail.com]
Sent: Tuesday, November 9, 2021 5:55 PM
To: Avi Weinberg <AviW@gilat.com>
Cc: pgsql-general@lists.postgresql.org
Subject: Re: Logical Replication - Should Destination Table Columns Be Defined With Default Value
On Tue, Nov 9, 2021 at 7:50 AM Avi Weinberg <AviW@gilat.com> wrote:
Was this done on purpose, that default value for new column is not copied for existing data? Does this mean that on destination side we must also define the table with default value?
Logical replication has two modes, initial synchronization and ongoing change push. The ongoing change push sends entire rows, when they change, from the publisher to subscriber.
The initial sync happens once, when the subscriber initially subscribes to the publication.
As entire rows are getting sent, defaults on the subscriber are immaterial so far as the published rows are concerned.
If you run a command on the publisher that causes every row to change then of course every row will be published with those new values to the subscriber.
David J.
"David G. Johnston" <david.g.johnston@gmail.com> writes: > On Tue, Nov 9, 2021 at 7:50 AM Avi Weinberg <AviW@gilat.com> wrote: >> Was this done on purpose, that default value for new column is not copied >> for existing data? Does this mean that on destination side we must also >> define the table with default value? > If you run a command on the publisher that causes every row to change then > of course every row will be published with those new values to the > subscriber. I think the important point here is that the ALTER ADD COLUMN operation *didn't* cause a physical update of every row on the publisher, thus nothing got sent to the subscriber. This is sort of annoying, because it is making what ought to be a purely internal optimization user-visible. Depending on the details of the column default (is it null, is it a constant) and which PG version you are talking about, there may or may not be a rewrite of the source table, and here we see that that is semantically visible to subscribers. I don't want to give up the aforesaid optimization --- it's a feature that you can do common forms of ALTER ADD COLUMN in O(1) time. But maybe we ought to document the implications for logical replication better. regards, tom lane
> On Nov 9, 2021, at 8:02 AM, Avi Weinberg <AviW@gilat.com> wrote: > > • A third column is added to table with default value on publisher side, but without default value on subscriber side > • The default value column has value for existing rows on publisher, but null on the subscriber side. See https://www.postgresql.org/docs/14/ddl-alter.html#DDL-ALTER-ADDING-A-COLUMN The important part is this TIP: From PostgreSQL 11, adding a column with a constant default value no longer means that each row of the table needs to beupdated when the ALTER TABLE statement is executed. Instead, the default value will be returned the next time the row isaccessed, and applied when the table is rewritten, making the ALTER TABLE very fast even on large tables. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Logical Replication - Should Destination Table Columns Be Defined With Default Value
> On Nov 9, 2021, at 1:24 PM, Michael Lewis <mlewis@entrata.com> wrote: > > Is there any advantage to not defining the default on the replica? If it is not a static value and the publishing databasewill trigger row updates, I could see waiting to set the default until after the table re-write is done, but otherwisethere doesn't seem to be any benefit to skipping column defaults on subscribers. That's a schema design decision. I can't really make recommendations in the abstract. As a general rule, I would expectthat the DDL executed on the publisher (such as the ALTER TABLE..ADD COLUMN..DEFAULT..) would be the same as that executedon the subscriber, unless there is a particular reason to want different behavior on the subscriber. Wanting different behavior is not unreasonable. For example, if the subscriber exists merely to archive data from the publisher,the subscriber might not bother creating indexes over that data, even if such indexes exist on the publisher. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company