Обсуждение: Logical Replication - Should Destination Table Columns Be Defined With Default Value

Поиск
Список
Период
Сортировка

Logical Replication - Should Destination Table Columns Be Defined With Default Value

От
Avi Weinberg
Дата:

Hi Experts,

 

I'm using logical replication and have the following open issue:

If the table on the publisher side has column with default value, should the table on the destination (subscriber) be defined with default value as well?

 

My intuition was not to define it with default value since it gets its values from publisher, but then I face the following issue that default value given to existing data when a column is added is not copied to the destination side. 

 

****on source/publisher side ****

 

create table table1(id int primary key, a int);

insert into table1 select 1,1;

alter table table1 add column b int not null default 1;

select * from table1

output id,a, b:

1              1              1

 

 

 

**** on destination/subscriber side ***

create table table1(id int primary key, a int);

select * from table1

alter table table1 add column b int;  -- I purposely defined it without default value

ALTER SUBSCRIPTION aaa REFRESH PUBLICATION

select * from table1

output id,a, b:

1              1              null

Why the (default) value given to column b for existing row is not synced to the subscriber.  Of course, for new rows the default value is copied to subscriber table.

 

Was this done on purpose, that default value for new column is not copied for existing data?  Does this mean that on destination side we must also define the table with default value?

 

 

If instead of the default on the publisher side I do the following it works and the value is copied to the subscriber.

 

do $$

begin

                alter table table1 add column b int;

                update table1 set b = 1;

    ALTER TABLE table1 ALTER COLUMN b SET NOT NULL;

end $$;

 

 

IMPORTANT - This email and any attachments is intended for the above named addressee(s), and may contain information which is confidential or privileged. If you are not the intended recipient, please inform the sender immediately and delete this email: you should not copy or use this e-mail for any purpose nor disclose its contents to any person.

Re: Logical Replication - Should Destination Table Columns Be Defined With Default Value

От
"David G. Johnston"
Дата:
On Tue, Nov 9, 2021 at 7:50 AM Avi Weinberg <AviW@gilat.com> wrote:

Was this done on purpose, that default value for new column is not copied for existing data?  Does this mean that on destination side we must also define the table with default value? 


Logical replication has two modes, initial synchronization and ongoing change push.  The ongoing change push sends entire rows, when they change, from the publisher to subscriber.

The initial sync happens once, when the subscriber initially subscribes to the publication.

As entire rows are getting sent, defaults on the subscriber are immaterial so far as the published rows are concerned.

If you run a command on the publisher that causes every row to change then of course every row will be published with those new values to the subscriber.

David J.

RE: Logical Replication - Should Destination Table Columns Be Defined With Default Value

От
Avi Weinberg
Дата:

Thanks David for the reply.

 

I also thought that default value on the subscriber side are immaterial.  However, with the case I showed without having default value on subscriber side it get null when the following occurs:

  1. Table was created with two columns on publisher and subscriber side
  2. Data inserted into this table
  3. A third column is added to table with default value on publisher side, but without default value on subscriber side
  4. The default value column has value for existing rows on publisher, but null on the subscriber side. 
  5. Doing refresh publication etc. does not help and the column on subscriber side remains with nulls

 

 

Your input is most welcome

 

 

From: David G. Johnston [mailto:david.g.johnston@gmail.com]
Sent: Tuesday, November 9, 2021 5:55 PM
To: Avi Weinberg <AviW@gilat.com>
Cc: pgsql-general@lists.postgresql.org
Subject: Re: Logical Replication - Should Destination Table Columns Be Defined With Default Value

 

On Tue, Nov 9, 2021 at 7:50 AM Avi Weinberg <AviW@gilat.com> wrote:

Was this done on purpose, that default value for new column is not copied for existing data?  Does this mean that on destination side we must also define the table with default value? 

 

Logical replication has two modes, initial synchronization and ongoing change push.  The ongoing change push sends entire rows, when they change, from the publisher to subscriber.

 

The initial sync happens once, when the subscriber initially subscribes to the publication.

 

As entire rows are getting sent, defaults on the subscriber are immaterial so far as the published rows are concerned.

 

If you run a command on the publisher that causes every row to change then of course every row will be published with those new values to the subscriber.

 

David J.

 

IMPORTANT - This email and any attachments is intended for the above named addressee(s), and may contain information which is confidential or privileged. If you are not the intended recipient, please inform the sender immediately and delete this email: you should not copy or use this e-mail for any purpose nor disclose its contents to any person.

Re: Logical Replication - Should Destination Table Columns Be Defined With Default Value

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Tue, Nov 9, 2021 at 7:50 AM Avi Weinberg <AviW@gilat.com> wrote:
>> Was this done on purpose, that default value for new column is not copied
>> for existing data?  Does this mean that on destination side we must also
>> define the table with default value?

> If you run a command on the publisher that causes every row to change then
> of course every row will be published with those new values to the
> subscriber.

I think the important point here is that the ALTER ADD COLUMN operation
*didn't* cause a physical update of every row on the publisher, thus
nothing got sent to the subscriber.

This is sort of annoying, because it is making what ought to be a purely
internal optimization user-visible.  Depending on the details of the
column default (is it null, is it a constant) and which PG version you
are talking about, there may or may not be a rewrite of the source table,
and here we see that that is semantically visible to subscribers.

I don't want to give up the aforesaid optimization --- it's a feature that
you can do common forms of ALTER ADD COLUMN in O(1) time.  But maybe we
ought to document the implications for logical replication better.

            regards, tom lane



Re: Logical Replication - Should Destination Table Columns Be Defined With Default Value

От
Mark Dilger
Дата:

> On Nov 9, 2021, at 8:02 AM, Avi Weinberg <AviW@gilat.com> wrote:
>
>     • A third column is added to table with default value on publisher side, but without default value on subscriber
side
>     • The default value column has value for existing rows on publisher, but null on the subscriber side.

See https://www.postgresql.org/docs/14/ddl-alter.html#DDL-ALTER-ADDING-A-COLUMN

The important part is this TIP:

From PostgreSQL 11, adding a column with a constant default value no longer means that each row of the table needs to
beupdated when the ALTER TABLE statement is executed. Instead, the default value will be returned the next time the row
isaccessed, and applied when the table is rewritten, making the ALTER TABLE very fast even on large tables. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: Logical Replication - Should Destination Table Columns Be Defined With Default Value

От
Michael Lewis
Дата:
Is there any advantage to not defining the default on the replica? If it is not a static value and the publishing database will trigger row updates, I could see waiting to set the default until after the table re-write is done, but otherwise there doesn't seem to be any benefit to skipping column defaults on subscribers.

Re: Logical Replication - Should Destination Table Columns Be Defined With Default Value

От
Mark Dilger
Дата:

> On Nov 9, 2021, at 1:24 PM, Michael Lewis <mlewis@entrata.com> wrote:
>
> Is there any advantage to not defining the default on the replica? If it is not a static value and the publishing
databasewill trigger row updates, I could see waiting to set the default until after the table re-write is done, but
otherwisethere doesn't seem to be any benefit to skipping column defaults on subscribers. 

That's a schema design decision.  I can't really make recommendations in the abstract.  As a general rule, I would
expectthat the DDL executed on the publisher (such as the ALTER TABLE..ADD COLUMN..DEFAULT..) would be the same as that
executedon the subscriber, unless there is a particular reason to want different behavior on the subscriber. 

Wanting different behavior is not unreasonable.  For example, if the subscriber exists merely to archive data from the
publisher,the subscriber might not bother creating indexes over that data, even if such indexes exist on the publisher.


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company