Обсуждение: BUG #18027: Logical replication taking forever
The following bug has been logged on the website: Bug reference: 18027 Logged by: Andres Martin del Campo Campos Email address: andres@invisible.email PostgreSQL version: 13.2 Operating system: PostgreSQL 13.2 on x86_64-pc-linux-gnu, compiled b Description: I'm trying to use logical replication to replicate a table of 69GB it's been a week and it hasn't synced but the table says (in the database where I'm replicating this table) is now 400GB and I'm running out of space. What's wrong, first I thought it was because of all the WAL logs but I'm now assuming it's something else. Anyone has experienced something like this? You help is much appreciated.
On Tue, Jul 18, 2023 at 11:08 AM PG Bug reporting form <noreply@postgresql.org> wrote: > > The following bug has been logged on the website: > > Bug reference: 18027 > Logged by: Andres Martin del Campo Campos > Email address: andres@invisible.email > PostgreSQL version: 13.2 > Operating system: PostgreSQL 13.2 on x86_64-pc-linux-gnu, compiled b > Description: > > I'm trying to use logical replication to replicate a table of 69GB it's been > a week and it hasn't synced but the table says (in the database where I'm > replicating this table) is now 400GB and I'm running out of space. What's > wrong, first I thought it was because of all the WAL logs but I'm now > assuming it's something else. > This sounds a bit unusual. Have you verified the size of tables on both publisher and subscriber? Can you once show the values of pg_subscription_rel for this table? Are there any operations happening on this table either on publisher or subscriber? -- With Regards, Amit Kapila.
Thank you so much for replying, I'm really struggling here.

I ran out of space and I thought I could re-start it to see if it worked but again it's been more than a day and it isn't showing.
I ran this query and it shows that the table is still copying: 
SELECT s.subname AS subscription_name,
c.relnamespace::regnamespace::text as table_schema,
c.relname as table_name,
rel.srsublsn,
case rel.srsubstate
when 'i' then 'initialized'
when 'd' then 'copying'
when 's' then 'synchronized'
when 'r' then 'ready'
end as state
FROM pg_catalog.pg_subscription s
JOIN pg_catalog.pg_subscription_rel rel ON rel.srsubid = s.oid
JOIN pg_catalog.pg_class c on c.oid = rel.srrelid;

If I check the pg_stat_activity on the publication I also see the backend_type = 'walsender'

In the publication the table size is 70G as you can see in the screenshot below 
but in the subscription, it's already 124G and still copying 😭  any suggestions?

Thanks again for your support, Amit 
On Tue, Jul 18, 2023 at 10:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Jul 18, 2023 at 11:08 AM PG Bug reporting form
<noreply@postgresql.org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference: 18027
> Logged by: Andres Martin del Campo Campos
> Email address: andres@invisible.email
> PostgreSQL version: 13.2
> Operating system: PostgreSQL 13.2 on x86_64-pc-linux-gnu, compiled b
> Description:
>
> I'm trying to use logical replication to replicate a table of 69GB it's been
> a week and it hasn't synced but the table says (in the database where I'm
> replicating this table) is now 400GB and I'm running out of space. What's
> wrong, first I thought it was because of all the WAL logs but I'm now
> assuming it's something else.
>
This sounds a bit unusual. Have you verified the size of tables on
both publisher and subscriber? Can you once show the values of
pg_subscription_rel for this table? Are there any operations happening
on this table either on publisher or subscriber?
--
With Regards,
Amit Kapila.
Вложения
On Wed, 19 Jul 2023 at 14:06, Andres Martin del Campo Campos <andres@invisible.email> wrote: > > Thank you so much for replying, I'm really struggling here. > > I ran out of space and I thought I could re-start it to see if it worked but again it's been more than a day and it isn'tshowing. > Can you run this query few times on the publisher and see if bytes_processed and tuples_processed are getting increased(this might give us some hint if it stuck or it is progressing slowly): SELECT * FROM pg_stat_progress_copy; If it is not progressing, is there anything unusual in the log file, some warning or error messages? if so could you post those log contents too. Also if it is stuck, how many tuples are copied and how much is the total tuple count? Regards, Vignesh
Seems like I don't have that table
 

There are no errors in the logs but I only see dead tuples and no live tuples
On Wed, Jul 19, 2023 at 5:07 AM vignesh C <vignesh21@gmail.com> wrote:
On Wed, 19 Jul 2023 at 14:06, Andres Martin del Campo Campos
<andres@invisible.email> wrote:
>
> Thank you so much for replying, I'm really struggling here.
>
> I ran out of space and I thought I could re-start it to see if it worked but again it's been more than a day and it isn't showing.
>
Can you run this query few times on the publisher and see if
bytes_processed and tuples_processed are getting increased(this might
give us some hint if it stuck or it is progressing slowly):
SELECT * FROM pg_stat_progress_copy;
If it is not progressing, is there anything unusual in the log file,
some warning or error messages? if so could you post those log
contents too.
Also if it is stuck, how many tuples are copied and how much is the
total tuple count?
Regards,
Vignesh
Вложения
On Thu, 20 Jul 2023 at 22:46, Andres Martin del Campo Campos <andres@invisible.email> wrote: > > Seems like I don't have that table > > > > > > There are no errors in the logs but I only see dead tuples and no live tuples Sorry my bad, this is available only from PG14, it is not available in PG13. Regards, Vignesh
On Thu, Jul 20, 2023 at 10:46 PM Andres Martin del Campo Campos <andres@invisible.email> wrote:
Seems like I don't have that tableThere are no errors in the logs but I only see dead tuples and no live tuples
oh, can you show us the dead and live tuple count on both publisher and subscriber? Ideally, COPY command should only copy the recent data based on the snapshot. It shouldn't copy the old/dead rows. One possibility I could think of is that due to some reason, if there is a failure during the initial sync process, it will ROLLBACK the whole copy and restart it again. So, that way one can see the table is growing with dead tuples and the copy is never finished especially if such an error occurs repeatedly. If that happens, you must see some error in the subscriber-side logs. Can you ensure in some way that such a phenomenon is not happening in your case?
--
With Regards,
Amit Kapila.
Вложения
Thank you Amit!  

Here's the publisher:

Here's the subscriber:

I don't see any errors in the logs but if you have time I would love to schedule a quick meeting with you and compensate you for your time of course.
Let me know if you are available
On Fri, Jul 21, 2023 at 11:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Jul 20, 2023 at 10:46 PM Andres Martin del Campo Campos <andres@invisible.email> wrote:Seems like I don't have that tableThere are no errors in the logs but I only see dead tuples and no live tuplesoh, can you show us the dead and live tuple count on both publisher and subscriber? Ideally, COPY command should only copy the recent data based on the snapshot. It shouldn't copy the old/dead rows. One possibility I could think of is that due to some reason, if there is a failure during the initial sync process, it will ROLLBACK the whole copy and restart it again. So, that way one can see the table is growing with dead tuples and the copy is never finished especially if such an error occurs repeatedly. If that happens, you must see some error in the subscriber-side logs. Can you ensure in some way that such a phenomenon is not happening in your case?--With Regards,Amit Kapila.

