Обсуждение: Long running backup preventing auto vacuum

Поиск
Список
Период
Сортировка

Long running backup preventing auto vacuum

От
Nikhil Shetty
Дата:
Hi,

PostgreSQL version - 11.7

We are using wal-g for backup/restore. We had a duration of high WAL generation due to application usage due to which archive backup was delayed. 

Backup was started through cron schedule and it didn't complete because it waits for the last WAL to be archived. Now, I know that long running transactions will prevent vacuuming of dead tuples but in this case we were able to run manual vacuums successfully but auto-vacuum never starts.

As soon as we terminated the long running backup session, the auto-vacuum started kicking in.

Wanted to understand why a backup operation is blocking the auto-vacuum?

Wal-g uses non-exclusive backups and waits for the last wal to be archived.

Thanks,
Nikhil

Re: Long running backup preventing auto vacuum

От
Andrey Borodin
Дата:
Hi!

> On 30 Jun 2022, at 14:23, Nikhil Shetty <nikhil.dba04@gmail.com> wrote:
>
> Hi,
>
> PostgreSQL version - 11.7
>
> We are using wal-g for backup/restore. We had a duration of high WAL generation due to application usage due to which
archivebackup was delayed.  
>
> Backup was started through cron schedule and it didn't complete because it waits for the last WAL to be archived.
Now,I know that long running transactions will prevent vacuuming of dead tuples but in this case we were able to run
manualvacuums successfully but auto-vacuum never starts. 
>
> As soon as we terminated the long running backup session, the auto-vacuum started kicking in.
>
> Wanted to understand why a backup operation is blocking the auto-vacuum?
>
> Wal-g uses non-exclusive backups and waits for the last wal to be archived.


As far as I know, WAL-G should not prevent running autovacuums on tables. WAL-G can be run against replication standby,
wherevacuum is not possible at all, so there must be no conflicts... 

The only probable problem I can imagine is that we open transaction during pg_stop_backup() [0] to set a
statement_timeout.I think this is superflous and unneeded. I think this tx can be safely removed. But anyway, this tx
doesnot have a xid (otherwise it could not run on Standby). 

BTW which version of WAL-G do you use?

Best regards, Andrey Borodin.

[0]
https://github.com/wal-g/wal-g/blob/8b9b3ca06efb44be4724208dc6f6d11836bd34d9/internal/databases/postgres/queryRunner.go#L243


Re: Long running backup preventing auto vacuum

От
Nikhil Shetty
Дата:
Hi Andrey,

./wal-g7 --version

wal-g version v1.1.2-rc 6af461f 2021.12.14_08:19:15 PostgreSQL


The only probable problem I can imagine is that we open transaction during pg_stop_backup() [0] to set a statement_timeout
Do you think this will prevent the autovacuum from kicking in ? 

I think this tx can be safely removed
How can I remove this? I can test whether this is the problem 

Thanks,
Nikhil

On Thu, Jun 30, 2022 at 5:18 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
Hi!

> On 30 Jun 2022, at 14:23, Nikhil Shetty <nikhil.dba04@gmail.com> wrote:
>
> Hi,
>
> PostgreSQL version - 11.7
>
> We are using wal-g for backup/restore. We had a duration of high WAL generation due to application usage due to which archive backup was delayed.
>
> Backup was started through cron schedule and it didn't complete because it waits for the last WAL to be archived. Now, I know that long running transactions will prevent vacuuming of dead tuples but in this case we were able to run manual vacuums successfully but auto-vacuum never starts.
>
> As soon as we terminated the long running backup session, the auto-vacuum started kicking in.
>
> Wanted to understand why a backup operation is blocking the auto-vacuum?
>
> Wal-g uses non-exclusive backups and waits for the last wal to be archived.


As far as I know, WAL-G should not prevent running autovacuums on tables. WAL-G can be run against replication standby, where vacuum is not possible at all, so there must be no conflicts...

The only probable problem I can imagine is that we open transaction during pg_stop_backup() [0] to set a statement_timeout. I think this is superflous and unneeded. I think this tx can be safely removed. But anyway, this tx does not have a xid (otherwise it could not run on Standby).

BTW which version of WAL-G do you use?

Best regards, Andrey Borodin.

[0] https://github.com/wal-g/wal-g/blob/8b9b3ca06efb44be4724208dc6f6d11836bd34d9/internal/databases/postgres/queryRunner.go#L243

Re: Long running backup preventing auto vacuum

От
Nikhil Shetty
Дата:
Hi Andrey,

One more thing to add is that, vacuum will not run for catalog tables because of long running pg_stop_backup query

Query:

datname        |  pid  | query_start                   | state  |     runtime     |                            query      

----------------------+-------+---------+------------------+-------------+-----------------+-------------------------------+-----------------------------

 postgres      | 15088 | 2022-07-01 05:56:22.729101+00 | active | 02:12:56.255367 | SELECT labelfile, spcmapfile, lsn FROM pg_stop_backup(false)


Example Vacuum:


<dbname>=# vacuum verbose analyze pg_database;

INFO:  vacuuming "pg_catalog.pg_database"

INFO:  "pg_database": found 0 removable, 5 nonremovable row versions in 1 out of 1 pages

DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 996487475

There were 0 unused item pointers.

Skipped 0 pages due to buffer pins, 0 frozen pages.

0 pages are entirely empty.

CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.

INFO:  analyzing "pg_catalog.pg_database"

INFO:  "pg_database": scanned 1 of 1 pages, containing 5 live rows and 0 dead rows; 5 rows in sample, 5 estimated total rows

VACUUM



Autovacuum from logs of previous issue (timestamp does not match because this is taken from last week logs):


022-06-26 00:00:21 UTC [32632]: [2-1] user=,db=,app=,client= HINT:  Close open transactions soon to avoid wraparound problems.

You might also need to commit or roll back old prepared transactions, or drop stale replication slots.

2022-06-26 00:00:21 UTC [32632]: [3-1] user=,db=,app=,client= LOG:  automatic aggressive vacuum of table "postgres.pg_catalog.pg_statistic": index scans: 0

pages: 0 removed, 40 remain, 0 skipped due to pins, 0 skipped frozen

tuples: 0 removed, 1064 remain, 577 are dead but not yet removable, oldest xmin: 4200201112

buffer usage: 57 hits, 43 misses, 0 dirtied

avg read rate: 203.846 MB/s, avg write rate: 0.000 MB/s

system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s

2022-06-26 00:00:21 UTC [32632]: [4-1] user=,db=,app=,client= WARNING:  oldest xmin is far in the past

2022-06-26 00:00:21 UTC [32632]: [5-1] user=,db=,app=,client= HINT:  Close open transactions soon to avoid wraparound problems.


Both xmin point to the long running pg_stop_backup. Not quite sure, but would this cause autovacuum to not kick-in on application tables?


Thanks,

Nikhil


On Fri, Jul 1, 2022 at 1:11 PM Nikhil Shetty <nikhil.dba04@gmail.com> wrote:
Hi Andrey,

./wal-g7 --version

wal-g version v1.1.2-rc 6af461f 2021.12.14_08:19:15 PostgreSQL


The only probable problem I can imagine is that we open transaction during pg_stop_backup() [0] to set a statement_timeout
Do you think this will prevent the autovacuum from kicking in ? 

I think this tx can be safely removed
How can I remove this? I can test whether this is the problem 

Thanks,
Nikhil

On Thu, Jun 30, 2022 at 5:18 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
Hi!

> On 30 Jun 2022, at 14:23, Nikhil Shetty <nikhil.dba04@gmail.com> wrote:
>
> Hi,
>
> PostgreSQL version - 11.7
>
> We are using wal-g for backup/restore. We had a duration of high WAL generation due to application usage due to which archive backup was delayed.
>
> Backup was started through cron schedule and it didn't complete because it waits for the last WAL to be archived. Now, I know that long running transactions will prevent vacuuming of dead tuples but in this case we were able to run manual vacuums successfully but auto-vacuum never starts.
>
> As soon as we terminated the long running backup session, the auto-vacuum started kicking in.
>
> Wanted to understand why a backup operation is blocking the auto-vacuum?
>
> Wal-g uses non-exclusive backups and waits for the last wal to be archived.


As far as I know, WAL-G should not prevent running autovacuums on tables. WAL-G can be run against replication standby, where vacuum is not possible at all, so there must be no conflicts...

The only probable problem I can imagine is that we open transaction during pg_stop_backup() [0] to set a statement_timeout. I think this is superflous and unneeded. I think this tx can be safely removed. But anyway, this tx does not have a xid (otherwise it could not run on Standby).

BTW which version of WAL-G do you use?

Best regards, Andrey Borodin.

[0] https://github.com/wal-g/wal-g/blob/8b9b3ca06efb44be4724208dc6f6d11836bd34d9/internal/databases/postgres/queryRunner.go#L243

Re: Long running backup preventing auto vacuum

От
Andrey Borodin
Дата:

> 1 июля 2022 г., в 12:41, Nikhil Shetty <nikhil.dba04@gmail.com> написал(а):
>
> I think this tx can be safely removed
> How can I remove this? I can test whether this is the problem

I'v thought more about it and don't see any sense in removing tx from WAL-G code.
The very same virtual transaction would be started implicitly by "select pg_stop_backup()" SQL call. It will obtain a
snapshotholding global xmin. 

I don't think we can make anything here on WAL-G side, this is how Postgres backup API works.

But I'd recommend focusing on eliminating lag of the WAL archivation. Why did you have so many unarchived WALs? Maybe
itworth some monitoring etc. 

Thanks!

Best regards, Andrey Borodin.