Обсуждение: Long running backup preventing auto vacuum
Hi! > On 30 Jun 2022, at 14:23, Nikhil Shetty <nikhil.dba04@gmail.com> wrote: > > Hi, > > PostgreSQL version - 11.7 > > We are using wal-g for backup/restore. We had a duration of high WAL generation due to application usage due to which archivebackup was delayed. > > Backup was started through cron schedule and it didn't complete because it waits for the last WAL to be archived. Now,I know that long running transactions will prevent vacuuming of dead tuples but in this case we were able to run manualvacuums successfully but auto-vacuum never starts. > > As soon as we terminated the long running backup session, the auto-vacuum started kicking in. > > Wanted to understand why a backup operation is blocking the auto-vacuum? > > Wal-g uses non-exclusive backups and waits for the last wal to be archived. As far as I know, WAL-G should not prevent running autovacuums on tables. WAL-G can be run against replication standby, wherevacuum is not possible at all, so there must be no conflicts... The only probable problem I can imagine is that we open transaction during pg_stop_backup() [0] to set a statement_timeout.I think this is superflous and unneeded. I think this tx can be safely removed. But anyway, this tx doesnot have a xid (otherwise it could not run on Standby). BTW which version of WAL-G do you use? Best regards, Andrey Borodin. [0] https://github.com/wal-g/wal-g/blob/8b9b3ca06efb44be4724208dc6f6d11836bd34d9/internal/databases/postgres/queryRunner.go#L243
./wal-g7 --version
wal-g version v1.1.2-rc 6af461f 2021.12.14_08:19:15 PostgreSQL
The only probable problem I can imagine is that we open transaction during pg_stop_backup() [0] to set a statement_timeout
I think this tx can be safely removed
Hi!
> On 30 Jun 2022, at 14:23, Nikhil Shetty <nikhil.dba04@gmail.com> wrote:
>
> Hi,
>
> PostgreSQL version - 11.7
>
> We are using wal-g for backup/restore. We had a duration of high WAL generation due to application usage due to which archive backup was delayed.
>
> Backup was started through cron schedule and it didn't complete because it waits for the last WAL to be archived. Now, I know that long running transactions will prevent vacuuming of dead tuples but in this case we were able to run manual vacuums successfully but auto-vacuum never starts.
>
> As soon as we terminated the long running backup session, the auto-vacuum started kicking in.
>
> Wanted to understand why a backup operation is blocking the auto-vacuum?
>
> Wal-g uses non-exclusive backups and waits for the last wal to be archived.
As far as I know, WAL-G should not prevent running autovacuums on tables. WAL-G can be run against replication standby, where vacuum is not possible at all, so there must be no conflicts...
The only probable problem I can imagine is that we open transaction during pg_stop_backup() [0] to set a statement_timeout. I think this is superflous and unneeded. I think this tx can be safely removed. But anyway, this tx does not have a xid (otherwise it could not run on Standby).
BTW which version of WAL-G do you use?
Best regards, Andrey Borodin.
[0] https://github.com/wal-g/wal-g/blob/8b9b3ca06efb44be4724208dc6f6d11836bd34d9/internal/databases/postgres/queryRunner.go#L243
datname | pid | query_start | state | runtime | query
----------------------+-------+---------+------------------+-------------+-----------------+-------------------------------+-----------------------------
postgres | 15088 | 2022-07-01 05:56:22.729101+00 | active | 02:12:56.255367 | SELECT labelfile, spcmapfile, lsn FROM pg_stop_backup(false)
Example Vacuum:
<dbname>=# vacuum verbose analyze pg_database;
INFO: vacuuming "pg_catalog.pg_database"
INFO: "pg_database": found 0 removable, 5 nonremovable row versions in 1 out of 1 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 996487475
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: analyzing "pg_catalog.pg_database"
INFO: "pg_database": scanned 1 of 1 pages, containing 5 live rows and 0 dead rows; 5 rows in sample, 5 estimated total rows
VACUUM
Autovacuum from logs of previous issue (timestamp does not match because this is taken from last week logs):
022-06-26 00:00:21 UTC [32632]: [2-1] user=,db=,app=,client= HINT: Close open transactions soon to avoid wraparound problems.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
2022-06-26 00:00:21 UTC [32632]: [3-1] user=,db=,app=,client= LOG: automatic aggressive vacuum of table "postgres.pg_catalog.pg_statistic": index scans: 0
pages: 0 removed, 40 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 0 removed, 1064 remain, 577 are dead but not yet removable, oldest xmin: 4200201112
buffer usage: 57 hits, 43 misses, 0 dirtied
avg read rate: 203.846 MB/s, avg write rate: 0.000 MB/s
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2022-06-26 00:00:21 UTC [32632]: [4-1] user=,db=,app=,client= WARNING: oldest xmin is far in the past
2022-06-26 00:00:21 UTC [32632]: [5-1] user=,db=,app=,client= HINT: Close open transactions soon to avoid wraparound problems.
Both xmin point to the long running pg_stop_backup. Not quite sure, but would this cause autovacuum to not kick-in on application tables?
Thanks,
Nikhil
Hi Andrey,./wal-g7 --version
wal-g version v1.1.2-rc 6af461f 2021.12.14_08:19:15 PostgreSQL
The only probable problem I can imagine is that we open transaction during pg_stop_backup() [0] to set a statement_timeoutDo you think this will prevent the autovacuum from kicking in ?I think this tx can be safely removedHow can I remove this? I can test whether this is the problemThanks,NikhilOn Thu, Jun 30, 2022 at 5:18 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:Hi!
> On 30 Jun 2022, at 14:23, Nikhil Shetty <nikhil.dba04@gmail.com> wrote:
>
> Hi,
>
> PostgreSQL version - 11.7
>
> We are using wal-g for backup/restore. We had a duration of high WAL generation due to application usage due to which archive backup was delayed.
>
> Backup was started through cron schedule and it didn't complete because it waits for the last WAL to be archived. Now, I know that long running transactions will prevent vacuuming of dead tuples but in this case we were able to run manual vacuums successfully but auto-vacuum never starts.
>
> As soon as we terminated the long running backup session, the auto-vacuum started kicking in.
>
> Wanted to understand why a backup operation is blocking the auto-vacuum?
>
> Wal-g uses non-exclusive backups and waits for the last wal to be archived.
As far as I know, WAL-G should not prevent running autovacuums on tables. WAL-G can be run against replication standby, where vacuum is not possible at all, so there must be no conflicts...
The only probable problem I can imagine is that we open transaction during pg_stop_backup() [0] to set a statement_timeout. I think this is superflous and unneeded. I think this tx can be safely removed. But anyway, this tx does not have a xid (otherwise it could not run on Standby).
BTW which version of WAL-G do you use?
Best regards, Andrey Borodin.
[0] https://github.com/wal-g/wal-g/blob/8b9b3ca06efb44be4724208dc6f6d11836bd34d9/internal/databases/postgres/queryRunner.go#L243
> 1 июля 2022 г., в 12:41, Nikhil Shetty <nikhil.dba04@gmail.com> написал(а): > > I think this tx can be safely removed > How can I remove this? I can test whether this is the problem I'v thought more about it and don't see any sense in removing tx from WAL-G code. The very same virtual transaction would be started implicitly by "select pg_stop_backup()" SQL call. It will obtain a snapshotholding global xmin. I don't think we can make anything here on WAL-G side, this is how Postgres backup API works. But I'd recommend focusing on eliminating lag of the WAL archivation. Why did you have so many unarchived WALs? Maybe itworth some monitoring etc. Thanks! Best regards, Andrey Borodin.