Обсуждение: [ADMIN] WAL segement issues on both master and slave server

Поиск
Список
Период
Сортировка

[ADMIN] WAL segement issues on both master and slave server

От
Chris Kim
Дата:
Hi there,

I am running into an issue with the number of files that reside on the pg_xlog directory of my compliance database server (This one is the master server in our master-slave setup). Sometime earlier this year, I modified the location of the PITR directory and that caused an issue with WAL segments not being sent to the correct location and crashing the DB. I went ahead and fixed that up so that it points to the correct location but since then the number of files on the pg_xlog directory went up from around 898 to 1025. I didn't have a chance to look in to this issue until now so my question is do you know if there is an easy way to clean up some of these files in the pg_xlog directory safely? I believe that there might be some orphaned files there and would like to clean those up.

Also, on the Standby, the pg_xlog directory appears like it is growing on a daily basis. The WAL files are being cleaned up but I don't believe at a fast enough rate. This directory is approximately over 650GB in size and I would like to revisit if any of the parameters will need to be changed in the postgresql.conf file since it's almost 5 years since I last touched this.

Let me know if you need more details to clarify.

Thanks.

Re: [ADMIN] WAL segement issues on both master and slave server

От
Payal Singh
Дата:
On Thu, Oct 19, 2017 at 1:25 PM, Chris Kim <chrisk@propaas.com> wrote:
Hi there,

I am running into an issue with the number of files that reside on the pg_xlog directory of my compliance database server (This one is the master server in our master-slave setup). Sometime earlier this year, I modified the location of the PITR directory and that caused an issue with WAL segments not being sent to the correct location and crashing the DB. I went ahead and fixed that up so that it points to the correct location but since then the number of files on the pg_xlog directory went up from around 898 to 1025. I didn't have a chance to look in to this issue until now so my question is do you know if there is an easy way to clean up some of these files in the pg_xlog directory safely? I believe that there might be some orphaned files there and would like to clean those up.

How is the replication being done? Is the replica in sync with master? Check for lag on replica and replication byte lag on master, and if they are in sync, an `ls -l | less` in wal directory should show you which older files are being kept. Do check in both master and replica postgres and archive logs for any ERROR or FATAL messages before you remove any files though. As an extra precaution, you can just move the older files to another location where postgres can't access it, and if something breaks, you can move them back. If all looks good after moving, you can delete the files you moved. 

Would highly recommend having a monitor in place to track # of WALs in the WAL directory and alerting if too high. 
 

Also, on the Standby, the pg_xlog directory appears like it is growing on a daily basis. The WAL files are being cleaned up but I don't believe at a fast enough rate. This directory is approximately over 650GB in size and I would like to revisit if any of the parameters will need to be changed in the postgresql.conf file since it's almost 5 years since I last touched this.

Let me know if you need more details to clarify.

Thanks.


Again, this might be a sign that replication is lagging. If your cleanup command is correct and related logs have nothing suspicious, checking the replication lag would be a good first step to determine the cause. 

 
Thanks,
Payal Singh,
Database Administrator,
OmniTI Computer Consulting Inc.
Phone: 240.646.0770 x 253

Re: [ADMIN] WAL segement issues on both master and slave server

От
Laurenz Albe
Дата:
Chris Kim wrote:
> I am running into an issue with the number of files that reside on the
> pg_xlog directory of my compliance database server (This one is the
> master server in our master-slave setup). Sometime earlier this year,
> I modified the location of the PITR directory and that caused an issue
> with WAL segments not being sent to the correct location and crashing
> the DB. I went ahead and fixed that up so that it points to the correct
> location but since then the number of files on the pg_xlog directory
> went up from around 898 to 1025. I didn't have a chance to look in
> to this issue until now so my question is do you know if there is
> an easy way to clean up some of these files in the pg_xlog directory
> safely? I believe that there might be some orphaned files there and
> would like to clean those up.

You should never remove files manually from pg_xlog.

Look at "pg_stat_archiver" to see what's going on with archiving.
Is it behind schedule?

There are several settings that can cause pg_xlog to grow:
- very high wal_keep_segments
- very high checkpoint_segments

You probably have an old version of PostgreSQL if you didn't
touch the configuration in 5 years, but if not, you should also
look if there are active replication slots that keep WAL around.

> Also, on the Standby, the pg_xlog directory appears like it is growing
> on a daily basis. The WAL files are being cleaned up but I don't
> believe at a fast enough rate. This directory is approximately
> over 650GB in size and I would like to revisit if any of the parameters
> will need to be changed in the postgresql.conf file since it's almost
> 5 years since I last touched this.

Look at the "pg_stat_replication" view in the primary to see how
replication is doing.

Yours,
Laurenz Albe


-- 
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin