Обсуждение: Too many WAL(s) despite low transaction

Поиск
Список
Период
Сортировка

Too many WAL(s) despite low transaction

От
Selva manickaraja
Дата:
Dear All,

We just migrated our system from Informix/GeneroDB to PostgreSQL 9.03. We are monitoring the situation very closely.

Everything seems to be OK except our concern on the build up of WAL files. You see, we cut off from the previous database around 3.30AM. After that we directed our applications to point to PostgreSQL. We already had a Standby machine A up and running as a Secondary for log shipping and replication. We also setup another machine B just to receive the WAL(s) and store it for us to backup later for recovery purposes. As there are no PostgreSQL running in that machine no pg_archive_cleanup is in effect. Our plan is to backup the WAL(s) every 12 hours and then purge them all. But at the rate the logs are building up at machine B, we would possibly ran out of space very soon.

But this is the point to note. There is very little transaction taking place and the checkpoint_timeout is set to 35minutes. Despite that the WAL logs are getting shipped every minute.

Any clue as to what could be the problem?

Thank you.

Regards,

Selvam


Re: Too many WAL(s) despite low transaction

От
Stephen Frost
Дата:
* Selva manickaraja (mavles78@gmail.com) wrote:
> Any clue as to what could be the problem?

Can you provide your postgresql.conf?  Little hard to guess at possible
mistakes there without it..

Is it exactly once a minute?

    Stephen

Вложения

Re: Too many WAL(s) despite low transaction

От
Selva manickaraja
Дата:
Hi,

I'm attaching the file here. As of now it is 874 files within 3 hours.

Kindly assist.

Thanking in advance.

Regards,

Selvam


On Fri, Apr 1, 2011 at 8:42 AM, Stephen Frost <sfrost@snowman.net> wrote:
* Selva manickaraja (mavles78@gmail.com) wrote:
> Any clue as to what could be the problem?

Can you provide your postgresql.conf?  Little hard to guess at possible
mistakes there without it..

Is it exactly once a minute?

       Stephen

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAk2VH4AACgkQrzgMPqB3kihY6ACgkIV9FdfUsT61eKWhzZD0W5SB
vp0AnRXjpsRZ/YZgVcByRppUvldAzBtE
=2gv8
-----END PGP SIGNATURE-----


Вложения

Re: Too many WAL(s) despite low transaction

От
Stephen Frost
Дата:
* Selva manickaraja (mavles78@gmail.com) wrote:
> I'm attaching the file here. As of now it is 874 files within 3 hours.

You have archive_timeout set to '5'.  That means "rotate a log file
every 5 seconds".  You probably want to increase that.

    Thanks

        Stephen

Вложения

Re: Too many WAL(s) despite low transaction

От
Selva manickaraja
Дата:
If our check_timeout is 30 minutes, what would be an acceptable time limit for archive_timeout?

Also since bulk loading/migration of large amount of data was done earlier, do I need to run vacuum etc.

I will wait for your reply.

Thank you.

Regards,

Selvam

On Fri, Apr 1, 2011 at 9:03 AM, Stephen Frost <sfrost@snowman.net> wrote:
* Selva manickaraja (mavles78@gmail.com) wrote:
> I'm attaching the file here. As of now it is 874 files within 3 hours.

You have archive_timeout set to '5'.  That means "rotate a log file
every 5 seconds".  You probably want to increase that.

       Thanks

               Stephen

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAk2VJGgACgkQrzgMPqB3kigM7QCfZ0qTMlLfCpksmAXogWo80ezn
QkQAn1VT2I+y+eMeaoIFEMdeEsnN+uN5
=lynD
-----END PGP SIGNATURE-----


Re: Too many WAL(s) despite low transaction

От
Stephen Frost
Дата:
Selva,

* Selva manickaraja (mavles78@gmail.com) wrote:
> If our check_timeout is 30 minutes, what would be an acceptable time limit
> for archive_timeout?

They're two different things.  Checkpoints are about getting data
flushed out to the data files (so they're not just in the WALs),
archive_timeout is about how often WAL segments should be forcibly
archived (so that the archive server doesn't end up missing data on
low-write systems).

Typically, I'd pick archive_timeout of around 5m or 10m, depending on
how much time you don't mind losing.  I'd also compress the WALs (on a
low-write system, they're going to have very little data in them).
There's also a utility out there, iirc, which will truncate WALs to
remove empty space.

> Also since bulk loading/migration of large amount of data was done earlier,
> do I need to run vacuum etc.

Erm, you should be running autovacuum..

    Stephen

Вложения

Re: Too many WAL(s) despite low transaction

От
Selva manickaraja
Дата:
Since the production database is running,  I plan to do now is this

1. Set archive_timeout = 20m (Does the change require db restart to take effect?)
2. Set  autovacuum=on and track_count=on (Does the change require db restart to take effect?)
    Does that mean we are running autovacuum?
3. Run VACUUM FREEZE ANALYZE since bulk loading was done earlier. (Can this be done while the db is active and on production?)

All 3 steps is to lower the WAL files that are being shipped out.

Is this a workable action to achieve the result required?

Please assist.

Thank you.

Regards,

Selvam

Re: Too many WAL(s) despite low transaction

От
Stephen Frost
Дата:
* Selva manickaraja (mavles78@gmail.com) wrote:
> 1. Set archive_timeout = 20m (Does the change require db restart to take
> effect?)

I *think* it can be changed with just a reload, but I'm not 100% sure.
Check your logs after doing the reload, it'll complain if it isn't able
to change that parameter on reload.

20m sounds reasonable, still would recommend compressing the WALs if
they're likely to be less than full (less than 16M of data in 20m).

> 2. Set  autovacuum=on and track_count=on (Does the change require db restart
> to take effect?)
>     Does that mean we are running autovacuum?

This is the default, so unless you changed the default, yes, it's
already running.

> 3. Run VACUUM FREEZE ANALYZE since bulk loading was done earlier. (Can this
> be done while the db is active and on production?)

Yes, you can freeze records while the DB is running (erm, I don't know
that you can run it w/o the DB running..).  I don't know that I'd jump
to running it right away though, unless you know that you need it...?

> All 3 steps is to lower the WAL files that are being shipped out.

Uhh, the only option that's going to affect that is the first one..

> Is this a workable action to achieve the result required?

You probably just need to change archive_timeout and reload the
database.  Well, you also need to go read the documentation, but that's
beside the point.

> Please assist.

Uhm, pretty sure I have been?

    Thanks,

        Stephen

Вложения

Re: Too many WAL(s) despite low transaction

От
Selva manickaraja
Дата:
Where you mentioned "after the reload" I suppose you meant restart right?

About compressing you mentioned iirc, but how do I use it? are there any examples. I read about pg_compress before. Is that same?

The configuration file shows that autovacuum=on and track_count=on to be commented out. That means that it is not running right? If that's the case, just uncommenting it now should get it working right?

OK, I'm going to hold on to the VACUUM FREEZE ANALYZE for time being.

Yes Stephen you have been extraordinarily helpful.

I will wait for your reply.


On Fri, Apr 1, 2011 at 10:22 AM, Stephen Frost <sfrost@snowman.net> wrote:
* Selva manickaraja (mavles78@gmail.com) wrote:
> 1. Set archive_timeout = 20m (Does the change require db restart to take
> effect?)

I *think* it can be changed with just a reload, but I'm not 100% sure.
Check your logs after doing the reload, it'll complain if it isn't able
to change that parameter on reload.

20m sounds reasonable, still would recommend compressing the WALs if
they're likely to be less than full (less than 16M of data in 20m).

> 2. Set  autovacuum=on and track_count=on (Does the change require db restart
> to take effect?)
>     Does that mean we are running autovacuum?

This is the default, so unless you changed the default, yes, it's
already running.

> 3. Run VACUUM FREEZE ANALYZE since bulk loading was done earlier. (Can this
> be done while the db is active and on production?)

Yes, you can freeze records while the DB is running (erm, I don't know
that you can run it w/o the DB running..).  I don't know that I'd jump
to running it right away though, unless you know that you need it...?

> All 3 steps is to lower the WAL files that are being shipped out.

Uhh, the only option that's going to affect that is the first one..

> Is this a workable action to achieve the result required?

You probably just need to change archive_timeout and reload the
database.  Well, you also need to go read the documentation, but that's
beside the point.

> Please assist.

Uhm, pretty sure I have been?

       Thanks,

               Stephen

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAk2VNtEACgkQrzgMPqB3kig1ugCeMqz9PWDozSYpfVsJh4SxzitJ
EKAAmQFPiVurdCDNxW5YEKE4JICHHUFq
=mJol
-----END PGP SIGNATURE-----


Re: Too many WAL(s) despite low transaction

От
Stephen Frost
Дата:
* Selva manickaraja (mavles78@gmail.com) wrote:
> Where you mentioned "after the reload" I suppose you meant restart right?

I'm not sure offhand if it requires a reload or a restart, that's why I
suggested doing a reload than then checking the logs to see if a restart
is required.

> About compressing you mentioned iirc, but how do I use it? are there any
> examples. I read about pg_compress before. Is that same?

No, I meant "use gzip".

> The configuration file shows that autovacuum=on and track_count=on to be
> commented out. That means that it is not running right? If that's the case,
> just uncommenting it now should get it working right?

Commented out means that the default value is used, which is on for both
of those.  That means that autovacuum should already be running.  Is
there some reason you think it isn't?

    Thanks,

        Stephen

Вложения

Re: Too many WAL(s) despite low transaction

От
Selva manickaraja
Дата:


On Fri, Apr 1, 2011 at 10:35 AM, Stephen Frost <sfrost@snowman.net> wrote:
* Selva manickaraja (mavles78@gmail.com) wrote:
> Where you mentioned "after the reload" I suppose you meant restart right?

I'm not sure offhand if it requires a reload or a restart, that's why I
suggested doing a reload than then checking the logs to see if a restart
is required.
I have put it in. Log file does not complaint, neither do I see the WAL(s) reducing. So I'm waiting for Friday lunch time in 1 hour to restart the db.
 

> About compressing you mentioned iirc, but how do I use it? are there any
> examples. I read about pg_compress before. Is that same?

 
No, I meant "use gzip".
OK, I saw a sample in the  PostgreSQL site, I will try that on the development machine first.
 
> The configuration file shows that autovacuum=on and track_count=on to be
> commented out. That means that it is not running right? If that's the case,
> just uncommenting it now should get it working right?

Commented out means that the default value is used, which is on for both
of those.  That means that autovacuum should already be running.  Is
there some reason you think it isn't?
I see. Cause I was thinking that all these surge of WAL was due to the fact that autovacuum was not running. Is there a utility to check if autovacuum is running or not?

Once the db is restarted, I will post the results to you to let you know how well the WAL(s) are fairing.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAk2VOgsACgkQrzgMPqB3kignEgCdFE+Ij+EbX+zC/rUtugZrG1nA
sHoAoIZlmfjTlONs0fPA//Rz6g0HRoVn
=D+LS
-----END PGP SIGNATURE-----


Re: Too many WAL(s) despite low transaction

От
"Kevin Grittner"
Дата:
Selva manickaraja <mavles78@gmail.com> wrote:

> Since the production database is running,  I plan to do now is
> this
>
> 1. Set archive_timeout = 20m (Does the change require db restart
> to take effect?)

You'd better make that '20min'.  Reload should suffice.

> 2. Set  autovacuum=on and track_count=on (Does the change require
> db restart to take effect?)

Reload should suffice.

>     Does that mean we are running autovacuum?

Autovacuum is always there to prevent transaction ID wraparound, but
you really should be using it with at least the default
aggressiveness in almost all environments.

> 3. Run VACUUM FREEZE ANALYZE since bulk loading was done earlier.
> (Can this be done while the db is active and on production?)

It can be done while the server is running, but be prepared for a
massive WAL volume and a likely-noticeable performance impact.  We
usually do our bulk loads with archiving off and run VACUUM FREEZE
ANALYZE before turning archiving on and letting the users in.
Otherwise, since everything in a bulk load will have the same xmin,
or very close to that, autovacuum can kick in and tank performance
and glut the WAL streaming for a long time at some unpredictable
moment -- although that moment is most likely to be in the midst of
high transaction volume.

> Is this a workable action to achieve the result required?

Those are all sound like good ideas, given your situation.

As mentioned elsewhere on the thread, WAL file compression can help
quite a bit.  That's especially true if you filter the WAL through
pg_clearxlogtail or pg_lesslog before compression.  The issue there
is that WAL files are reused without clearing the tail of the file,
so if you don't use one of these programs before compression a file
which is switched out based on time, before it's full, will probably
take just as much space as a full file.  With pg_clearxlogtail an
"empty" WAL file will gzip to 16KB (three orders of magnitude better
than uncompressed).  I haven't used pg_lesslog but it might do even
better.

-Kevin

Re: Too many WAL(s) despite low transaction

От
"Benjamin Krajmalnik"
Дата:

You can check the running processes and grep aotovacuum.

You should see a launcher process which is always running.

If you use pgAdmin you can also look at the table statistics from the GUI and see when was the last time autovacuum or autoanalyze ran on each table.

 

 

From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Selva manickaraja
Sent: Thursday, March 31, 2011 9:25 PM
To: Stephen Frost
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Too many WAL(s) despite low transaction

 

 

On Fri, Apr 1, 2011 at 10:35 AM, Stephen Frost <sfrost@snowman.net> wrote:

* Selva manickaraja (mavles78@gmail.com) wrote:

> Where you mentioned "after the reload" I suppose you meant restart right?

I'm not sure offhand if it requires a reload or a restart, that's why I
suggested doing a reload than then checking the logs to see if a restart
is required.

I have put it in. Log file does not complaint, neither do I see the WAL(s) reducing. So I'm waiting for Friday lunch time in 1 hour to restart the db.
 


> About compressing you mentioned iirc, but how do I use it? are there any
> examples. I read about pg_compress before. Is that same?

 

No, I meant "use gzip".

OK, I saw a sample in the  PostgreSQL site, I will try that on the development machine first.

 

> The configuration file shows that autovacuum=on and track_count=on to be
> commented out. That means that it is not running right? If that's the case,
> just uncommenting it now should get it working right?

Commented out means that the default value is used, which is on for both
of those.  That means that autovacuum should already be running.  Is
there some reason you think it isn't?

I see. Cause I was thinking that all these surge of WAL was due to the fact that autovacuum was not running. Is there a utility to check if autovacuum is running or not?


Once the db is restarted, I will post the results to you to let you know how well the WAL(s) are fairing.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAk2VOgsACgkQrzgMPqB3kignEgCdFE+Ij+EbX+zC/rUtugZrG1nA
sHoAoIZlmfjTlONs0fPA//Rz6g0HRoVn
=D+LS
-----END PGP SIGNATURE-----

 

Re: Too many WAL(s) despite low transaction

От
Selva manickaraja
Дата:
This is what we did and monitored the situation for one day.

  1. We did vacuum/freeze/analyze first. Did not see any changes. WAL(s) were still building up. So as recommended in the thread, we switched of archiving and reran vacuum.
  2. Set archive_timeout to 20min.
  3. Deleted all old WAL(s) and started the db.
After about 10 minutes, the database started settling down on the WAL generation.

There are points note and lessons to learn.

  1. Changing archive_timeout did not have any effect without restarting the db.I'm not sure why, perhaps due to not running the vacuum perhaps. But to rule out this, we will attempt this tomorrow to see if it takes effect without restating the db.
  2. When the db was in development just changing the checkpoint_timeout was sufficient to set the interval between WAL(s) that were shipped out.
Next actions to do:

  1. Implement pg_compress, or pg_lesslog, pg_clearxlogtail
  2. Check on how well autovacuum is running and how much to tune it.
Regards,

Selvam


Re: Too many WAL(s) despite low transaction

От
Stephen Frost
Дата:
* Selva manickaraja (mavles78@gmail.com) wrote:
> Next actions to do:
>
>    1. Implement pg_compress, or pg_lesslog, pg_clearxlogtail
>    2. Check on how well autovacuum is running and how much to tune it.

Have you tested that you can do a restore using the base backup and
WALs?  Have you made sure the database is consistent/correct after doing
such a restore?  If you change to using pg_compress or anything else,
you should be sure to repeat that testing to make sure everything works
as expected.

    Thanks,

        Stephen

Вложения

Re: Too many WAL(s) despite low transaction

От
Selva manickaraja
Дата:
We have let the production DB to run for about one week and monitor the situation. As days passed the DB began to settle down and the log shipping was more consistent. So as suggested by Stephen, we did a tape restore of the base backup and wal-backup from the previous backup done. We have scheduled a backup of base+wal(s) every 12 hours.

So this was the steps taken to test whether our backup is good or not.

  1. Our primary MachineA is replicated to MachineB for pass one week. And WAL(s) are also shipped to Machine C from where all backups are done.
  2. Backup (base+wal) restored from tape to MachineD.
  3. Use a home-written scripts to unzip and untar the base and wal to the appropriate directories.
  4. Database started in MachineD as a secondary to MachineA.
  5. Database in MachineD started and attempted to replicate from MachineA. But unable to replicate because state of base is inconsistent with MachineA.
  6. We pushed the un-backedup WAL(s) from MachineC to the archive directory in MachineD.
  7. It was a great sight to witness as and when the WAL(S) arrive in MachineC it attempts to restore and replicate with MachineA.
  8. Finally after the last WAL was restored the MachineD was in-synch as MachineA in full replication mode.
We suppose our backup(s) can be restored if this worked. Comments are welcomed.

Our next plan of action is

1. To do a PITR from these backups.
2. Implement pg_compress, or pg_lesslog, pg_clearxlogtail to WAL(S) to reduce the size of WALs.

There are other DB issues that I would like to highlight but that it's better to begin another thread for everyone's clarity.

Thank you.

Regards,

Selvam


On Sun, Apr 3, 2011 at 6:52 PM, Stephen Frost <sfrost@snowman.net> wrote:
* Selva manickaraja (mavles78@gmail.com) wrote:
> Next actions to do:
>
>    1. Implement pg_compress, or pg_lesslog, pg_clearxlogtail
>    2. Check on how well autovacuum is running and how much to tune it.

Have you tested that you can do a restore using the base backup and
WALs?  Have you made sure the database is consistent/correct after doing
such a restore?  If you change to using pg_compress or anything else,
you should be sure to repeat that testing to make sure everything works
as expected.

       Thanks,

               Stephen

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAk2YUYMACgkQrzgMPqB3kijQfgCeJ9F4HPrpNdRePlR4SwJ2A89W
ikUAoIM9cM23J43vvX/wuW7Iq1UqQsac
=JnLy
-----END PGP SIGNATURE-----