Обсуждение: How can I avoid Frozenxid wraparound on failover to a standby(PITR) database?

Поиск

Список

Период

Сортировка

How can I avoid Frozenxid wraparound on failover to a standby(PITR) database?

От

Arctic Toucan

Дата:

05 сентября 2008 г., 15:28:37

I have a fairly large database(approx. 1.5TB) that is backed up by a warm standby database using log shipping(PITR). This setup had been running for a couple of months when I ran into a problem on the primary DB and had to failover to the standby DB. This worked as expected.

Shortly thereafter(Sometime over the long weekend of course), Postgres shutdown the database to avoid XID wraparound data loss. I presume there were warnings in the log about running out of XIDs, but nobody noticed in time and given what transpired after that I don't think it would have mattered if they had.

As per the documentation, I started the DB in single user mode and attempted to do a full database vacuum. After this ran for about 12hours the pg_xlog directory ran out of disk space. I'm not sure I understand why anything is written to pg_xlog as part of the vacuum process, perhaps someone can enlighten me.

I next started looking at the age(refrozenxid) of the tables in my DB, and was surprised to see that over 4000 of the 5000 tables in this DB had an age over 2Billion. So thats 4000 tables representing over a terabyte of data that need to be vacuumed! I am now vacuuming those tables one at a time, which is taking a long time(This is a scripted process). So there is no way I could have vacuumed the tables quickly enough even given a warning of impending XID wraparound.

Looking through the support mailing lists(Bugs) I see some discussion about the frozenxid updates on the master not being propogated to the slave through the WAL logs, and comments from Tom, Alvaro and Heikki suggesting that they were looking into a solution for PG 8.3 and needed a way around the problem in PG 8.2.

I am currently running PG 8.2.4 on FreeBSD.

So my questions are:

1) What is the recommended way to either solve or get around this problem in PG 8.2.4?
2) Is this "problem" fixed in some more current version of Postgres? I didn't see any mention of it in release notes up to PG 8.3.3?
3) Does this mean that if you are trying to use a warm standby DB with PITR, you need to make a new base backup of your primary DB every 1.5billion transactions, or there abouts, to avoid the problem. If so, I think this should be documented in the "Caveats" section of "Continuous Archiving and Point-in-time-recovery(PITR)" section of the manual.

Regards...

Mark Sherwood

Use Windows Live Messenger to send messages to your buddies on their mobile phones Find out more on our PC to Mobile website

Re: How can I avoid Frozenxid wraparound on failover to a standby(PITR) database?

От

Tom Lane

Дата:

05 сентября 2008 г., 16:52:25

Arctic Toucan <arctic_toucan@hotmail.com> writes:
> I next started looking at the age(refrozenxid) of the tables in my DB, and was surprised to see that over 4000 of the
5000tables in this DB had an age over 2Billion. So thats 4000 tables representing over a terabyte of data that need to
bevacuumed! I am now vacuuming those tables one at a time, which is taking a long time(This is a scripted process). So
thereis no way I could have vacuumed the tables quickly enough even given a warning of impending XID wraparound. 

> Looking through the support mailing lists(Bugs) I see some discussion about the frozenxid  updates on the master not
beingpropogated to the slave through the WAL logs, and comments from Tom, Alvaro and Heikki suggesting that they were
lookinginto a solution for PG 8.3 and needed a way around the problem in PG 8.2.  

Hmm ... that did get fixed in 8.2
http://archives.postgresql.org/pgsql-committers/2006-11/msg00026.php
so I'm a bit confused about what happened here.  What were your
autovacuum settings on the old master?  Particularly
autovacuum_freeze_max_age?

            regards, tom lane

Re: How can I avoid Frozenxid wraparound on failover to a standby(PITR) database?

От

Arctic Toucan

Дата:

05 сентября 2008 г., 18:09:59

OK, I am glad to hear that what I thought might be the problem is fixed in PG 8.2. I feel much better about that :-)

That must mean there was something unusual about this particular DB. I have several other "cloned" DB's configured the same way(They haven't failed over yet), and checking the age of frozenXIDs on them show that they are fine. The primary DB box in my problem setup is completely inaccessible, so I have no way to check what the FrozenXID values were there. One thing I could try is forcing a failover on one of the clones and see if it exhibits the same behaviour. From what you say I'm guessing the answer is that they will be OK.

My standard settings in the config file are:
autovacuum_freeze_max_age = 2,000,000,000
vacuum_freeze_min_age = 100,000,000

This particular DB, has two tables that are quite heavily updated(approx 1,000,000 rows/hour), with the rest of the DB load being around 8,000,000 row inserts/hour.

Previously, when I was watching pg_activity on this DB, the autovacuum process seemed to be always vacuuming one or the other of those heavily updated tables(typically for an hour or more at a time).

Could another explanation for what happened be that the tables being inserted to, were not getting their XIDs updated quickly enough by the autovacuum process on the primary DB because it was spending all its time on those two tables? (ie. the XID wraparound problem would have ocurred even without the failover) .

If thats the case, then once I get this standby DB vacuumed and back up, I will set up a cron job to look at how the max_age of the relation XIDs changes over time.

Thanks,

Mark Sherwood

> To: arctic_toucan@hotmail.com
> CC: pgsql-admin@postgresql.org
> Subject: Re: [ADMIN] How can I avoid Frozenxid wraparound on failover to a standby(PITR) database?
> Date: Fri, 5 Sep 2008 12:52:17 -0400
> From: tgl@sss.pgh.pa.us
>
> Arctic Toucan <arctic_toucan@hotmail.com> writes:
> > I next started looking at the age(refrozenxid) of the tables in my DB, and was surprised to see that over 4000 of the 5000 tables in this DB had an age over 2Billion. So thats 4000 tables representing over a terabyte of data that need to be vacuumed! I am now vacuuming those tables one at a time, which is taking a long time(This is a scripted process). So there is no way I could have vacuumed the tables quickly enough even given a warning of impending XID wraparound.
>
> > Looking through the support mailing lists(Bugs) I see some discussion about the frozenxid updates on the master not being propogated to the slave through the WAL logs, and comments from Tom, Alvaro and Heikki suggesting that they were looking into a solution for PG 8.3 and needed a way around the problem in PG 8.2.
>
> Hmm ... that did get fixed in 8.2
> http://archives.postgresql.org/pgsql-committers/2006-11/msg00026.php
> so I'm a bit confused about what happened here. What were your
> autovacuum settings on the old master? Particularly
> autovacuum_freeze_max_age?
>
> regards, tom lane

Get your information fix on your phone. With MSN Mobile you get regular news, sports and finance updates. Try it today!

Re: How can I avoid Frozenxid wraparound on failover to a standby(PITR) database?

От

Tom Lane

Дата:

05 сентября 2008 г., 18:46:05

Arctic Toucan <arctic_toucan@hotmail.com> writes:
> My standard settings in the config file are:
> autovacuum_freeze_max_age = 2,000,000,000
> vacuum_freeze_min_age = 100,000,000

Ah, well, there's the issue.  It could be expected that no tuple
freezing would happen before autovacuum_freeze_max_age.  I'd take a zero
off that value I think --- you're not leaving yourself a lot of daylight
between freeze_max_age and wraparound.  You could get away with that
setting in a smaller database that didn't take so long to vacuum, but
in a bigger one you want to freeze a bit more aggressively IMHO.

            regards, tom lane

Re: How can I avoid Frozenxid wraparound on failover to a standby(PITR) database?

От

Arctic Toucan

Дата:

05 сентября 2008 г., 20:07:29

If I recall correctly, the reason we originally set autovacuum_freeze_max_age = 2,000,000,000 was that most of the data in the DB is a rolling window(partitioned tables) and so by having the max age so high most of the tables in the DB would never need to be vacuumed at all thus allowing the autovacuum to concentrate on the high update rate of the two tables of concern.

It seems that we may have gotten to the point where our rolling window exceeds 2billion transactions and that approach is no longer viable so we need to change our max_age as you suggest.

Thanks, as usual, for the help.

Regards...

Mark

Upgrade to Hotmail Plus and share more photos with bigger attachments. Click here to find out how Click here to find out how

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: How can I avoid Frozenxid wraparound on failover to a standby(PITR) database?

How can I avoid Frozenxid wraparound on failover to a standby(PITR) database?

Re: How can I avoid Frozenxid wraparound on failover to a standby(PITR) database?

Re: How can I avoid Frozenxid wraparound on failover to a standby(PITR) database?

Re: How can I avoid Frozenxid wraparound on failover to a standby(PITR) database?

Re: How can I avoid Frozenxid wraparound on failover to a standby(PITR) database?