Обсуждение: large table vacuum issues

Поиск
Список
Период
Сортировка

large table vacuum issues

От
"Ed L."
Дата:
We need some advice on how to handle some large table autovacuum
issues.  One of our 8.1.2 autovacuums is launching a DB-wide
vacuum on our 270GB database to prevent xid wrap-around, but is
getting hung-up and/or bogged down for hours on a 40gb table and
taking the server performance down with it, apparently due to an
IO bottleneck.  The autovac child process becomes completely
unresponsive to SIGTERM/SIGINT; only a sigkill restart with
disabling the autovac daemon gets us back to adequate
performance for now.

We are discussing how to partition the table (difficult due to
existing foreign keys in other tables), and archiving/clearing
data.

Are there any other tricks to get it past this large table for
the time being and still get the xid wraparound fix?

TIA.

Ed

Re: large table vacuum issues

От
"Scott Marlowe"
Дата:
On Jan 4, 2008 6:38 PM, Ed L. <pgsql@bluepolka.net> wrote:
> We need some advice on how to handle some large table autovacuum
> issues.  One of our 8.1.2

First of all, update your 8.1 install to 8.1.10.  Failing to keep up
with bug fixes is negligent.  who knows, you might be getting bitten
by a bug that was fixed between 8.1.2 and 8.1.10

> autovacuums is launching a DB-wide
> vacuum on our 270GB database to prevent xid wrap-around, but is
> getting hung-up and/or bogged down for hours on a 40gb table and
> taking the server performance down with it, apparently due to an
> IO bottleneck.

Have you tried adjusting the

#vacuum_cost_delay = 0                  # 0-1000 milliseconds
#vacuum_cost_page_hit = 1               # 0-10000 credits
#vacuum_cost_page_miss = 10             # 0-10000 credits
#vacuum_cost_page_dirty = 20            # 0-10000 credits
#vacuum_cost_limit = 200                # 0-10000 credits

settings to something so as to make vacuum less intrusive?  might be
the easiest fix.

> Are there any other tricks to get it past this large table for
> the time being and still get the xid wraparound fix?

the other trick would be to do a dump / restore of your whole db,
which can often be quicker than vacuuming it if it's got a lot of dead
tuples in it.

Re: large table vacuum issues

От
"Ed L."
Дата:
On Friday 04 January 2008 6:21 pm, Scott Marlowe wrote:
>
> Have you tried adjusting the
>
> #vacuum_cost_delay = 0                  # 0-1000 milliseconds
> #vacuum_cost_page_hit = 1               # 0-10000 credits
> #vacuum_cost_page_miss = 10             # 0-10000 credits
> #vacuum_cost_page_dirty = 20            # 0-10000 credits
> #vacuum_cost_limit = 200                # 0-10000 credits
>
> settings to something so as to make vacuum less intrusive?
> might be the easiest fix.

Any particular suggested changes for these parameters?

Ed

Re: large table vacuum issues

От
"Ed L."
Дата:
On Friday 04 January 2008 6:21 pm, Scott Marlowe wrote:
> On Jan 4, 2008 6:38 PM, Ed L. <pgsql@bluepolka.net> wrote:
> > We need some advice on how to handle some large table
> > autovacuum issues.  One of our 8.1.2
>
> First of all, update your 8.1 install to 8.1.10.  Failing to
> keep up with bug fixes is negligent.  who knows, you might be
> getting bitten by a bug that was fixed between 8.1.2 and
> 8.1.10

Could be.  But like you said, who knows.  In some environments,
downtime for upgrading costs money (and more), too, sometimes
even enough to make it "negligent" to take downtime to keep up
with bug fixes (and of course, the new bugs) which may or may
not be a factor at hand.  While the time required to restart a
DB may be neglible, there are often upstream/downstream
dependencies that greatly expand the actual downtime for the
customer.  How much would downtime need to cost before you
thought it negligent to upgrade immediately?  It's a tradeoff,
not well-supported by simple pronouncements, one the customer
and provider are best qualified to make.

Ed

Re: large table vacuum issues

От
"Scott Marlowe"
Дата:
On Jan 4, 2008 7:41 PM, Ed L. <pgsql@bluepolka.net> wrote:
> On Friday 04 January 2008 6:21 pm, Scott Marlowe wrote:
> > On Jan 4, 2008 6:38 PM, Ed L. <pgsql@bluepolka.net> wrote:
> > > We need some advice on how to handle some large table
> > > autovacuum issues.  One of our 8.1.2
> >
> > First of all, update your 8.1 install to 8.1.10.  Failing to
> > keep up with bug fixes is negligent.  who knows, you might be
> > getting bitten by a bug that was fixed between 8.1.2 and
> > 8.1.10
>
> Could be.  But like you said, who knows.  In some environments,
> downtime for upgrading costs money (and more), too, sometimes
> even enough to make it "negligent" to take downtime to keep up
> with bug fixes (and of course, the new bugs) which may or may
> not be a factor at hand.  While the time required to restart a
> DB may be neglible, there are often upstream/downstream
> dependencies that greatly expand the actual downtime for the
> customer.  How much would downtime need to cost before you
> thought it negligent to upgrade immediately?  It's a tradeoff,
> not well-supported by simple pronouncements, one the customer
> and provider are best qualified to make.

And how much would downtime cost you if your database got corrupted by
those bugs and you had to restore from backups?  You can use something
like slony and a two server setup to reduce the downtime to mere
seconds.  I know about downtime and its costs, I work at an airline
reservation company.  Even we have scheduled maintenance windows,
albeit few and far between.  I find it hard to believe you've had none
since 8.1.2 came out.

Re: large table vacuum issues

От
"Joshua D. Drake"
Дата:
Ed L. wrote:
> On Friday 04 January 2008 6:21 pm, Scott Marlowe wrote:
>> On Jan 4, 2008 6:38 PM, Ed L. <pgsql@bluepolka.net> wrote:
>>> We need some advice on how to handle some large table
>>> autovacuum issues.  One of our 8.1.2
>> First of all, update your 8.1 install to 8.1.10.  Failing to
>> keep up with bug fixes is negligent.  who knows, you might be
>> getting bitten by a bug that was fixed between 8.1.2 and
>> 8.1.10
>
> Could be.  But like you said, who knows.  In some environments,
> downtime for upgrading costs money (and more), too, sometimes
> even enough to make it "negligent" to take downtime to keep up
> with bug fixes (and of course, the new bugs) which may or may
> not be a factor at hand.  While the time required to restart a
> DB may be neglible, there are often upstream/downstream
> dependencies that greatly expand the actual downtime for the
> customer.  How much would downtime need to cost before you
> thought it negligent to upgrade immediately?  It's a tradeoff,
> not well-supported by simple pronouncements, one the customer
> and provider are best qualified to make.

You make a valid argument above but you forget a couple of minor points.

How much money does it cost when your customer:

* gets sued for a breech of security because they couldn't afford a 30
minute downtime at 3am? (I assume 30 minutes only because you do need to
shutdown external services).

* looses all there data because of a corner case function they are
running that causes pages to become corrupt?

Just curious...

Sincerely,

Joshua D. Drake



Re: large table vacuum issues

От
"Scott Marlowe"
Дата:
On Jan 4, 2008 7:29 PM, Ed L. <pgsql@bluepolka.net> wrote:
> On Friday 04 January 2008 6:21 pm, Scott Marlowe wrote:
> >
> > Have you tried adjusting the
> >
> > #vacuum_cost_delay = 0                  # 0-1000 milliseconds
> > #vacuum_cost_page_hit = 1               # 0-10000 credits
> > #vacuum_cost_page_miss = 10             # 0-10000 credits
> > #vacuum_cost_page_dirty = 20            # 0-10000 credits
> > #vacuum_cost_limit = 200                # 0-10000 credits
> >
> > settings to something so as to make vacuum less intrusive?
> > might be the easiest fix.
>
> Any particular suggested changes for these parameters?

Well, it really depends on your I/O subsystem, but a good start would
be to read this section of the manual:

http://www.postgresql.org/docs/8.2/interactive/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-VACUUM-COST

I'd start setting the delay to 10 or 20 and seeing if vacuuming has a
low enough impact to allow it to run in the background, even during
peak hours.

Keep an eye on vmstat / iostat output while vacuum is running to see
if you're flooding your I/O or not.

note that there's a whole other set of vars for autovacuum (at least
in 8.2, don't know about 8.1) that you can set so that regular vacuums
can happen with greater or less priority than autovacuuming.

Re: large table vacuum issues

От
Bill Moran
Дата:
"Ed L." <pgsql@bluepolka.net> wrote:
>
> On Friday 04 January 2008 6:21 pm, Scott Marlowe wrote:
> > On Jan 4, 2008 6:38 PM, Ed L. <pgsql@bluepolka.net> wrote:
> > > We need some advice on how to handle some large table
> > > autovacuum issues.  One of our 8.1.2
> >
> > First of all, update your 8.1 install to 8.1.10.  Failing to
> > keep up with bug fixes is negligent.  who knows, you might be
> > getting bitten by a bug that was fixed between 8.1.2 and
> > 8.1.10
>
> Could be.  But like you said, who knows.  In some environments,
> downtime for upgrading costs money (and more), too, sometimes
> even enough to make it "negligent" to take downtime to keep up
> with bug fixes (and of course, the new bugs) which may or may
> not be a factor at hand.

Upgrades along the 8.1.x branch take something on the order of
5 minutes (if you're meticulous and serialize the process).

If you haven't set yourself up so you can schedule 5 minutes of
downtime once a month or so, then the negligence occurred much
earlier than at the failure to upgrade.

> While the time required to restart a
> DB may be neglible, there are often upstream/downstream
> dependencies that greatly expand the actual downtime for the
> customer.

Like what?  The point to the double-dot branch is that upgrades
don't affect dependencies.

> How much would downtime need to cost before you
> thought it negligent to upgrade immediately?  It's a tradeoff,
> not well-supported by simple pronouncements, one the customer
> and provider are best qualified to make.

Not really.  Unscheduled downtime is _always_ more expensive than
scheduled downtime.  Scheduled downtime isn't going to put you in
breach of contract if you've got an uptime guarantee.

If you're really in a situation where you need 100% uptime, then
you're still negligent for not having something like Slony to allow
you to switch production to another server so you can alternate
maintenance between the two.

This is something along the RAID 5 argument, no matter how you argue
it, it's a bad idea.  If you claim you can't afford to buy more hardware,
then you made a mistake in pricing out your product to your client.

--
Bill Moran
http://www.potentialtech.com

Re: large table vacuum issues

От
"Usama Dar"
Дата:


On Jan 5, 2008 5:38 AM, Ed L. <pgsql@bluepolka.net> wrote:
We need some advice on how to handle some large table autovacuum
issues.  One of our 8.1.2 autovacuums is launching a DB-wide
vacuum on our 270GB database to prevent xid wrap-around, but is
getting hung-up and/or bogged down for hours on a 40gb table and
taking the server performance down with it, apparently due to an
IO bottleneck.  The autovac child process becomes completely
unresponsive to SIGTERM/SIGINT; only a sigkill restart with
disabling the autovac daemon gets us back to adequate
performance for now.

Looks like you haven't been vacuuming for a while , have you? because it seems the autovac was disabled but was invoked forcefully to avoid wraparound. If infact the wraparound happens you will lose data. When autovacuum is processing a table it wouldn't take more time than what a normal vacuum would take.

What might help you really is a temporary increase in maint work memory, whats your current setting? how much RAM do you have?, if you can afford more memory, increase it to significantly high value to help speed up the vacuum process. I understand it might impact some other system activity but you need a vacuum and fast, before you lose all data.

You need to get rid of dead rows first and then have a healthy vacuuming schedule , either a daily cron job or autovac, setup as your workload . Prevention is ofcourse better than the cure. ;)

 Thanks,
--
Usama Munir Dar http://www.linkedin.com/in/usamadar
Consultant Architect
Cell:+92 321 5020666
Skype: usamadar