Обсуждение: Vacuum problems

Поиск
Список
Период
Сортировка

Vacuum problems

От
"Scot Kreienkamp"
Дата:

Hi everyone…

 

I have a database that is currently about 25 gigs on my primary DB server running Postgres 8.2.9, and two others that are less than 1 gig apiece.  The DB server is a quad proc, quad core, 64 gigs of memory, 5 drive RAID5 array, so it has plenty of horsepower.  Until about three weeks ago I was running a nightly vacuum analyze and a vacuum full analyze once per week. 

 

This is what I was running for the vacuum full command:

vacuumdb -a -e -f -z -v  -U postgres

 

The nightly vacuums have been working flawlessly, but about three weeks ago the vacuum full started failing.  It was taking about 5-10 minutes normally, but all of a sudden it started hitting the command timeout that I have set, which is at 60 minutes.  I thought that it may be a corrupt table or a large amount of content had been deleted from a database, so I built a script to loop through each database and run a vacuum full analyze on each table individually thinking I would find my problem table.  The script finished in 5 minutes! 

 

This is what I’m running on each table now in my script:

vacuumdb -d $DB -t $TABLE -e -f -z -v  -U postgres

 

As I understand it, the “vacuumdb –a” command basically does the same thing as my script.  So why is it timing out while my script finishes in 5 minutes or less?  Is the “vacuumdb –a” command doing something that I’m not? 

 

Now that the holiday season is past I will be upgrading to the latest 8.2 release as soon as possible, but I need to get this figured out first unless it’s a bug that the upgrade fixes.

 

I’d appreciate any advice or explanations you guys can send my way. 

 

 

Thanks,

 

Scot Kreienkamp

La-Z-Boy Inc.

skreien@la-z-boy.com

734-242-1444 ext 6379

 

Re: Vacuum problems

От
"Grzegorz Jaśkiewicz"
Дата:
well, upgrade to 8.2.11 - even tho, change log doesn't say anything
about vacuum there.
Secondly, I think turn up logging verbosity - and see if postgresql
actually complains about anything there, otherwise it is pretty much
blind guess.

Re: Vacuum problems

От
"Grzegorz Jaśkiewicz"
Дата:
btw, is that on windows ? (looking at la-z-boy.com, it uses .net).

Re: Vacuum problems

От
"Scot Kreienkamp"
Дата:
Nope, on RHEL5.  This is the backend of an application that's not available or visible to the public.

Thanks,

Scot Kreienkamp
La-Z-Boy Inc.
skreien@la-z-boy.com
734-242-1444 ext 6379
-----Original Message-----
From: Grzegorz Jaśkiewicz [mailto:gryzman@gmail.com]
Sent: Monday, January 05, 2009 11:19 AM
To: Scot Kreienkamp
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Vacuum problems

btw, is that on windows ? (looking at la-z-boy.com, it uses .net).

Re: Vacuum problems

От
"Scott Marlowe"
Дата:
On Mon, Jan 5, 2009 at 7:46 AM, Scot Kreienkamp <SKreien@la-z-boy.com> wrote:
> Hi everyone…
>
>
>
> I have a database that is currently about 25 gigs on my primary DB server
> running Postgres 8.2.9, and two others that are less than 1 gig apiece.  The
> DB server is a quad proc, quad core, 64 gigs of memory, 5 drive RAID5 array,
> so it has plenty of horsepower.  Until about three weeks ago I was running a
> nightly vacuum analyze and a vacuum full analyze once per week.

Did you have a compelling reason for running vacuum full?  It's
generally discouraged unless you've got a usage pattern that demands
it.  If you are running vacuum full you likely have bloated indexes,
so you might need to reindex the db as well.

> This is what I was running for the vacuum full command:
>
> vacuumdb -a -e -f -z -v  -U postgres
>
>
>
> The nightly vacuums have been working flawlessly, but about three weeks ago
> the vacuum full started failing.  It was taking about 5-10 minutes normally,
> but all of a sudden it started hitting the command timeout that I have set,
> which is at 60 minutes.

Since I assume vacuum is running under the superuser account you can try this:

alter user postgres set statement_timeout=0;

To give it all the time it needs to finish.


>  I thought that it may be a corrupt table or a large
> amount of content had been deleted from a database, so I built a script to
> loop through each database and run a vacuum full analyze on each table
> individually thinking I would find my problem table.  The script finished in
> 5 minutes!

It might be that the previous vacuum full cleaned up enough stuff that
the next one ran faster.  But again, vacuum full is usually a bad idea
as regular maintenance.

Re: Vacuum problems

От
"Scot Kreienkamp"
Дата:
Scott,

Would the "alter user postgres set statement_timeout=0;" be a permanent
change?  I ask because our application is using that for its login to
the database.  (No lectures please, I inherited the system that way.  I
already read the riot act to our developers about that.)  If so I'll
have to set it back after the vacuum is done.

FYI, when I inherited the system it was doing nightly vacuum fulls.  It
was that way for several months.  If that causes bloated indexes, then
that's fairly likely a problem I have.  Sounds like I should quit
running vacuum fulls altogether except maybe once or twice per year.

I'll try adding a reindex for tonight's vacuum run.

Thanks,

Scot Kreienkamp
La-Z-Boy Inc.
skreien@la-z-boy.com

-----Original Message-----
From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Sent: Monday, January 05, 2009 11:35 AM
To: Scot Kreienkamp
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Vacuum problems

On Mon, Jan 5, 2009 at 7:46 AM, Scot Kreienkamp <SKreien@la-z-boy.com>
wrote:
> Hi everyone...
>
>
>
> I have a database that is currently about 25 gigs on my primary DB
server
> running Postgres 8.2.9, and two others that are less than 1 gig
apiece.  The
> DB server is a quad proc, quad core, 64 gigs of memory, 5 drive RAID5
array,
> so it has plenty of horsepower.  Until about three weeks ago I was
running a
> nightly vacuum analyze and a vacuum full analyze once per week.

Did you have a compelling reason for running vacuum full?  It's
generally discouraged unless you've got a usage pattern that demands
it.  If you are running vacuum full you likely have bloated indexes,
so you might need to reindex the db as well.

> This is what I was running for the vacuum full command:
>
> vacuumdb -a -e -f -z -v  -U postgres
>
>
>
> The nightly vacuums have been working flawlessly, but about three
weeks ago
> the vacuum full started failing.  It was taking about 5-10 minutes
normally,
> but all of a sudden it started hitting the command timeout that I have
set,
> which is at 60 minutes.

Since I assume vacuum is running under the superuser account you can try
this:

alter user postgres set statement_timeout=0;

To give it all the time it needs to finish.


>  I thought that it may be a corrupt table or a large
> amount of content had been deleted from a database, so I built a
script to
> loop through each database and run a vacuum full analyze on each table
> individually thinking I would find my problem table.  The script
finished in
> 5 minutes!

It might be that the previous vacuum full cleaned up enough stuff that
the next one ran faster.  But again, vacuum full is usually a bad idea
as regular maintenance.

Re: Vacuum problems

От
Tom Lane
Дата:
"Scot Kreienkamp" <SKreien@la-z-boy.com> writes:
> The nightly vacuums have been working flawlessly, but about three weeks
> ago the vacuum full started failing.  It was taking about 5-10 minutes
> normally, but all of a sudden it started hitting the command timeout
> that I have set, which is at 60 minutes.  I thought that it may be a
> corrupt table or a large amount of content had been deleted from a
> database, so I built a script to loop through each database and run a
> vacuum full analyze on each table individually thinking I would find my
> problem table.  The script finished in 5 minutes!

Maybe the vacuum was blocking waiting for someone's lock on a table?
What else is running in parallel with your nightly vacuum script?

            regards, tom lane

Re: Vacuum problems

От
"Scot Kreienkamp"
Дата:
Nothing.  I ran a query to see what other queries were running and what
other locks existed at the time.  No queries running, no locks.  Nothing
running at that time of night except dump, which is finished before the
vacuum job starts.

Thanks,

Scot Kreienkamp
La-Z-Boy Inc.
skreien@la-z-boy.com

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Monday, January 05, 2009 1:05 PM
To: Scot Kreienkamp
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Vacuum problems

"Scot Kreienkamp" <SKreien@la-z-boy.com> writes:
> The nightly vacuums have been working flawlessly, but about three
weeks
> ago the vacuum full started failing.  It was taking about 5-10 minutes
> normally, but all of a sudden it started hitting the command timeout
> that I have set, which is at 60 minutes.  I thought that it may be a
> corrupt table or a large amount of content had been deleted from a
> database, so I built a script to loop through each database and run a
> vacuum full analyze on each table individually thinking I would find
my
> problem table.  The script finished in 5 minutes!

Maybe the vacuum was blocking waiting for someone's lock on a table?
What else is running in parallel with your nightly vacuum script?

            regards, tom lane

Re: Vacuum problems

От
"Scott Marlowe"
Дата:
On Mon, Jan 5, 2009 at 10:15 AM, Scot Kreienkamp <SKreien@la-z-boy.com> wrote:
> Scott,
>
> Would the "alter user postgres set statement_timeout=0;" be a permanent
> change?  I ask because our application is using that for its login to
> the database.  (No lectures please, I inherited the system that way.  I
> already read the riot act to our developers about that.)  If so I'll
> have to set it back after the vacuum is done.

Then you can just issue a "set statement_timeout=0" before you run
vacuum / vacuum full.

The update versus insert ratio isn't as important as how many rows are
updated out of the total between each run of vacuum analyze.  Vacuum
full is definitely NOT a regular, recommended practice.  I don't think
the docs really say it is.  But a few other people have seemed to get
the same idea from the docs, so there must be some gray area I'm not
seeing when I read them.  Given the usage pattern you described
earlier, I'd say vacuum full is definitely NOT called for, but regular
vacuum should be plenty.

The best thing to do is to examine how many dead tuples you've got to
keep track of, and if that number keeps rising then figure out if fsm
pages needs to be bumped up, and / or autovacuum needs more aggresive
settings.  Note that autovacuum is kind of hand cuffed on pg versions
before 8.3 because it was single threaded, and one really big table
could throw it behind on other more frequently updated tables getting
bloated while the vacuum thread runs against that one large table.

Use vacuum verbose to get an idea of how many dead tuples there are in
the database, and see if they rise to a plateu, or just keep rising.
For most usage patterns with autovacuum enabled, you'll see a steady
rise to about 10-20% dead tuples then it should level off.

> FYI, when I inherited the system it was doing nightly vacuum fulls.  It
> was that way for several months.  If that causes bloated indexes, then
> that's fairly likely a problem I have.  Sounds like I should quit
> running vacuum fulls altogether except maybe once or twice per year.

A lot of times a pgsql doing nightly fulls is a sign of someone who
started out with an old version that only supported full vacuum and
applying the faulty knowledge they gained from there to the newer
version which likely doesn't need it.

If you do find one table that really needs full vacuums because of its
usage pattern, it's best to cron up a single vacuum (regular) to run
more often on it, or make autovacuum more aggresive, or, failing those
two, to make a regular nightly vacuum full / cluster / reindex for
that one relation.

Usually cluster is a better choice, as it doesn't bloat indexes and
puts the table into index order (on the index you clustered on).

Re: Vacuum problems

От
"Scott Marlowe"
Дата:
Another strategy to look at is to make autovacuum more aggresive by
putting entries for that one table into the pg_autovacuum table.  I
know that table exists in 8.3 but I don't remember if it exists in
older versions or how it works there.  But if you have that available
you can make that one relation more aggresively autovacuumed without
affecting the other relations vacuum scheduling.  On 8.3 you can set
individual cost delays, and all the other settings you'd need.  If 8.2
doesn't support this, look at migrating just for that functionality
alone.

Re: Vacuum problems

От
"Scot Kreienkamp"
Дата:
Thanks for the advice Scott.  I've taken out the vacuum fulls entirely.
I've now got a nightly vacuum analyze as well as reindex.  I'll probably
drop both to every other night.

BTW, the database shrunk by 2 gigs just from reindexing last night.  I
expect I'll see a performance gain from actually doing reindexing since
this database has never been reindexed since it was put in production 6
months ago.

I've got about 12 tables that get caught by the autoanalyze and about 6
that get caught by autovacuum on a daily basis.  I'm not sure how often
the autovacuum and autoanalyze runs on those tables.  I probably need to
up the logging to find out.  I'm not worried about making it more
aggressive yet.

One other problem though... my database has a "-" in the name... when I
try to run:

psql -U postgres -d rms-prod -c "REINDEX SYSTEM rms-prod"

I get this:

ERROR:  syntax error at or near "-"
LINE 1: REINDEX SYSTEM rms-prod

The user tables run fine.  Should I reindex the system tables also?  If
so, how do I get around the dash in the db name?

Thanks,

Scot Kreienkamp
La-Z-Boy Inc.
skreien@la-z-boy.com
734-242-1444 ext 6379

-----Original Message-----
From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Sent: Monday, January 05, 2009 1:37 PM
To: Scot Kreienkamp
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Vacuum problems

On Mon, Jan 5, 2009 at 10:15 AM, Scot Kreienkamp <SKreien@la-z-boy.com>
wrote:
> Scott,
>
> Would the "alter user postgres set statement_timeout=0;" be a
permanent
> change?  I ask because our application is using that for its login to
> the database.  (No lectures please, I inherited the system that way.
I
> already read the riot act to our developers about that.)  If so I'll
> have to set it back after the vacuum is done.

Then you can just issue a "set statement_timeout=0" before you run
vacuum / vacuum full.

The update versus insert ratio isn't as important as how many rows are
updated out of the total between each run of vacuum analyze.  Vacuum
full is definitely NOT a regular, recommended practice.  I don't think
the docs really say it is.  But a few other people have seemed to get
the same idea from the docs, so there must be some gray area I'm not
seeing when I read them.  Given the usage pattern you described
earlier, I'd say vacuum full is definitely NOT called for, but regular
vacuum should be plenty.

The best thing to do is to examine how many dead tuples you've got to
keep track of, and if that number keeps rising then figure out if fsm
pages needs to be bumped up, and / or autovacuum needs more aggresive
settings.  Note that autovacuum is kind of hand cuffed on pg versions
before 8.3 because it was single threaded, and one really big table
could throw it behind on other more frequently updated tables getting
bloated while the vacuum thread runs against that one large table.

Use vacuum verbose to get an idea of how many dead tuples there are in
the database, and see if they rise to a plateu, or just keep rising.
For most usage patterns with autovacuum enabled, you'll see a steady
rise to about 10-20% dead tuples then it should level off.

> FYI, when I inherited the system it was doing nightly vacuum fulls.
It
> was that way for several months.  If that causes bloated indexes, then
> that's fairly likely a problem I have.  Sounds like I should quit
> running vacuum fulls altogether except maybe once or twice per year.

A lot of times a pgsql doing nightly fulls is a sign of someone who
started out with an old version that only supported full vacuum and
applying the faulty knowledge they gained from there to the newer
version which likely doesn't need it.

If you do find one table that really needs full vacuums because of its
usage pattern, it's best to cron up a single vacuum (regular) to run
more often on it, or make autovacuum more aggresive, or, failing those
two, to make a regular nightly vacuum full / cluster / reindex for
that one relation.

Usually cluster is a better choice, as it doesn't bloat indexes and
puts the table into index order (on the index you clustered on).

Re: Vacuum problems

От
"Grzegorz Jaśkiewicz"
Дата:
you don't have to reindex too often - it locks exclusively whole
table, just like vacuum full. Just do it every few months, depending
on db growth.

Re: Vacuum problems

От
Adrian Klaver
Дата:

----- "Scot Kreienkamp" <SKreien@la-z-boy.com> wrote:

> Thanks for the advice Scott.  I've taken out the vacuum fulls
> entirely.
> I've now got a nightly vacuum analyze as well as reindex.  I'll
> probably
> drop both to every other night.
>
> BTW, the database shrunk by 2 gigs just from reindexing last night.
> I
> expect I'll see a performance gain from actually doing reindexing
> since
> this database has never been reindexed since it was put in production
> 6
> months ago.
>
> I've got about 12 tables that get caught by the autoanalyze and about
> 6
> that get caught by autovacuum on a daily basis.  I'm not sure how
> often
> the autovacuum and autoanalyze runs on those tables.  I probably need
> to
> up the logging to find out.  I'm not worried about making it more
> aggressive yet.
>
> One other problem though... my database has a "-" in the name... when
> I
> try to run:
>
> psql -U postgres -d rms-prod -c "REINDEX SYSTEM rms-prod"

You need to quote the db name:
psql -U postgres -d rms-prod -c 'REINDEX SYSTEM "rms-prod"'

>
> I get this:
>
> ERROR:  syntax error at or near "-"
> LINE 1: REINDEX SYSTEM rms-prod
>
> The user tables run fine.  Should I reindex the system tables also?
> If
> so, how do I get around the dash in the db name?
>
> Thanks,
>
> Scot Kreienkamp
> La-Z-Boy Inc.
> skreien@la-z-boy.com
> 734-242-1444 ext 6379
>

Adrian Klaver
aklaver@comcast.net

Re: Vacuum problems

От
"Scot Kreienkamp"
Дата:
Yep... dummy me.  That works.  I tried that before with the reindexdb
command, that doesn't work.  I didn't try it with the psql command.

Thanks,

Scot Kreienkamp
La-Z-Boy Inc.
skreien@la-z-boy.com

-----Original Message-----
From: Adrian Klaver [mailto:aklaver@comcast.net]
Sent: Tuesday, January 06, 2009 12:02 PM
To: Scot Kreienkamp
Cc: pgsql-general@postgresql.org; Scott Marlowe
Subject: Re: [GENERAL] Vacuum problems



----- "Scot Kreienkamp" <SKreien@la-z-boy.com> wrote:

> Thanks for the advice Scott.  I've taken out the vacuum fulls
> entirely.
> I've now got a nightly vacuum analyze as well as reindex.  I'll
> probably
> drop both to every other night.
>
> BTW, the database shrunk by 2 gigs just from reindexing last night.
> I
> expect I'll see a performance gain from actually doing reindexing
> since
> this database has never been reindexed since it was put in production
> 6
> months ago.
>
> I've got about 12 tables that get caught by the autoanalyze and about
> 6
> that get caught by autovacuum on a daily basis.  I'm not sure how
> often
> the autovacuum and autoanalyze runs on those tables.  I probably need
> to
> up the logging to find out.  I'm not worried about making it more
> aggressive yet.
>
> One other problem though... my database has a "-" in the name... when
> I
> try to run:
>
> psql -U postgres -d rms-prod -c "REINDEX SYSTEM rms-prod"

You need to quote the db name:
psql -U postgres -d rms-prod -c 'REINDEX SYSTEM "rms-prod"'

>
> I get this:
>
> ERROR:  syntax error at or near "-"
> LINE 1: REINDEX SYSTEM rms-prod
>
> The user tables run fine.  Should I reindex the system tables also?
> If
> so, how do I get around the dash in the db name?
>
> Thanks,
>
> Scot Kreienkamp
> La-Z-Boy Inc.
> skreien@la-z-boy.com
> 734-242-1444 ext 6379
>

Adrian Klaver
aklaver@comcast.net

Re: Vacuum problems

От
"Scott Marlowe"
Дата:
On Tue, Jan 6, 2009 at 9:05 AM, Grzegorz Jaśkiewicz <gryzman@gmail.com> wrote:
> you don't have to reindex too often - it locks exclusively whole
> table, just like vacuum full. Just do it every few months, depending
> on db growth.

While I don't wholly disagree with periodic reindexing, I do recommend
that one keeps track of bloat.  It's easy enough to have an alarm that
goes off if any index gets over 50% dead space, then go look at the
database.  Or go in every week and see what vacuum verbose looks like.
 Even if you just email yourself a copy of the last 10 or 15 lines
every morning or something to see how bloated the db is in general,
you'll catch most problems before they become problems.  One or two
rogue updates without where clauses on medium to large sized tables
can blow you right out of the water.  Best to have some way to keep
track of them.

Re: Vacuum problems

От
Craig Ringer
Дата:
Scott Marlowe wrote:

> While I don't wholly disagree with periodic reindexing, I do recommend
> that one keeps track of bloat.  It's easy enough to have an alarm that
> goes off if any index gets over 50% dead space, then go look at the
> database.

Reading this list, I've noticed that:

- Many admins don't understand vacuum vs vacuum full at all,
  and are unaware of the probable need for a reindex after
  vacuum full. They're often landing up with very bloated indexes
  from excessive use of vacuum full, or very bloated tables due
  to insufficient fsm space / infrequent vacuuming.

- It's hard to spot table and (especially) index bloat. Pg doesn't
  warn about bloated tables or indexes in any way that people seem to
  notice, nor are there any built-in views or functions that might help
  the admin identify problem tables and indexes.

- Most people have a lot of trouble understanding where and how
  their storage is being used.

I'm wondering if it might be a good idea to adopt one of the handy views
people have written for getting table/index bloat information as a
standard part of Pg (probably as an SQL function rather than a view) so
people can just "SELECT pg_bloat()" to get a useful summary of
table/index status.

The other thing I wonder about is having EXPLAIN and EXPLAIN ANALYZE
report information on the amount of free space that a table seq scan or
an index scan is having to traverse. That might bring problems to the
attention of people who're investigating query performance issues
without being aware that the underlying issue is actually bloated
indexes, not bad plans.

--
Craig Ringer