Обсуждение: vacuum taking an unusually long time
Vacuum operations on several tables are taking much longer than they previously were.
We currently have 3 autovacuum processes that have been running more than 3 days each.
The tables are large (between 40 and 90GB each).
Postgresql version is 8.3.1
maintenance_work_mem is 512MB (on a 32GB server).
Any ideas what would make vacuum take so long?
What can I do to speed things up?
thanks,
- Mason
On Mon, Jul 14, 2008 at 3:08 PM, Mason Hale <masonhale@gmail.com> wrote: > Vacuum operations on several tables are taking much longer than they > previously were. > We currently have 3 autovacuum processes that have been running more than 3 > days each. > The tables are large (between 40 and 90GB each). > Postgresql version is 8.3.1 > maintenance_work_mem is 512MB (on a 32GB server). > Any ideas what would make vacuum take so long? > What can I do to speed things up? Have you adjusted your vacuum / autovacuum cost parameters up? that will certainly slow down vacuums.
None of these values have changed recently.
The values are:
vacuum_cost_delay = 10ms
vacuum_cost_limit = 200
Are there any other values I should be looking at?
The longest running vacuum has been running more than 6 days at this point.
Thanks,
Mason
On Mon, Jul 14, 2008 at 4:39 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
Have you adjusted your vacuum / autovacuum cost parameters up? thatOn Mon, Jul 14, 2008 at 3:08 PM, Mason Hale <masonhale@gmail.com> wrote:
> Vacuum operations on several tables are taking much longer than they
> previously were.
> We currently have 3 autovacuum processes that have been running more than 3
> days each.
> The tables are large (between 40 and 90GB each).
> Postgresql version is 8.3.1
> maintenance_work_mem is 512MB (on a 32GB server).
> Any ideas what would make vacuum take so long?
> What can I do to speed things up?
will certainly slow down vacuums.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
"Mason Hale" <masonhale@gmail.com> writes: > The longest running vacuum has been running more than 6 days at this point. Is it actually *doing* anything, or is it just blocked waiting for someone else? strace or local equivalent would be the most definitive way to check. regards, tom lane
Here's some of the strace output:
select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout)
semop(9895945, 0x7fff1321db70, 1) = 0
read(72, "\233\7\0\0H\207f2\1\0\1\0`\0\0 \0 \4 \0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
select(0, NULL, NULL, NULL, {0, 80000}) = 0 (Timeout)
read(72, "!\5\0\0\370\277\371\247\1\0\1\0\214\0\330\23\0 \4 \0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
select(0, NULL, NULL, NULL, {0, 80000}) = 0 (Timeout)
read(72, "}\4\0\0h]\333\217\1\0\1\0X\0\310\v\0 \4 \0\0\0\0\0\0\0\0X\235H\5"..., 8192) = 8192
select(0, NULL, NULL, NULL, {0, 80000}) = 0 (Timeout)
read(72, "}\4\0\0(\260\333\217\1\0\1\0\\\0\200\v\0 \4 \0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
select(0, NULL, NULL, NULL, {0, 80000}) = 0 (Timeout)
read(72, "}\4\0\0p\305\333\217\1\0\1\0\\\0p\v\0 \4 \0\0\0\0\370\235\20\4\0\0\0\0"..., 8192) = 8192
select(0, NULL, NULL, NULL, {0, 80000}) = 0 (Timeout)
read(72, "!\5\0\0\10\330\371\247\1\0\1\0P\0\240\10\0 \4 \0\0\0\0\0\0\0\0\360\234 \6"..., 8192) = 8192
select(0, NULL, NULL, NULL, {0, 80000}) = 0 (Timeout)
read(72, "\233\7\0\0\340\212f2\1\0\1\0t\0(\35\0 \4 \0\0\0\0\0\0\0\0(\235\250\5"..., 8192) = 8192
select(0, NULL, NULL, NULL, {0, 80000}) = 0 (Timeout)
read(72, "\24\10\0\0\230\333\372\207\1\0\1\0`\0\330\n\0 \4 \0\0\0\0X\220@\4\7\0\1\0"..., 8192) = 8192
select(0, NULL, NULL, NULL, {0, 80000}) = 0 (Timeout)
semop(9830407, 0x7fff1321db90, 1) = 0
read(72, "\362\n\0\0H\316,r\1\0\1\0\204\0P\r\0 \4 _\17L\21x\235\20\5\5\0\1\0"..., 8192) = 8192
select(0, NULL, NULL, NULL, {0, 80000}) = 0 (Timeout)
read(72, "\177\5\0\0\300\317A\276\1\0\1\0p\0\210\1\0 \4 \0\0\0\0\26\0\1\0\30\235\310\5"..., 8192) = 8192
All those "select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout)" commands sure look suspicious.
Any thoughts?
Mason
On Tue, Jul 15, 2008 at 10:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Mason Hale" <masonhale@gmail.com> writes:Is it actually *doing* anything, or is it just blocked waiting for
> The longest running vacuum has been running more than 6 days at this point.
someone else? strace or local equivalent would be the most definitive
way to check.
regards, tom lane
On Wed, Jul 16, 2008 at 10:26 AM, Mason Hale <masonhale@gmail.com> wrote:
Here's some of the strace output:select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout)
If I read the 'select(2)' man page correctly, it appears this process is waiting indefinitely for a NULL file descriptor.
That looks pretty stuck to me.
Should I kill this autovacuum process via pg_cancel_backend?
Or is there a better way to "unstick" it?
Any ideas what may have caused it to get into this condition?
- Mason
On Jul 16, 2008, at 4:40 PM, Mason Hale wrote: > > > On Wed, Jul 16, 2008 at 10:26 AM, Mason Hale <masonhale@gmail.com> > wrote: > Here's some of the strace output: > > select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout) > > If I read the 'select(2)' man page correctly, it appears this > process is waiting indefinitely for a NULL file descriptor. > That looks pretty stuck to me. No, it's just sleeping. For 10 milliseconds or so. > > Should I kill this autovacuum process via pg_cancel_backend? > Or is there a better way to "unstick" it? > > Any ideas what may have caused it to get into this condition? > It looks like "read a page, sleep for 80 milliseconds, repeat". I'd look at your settings for autovacuum_vacuum_cost_limit / autovacuum_vacuum_cost_delay, and maybe the contents of pg_autovacuum. Cheers, Steve
No, it's just sleeping. For 10 milliseconds or so.On Wed, Jul 16, 2008 at 10:26 AM, Mason Hale <masonhale@gmail.com> wrote:
Here's some of the strace output:
select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout)
If I read the 'select(2)' man page correctly, it appears this process is waiting indefinitely for a NULL file descriptor.
That looks pretty stuck to me.
Ah, thanks.
It looks like "read a page, sleep for 80 milliseconds, repeat".Should I kill this autovacuum process via pg_cancel_backend?
Or is there a better way to "unstick" it?
Any ideas what may have caused it to get into this condition?
I'd look at your settings for autovacuum_vacuum_cost_limit / autovacuum_vacuum_cost_delay, and maybe the contents of pg_autovacuum.
autovacuum_vacuum_cost_limit = -1
autovacuum_vacuum_cost_delay = 20ms
'select * from pg_autovacuum;' returns zero rows
I killed the one autovacuum process already that I thought was hung (via pg_cancel_backend).
I can see one other autovacuum process running. The other vacuum process was started manually.
Should I be concerned that pg_autovacuum is empty when I have an autovacuum process running?
Mason
Cheers,
Steve
--Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
"Mason Hale" <masonhale@gmail.com> writes: >> It looks like "read a page, sleep for 80 milliseconds, repeat". That's what it looks like to me too. >> I'd look at your settings for autovacuum_vacuum_cost_limit / >> autovacuum_vacuum_cost_delay, and maybe the contents of pg_autovacuum. > autovacuum_vacuum_cost_limit = -1 > autovacuum_vacuum_cost_delay = 20ms That process is *clearly* not using those vacuum cost parameters --- it's evidently using a delay of 80ms and some completely over-aggressive cost settings that're making it sleep for each single page read. So you need to find out where those whacked-out values are coming from. pg_autovacuum might be a likely source. Or maybe you just forgot a SIGHUP after a recent change to postgresql.conf? regards, tom lane