Re: autovacuum stuck on a table for 18+ hours, consuming lots of CPU time

Поиск
Список
Период
Сортировка
От Lonni J Friedman
Тема Re: autovacuum stuck on a table for 18+ hours, consuming lots of CPU time
Дата
Msg-id CAP=oouF7P_zThUrcQPxVNYrhLByZ0T-TVdQa5iUKXN3BGyegoQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: autovacuum stuck on a table for 18+ hours, consuming lots of CPU time  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: autovacuum stuck on a table for 18+ hours, consuming lots of CPU time  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
On Tue, Nov 22, 2011 at 7:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Lonni J Friedman <netllama@gmail.com> writes:
>> Thanks for your prompt reply.  I was pretty sure that I was using the
>> default, but just to confirm, I just ran:
>> 'SHOW vacuum_cost_delay;'
>
> What about autovacuum_vacuum_cost_delay?  The selects seem to be
> delaying for 32msec, which is not the default for anything.

20ms is what it returns.  I grepped for '32' in postgresql.conf  and
other than the '5432' port, nothing was returned.

I'm using the official postgres RPMs that shipped with
Fedora15-x86_64, so unless something weird got compiled in, I have no
idea where that 32ms is coming from.

>
>> Is there no way to safely kill off this autovacuum process that
>> doesn't involve shutting down & restarting the database?
>
> Sending it a SIGINT ought to be safe enough, though I don't think that
> is necessarily advisable, because the next autovacuum will probably take
> just as long.  Killing this one will mainly mean you've wasted (much of)
> the work it did so far.

ok, then I guess I'll wait longer.

>
> Before getting hasty I'd suggest identifying what table (or index) it's
> working on --- lsof on the process to see what FD 95 is connected to
> would be the first step.  I'm thinking it's an index since the seek
> addresses don't seem to be consecutive.  And it might be worth watching
> the seek addresses for awhile to see if you can prove that it's looping
> --- if it is, that might be an indication of a corrupt index.  If it
> isn't, but is just steadily working through the index, you'd be best
> advised to have patience.

I suspect you're right.  I just ran strace against that PID again, and
now all the lseek & read FD's are referrring to a different number
(115), so that means its moved onto something new since I looked a few
hours ago?

Anyway, I think this is what you were referring to:
/proc/30188/fd/115 ->   /var/lib/pgsql/data/base/64793/72633.10

How do I correlate that file to an actual database object?

thanks

В списке pgsql-general по дате отправления:

Предыдущее
От: Ondrej Ivanič
Дата:
Сообщение: Re: autovacuum stuck on a table for 18+ hours, consuming lots of CPU time
Следующее
От: Tom Lane
Дата:
Сообщение: Re: autovacuum stuck on a table for 18+ hours, consuming lots of CPU time