Re: Autovacuum not keeping up. (PG 9.2.9)

Поиск
Список
Период
Сортировка
От jesper@krogh.cc
Тема Re: Autovacuum not keeping up. (PG 9.2.9)
Дата
Msg-id ea5b920abfccddd0707920001041d0af.squirrel@shrek.krogh.cc
обсуждение исходный текст
Ответ на Re: Autovacuum not keeping up. (PG 9.2.9)  (jesper@krogh.cc)
Список pgsql-admin
>> jesper@krogh.cc wrote:
>>> Hi.
>>>
>>> I have a large  database with a message queue table, that has high
>>> activity. The database supports 1-300 client connection concurrently,
>>>  having transactions open in up to 30 minutes each.
>>>
>>> Recently I am seeing autuvacuum being issued, but it takes
>>> ages to get through the message queue table, with strace showing
>>> waiting
>>> for semop's for 10's to 100's of seconds.
>>
>> Do you have data on how relfrozenxid advances for that table?
>
> Not really, how would you normally pick that out?
> 2014-08-01 10:49:39.171 db=# select relname, age(relfrozenxid),
> relfrozenxid  FROM pg_class WHERE relkind = 'r' and relname = 'job';
>  relname |    age    | relfrozenxid
> ---------+-----------+--------------
>  job     | 111893622 |    796259097
> (1 row)
>
> Time: 1.913 ms
>
> This shouldn't qualify for a freeze vacuum, should it?
>
>> Vacuuming needs to grab a "cleanup lock" on each page it's going to
>> vacuum, which is a special kind of lock that requires that no other
>> process is even looking at the page at that moment (we call this "to
>> have the page pinned"), which is even weaker than having a shared lock
>> on the page.  If traffic to some pages is high, it might be difficult
>> for vacuum to acquire this.
>
> Based on strace output this looks excactly like whats going on, there are
> some activity, then it pauses for some time then continues.

Very "scientifically" i have captured this strace output of the autovacuum
pid. This is a 10 minute trace (600s), total waiting time for
semop(2850903 in those minutes are not less than 485s, (last semop not
counted in).

jk@srv01:~$ sudo timelimit -t 600 -T 10 strace -T -p 36839 2>&1  | grep -v
0.000
Process 36839 attached - interrupt to quit
semop(2850903, {{11, -1, 0}}, 1)        = 0 <14.403021>
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout) <0.001100>
write(367, "\\D\1\0\30p\325\37\1\0\0\0t\6\200\6\360\37\4 \0\0\0\0\340\237
\0\200\206 \0"..., 8192) = 8192 <0.002371>
semop(2850903, {{11, -1, 0}}, 1)        = 0 <38.871837>
semop(2850903, {{11, -1, 0}}, 1)        = 0 <17.111822>
semop(2850903, {{11, -1, 0}}, 1)        = 0 <22.438373>
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout) <0.001095>
semop(2850903, {{11, -1, 0}}, 1)        = 0 <32.521132>
semop(2261061, {{8, 1, 0}}, 1)          = 0 <0.001601>
semop(2850903, {{11, -1, 0}}, 1)        = 0 <10.740716>
semop(1179684, {{9, 1, 0}}, 1)          = 0 <0.001900>
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout) <0.001106>
semop(2850903, {{11, -1, 0}}, 1)        = 0 <26.610242>
semop(2850903, {{11, -1, 0}}, 1)        = 0 <21.911487>
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout) <0.001060>
semop(2850903, {{11, -1, 0}}, 1)        = 0 <0.001694>
semop(2850903, {{11, -1, 0}}, 1)        = 0 <128.996370>
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout) <0.001390>
semop(2850903, {{11, -1, 0}}, 1)        = 0 <20.553280>
read(367, "\\D\1\0H\2107\357\1\0\0\0|\3`\22\360\37\4 \0\0\0\0\340\237
\0\320\237 \0"..., 8192) = 8192 <0.001670>
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout) <0.001357>
semop(2850903, {{11, -1, 0}}, 1)        = 0 <0.001046>
semop(2850903, {{11, -1, 0}}, 1)        = 0 <0.001352>
semop(2850903, {{11, -1, 0}}, 1)        = 0 <44.311165>
semop(2850903, {{11, -1, 0}}, 1)        = 0 <79.246222>
read(367, "]D\1\0\10\"\3-\1\0\0\0|\3`\22\360\37\4 \0\0\0\0\340\237
\0\320\237 \0"..., 8192) = 8192 <0.001124>
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout) <0.001093>
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout) <0.001098>
semop(2850903, {{11, -1, 0}}, 1)        = 0 <8.440753>
semop(2850903, {{11, -1, 0}}, 1)        = 0 <0.001390>
semop(2850903, {{11, -1, 0}}, 1)        = 0 <27.041010>
semop(2850903, {{11, -1, 0}}, 1)        = 0 <0.001679>
semop(2850903, {{11, -1, 0}}, 1)        = 0 <0.004053>
semop(2850903, {{11, -1, 0}}, 1)        = 0 <9.158556>
semop(2850903, {{11, -1, 0}}, 1)        = 0 <0.001015>
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout) <0.001113>
write(367, "]D\1\0\360#\177x\1\0\0\0\344\4\300\f\360\37\4 \0\0\0\0\340\237
\0\300\214 \0"..., 8192) = 8192 <0.001474>
semop(2850903, {{11, -1, 0}}, 1)        = 0 <0.002450>
semop(655380, {{7, 1, 0}}, 1)           = 0 <0.002245>
read(367, "]D\1\0\270\311\247\221\1\0\0\0D\5@\v\360\37\4 \0\0\0\0\340\237
\0@\213 \0"..., 8192) = 8192 <0.001339>
semop(2850903, {{11, -1, 0}}, 1)        = 0 <0.002029>
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout) <0.001078>
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout) <0.001079>
semop(2850903, {{11, -1, 0}}, 1timelimit: sending warning signal 15
 <unfinished ...>
Process 36839 detached
jk@srv01:~$


Is there any clue in that the semaphorenumber it waits on is excactly the
same all the time? (the <> to the right are timings in seconds.

--
Jesper



В списке pgsql-admin по дате отправления:

Предыдущее
От: jesper@krogh.cc
Дата:
Сообщение: Re: Autovacuum not keeping up. (PG 9.2.9)
Следующее
От: Rural Hunter
Дата:
Сообщение: [Help needed] Update query hangs forever on high load