Re: Autovacuum not keeping up. (PG 9.2.9)
От | jesper@krogh.cc |
---|---|
Тема | Re: Autovacuum not keeping up. (PG 9.2.9) |
Дата | |
Msg-id | ea5b920abfccddd0707920001041d0af.squirrel@shrek.krogh.cc обсуждение исходный текст |
Ответ на | Re: Autovacuum not keeping up. (PG 9.2.9) (jesper@krogh.cc) |
Список | pgsql-admin |
>> jesper@krogh.cc wrote: >>> Hi. >>> >>> I have a large database with a message queue table, that has high >>> activity. The database supports 1-300 client connection concurrently, >>> having transactions open in up to 30 minutes each. >>> >>> Recently I am seeing autuvacuum being issued, but it takes >>> ages to get through the message queue table, with strace showing >>> waiting >>> for semop's for 10's to 100's of seconds. >> >> Do you have data on how relfrozenxid advances for that table? > > Not really, how would you normally pick that out? > 2014-08-01 10:49:39.171 db=# select relname, age(relfrozenxid), > relfrozenxid FROM pg_class WHERE relkind = 'r' and relname = 'job'; > relname | age | relfrozenxid > ---------+-----------+-------------- > job | 111893622 | 796259097 > (1 row) > > Time: 1.913 ms > > This shouldn't qualify for a freeze vacuum, should it? > >> Vacuuming needs to grab a "cleanup lock" on each page it's going to >> vacuum, which is a special kind of lock that requires that no other >> process is even looking at the page at that moment (we call this "to >> have the page pinned"), which is even weaker than having a shared lock >> on the page. If traffic to some pages is high, it might be difficult >> for vacuum to acquire this. > > Based on strace output this looks excactly like whats going on, there are > some activity, then it pauses for some time then continues. Very "scientifically" i have captured this strace output of the autovacuum pid. This is a 10 minute trace (600s), total waiting time for semop(2850903 in those minutes are not less than 485s, (last semop not counted in). jk@srv01:~$ sudo timelimit -t 600 -T 10 strace -T -p 36839 2>&1 | grep -v 0.000 Process 36839 attached - interrupt to quit semop(2850903, {{11, -1, 0}}, 1) = 0 <14.403021> select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) <0.001100> write(367, "\\D\1\0\30p\325\37\1\0\0\0t\6\200\6\360\37\4 \0\0\0\0\340\237 \0\200\206 \0"..., 8192) = 8192 <0.002371> semop(2850903, {{11, -1, 0}}, 1) = 0 <38.871837> semop(2850903, {{11, -1, 0}}, 1) = 0 <17.111822> semop(2850903, {{11, -1, 0}}, 1) = 0 <22.438373> select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) <0.001095> semop(2850903, {{11, -1, 0}}, 1) = 0 <32.521132> semop(2261061, {{8, 1, 0}}, 1) = 0 <0.001601> semop(2850903, {{11, -1, 0}}, 1) = 0 <10.740716> semop(1179684, {{9, 1, 0}}, 1) = 0 <0.001900> select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) <0.001106> semop(2850903, {{11, -1, 0}}, 1) = 0 <26.610242> semop(2850903, {{11, -1, 0}}, 1) = 0 <21.911487> select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) <0.001060> semop(2850903, {{11, -1, 0}}, 1) = 0 <0.001694> semop(2850903, {{11, -1, 0}}, 1) = 0 <128.996370> select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) <0.001390> semop(2850903, {{11, -1, 0}}, 1) = 0 <20.553280> read(367, "\\D\1\0H\2107\357\1\0\0\0|\3`\22\360\37\4 \0\0\0\0\340\237 \0\320\237 \0"..., 8192) = 8192 <0.001670> select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) <0.001357> semop(2850903, {{11, -1, 0}}, 1) = 0 <0.001046> semop(2850903, {{11, -1, 0}}, 1) = 0 <0.001352> semop(2850903, {{11, -1, 0}}, 1) = 0 <44.311165> semop(2850903, {{11, -1, 0}}, 1) = 0 <79.246222> read(367, "]D\1\0\10\"\3-\1\0\0\0|\3`\22\360\37\4 \0\0\0\0\340\237 \0\320\237 \0"..., 8192) = 8192 <0.001124> select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) <0.001093> select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) <0.001098> semop(2850903, {{11, -1, 0}}, 1) = 0 <8.440753> semop(2850903, {{11, -1, 0}}, 1) = 0 <0.001390> semop(2850903, {{11, -1, 0}}, 1) = 0 <27.041010> semop(2850903, {{11, -1, 0}}, 1) = 0 <0.001679> semop(2850903, {{11, -1, 0}}, 1) = 0 <0.004053> semop(2850903, {{11, -1, 0}}, 1) = 0 <9.158556> semop(2850903, {{11, -1, 0}}, 1) = 0 <0.001015> select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) <0.001113> write(367, "]D\1\0\360#\177x\1\0\0\0\344\4\300\f\360\37\4 \0\0\0\0\340\237 \0\300\214 \0"..., 8192) = 8192 <0.001474> semop(2850903, {{11, -1, 0}}, 1) = 0 <0.002450> semop(655380, {{7, 1, 0}}, 1) = 0 <0.002245> read(367, "]D\1\0\270\311\247\221\1\0\0\0D\5@\v\360\37\4 \0\0\0\0\340\237 \0@\213 \0"..., 8192) = 8192 <0.001339> semop(2850903, {{11, -1, 0}}, 1) = 0 <0.002029> select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) <0.001078> select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) <0.001079> semop(2850903, {{11, -1, 0}}, 1timelimit: sending warning signal 15 <unfinished ...> Process 36839 detached jk@srv01:~$ Is there any clue in that the semaphorenumber it waits on is excactly the same all the time? (the <> to the right are timings in seconds. -- Jesper
В списке pgsql-admin по дате отправления: