Re: Hung Vacuum in 8.3
От | Mark Kirkwood |
---|---|
Тема | Re: Hung Vacuum in 8.3 |
Дата | |
Msg-id | 4D759F22.7020809@catalyst.net.nz обсуждение исходный текст |
Ответ на | Re: Hung Vacuum in 8.3 (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>) |
Список | pgsql-bugs |
On 08/03/11 13:03, Mark Kirkwood wrote: > On 08/03/11 12:55, Mark Kirkwood wrote: >> On 23/02/11 10:18, Mark Kirkwood wrote: >>> On 23/02/11 00:26, Greg Stark wrote: >>>> >>>> It's also possible there's a bug of course. If someone was using that >>>> buffer and somehow failed to notify the vacuum that they were done it >>>> would wait for a very long time (forever?). However if vacuum >>>> eventually continued when the query was canceled then it seems likely >>>> it was working as intended. >>>> >>> >>> Greg, thanks for clarifying this. >>> >>> Unfortunately this time around I canceled the vacuum and then the >>> query. However *next* time I'll get rid of the query 1st and see >>> what happens. >>> >> >> I have another one here: >> >> postgres=# select datname,procpid, query_start, current_query >> from pg_stat_activity >> where now() - query_start > '2 hours'::interval >> order by query_start; >> datname | procpid | query_start | current_query >> ---------+---------+-------------------------------+----------------- >> stuff | 23957 | 2011-03-08 06:02:02.659159+13 | VACUUM ANALYZE; >> >> postgres=# select datname,procpid, query_start, current_query >> from pg_stat_activity >> where now() - backend_start > '2 hours'::interval >> order by query_start; >> datname | procpid | query_start | current_query >> ---------+---------+-------------------------------+----------------- >> stuff | 23957 | 2011-03-08 06:02:02.659159+13 | VACUUM ANALYZE; >> : >> stuff | 30018 | 2011-03-08 11:57:03.214177+13 | <IDLE> >> stuff | 11390 | 2011-03-08 12:27:20.390122+13 | <IDLE> >> stuff | 10178 | 2011-03-08 12:28:59.328265+13 | <IDLE> >> stuff | 5100 | 2011-03-08 12:30:10.318204+13 | <IDLE> >> stuff | 10255 | 2011-03-08 12:32:21.520814+13 | <IDLE> >> stuff | 29993 | 2011-03-08 12:32:26.562835+13 | <IDLE> >> stuff | 10204 | 2011-03-08 12:40:57.059958+13 | <IDLE> >> stuff | 10206 | 2011-03-08 12:41:11.708924+13 | <IDLE> >> (9 rows) >> >> >> It looks to me like whatever query buffer pins the vacuum *was* >> waiting on should be well and truly over by now. The strace for the >> vacuum looks the same as before: >> >> $ strace -p 23957 >> Process 23957 attached - interrupt to quit >> semop(39747613, 0x7fff53d3dec0, 1 >> >> >> It certainly looks like the vacuum is not able to resume after >> setting itself to sleep for some reason. >> >> >> > > Immediately after I sent this, I saw the vacuum unlock itself and > continue... so there *are* some queries still blocking it. I'm seeing > some highish (1 -2 minute) lock waits for a certain table: > > postgres=# select datname,procpid, query_start, current_query > from pg_stat_activity where now() - query_start > '1 minute' > and procpid in (select distinct pid from pg_locks where not > granted) order by query_start; > datname | procpid | query_start | current_query > ---------+---------+-------------------------------+------------------------------ > > stuff | 390 | 2011-03-08 12:57:01.752234+13 | UPDATE > content.twitter_queue > : SET error = $1 > : WHERE > twitter_queue_id = $2 > (30 or so) > > which are probably not helping. > > Make that definitely not helping: stuff=# select relname, last_vacuum from pg_stat_all_tables where last_vacuum is not null and last_vacuum> '2011-03-08 06:03:06'::timestamp order by last_vacuum limit 3; relname | last_vacuum ----------------+------------------------------- headline_alert | 2011-03-08 06:03:06.182866+13 stock_static | 2011-03-08 06:03:06.610739+13 twitter_queue | 2011-03-08 12:58:03.02451+13 (3 rows) So we were trying to vacuum 'twitter_queue' (a small 20MB table) for approx 6 hrs. Note that the vacuum itself was not waiting in any locks according to pg_locks, so looks to be the pinning issue. Clearly we need to amend the application design that results in this quite long (1-2 minute)! wait for the lock on this table. regards Mark P.s: for the interested, the process(es) that the above UPDATEs are waiting for are sitting '<IDLE> in transaction' - i.e application end is doing some *non database* blocking call whilst holding a lock on the 'twitter_queue' table :-/
В списке pgsql-bugs по дате отправления:
Следующее
От: "YAMAMOTO Takashi"Дата:
Сообщение: BUG #5918: SummarizeOldestCommittedSxact assertion failure