Re: Hung Vacuum in 8.3

Поиск
Список
Период
Сортировка
От Mark Kirkwood
Тема Re: Hung Vacuum in 8.3
Дата
Msg-id 4D759F22.7020809@catalyst.net.nz
обсуждение исходный текст
Ответ на Re: Hung Vacuum in 8.3  (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>)
Список pgsql-bugs
On 08/03/11 13:03, Mark Kirkwood wrote:
> On 08/03/11 12:55, Mark Kirkwood wrote:
>> On 23/02/11 10:18, Mark Kirkwood wrote:
>>> On 23/02/11 00:26, Greg Stark wrote:
>>>>
>>>> It's also possible there's a bug of course. If someone was using that
>>>> buffer and somehow failed to notify the vacuum that they were done it
>>>> would wait for a very long time (forever?). However if vacuum
>>>> eventually continued when the query was canceled then it seems likely
>>>> it was working as intended.
>>>>
>>>
>>> Greg, thanks for clarifying this.
>>>
>>> Unfortunately this time around I canceled the vacuum and then the
>>> query. However *next* time I'll get rid of the query 1st and see
>>> what happens.
>>>
>>
>> I have another one here:
>>
>> postgres=# select datname,procpid, query_start, current_query
>>                   from pg_stat_activity
>>                   where now() - query_start > '2 hours'::interval
>> order by query_start;
>>  datname | procpid |          query_start          |  current_query
>> ---------+---------+-------------------------------+-----------------
>>  stuff   |   23957 | 2011-03-08 06:02:02.659159+13 | VACUUM ANALYZE;
>>
>> postgres=# select datname,procpid, query_start, current_query
>>                   from pg_stat_activity
>>                   where now() - backend_start > '2 hours'::interval
>> order by query_start;
>>  datname | procpid |          query_start          |  current_query
>> ---------+---------+-------------------------------+-----------------
>>  stuff   |   23957 | 2011-03-08 06:02:02.659159+13 | VACUUM ANALYZE;
>>                                                    :
>>  stuff   |   30018 | 2011-03-08 11:57:03.214177+13 | <IDLE>
>>  stuff   |   11390 | 2011-03-08 12:27:20.390122+13 | <IDLE>
>>  stuff   |   10178 | 2011-03-08 12:28:59.328265+13 | <IDLE>
>>  stuff   |    5100 | 2011-03-08 12:30:10.318204+13 | <IDLE>
>>  stuff   |   10255 | 2011-03-08 12:32:21.520814+13 | <IDLE>
>>  stuff   |   29993 | 2011-03-08 12:32:26.562835+13 | <IDLE>
>>  stuff   |   10204 | 2011-03-08 12:40:57.059958+13 | <IDLE>
>>  stuff   |   10206 | 2011-03-08 12:41:11.708924+13 | <IDLE>
>> (9 rows)
>>
>>
>> It looks to me like whatever query buffer pins the vacuum *was*
>> waiting on should be well and truly over by now. The strace for the
>> vacuum looks the same as before:
>>
>> $ strace -p 23957
>> Process 23957 attached - interrupt to quit
>> semop(39747613, 0x7fff53d3dec0, 1
>>
>>
>> It certainly looks like the vacuum is not able to resume after
>> setting itself to sleep for some reason.
>>
>>
>>
>
> Immediately after I sent this, I saw the vacuum unlock itself and
> continue... so there *are* some queries still blocking it. I'm seeing
> some highish (1 -2 minute) lock waits for a certain table:
>
> postgres=# select datname,procpid, query_start, current_query
>            from pg_stat_activity where now() - query_start > '1 minute'
>            and procpid in (select distinct pid from pg_locks where not
> granted) order by query_start;
>  datname | procpid |          query_start          |        current_query
> ---------+---------+-------------------------------+------------------------------
>
>  stuff   |     390 | 2011-03-08 12:57:01.752234+13 | UPDATE
> content.twitter_queue
>                                                    : SET error = $1
>                                                    : WHERE
> twitter_queue_id = $2
> (30 or so)
>
> which are probably not helping.
>
>

Make that definitely not helping:

stuff=# select relname, last_vacuum
         from pg_stat_all_tables
         where last_vacuum is not null
         and last_vacuum>  '2011-03-08 06:03:06'::timestamp
         order by last_vacuum limit 3;
     relname     |          last_vacuum
----------------+-------------------------------
  headline_alert | 2011-03-08 06:03:06.182866+13
  stock_static   | 2011-03-08 06:03:06.610739+13
  twitter_queue  | 2011-03-08 12:58:03.02451+13
(3 rows)



So we were trying to vacuum 'twitter_queue' (a small 20MB table) for
approx 6 hrs.

Note that the vacuum itself was not waiting in any locks according to
pg_locks, so looks to be the pinning issue. Clearly we need to amend the
application design that results in this quite long (1-2 minute)! wait
for the lock on this table.

regards

Mark

P.s: for the interested, the process(es) that the above UPDATEs are
waiting for are sitting '<IDLE> in transaction' - i.e application end is
doing some *non database* blocking call whilst holding a lock on the
'twitter_queue' table :-/

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Greg Stark
Дата:
Сообщение: Re: Hung Vacuum in 8.3
Следующее
От: "YAMAMOTO Takashi"
Дата:
Сообщение: BUG #5918: SummarizeOldestCommittedSxact assertion failure