Re: BUG #11264: Auto vacuum wraparound job blocking everything

Поиск
Список
Период
Сортировка
От Alvaro Herrera
Тема Re: BUG #11264: Auto vacuum wraparound job blocking everything
Дата
Msg-id 20140827152636.GC7046@eldon.alvh.no-ip.org
обсуждение исходный текст
Ответ на BUG #11264: Auto vacuum wraparound job blocking everything  (dbhandary@switchfly.com)
Ответы Re: BUG #11264: Auto vacuum wraparound job blocking everything  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Re: BUG #11264: Auto vacuum wraparound job blocking everything  (Dinesh Bhandary <dbhandary@switchfly.com>)
Список pgsql-bugs
dbhandary@switchfly.com wrote:

> We recently upgraded to postgressql 9.3.5  from postgres 9.1 using
> pg_upgrade. We are running streaming replication. DB was functioning without
> any issues till today where we discovered an auto vacuum wraparound job on
> one of our tables which was blocking everything. There were no waiting jobs
> when we queried pg_stat_activity, but incoming request would just hang, so
> database was unusable.

Can you please paste

select oid::regclass, relfrozenxid, relminmxid from pg_class where relname = 'room_contract_service_code';
select datname, datfrozenxid, datminmxid from pg_database where datname = 'jetblue';

> (gdb) bt
> #0  0x00007fb9740eb5b3 in __select_nocancel () from /lib64/libc.so.6
> #1  0x000000000075ad7a in pg_usleep ()
> #2  0x00000000004ae2d4 in GetMultiXactIdMembers ()
> #3  0x000000000047f6cc in heap_prepare_freeze_tuple ()
> #4  0x000000000057d036 in lazy_vacuum_rel ()

This stall in GetMultiXactIdMembers should only occur when somebody is
writing the multixact immediately following.  It certainly should not
sleep for long.  I'm not sure what's happening here but this is probably
where the problem is.  While waiting, it holds the buffer content lock
in exclusive mode.

Can you install debug symbols and pageinspect?  I'm curious why this is
stalling.  If the tuple has a FOR SHARE or FOR UPDATE marking from
before the upgrade, it shouldn't be inquiring the multixact members at
all.  I'm curious about

Latest checkpoint's NextMultiXactId:  23431
                                                                                           
Latest checkpoint's NextMultiOffset:  5678
                                                                                           
Latest checkpoint's oldestMultiXid:   1
                                                                                           
Latest checkpoint's oldestMulti's DB: 16423
                                                                                           

It says the oldest multi is 1, so the database should not have any
values between 1 and 23431 that correspond to pg_upgrade'd multixact
values ... so what is going on here?  Unless the recent mucking with
pg_upgrade to handle multixact's got something wrong.


> 10.33.11.11 |                 |       40735 | 2014-08-25 10:38:02.944083+00
> | 2014-08-25 10:38:02.964562+00 | 2014-08-25 10:38:22.805859+00 | 2014-08-25
> 10:38:22.805861+00 | f       | active | COPY
> settings.room_contract_service_code (room_contract_service_code,
> room_contract_service_description
> , room_contract_service_code_id, supplierid, inactive) TO stdout;
>
>  (gdb) bt
> #0  0x00007fb9740f5187 in semop () from /lib64/libc.so.6
> #1  0x0000000000609397 in PGSemaphoreLock ()
> #2  0x000000000064e871 in LWLockAcquire ()
> #3  0x000000000047e1a9 in ?? ()
> #4  0x000000000047e66a in ?? ()
> #5  0x000000000047f0a6 in heap_getnext ()
> #6  0x00000000005343e6 in ?? ()
> #7  0x000000000053521a in DoCopy ()
> #8  0x000000000066112d in standard_ProcessUtility ()
> #9  0x00007fb96c4df261 in ?? () from /usr/pgsql-9.3/lib/pg_stat_statements.so

This one is stalling on the buffer content lock trying to read the
table to dump it.

> SELECTS:
>
> query            |
>                  |         SELECT room_contract_service_code,
> room_contract_service_description, room_contract_service_code_id
>                  |         FROM room_contract_service_code
>                  |         WHERE room_contract_service_code.supplierid = $1
>                  |                 AND NOT inactive
>                  |         ORDER by room_contract_service_description
> (gdb) bt
> #0  0x00007fb9740f5187 in semop () from /lib64/libc.so.6
> #1  0x0000000000609397 in PGSemaphoreLock ()
> #2  0x000000000064e871 in LWLockAcquire ()
> #3  0x000000000048d05d in index_fetch_heap ()
> #4  0x000000000048d22e in index_getnext ()

And this one is stalling in the same buffer lock most likely.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Matheus de Oliveira
Дата:
Сообщение: Re: BUG #11271: Out of memory while reading tuples.
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: BUG #11264: Auto vacuum wraparound job blocking everything