Re: dsa_allocate() faliure

Поиск
Список
Период
Сортировка
От Jakub Glapa
Тема Re: dsa_allocate() faliure
Дата
Msg-id CAJk1zg2kgnAbWAuM3oG20EN_Fvin2Z5OtWJHTR711S2jbNwQQA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: dsa_allocate() faliure  (Thomas Munro <thomas.munro@enterprisedb.com>)
Ответы Re: dsa_allocate() faliure  (Justin Pryzby <pryzby@telsasoft.com>)
Список pgsql-hackers
Hi, just a small update. 
I've configured the OS for taking crash dumps on Ubuntu 16.04 with the following (maybe somebody will find it helpful):
I've added LimitCORE=infinity to /lib/systemd/system/postgresql@.service under [Service] section
I've reloaded the service config with sudo systemctl daemon-reload
Changed the core pattern to: sudo echo /var/lib/postgresql/core.%p.sig%s.%ts | tee -a /proc/sys/kernel/core_pattern
I had tested it with kill -ABRT pidofbackend and it behaved correctly. A crash dump was written.

In the last days I've been monitoring no segfault occurred but the das_allocation did.
I'm starting to doubt if the segfault I've found in dmesg was actually related.

I've grepped the postgres log for dsa_allocated:
Why do the messages occur sometimes as FATAL and sometimes as ERROR?

2018-11-29 07:59:06 CET::@:[20584]: FATAL:  dsa_allocate could not find 7 free pages
2018-11-29 07:59:06 CET:127.0.0.1(40846):user@db:[19507]: ERROR:  dsa_allocate could not find 7 free pages
2018-11-30 09:04:13 CET::@:[27341]: FATAL:  dsa_allocate could not find 13 free pages
2018-11-30 09:04:13 CET:127.0.0.1(41782):user@db:[25417]: ERROR:  dsa_allocate could not find 13 free pages
2018-11-30 09:28:38 CET::@:[30215]: FATAL:  dsa_allocate could not find 4 free pages
2018-11-30 09:28:38 CET:127.0.0.1(45980):user@db:[29924]: ERROR:  dsa_allocate could not find 4 free pages
2018-11-30 16:37:16 CET::@:[14385]: FATAL:  dsa_allocate could not find 7 free pages
2018-11-30 16:37:16 CET::@:[14375]: FATAL:  dsa_allocate could not find 7 free pages
2018-11-30 16:37:16 CET:212.186.105.45(55004):user@db:[14386]: FATAL:  dsa_allocate could not find 7 free pages
2018-11-30 16:37:16 CET:212.186.105.45(54964):user@db:[14379]: ERROR:  dsa_allocate could not find 7 free pages
2018-11-30 16:37:16 CET:212.186.105.45(54916):user@db:[14370]: ERROR:  dsa_allocate could not find 7 free pages
2018-11-30 16:45:11 CET:212.186.105.45(55356):user@db:[14555]: FATAL:  dsa_allocate could not find 7 free pages
2018-11-30 16:49:13 CET::@:[15359]: FATAL:  dsa_allocate could not find 7 free pages
2018-11-30 16:49:13 CET::@:[15363]: FATAL:  dsa_allocate could not find 7 free pages
2018-11-30 16:49:13 CET:212.186.105.45(54964):user@db:[14379]: FATAL:  dsa_allocate could not find 7 free pages
2018-11-30 16:49:13 CET:212.186.105.45(54916):user@db:[14370]: ERROR:  dsa_allocate could not find 7 free pages
2018-11-30 16:49:13 CET:212.186.105.45(55842):user@db:[14815]: ERROR:  dsa_allocate could not find 7 free pages
2018-11-30 16:56:11 CET:212.186.105.45(57076):user@db:[15638]: FATAL:  dsa_allocate could not find 7 free pages


There's quite a bit errors from today but I was launching the problematic query in parallel from 2-3 sessions. 
Sometimes it was breaking sometimes not. 
Couldn't find any pattern. 
The workload on this db is not really constant, rather bursting.

--
regards,
Jakub Glapa


On Tue, Nov 27, 2018 at 9:03 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Tue, Nov 27, 2018 at 4:00 PM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Hmm.  I will see if I can come up with a many-partition torture test
> reproducer for this.

No luck.  I suppose one theory that could link both failure modes
would a buffer overrun, where in the non-shared case it trashes a
pointer that is later dereferenced, and in the shared case it writes
past the end of allocated 4KB pages and corrupts the intrusive btree
that lives in spare pages to track available space.

--
Thomas Munro
http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: pgsql: Switch pg_verify_checksums back to a blacklist
Следующее
От: Dmitry Dolgov
Дата:
Сообщение: Re: Add function to release an allocated SQLDA