Re: Occasional 9.6.10 PMChildFlags fatal error, possibly due to >2parallel gathers

Поиск
Список
Период
Сортировка
От Chris Snook
Тема Re: Occasional 9.6.10 PMChildFlags fatal error, possibly due to >2parallel gathers
Дата
Msg-id CAONUJSNGCLW1GSXh18raY6bvqBiDkqfLyKpy-QeSvZTx3SrHhA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Occasional 9.6.10 PMChildFlags fatal error, possibly due to >2parallel gathers  (Thomas Munro <thomas.munro@enterprisedb.com>)
Список pgsql-bugs
There was an idle psql session running in screen, invoked as sudo -u postgres psql. Salt is also routinely running a bunch of configuration assertion checks as the postgres user, but those are not login sessions either, and have been running sub-hourly for over a year without incident. Backups run from a replica, and this failure happened on the primary, and not proximal to a backup run. Because we're using stock Debian Stretch packages, that user is a system user (UID 110, GID 114), so that behavior wouldn't apply in this case.

If we can figure out how to reproduce it reliably outside of production, I'll turn all the logging options up to 11 so we can figure out if the shared memory error is immediately following the fatal error in the same process, or just a cleanup race as everything is shutting down. We haven't had a recurrence with max_parallel_workers_per_gather set to 2, but we also went for several hours after the two failures that were 63 minutes apart with it still set to 10, and it didn't reproduce in that time either, so that doesn't mean much.

- Chris

On Tue, Feb 12, 2019 at 9:55 PM Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Wed, Feb 13, 2019 at 3:41 PM Chris Snook <csnook@cloudflare.com> wrote:
> For more context, I got these tightly packed around the first crash, with the first and last messages repeated hundreds of times:
>
> FATAL:  sorry, too many clients already
> FATAL:  sorry, too many clients already
> FATAL:  sorry, too many clients already
> FATAL:  no free slots in PMChildFlags array
> WARNING:  could not remove shared memory segment "/PostgreSQL.1407760088": No such file or directory
> FATAL:  semop(id=2293786) failed: Invalid argument
> FATAL:  semop(id=2293786) failed: Invalid argument
> FATAL:  semctl(2064403, 7, SETVAL, 0) failed: Invalid argument
> FATAL:  semop(id=2621476) failed: Invalid argument
> FATAL:  semop(id=2621476) failed: Invalid argument
> FATAL:  semctl(2293786, 1, SETVAL, 0) failed: Invalid argument
> FATAL:  semctl(2621476, 10, SETVAL, 0) failed: Invalid argument
> WARNING:  could not remove shared memory segment "/PostgreSQL.1621779631": No such file or directory

Any chance you created a cronjob that runs as user "postgres" (or
whatever user the PostgreSQL cluster runs as), or logged in as that
user manually for some reason?  Systemd likes to blow away global IPC
resources associated with users when they log out.

https://www.postgresql.org/docs/11/kernel-resources.html#SYSTEMD-REMOVEIPC

--
Thomas Munro
http://www.enterprisedb.com

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: Occasional 9.6.10 PMChildFlags fatal error, possibly due to >2parallel gathers
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: BUG #15609: synchronous_commit=off insert performance regressionwith secondary indexes