[HACKERS] possible self-deadlock window after bad ProcessStartupPacket

Поиск

Список

Период

Сортировка

От	Jimmy Yih
Тема	[HACKERS] possible self-deadlock window after bad ProcessStartupPacket
Дата	3 февраля 2017 г. 02:18:22
Msg-id	CAOMx_OAuRUHiAuCg2YgicZLzPVv5d9_H4KrL_OFsFP=VPekigA@mail.gmail.com обсуждение исходный текст
Ответы	Re: [HACKERS] possible self-deadlock window after bad ProcessStartupPacket (Robert Haas <robertmhaas@gmail.com>) Re: [HACKERS] possible self-deadlock window after badProcessStartupPacket (Andres Freund <andres@anarazel.de>)
Список	pgsql-hackers

Дерево обсуждения

Hello,

There may possibly be a very small window for a double exit() self-deadlock during a forked backend process's ProcessStartupPacket returns and status is not STATUS_OK. The process will go into proc_exit and then a very well timed SIGQUIT will call startup_die for another proc_exit. If timed correctly, the two exit() calls can deadlock since exit() is not re-entrant. It seems extremely hard to hit the deadlock but I believe the opportunity is there.

Using gdb, I was able to create the window and get this stacktrace:

#0  startup_die (postgres_signal_arg=0) at postmaster.c:5090
#1  <signal handler called>
#2  proc_exit_prepare (code=0) at ipc.c:158
#3  0x00000000007c4135 in proc_exit (code=0) at ipc.c:102
#4  0x000000000076b736 in BackendInitialize (port=0x2c13740) at postmaster.c:4207
#5  0x000000000076b190 in BackendStartup (port=0x2c13740) at postmaster.c:3979
#6  0x0000000000767ad3 in ServerLoop () at postmaster.c:1722
#7  0x00000000007671df in PostmasterMain (argc=3, argv=0x2bebad0) at postmaster.c:1330
#8  0x00000000006b5df6 in main (argc=3, argv=0x2bebad0) at main.c:228

I got the stacktrace by doing the following:

gdb attach to postmaster and run set follow-fork child and break postmaster.c:4206(right after ProcessStartupPacket) and continue
In another terminal, open a psql session which should trigger the gdb follow
In the gdb session, set status=1 and step into proc_exit()
In another terminal, kill -s QUIT <child pid> to send SIGQUIT to the child process. Or run pg_ctl stop -M immediate.
In the gdb session, step to process the signal into startup_die and run bt

This was discovered while hacking on Greenplum Database (currently based off of Postgres 8.3) where we recently started encountering the self-deadlock intermittently in our testing environment.

Here's the pull request discussion:
https://github.com/greenplum-db/gpdb/pull/1662

In that pull request, we fix the issue by checking for proc_exit_inprogress. Is there a reason why startup_die should not check for proc_exit_inprogress?

In the above pull request, Heikki also mentions that a similar scenario can happen during palloc() as well... which is similar to what we saw in Greenplum a couple years back for a deadlock in a malloc() call where we responded by changing exit() to _exit() in quickdie as a fix. That could possibly be applicable to latest Postgres as well.

Regards,

Jimmy

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Simon Riggs
Дата: 03 февраля 2017 г., 01:59:04
Сообщение: Re: [HACKERS] Enabling replication connections by default in pg_hba.conf

Следующее

От: Tom Lane
Дата: 03 февраля 2017 г., 03:44:22
Сообщение: Re: [HACKERS] TRAP: FailedAssertion("!(hassrf)", File: "nodeProjectSet.c", Line: 180)

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

[HACKERS] possible self-deadlock window after bad ProcessStartupPacket

Предыдущее

Следующее