Re: On-demand running query plans using auto_explain and signals

Поиск

Список

Период

Сортировка

От	Pavel Stehule
Тема	Re: On-demand running query plans using auto_explain and signals
Дата	14 сентября 2015 г. 17:27:59
Msg-id	CAFj8pRA_yTPDtZcaf=kLE-PD-AZg19s8_OKFd5AwLt+EcgA68g@mail.gmail.com обсуждение исходный текст
Ответ на	Re: On-demand running query plans using auto_explain and signals ("Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de>)
Ответы	Re: On-demand running query plans using auto_explain and signals
Список	pgsql-hackers

Дерево обсуждения

2015-09-14 18:46 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

On Mon, Sep 14, 2015 at 3:09 PM, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote:
On Mon, Sep 14, 2015 at 2:11 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

Now the backend that has been signaled on the second call to
pg_cmdstatus (it can be either some other backend, or the backend B
again) will not find an unprocessed slot, thus it will not try to
attach/detach the queue and the backend A will block forever.

This requires a really bad timing and the user should be able to
interrupt the querying backend A still.

I think we can't rely on the low probability that this won't happen, and we should not rely on people interrupting the backend. Being able to detect the situation and fail gracefully should be possible.

It may be possible to introduce some lock-less protocol preventing such situations, but it's not there at the moment. If you believe it's possible, you need to explain and "prove" that it's actually safe.

Otherwise we may need to introduce some basic locking - for example we may introduce a LWLock for each slot, and lock it with dontWait=true (and skip it if we couldn't lock it). This should prevent most scenarios where one corrupted slot blocks many processes.

OK, I will revisit this part then.

I have a radical proposal to remove the need for locking: make the CmdStatusSlot struct consist of a mere dsm_handle and move all the required metadata like sender_pid, request_type, etc. into the shared memory segment itself.

If we allow the only the requesting process to update the slot (that is the handle value itself) this removes the need for locking between sender and receiver.

The sender will walk through the slots looking for a non-zero dsm handle (according to dsm_create() implementation 0 is considered an invalid handle), and if it finds a valid one, it will attach and look inside, to check if it's destined for this process ID. At first that might sound strange, but I would expect 99% of the time that the only valid slot would be for the process that has been just signaled.

The sender process will then calculate the response message, update the result_code in the shared memory segment and finally send the message through the queue. If the receiver has since detached we get a detached result code and bail out.

Clearing the slot after receiving the message should be the requesting process' responsibility. This way the receiver only writes to the slot and the sender only reads from it.

By the way, is it safe to assume atomic read/writes of dsm_handle (uint32)? I would be surprised if not.

I don't see any reason why it should not to work - only few processes will wait for data - so lost attach/detach shm operations will not be too much.

Pavel

--
Alex

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: On-demand running query plans using auto_explain and signals