Re: Make query cancellation keys longer

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Make query cancellation keys longer
Дата
Msg-id 27d936fd-3e1b-422e-8a9b-d0543a22b3dc@iki.fi
обсуждение исходный текст
Ответ на Re: Make query cancellation keys longer  (Jelte Fennema-Nio <postgres@jeltef.nl>)
Список pgsql-hackers
On 04/07/2024 15:20, Jelte Fennema-Nio wrote:
> On Thu, 4 Jul 2024 at 12:32, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> We currently don't do any locking on the ProcSignal array. For query
>> cancellations, that's good because a query cancel packet is processed
>> without having a PGPROC entry, so we cannot take LWLocks. We could use
>> spinlocks though. In this patch, I used memory barriers to ensure that
>> we load/store the pid and the cancellation key in a sensible order, so
>> that you cannot e.g. send a cancellation signal to a backend that's just
>> starting up and hasn't advertised its cancellation key in ProcSignal
>> yet. But I think this might be simpler and less error-prone by just
>> adding a spinlock to each ProcSignal slot. That would also fix the
>> existing race condition where we might set the pss_signalFlags flag for
>> a slot, when the process concurrently terminates and the slot is reused
>> for a different process. Because of that, we currently have this caveat:
>> "... all the signals are such that no harm is done if they're mistakenly
>> fired". With a spinlock, we could eliminate that race.
> 
> I think a spinlock would make this thing a whole concurrency stuff a
> lot easier to reason about.
> 
> +   slot->pss_cancel_key_valid = false;
> +   slot->pss_cancel_key = 0;
> 
> If no spinlock is added, I think these accesses should still be made
> atomic writes. Otherwise data-races on those fields are still
> possible, resulting in undefined behaviour. The memory barriers you
> added don't prevent that afaict. With atomic operations there are
> still race conditions, but no data-races.
> 
> Actually it seems like that same argument applies to the already
> existing reading/writing of pss_pid: it's written/read using
> non-atomic operations so data-races can occur and thus undefined
> behaviour too.

Ok, here's a version with spinlocks.

I went back and forth on what exactly is protected by the spinlock. I 
kept the "volatile sig_atomic_t" type for pss_signalFlags, so that it 
can still be safely read without holding the spinlock in 
CheckProcSignal, but all the functions that set the flags now hold the 
spinlock. That removes the race condition that you might set the flag 
for wrong slot, if the target backend exits and the slot is recycled. 
The race condition was harmless and there were comments to note it, but 
it doesn't occur anymore with the spinlock.

(Note: Thomas's "Interrupts vs signals" patch will remove ps_signalFlags 
altogether. I'm looking forward to that.)

> -    volatile pid_t pss_pid;
> +    pid_t        pss_pid;
> 
> Why remove the volatile modifier?

Because I introduced a memory barrier to ensure the reads/writes of 
pss_pid become visible to other processes in right order. That makes the 
'volatile' unnecessary IIUC. With the spinlock, the point is moot however.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Inconsistency between try_mergejoin_path and create_mergejoin_plan
Следующее
От: Andreas Karlsson
Дата:
Сообщение: Re: Convert sepgsql tests to TAP