Re: [HACKERS] Race conditions with WAL sender PID lookups

Поиск
Список
Период
Сортировка
От Alvaro Herrera
Тема Re: [HACKERS] Race conditions with WAL sender PID lookups
Дата
Msg-id 20170628225214.iefjdj2y3ulga2ek@alvherre.pgsql
обсуждение исходный текст
Ответ на Re: [HACKERS] Race conditions with WAL sender PID lookups  (Michael Paquier <michael.paquier@gmail.com>)
Ответы Re: [HACKERS] Race conditions with WAL sender PID lookups  (Michael Paquier <michael.paquier@gmail.com>)
Re: [HACKERS] Race conditions with WAL sender PID lookups  (Noah Misch <noah@leadboat.com>)
Список pgsql-hackers
Michael Paquier wrote:

> On Thu, Jun 8, 2017 at 3:31 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > I think if you're going to fix it so that we take spinlocks on
> > MyWalSnd in a bunch of places that we didn't take them before, it
> > would make sense to fix all the places where we're accessing those
> > fields without a spinlock instead of only some of them, unless there
> > are good reasons why we only need it in some cases and not others, in
> > which case I think the patch should add comments to each place where
> > the lock was not taken explaining why it's OK.  It didn't take me and
> > my copy of vi very long to find instances that you didn't change.
> 
> I take that you are referring to the two lookups in
> WalSndWaitForWal(), one in exec_replication_command(), one in
> WalSndLoop() and one in WalSndDone(). First let's remember that all
> the fields protected by the spinlock are only updated in a WAL sender
> process, so reading them directly is safe in this context: we know
> that no other processes will write them. And all those functions are
> called only in the context of a WAL sender process. So the copies of
> MyWalSnd that you are referring to here don't need a spinlock. Is
> there something I am missing?

I assume you mean "reading them unlocked" when you write "reading them
directly".  If so, I agree with that rationale; I verified that it
applies to members state, sentPtr, write, sent, flush, writeLag,
sentLag, flushLag.  I wonder if there's a maintainability danger: as
soon as we add a write to those members in a process other than the
walsender itself, we have a bug.  I don't yet have an opinion on the
severity of that problem.

Other struct members such as needreload are written by other processes,
so they should be always accessed with mutex held.  Regarding pid, it
seems easiest if we use the rule that it must also be always accessed
with spinlock held.  I propose updating the comment on WalSnd to this:

/** Each walsender has a WalSnd struct in shared memory.** This struct is protected by 'mutex', with two exceptions;
oneis* sync_standby_priority as noted below.  The other exception is that some* members are only written by the
walsenderprocess itself, and thus that* process is free to read those members without holding spinlock.  pid and*
needreloadalways require the spinlock to be held for all accesses.*/
 


> Actually, perhaps it would make sense to add some Assert(am_walsender)
> in this code?

I don't think that's needed, since MyWalSnd will only be valid in a
walsender.

I think I'm done with the walsender half of this patch; I still need to
review the walreceiver part.  I will report back tomorrow 19:00 CLT.

Currently, I'm considering not to backpatch any of this.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: [HACKERS] Fast promotion not used when doing a recovery_targetPITR restore?
Следующее
От: Andres Freund
Дата:
Сообщение: Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying toaccess the memory created using dsm_create().