Обсуждение: Re: pgsql: Fix and document lock handling for in-memory replicationslot da

Поиск
Список
Период
Сортировка

Re: pgsql: Fix and document lock handling for in-memory replicationslot da

От
Andres Freund
Дата:
Hi,

On 2018-06-10 10:45:04 +0000, Michael Paquier wrote:
> Fix and document lock handling for in-memory replication slot data
>
> While debugging issues on HEAD for the new slot forwarding feature of
> Postgres 11, some monitoring of the code surrounding in-memory slot data
> has proved that the lock handling may cause inconsistent data to be read
> by read-only callers of slot functions, particularly
> pg_get_replication_slots() which fetches data for the system view
> pg_replication_slots, or modules looking directly at slot information.
>
> The code paths involved in those problems concern logical decoding
> initialization (down to 9.4) and WAL reservation for slots (new as of
> 10).
>
> A set of comments documenting all the lock handlings, particularly the
> dependency with LW locks for slots and the in_use flag as well as the
> internal mutex lock is added, based on a suggested by Simon Riggs.
>
> Some of the fixed code exists down to 9.4 where WAL decoding has been
> introduced, but as those race conditions are really unlikely going to
> happen as those concern code paths for slot and decoding creation, just
> fix the problem on HEAD.

You can't do things like:

            /* start at current insert position */
+           SpinLockAcquire(&slot->mutex);
            slot->data.restart_lsn = GetXLogInsertRecPtr();
+           SpinLockRelease(&slot->mutex);

a) we don't call external functions with a spinlock held. As a
   rule. It's too hard to se what happens in that other function / too
   likely to change independently.

b) we most certainly don't do it if the other function also acquires a
   spinlock:
XLogRecPtr
GetXLogInsertRecPtr(void)
{
    XLogCtlInsert *Insert = &XLogCtl->Insert;
    uint64        current_bytepos;

    SpinLockAcquire(&Insert->insertpos_lck);
    current_bytepos = Insert->CurrBytePos;
    SpinLockRelease(&Insert->insertpos_lck);

    return XLogBytePosToRecPtr(current_bytepos);
}

   Nesting spinlock means that you'd need to be very careful about
   whether all lockers use the same order. And be ok with the system
   stalled until the PANIC if it failed.

Same is true for the codepaths calling GetRedoRecPtr().


I don't object to the general idea of adding locking - although the
benefit are nearly guaranteed to be cosmetic - but this has the
potential to make things worse.

- Andres


Re: pgsql: Fix and document lock handling for in-memory replicationslot da

От
Michael Paquier
Дата:
On Mon, Jun 11, 2018 at 09:49:52AM -0700, Andres Freund wrote:
> Same is true for the codepaths calling GetRedoRecPtr().

You are right.  I'll fix in a minute.  A first commit's stress make
things really harder to get right...

> I don't object to the general idea of adding locking - although the
> benefit are nearly guaranteed to be cosmetic - but this has the
> potential to make things worse.

Thanks.
--
Michael

Вложения

Re: pgsql: Fix and document lock handling for in-memory replicationslot da

От
Michael Paquier
Дата:
On Mon, Jun 11, 2018 at 09:49:52AM -0700, Andres Freund wrote:
> Same is true for the codepaths calling GetRedoRecPtr().

You are right.  I'll fix in a minute.  A first commit's stress make
things really harder to get right...

> I don't object to the general idea of adding locking - although the
> benefit are nearly guaranteed to be cosmetic - but this has the
> potential to make things worse.

Thanks.
--
Michael

Вложения