Re: [HACKERS] Logical Replication WIP

Поиск
Список
Период
Сортировка
От Petr Jelinek
Тема Re: [HACKERS] Logical Replication WIP
Дата
Msg-id 2c1c26c4-8ad0-3c4c-2cf3-0e52c58c80b8@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: Logical Replication WIP  (Petr Jelinek <petr@2ndquadrant.com>)
Список pgsql-hackers
On 13/12/16 01:33, Andres Freund wrote:
> HJi,
> 
> On 2016-12-12 09:18:48 -0500, Peter Eisentraut wrote:
>> On 12/8/16 4:10 PM, Petr Jelinek wrote:
>>> On 08/12/16 20:16, Peter Eisentraut wrote:
>>>> On 12/6/16 11:58 AM, Peter Eisentraut wrote:
>>>>> On 12/5/16 6:24 PM, Petr Jelinek wrote:
>>>>>> I think that the removal of changes to ReplicationSlotAcquire() that you
>>>>>> did will result in making it impossible to reacquire temporary slot once
>>>>>> you switched to different one in the session as the if (active_pid != 0)
>>>>>> will always be true for temp slot.
>>>>>
>>>>> I see.  I suppose it's difficult to get a test case for this.
>>>>
>>>> I created a test case, saw the error of my ways, and added your code
>>>> back in.  Patch attached.
>>>>
>>>
>>> Hi,
>>>
>>> I am happy with this version, thanks for moving it forward.
>>
>> committed
> 
> Hm.
> 
>  /*
> + * Cleanup all temporary slots created in current session.
> + */
> +void
> +ReplicationSlotCleanup()
> 
> I'd rather see a (void) there. The prototype has it, but still.
> 
> 
> +
> +    /*
> +     * No need for locking as we are only interested in slots active in
> +     * current process and those are not touched by other processes.
> 
> I'm a bit suspicious of this claim.  Without a memory barrier you could
> actually look at outdated versions of active_pid. In practice there's
> enough full memory barriers in the slot creation code that it's
> guaranteed to not be the same pid from before a wraparound though.
> 
> I think that doing iterations of slots without
> ReplicationSlotControlLock makes things more fragile, because suddenly
> assumptions that previously held aren't true anymore.   E.g. factually
>     /*
>      * The slot is definitely gone.  Lock out concurrent scans of the array
>      * long enough to kill it.  It's OK to clear the active flag here without
>      * grabbing the mutex because nobody else can be scanning the array here,
>      * and nobody can be attached to this slot and thus access it without
>      * scanning the array.
>      */
> is now simply not true anymore.  It's probably not harmfully broken, but
> at least you've changed the locking protocol without adapting comments.
> 

Well it's protected by being called only by ReplicationSlotCleanup() and
ReplicationSlotDropAcquired(). The comment could be improved though, yes.

Holding the ReplicationSlotControlLock in the scan is somewhat
problematic because ReplicationSlotDropPtr tryes to use it as well (and
in exclusive mode), so we'd have to do exclusive lock in
ReplicationSlotCleanup() which I don't really like much.

> 
>  /*
> - * Permanently drop the currently acquired replication slot which will be
> - * released by the point this function returns.
> + * Permanently drop the currently acquired replication slot.
>   */
>  static void
>  ReplicationSlotDropAcquired(void)
> 
> Isn't that actually removing interesting information? Yes, the comment's
> been moved to ReplicationSlotDropPtr(), but that routine is an internal
> one...
> 

ReplicationSlotDropAcquired() is internal one as well.

> 
> @@ -810,6 +810,9 @@ ProcKill(int code, Datum arg)
>      if (MyReplicationSlot != NULL)
>          ReplicationSlotRelease();
> 
> +    /* Also cleanup all the temporary slots. */
> +    ReplicationSlotCleanup();
> +
> 
> So we now have exactly this code in several places. Why does a
> generically named Cleanup routine not also deal with a currently
> acquired slot? Right now it'd be more appropriately named
> ReplicationSlotDropTemporary() or such.
> 

It definitely could release MyReplicationSlot as well.

> 
> @@ -1427,13 +1427,14 @@ pg_replication_slots| SELECT l.slot_name,
>      l.slot_type,
>      l.datoid,
>      d.datname AS database,
> +    l.temporary,
>      l.active,
>      l.active_pid,
>      l.xmin,
>      l.catalog_xmin,
>      l.restart_lsn,
>      l.confirmed_flush_lsn
> -   FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, active, active_pid, xmin, catalog_xmin,
restart_lsn,confirmed_flush_lsn)
 
> +   FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin,
catalog_xmin,restart_lsn, confirmed_flush_lsn)
 
>       LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
>  pg_roles| SELECT pg_authid.rolname,
>      pg_authid.rolsuper,
> 
> If we start to expose this, shouldn't we expose the persistency instead
> (i.e. persistent/ephemeral/temporary)?
> 

Not sure how much is that useful given that ephemeral is transient state
only present during slot creation.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: [HACKERS] PATCH: two slab-like memory allocators
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: [HACKERS] exposing wait events for non-backends (was: Trackingwait event for latches)