Re: Changeset Extraction v7.1

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Changeset Extraction v7.1
Дата
Msg-id 20140123233209.GJ7182@awork2.anarazel.de
обсуждение исходный текст
Ответ на Re: Changeset Extraction v7.1  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Changeset Extraction v7.1  (Andres Freund <andres@2ndquadrant.com>)
Re: Changeset Extraction v7.1  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi,

On 2014-01-23 16:04:10 -0500, Robert Haas wrote:
> Patch 0001:
> 
> +                                errmsg("could not find free replication slot"),
> 
> Suggest: all replication slots are in use

That sounds better indeed.

> +               elog(ERROR, "cannot aquire a slot while another slot
> has been acquired");
> 
> Suggest: this backend has already acquired a replication slot
> 
> Or demote it to Assert().  I'm not really sure why this needs to be
> checked in non-assert builds.

Hm. Fine with me, not sure why I went with an elog(). Maybe because I
thought output plugin authors could have the idea of using another slot
while inside one?

>  I also wonder if we should use the
> terminology "attach" instead of "acquire"; that pairs more naturally
> with "release".  Then the message, if we want more than an assert,
> might be "this backend is already attached to a replication slot".

I went with Acquire/Release because our locking code does so, and it
seemed sensible to be consistent. I don't have strong feelings about it.

> +       if (slot == NULL)
> +       {
> +               LWLockRelease(ReplicationSlotCtlLock);
> +               ereport(ERROR,
> +                               (errcode(ERRCODE_UNDEFINED_OBJECT),
> +                                errmsg("could not find replication
> slot \"%s\"", name)));
> +       }
> 
> The error will release the LWLock anyway; I'd get rid of the manual
> LWLockRelease, and the braces.  Similarly in ReplicationSlotDrop.

Unfortunately not. Inside the walsender there's currently no
LWLockReleaseAll() for ERRORs since commands aren't run inside a
transaction command...

But maybe I should have fixed this by adding the release to
WalSndErrorCleanup() instead? That'd still leave the problematic case
that currently we try to delete a replication slot inside a CATCH when
we fail while initializing the rest of logical replication... But I
guess adding it would be a good idea independent of that.

We could also do a StartTransactionCommand() but I'd rather not, that
currently prevents code in that vicinity from doing anything it
shouldn't via various Assert()s in existing code.

> +       /* acquire spinlock so we can test and set ->active safely */
> +       SpinLockAcquire(&slot->mutex);
> +
> +       if (slot->active)
> +       {
> +               SpinLockRelease(&slot->mutex);
> +               LWLockRelease(ReplicationSlotCtlLock);
> +               ereport(ERROR,
> +                               (errcode(ERRCODE_OBJECT_IN_USE),
> +                                errmsg("slot \"%s\" already active", name)));
> +       }
> +
> +       /* we definitely have the slot, no errors possible anymore */
> +       slot->active = true;
> +       MyReplicationSlot = slot;
> +       SpinLockRelease(&slot->mutex);
> 
> This doesn't need the LWLockRelease either.  It does need the
> SpinLockRelease, but as I think I noted previously, the right way to
> write this is probably: SpinLockAcquire(&slot->mutex); was_active =
> slot->active; slot->active = true; SpinLockRelease(&slot->mutex); if
> (was_active) ereport(). MyReplicatonSlot = slot.

That's not really simpler tho? But if you prefer I can go that way.

> ReplicationSlotsComputeRequiredXmin still acquires ProcArrayLock, and
> the comment "Provide interlock against concurrent recomputations"
> doesn't seem adequate to me.  I guess the idea here is that we regard
> ProcArrayLock as protecting ReplicationSlotCtl->catalog_xmin and
> ReplicationSlotCtl->data_xmin, but if that's the idea then we only
> need to hold the lock during the time when we actually update those
> values, not the loop where we compute them.

There's a comment someplace else to that end, but yes, that's
essentially the idea. I decided to take it during the whole
recomputation because we also take ProcArrayLock when creating a new
decoding slot and initially setting ->catalog_xmin. That's not strictly required
but seemed simpler that way, and the code shouldn't be very hot.
The code that initially computes the starting value for catalog_xmin
when creating a new decoding slot has to take ProcArrayLock to be safe,
that's why I though it'd be convenient to always use it for those
values.

In all other cases where we modify *_xmin we're only increasing it which
doesn't need a lock (HS feedback never has taken one, and
GetSnapshotData() modifies ->xmin while holding a shared lock), the only
potential danger is a slight delay in increasing the overall value.

> Also, if that's the
> design, maybe they should be part of PROC_HDR *ProcGlobal rather than
> here.  It seems weird to have some of the values protected by
> ProcArrayLock live in a completely different data structure managed
> almost entirely by some other part of the system.

Don't we already have cases of that? I seem to remember so. If you
prefer having them there, I am certainly fine with doing that. This way
they aren't allocated if slots are disabled but it's just two
TransactionIds.

> It's pretty evident that what's currently patch #4 (only peg the xmin
> horizon for catalog tables during logical decoding) needs to become
> patch #1, because it doesn't make any sense to apply this before we do
> that.

Well, the slot code and the the slot support for streaming rep are
independent from and don't use it. So they easily can come before it.

I previously had argued for committing that patch together with the main
changeset extraction commit but you, understandably so!, wanted to have
it separately for review.

> [ discussion about crash safety of slots and their use of PANIC ]
> Broadly, what you're trying to accomplish here is to have something
> that is crash-safe but without relying on WAL, so that it can work on
> standbys.  If making things crash-safe without WAL were easy to do, we
> probably wouldn't have WAL at all, so it stands to reason that there
> are going to be some difficulties here.

My big problem here is that you're asking this code to have *higher*
guarantees than WAL ever had and currently has, not equivalent
guarantees. Even though the likelihood of hitting problems is a least a
magnitude or two smaller as we are dealing with minimal amounts of data.
All the situations that seem halfway workable in the nearby thread about
PANIC in XLogInsert() you reference are rough ideas that reduce the
likelihood of PANICs, not remove them.

I am fine with reworking things so that the first operation of several
doesn't PANIC because we still can clearly ERROR out in that case. That
should press the likelihood of problems into the utterly irrelevant
area. E.g. ERROR for the rename(oldpath, newpath) and then start a
critical section for the fsync et al.

> Calling a slot "old" or "new" looks liable to cause problems.  Maybe
> change those names to contain a character not allowed in a slot name,
> if we're going to keep doing it that way.

Hm. Fair point. slotname.old, slotname.new sounds better.

> I wonder if it wouldn't be better to get rid of the subdirectories for
> the individual slots, and just have a file pg_replslot/$SLOTNAME, or
> not.  I know there are later patches that need subdirectories for
> their own private data, but they could just create
> pg_replslot/$SLOTNAME.dir and put whatever in it they like, without
> really implicating this code that much.

I wondered about making them plain files as well but given the need for
a directory independent from this I don't really see the advantage,
we'll need to handle them anyway during cleanup.

> Patch 0004:
> 
> I'm not very confident that PROC_IN_LOGICAL_DECODING is the right way
> to go here.  It seems to me that excluding the xmins of backends with
> slots from globalxmin consideration so that we can fold the same xmin
> in by some other mechanism is kind of strange.

It's essentially copying what PROC_IN_VACUUM already does, that's where
I got the idea from.

> How about letting the xmins of such backends affect the computation as normal, and then
> having one extra xmin that gets folded in that represents the minima
> of the xmin of unconnected slots?

That's how I had it in the beginning but it turned out that has
noticeable performance/space impact. Surprising isn't it? The reason is
that we'll intermittently use normal snapshots to look at the catalog
during decoding and they will install a xmin the current proc. So, while
that snapshot is active GetSnapshotData() will return an older xmin
preventing HOT pruning from being as efficient.

I think we *really* need to make heap_page_prune() more efficient CPU
wise someday not too far away.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Mark Kirkwood
Дата:
Сообщение: Re: Why do we let autovacuum give up?
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Hard limit on WAL space used (because PANIC sucks)