Re: WIP: Failover Slots

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: WIP: Failover Slots
Дата
Msg-id CAMsr+YGoaAVhOC2ja+pZbHs3Zux5W4G5JtDMZ-XbYDTRmM-0YA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: WIP: Failover Slots  (Oleksii Kliukin <alexk@hintbits.com>)
Ответы Re: WIP: Failover Slots  (Craig Ringer <craig@2ndquadrant.com>)
Список pgsql-hackers
On 22 February 2016 at 23:39, Oleksii Kliukin <alexk@hintbits.com> wrote:
 
What it’s doing is calling pg_basebackup first to initialize the replica, and that actually failed with:

_basebackup: unexpected termination of replication stream: ERROR:  requested WAL segment 000000010000000000000000 has already been removed

The segment name definitely looks bogus to me.

The actual command causing the failure was an attempt to clone the replica using pg_basebackup, turning on xlog streaming:

pg_basebackup --pgdata data/postgres1 --xlog-method=stream --dbname="host=localhost port=5432  user=replicator”

I checked the same command against the  git master without the patches applied and could not reproduce this problem there.

That's a bug. In testing whether we need to return a lower LSN for minimum WAL for BASE_BACKUP it failed to properly test for InvalidXLogRecPtr . Good catch.
 
On the code level, I have no comments on 0001, it’s well documented and I have no questions about the approach, although I might be not too knowledgable to judge the specifics of the implementation.

The first patch is the most important IMO, and the one I think needs the most thought since it's ... well, timelines aren't simple.
 
slots.c:294
elog(LOG, "persistency is %i", (int)slot->data.persistency);

Should be changed to DEBUG?

That's an escapee log statement I thought I'd already rebased out. Well spotted, fixed.
 

slot.c:468
Why did you drop “void" as a parameter type of ReplicationSlotDropAcquired?

That's an editing error on my part that I'll reverse. Since the prototype declares (void) it doesn't matter, but it's a pointless change. Fixed.
 
walsender.c: 1509 at PhysicalConfirmReceivedLocation

I’ve noticed a comment stating that we don’t need to call ReplicationSlotSave(), but that pre-dated the WAL-logging of replication slot changes. Don’t we need to call it now, the same way it’s done for the logical slots in logical.c:at LogicalConfirmReceivedLocation?

No, it's safe here. All we must ensure is that a slot is advanced on the replica when it's advanced on the master. For physical slots even that's a weak requirement, we just have to stop them from falling *too* far behind and causing too much xlog retention. For logical slots we should ensure we advance the slot on the replica before any vacuum activity that might remove catalog tuples still needed by that slot gets replayed. Basically the difference is that logical slots keep track of the catalog xmin too, so they have (slightly) stricter requirements.

This patch doesn't touch either of those functions except for renaming ReplicationSlotsComputeRequiredLSN to ReplicationSlotsUpdateRequiredLSN . Which, by the way, I really don't like doing, but I couldn't figure out a name to give the function that computes-and-returns the required LSN that wouldn't be even more confusing in the face of having a ReplicationSlotsComputeRequiredLSN function as well. Ideas welcome.

Updated patch 

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kyotaro HORIGUCHI
Дата:
Сообщение: Re: Support for N synchronous standby servers - take 2
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: WIP: Failover Slots