Re: postgres hot-standby questions.

Поиск
Список
Период
Сортировка
От Graeme B. Bell
Тема Re: postgres hot-standby questions.
Дата
Msg-id C3258C19-AAF1-42A6-93DD-C9057C5C05C0@skogoglandskap.no
обсуждение исходный текст
Ответ на Re: postgres hot-standby questions.  (Scott Ribe <scott_ribe@elevated-dev.com>)
Ответы Re: postgres hot-standby questions.  (Scott Ribe <scott_ribe@elevated-dev.com>)
Список pgsql-admin
On 26 Mar 2015, at 17:48, Scott Ribe <scott_ribe@elevated-dev.com> wrote:

> On Mar 26, 2015, at 10:40 AM, Graeme B. Bell <grb@skogoglandskap.no> wrote:
>>
>> I could be missing something obvious, but how does this stop e.g. checkpoints, autovacuum etc from occuring and
generatingnew wal that the standby might like? It seems like stopping services accessing the DB is likely to cause a
lengthyautovacuum to begin, for example.  
>
> OK, fair point. But do you care?

Yes, because I think it will prevent me from doing the fast 'swap back' which I mentioned in my previous emails, which
isthe entire point of this. I don't want to have to do an rsync --checksum on terabytes of data after a 10 minute
periodof downtime to be certain I can safely bring my normal master back into play, if I can trade that away for e.g. 1
secondof synchronous shutdown time. 

I want to be very certain that postgres hasn't produced one last bit of WAL that the standby hasn't caught, because if
ithas, I won't be able to directly apply the (promoted) standby's new WAL entries over the top of it. I also don't want
tostart playing guessing games about whether I can delete a WAL segment or not because that might lead to srious
accidentalbrain damage of the DB. It won't be till 9.5 is well tested that I consider that kind of stuff.  

> I care that all committed transactions are on the replica. I don't care if they're still in WAL or written to the
backingfiles. Actually, think about that question, you don't either--WAL gets copied to the replica, then moved to the
backingfiles. Checkpoints don't affect replication as far as I know. 

Let's say the standby drops off the network some time before shutdown. A checkpoint or autovacuum might generate a
smallchange/entry in WAL (I don't know this for sure regarding autovacuum; this is a worst case assumption). That will
preventthe master from being suitable for use when I come to switch back again.  The same problem occurs with any
normaltransaction that occurs in that timeframe too, on servers where it's harder to take down all the services that
mightconnect. 

I would like my master server to be able to wait or warn me that the standby isn't complete, during shutdown. Maybe the
logicalreplication slots service can help with that.  

Anyway this may seem a bit theoretical so here's a practical example that I know will cause this effect:

Let's imagine someone follows your advice but is already running a PITR archive with archive_timeout. The recommended
timeoutis 1 minute. Every minute their server generates a new WAL segment.  

They start turning off services. Once everything is turned off, they shutdown the master assuming that the standby is
upto date since there has been no server activity. However, depending on the time during the minute that they do it at
(anddepending on e.g. the network speed - 16MB is a big chunk of data for a WAN), there's a pretty good chance that the
serverhas a new WAL file and perhaps also modifications to the raw files on the disk, that make it unsuitable as a
ready-to-gostandby for the newly promoted master.  

And that's in the best case, where there are no connections to the server ongoing or being attempted.

Graeme Bell




p.s. Here's a troubling situation (unrelated to my original question, but with some analogous problems)

1. Let's assume you run a standby and a PITR recovery backup.

2. Power interruption knocks out half the building. Network switches are offline, master is still commiting WAL to disk
andto its PITR area. It cannot reach the standby. 

3. Master goes down.

4. Standby is promoted manually or automatically since it is still reachable.

5. PITR and master come back online. Standby copies it's version of events into the PITR area. However, this version of
eventsruns parallel to the WAL the master managed to commit.  

6. What happens when you try to recover from the WAL in the PITR?


- If we clean the PITR to make the standby able to start writing its new timeline there, we lose some transactions.
- If we don't clean the PITR, we have 2 branches of history in our PITR archive that both claim to follow the last WAL
entrythat was duplicated to the standby. 




В списке pgsql-admin по дате отправления:

Предыдущее
От: "Graeme B. Bell"
Дата:
Сообщение: Re: postgres hot-standby questions.
Следующее
От: Scott Ribe
Дата:
Сообщение: Re: postgres hot-standby questions.