Re: Keepalive for max_standby_delay

Поиск
Список
Период
Сортировка
От Greg Stark
Тема Re: Keepalive for max_standby_delay
Дата
Msg-id AANLkTiljJpCHPLhJqcAJvYevskZ5RBeqt-JqBAchX2AL@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Keepalive for max_standby_delay  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Keepalive for max_standby_delay  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Wed, Jun 2, 2010 at 8:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Indeed, but nothing we do can prevent that, if the slave is just plain
> slower than the master.  You have to assume that the slave is capable of
> keeping up in the absence of query-caused delays, or you're hosed.

I was assuming the walreceiver only requests more wal in relatively
small chunks and only when replay has caught up and needs more data. I
haven't actually read this code so if that assumption is wrong then
I'm off-base. But if my assumption is right then it's not merely the
master running faster than the slave that can cause you to fall
arbitrarily far behind. The "receive time" is delayed by how long the
slave waited on a previous lock, so if we wait for a few minutes, then
proceed and find we need more wal data we'll read data from the master
that it could have generated those few minutes ago.


> The sticky point is that once in a blue moon you do have a conflicting
> query sitting on a buffer lock for a long time, or even more likely a
> series of queries keeping the WAL replay process from obtaining buffer
> cleanup lock.

So I think this isn't necessarily such a blue moon event. As I
understand it, all it would take is a single long-running report and a
vacuum or HOT cleanup occurring on the master. If I want to set
max_standby_delay to 60min to allow reports to run for up to an hour
then any random HOT cleanup on the master will propagate to the slave
and cause a WAL stall until the transactions which have that xid in
their snapshot end.

Even the buffer locks are could potentially be blocked for a long time
if you happen to run the right kind of query (say a nested loop with
the buffer in question on the outside and a large table on the
inside). That's a rarer event though; is that what you were thinking
of?


--
greg


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: recovery getting interrupted is not so unusual as it used to be
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Keepalive for max_standby_delay