Re: Standalone synchronous master

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Standalone synchronous master
Дата
Msg-id 52CD1564.6060504@vmware.com
обсуждение исходный текст
Ответ на Standalone synchronous master  (Rajeev rastogi <rajeev.rastogi@huawei.com>)
Ответы Re: Standalone synchronous master  (Andres Freund <andres@2ndquadrant.com>)
Re: Standalone synchronous master  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-hackers
On 11/13/2013 03:09 PM, Rajeev rastogi wrote:
> This patch implements the following TODO item:
>
> Add a new "eager" synchronous mode that starts out synchronous but reverts to asynchronous after a failure timeout
period
> This would require some type of command to be executed to alert administrators of this change.
> http://archives.postgresql.org/pgsql-hackers/2011-12/msg01224.php
>
> This patch implementation is in the same line as it was given in the earlier thread.
> Some Of the additional important changes are:
>
> 1.       Have added two GUC variable to take commands from user to be executed
>
> a.       Master_to_standalone_cmd: To be executed before master switches to standalone mode.
>
> b.      Master_to_sync_cmd: To be executed before master switches from sync mode to standalone mode.
>
> 2.       Master mode switch will happen only if the corresponding command executed successfully.
>
> 3.       Taken care of replication timeout to decide whether synchronous standby has gone down. i.e. only after
expiryof
 
>
> wal_sender_timeout, the master will switch from sync mode to standalone mode.
>
> Please provide your opinion or any other expectation out of this patch.

I'm going to say right off the bat that I think the whole notion to 
automatically disable synchronous replication when the standby goes down 
is completely bonkers. If you don't need the strong guarantee that your 
transaction is safe in at least two servers before it's acknowledged to 
the client, there's no point enabling synchronous replication in the 
first place. If you do need it, then you shouldn't fall back to a 
degraded mode, at least not automatically. It's an idea that keeps 
coming back, but I have not heard a convincing argument why it makes 
sense. It's been discussed many times before, most recently in that 
thread you linked to.

Now that I got that out of the way, I concur that some sort of hooks or 
commands that fire when a standby goes down or comes back up makes 
sense, for monitoring purposes. I don't much like this particular 
design. If you just want to write log entry, when all the standbys are 
disconnected, running a shell command seems like an awkward interface. 
It's OK for raising an alarm, but there are many other situations where 
you might want to raise alarms, so I'd rather have us implement some 
sort of a generic trap system, instead of adding this one particular 
extra config option. What do people usually use to monitor replication?

There are two things we're trying to solve here: raising an alarm when 
something interesting happens, and changing the configuration to 
temporarily disable synchronous replication. What would be a good API to 
disable synchronous replication? Editing the config file and SIGHUPing 
is not very nice. There's been talk of an ALTER command to change the 
config, but I'm not sure that's a very good API either. Perhaps expose 
the sync_master_in_standalone_mode variable you have in your patch to 
new SQL-callable functions. Something like:

pg_disable_synchronous_replication()
pg_enable_synchronous_replication()

I'm not sure where that state would be stored. Should it persist 
restarts? And you probably should get some sort of warnings in the log 
when synchronous replication is disabled.

In summary, more work is required to design a good 
user/admin/programming interface. Let's hear a solid proposal for that, 
before writing patches.

BTW, calling an external command with system(), while holding 
SyncRepLock in exclusive-mode, seems like a bad idea. For starters, 
holding a lock will prevent a new WAL sender from starting up and 
becoming a synchronous standby, and the external command might take a 
long time to return.

- Heikki



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Craig Ringer
Дата:
Сообщение: Re: WIP patch (v2) for updatable security barrier views
Следующее
От: Pavel Raiskup
Дата:
Сообщение: Re: pg_upgrade: make the locale comparison more tolerating