How to Qualifying or quantify risk of loss in asynchronous replication

Поиск
Список
Период
Сортировка
От otheus uibk
Тема How to Qualifying or quantify risk of loss in asynchronous replication
Дата
Msg-id CALbQNd1G_np279G60HBqZkdJo=3hE+yDRYSYoZSc16gFnCujbg@mail.gmail.com
обсуждение исходный текст
Ответы Re: How to Qualifying or quantify risk of loss in asynchronous replication  (Thomas Munro <thomas.munro@enterprisedb.com>)
Список pgsql-general
I've been working with PG 9.1.8 for two years now, mainly asynchronous replication. Recently, an IT admin of another group contested that the PG's asynchronous replication can result in loss of data in a 1-node failure. After re-readinG the documentation, I cannot determine to what extent this is true.

Back in 2008, Robert Haas made this post http://postgresql.nabble.com/Sync-Rep-First-Thoughts-on-Code-tp1998339p1998433.html in which he delineates between different levels of replication. 1-safe is guaranteed with PG WALs. Other possibilities include group-safe, both group safe and 1-safe, 2-safe. 

How do we qualify PG when WALs are written (and archived) on the master, and streaming replication to a hot standby, albeit asynchronous, is used? Is it Group-safe?

My understanding is "Strictly speaking, No". 

But what precisely is the algorithm and timing involved with streaming WALs? 

Is it:
  * client issues COMMIT
  * master receives commit
  * master processes transaction internally
  * master creates WAL record
  | master appends WAL to local WAL file, flushes disk
  | master sends WAL record to all streaming clients
  * master sends OK to client
  * master applies WAL

So is this correct? Is it correct to say: PG async guarantees that the WAL is *sent* to the receivers, but not that they are received, before the client receives acknowledgement?

Can we make a case stronger than that? Assuming the T0 is round trip time between master and client, and T1 is round trip time between master and slave, as long as T1 <= T0, and provided both Slave and Master do not fail, the system is Group-safe?




--

В списке pgsql-general по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: how to switch old replication Master to new Standby after promoting old Standby - pg_rewind log file missing
Следующее
От: John Lumby
Дата:
Сообщение: Re: how to switch old replication Master to new Standby after promoting old Standby - pg_rewind log file missing