Re: Support for N synchronous standby servers - take 2

Поиск
Список
Период
Сортировка
От Kyotaro HORIGUCHI
Тема Re: Support for N synchronous standby servers - take 2
Дата
Msg-id 20151214.182738.130827803.horiguchi.kyotaro@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: Support for N synchronous standby servers - take 2  (Masahiko Sawada <sawada.mshk@gmail.com>)
Ответы Re: Support for N synchronous standby servers - take 2  (Masahiko Sawada <sawada.mshk@gmail.com>)
Список pgsql-hackers
Thank you for the new patch.

At Wed, 9 Dec 2015 20:59:20 +0530, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoDcn1fToCcYRqpU6fMY1xnpDdAKDTcbhW1R9M1mPM0kZg@mail.gmail.com>
> On Wed, Nov 18, 2015 at 2:06 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > I agree with #3 way and the s_s_name format you suggested.
> > I think that It's extensible and is tolerable for future changes.
> > I'm going to implement the patch based on this idea if other hackers
> > agree with this design.
> 
> Please find the attached draft patch which supports multi sync replication.
> This patch adds a GUC parameter synchronous_replication_method, which
> represent the method of synchronous replication.
> 
> [Design of replication method]
> synchronous_replication_method has two values; 'priority' and
> '1-priority' for now.
> We can expand the kind of its value (e.g, 'quorum', 'json' etc) in the future.
> 
> * s_r_method = '1-priority'
> This method is for backward compatibility, so the syntax of s_s_names
> is same as today.
> The behavior is same as well.
> 
> * s_r_method = 'priority'
> This method is for multiple synchronous replication using priority method.
> The syntax of s_s_names is,
>    <number of sync standbys>, <standby name> [, ...]

Is there anyone opposed to this?

> For example, s_r_method = 'priority' and s_s_names = '2, node1, node2,
> node3' means that the master waits for  acknowledge from at least 2
> lowest priority servers.
> If 4 standbys(node1 - node4) are available, the master server waits
> acknowledge from 'node1' and 'node2.
> The each status of wal senders are;
> 
> =# select application_name, sync_state from pg_stat_replication order
> by application_name;
> application_name | sync_state
> ------------------+------------
> node1            | sync
> node2            | sync
> node3            | potential
> node4            | async
> (4 rows)
> 
> After 'node2' crashed, the master will wait for acknowledge from
> 'node1' and 'node3'.
> The each status of wal senders are;
> 
> =# select application_name, sync_state from pg_stat_replication order
> by application_name;
> application_name | sync_state
> ------------------+------------
> node1            | sync
> node3            | sync
> node4            | async
> (3 rows)
> 
> [Changing replication method]
> When we want to change the replication method, we have to change the
> s_r_method  at first, and then do pg_reload_conf().
> After changing replication method, we can change the s_s_names.

Mmm. I should be able to be changed at once, because s_r_method
and s_s_names contradict each other during the intermediate
state.

> [Expanding replication method]
> If we want to expand new replication method additionally, we need to
> implement two functions for each replication method:
> * int SyncRepGetSynchronousStandbysXXX(int *sync_standbys)
>   This function obtains the list of standbys considered as synchronous
> at that time, and return its length.
> * bool SyncRepGetSyncLsnXXX(XLogRecPtr *write_pos, XLogRecPtr *flush_pos)
>   This function obtains LSNs(write, flush) considered as synced.
> 
> Also, this patch debug code is remain yet, you can debug this behavior
> using by enable DEBUG_REPLICATION macro.
> 
> Please give me feedbacks.

I haven't looked into this fully (sorry) but I'm concerned about
several points.


- I feel that some function names looks too long. For example SyncRepGetSynchronousStandbysOnePriority occupies more
thanthe half of a line. (However, the replication code alrady has many long function names..)
 

- The comment below of SyncRepGetSynchronousStandbyOnePriority, >       /* Find lowest priority standby */
 The code where the comment is for is doing the correct thing. Howerver, the comment is confusing. A lower priority
*value*means a higher priority.
 

- SyncRepGetSynchronousStandbys checks all if()s even when the first one matches. Use switch or "else if" there if you
theyare exclusive each other.
 

- Do you intende the DEBUG_REPLICATION code in SyncRepGetSynchronousStandbys*() to be the final shape?  The same code
blockswhich can work for both method should be in their common caller but SyncRepGetSyncLsns*() are headache. Although
itmight need more refactoring, I'm sorry but I don't see a desirable shape for now.
 
 By the way, palloc(20)/free() in such short term looks ineffective.

- SyncRepGetSyncLsnsPriority
 For the comment "/* Find lowest XLogRecPtr of both write and flush from sync_nodes */", LSN is compared as early or
lateso the comment would be better to be something like "Keep/Collect the earliest write and flush LSNs among
prioritizedstandbys".
 
 And what is more important, this block handles write and flush LSN jumbled and it reults in missing the earliest(=
mostdelayed) LSN for certain cases. The following is an example.
 
  Standby 1:  write LSN = 10, flush LSN = 5  Standby 2:  write LSN = 8 , flush LSN = 6
 For this case, finally we get tmp_write = 10 and tmp_flush = 5 from the current code, where tmp_write has wrong value
sinceLSN = 10 has *not* been written yet on standby 2. (the names "tmp_*" don't seem appropriate here)
 


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





В списке pgsql-hackers по дате отправления:

Предыдущее
От: Marco Nenciarini
Дата:
Сообщение: Re: Uninterruptible slow geo_ops.c
Следующее
От: Kyotaro HORIGUCHI
Дата:
Сообщение: Re: [PoC] Asynchronous execution again (which is not parallel)