Обсуждение: Issues in Replication Progress Tracking

Поиск
Список
Период
Сортировка

Issues in Replication Progress Tracking

От
Amit Kapila
Дата:
While reading the commit- 5aa23504 for Replication Progress
Tracking, I came across few issues which I would like to share.

1. catalogs.sgml
+     <row>
+      <entry><structfield>local_lsn</structfield></entry>
+      
<entry><type>pg_lsn</type></entry>
+      <entry></entry>
+      <entry>This node's LSN that at
+      which 
<literal>remote_lsn</literal> has been replicated. Used to
+      flush commit records before persisting data 
to disk when using
+      asynchronous commits.</entry>

I think part of the sentence is not very clear.
"This node's LSN *that at which* remote_lsn has been
replicated."

Reference link in docs

2.
closing ')' is missing.

Check below line:

Using the replication origin infrastructure a session can be marked
as replaying from a remote node (using 
the pg_replication_origin_session_setup() function.


3.
+      <row id="pg-replication-origin-session-progress">
+       <entry>
+        <indexterm>
+         
<primary>pg_replication_origin_session_progress</primary>
+        </indexterm>
+        
<literal><function>pg_replication_origin_progress(<parameter>flush</parameter> <type>bool</type>)
</function></literal>
+       </entry>

Specified function name is wrong
/pg_replication_origin_progress/pg_replication_origin_session_progress

Refer below page in docs:


4.
xloginsert.c
+/* Should te in-progress insertion log the origin */
+static bool include_origin = false;
+

typo /te

5.
origin.c

* * To create and drop replication origins an exclusive lock on
 *   pg_replication_slot is required for the 
duration. That allows us to
 *   safely and conflict free assign new origins using a dirty snapshot.

a. Isn't in above comment, we need to use pg_replication_origin
   instead of pg_replication_slot?
b. "..safely and conflict free assign..", I understand this part
of comment, but not sure if this is the best way to write it.

6.
origin.c
* setting up replorigin_sesssion_origin_lsn and replorigin_sesssion_origin that ensures we

line length greater than 80 and looks slightly odd.

7.
origin.c
replorigin_create()
{
..
if (tuple == NULL)
ereport(ERROR,
(errcode
(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("no free replication oid could be 
found")));
}

I think here error message should be
"no free replication origin oid could be found"

8.
origin.c
replorigin_drop()
{
..
tuple = SearchSysCache1(REPLORIGIDENT, ObjectIdGetDatum(roident));
simple_heap_delete(rel, &tuple-
>t_self);
..
}

Isn't it better to have check for a valid tuple after SearchSysCache1()?
if (!HeapTupleIsValid(tuple))
elog(ERROR, "cache lookup failed

9.
origin.c
ReplicationOriginShmemSize(void)
{
..
* XXX: max_replication_slots is arguablethe wrong
..
}

a. *arguablethe*, space is required and it better to use arguably
b. One thing that in favour of using a separate/new guc for
   ReplicationState is that even if the user has configured
   max_replication_slots for some other usage (other than
   tracking Replication Origin) of ReplicationSlots, even then we
   will end up allocating shared memory which will never be used,
   OTOH as the memory will not be huge, so we can even ignore it.
   

10.
origin.c
CheckPointReplicationOrigin()
{
..
if (unlink(tmppath) < 0 && errno != ENOENT)
ereport(PANIC,
(errcode_for_file_access(),
errmsg("could not remove file \"%s\": %m",
path)));
}

In ereport, tmppath should be used instead of path.

11.
origin.c
* local_commit needs to be a local LSN of the commit so that we can make sure
 * uppon a checkpoint that 
enough WAL has been persisted to disk.

/uppon/upon

12.
In funcions replorigin_advance() and replorigin_session_setup(),
different ways (free_state and free_slot) are used. Isn't it better
to use same way?

13.
In function replorigin_session_setup() and or
replorigin_session_advance(), don't we need to WAL log the
use of Replication state?

14.
pg_replication_origin_session_reset()
{
..
/* FIXME */
replorigin_sesssion_origin = InvalidRepOriginId;
..
}

What needs be fixed in FIXME is not clear?

15.
Below changes in pg_resetxlog.c seems redundant.
--------------------- src/bin/pg_resetxlog/pg_resetxlog.c ---------------------
index a0805d8..4a22575 100644
@@ -56,6 +56,8 @@
 #include "common/restricted_token.h"
 #include "storage/large_object.h"
 #include "pg_getopt.h"
+#include "replication/logical.h"
+#include "replication/origin.h"
 
 
 static ControlFileData ControlFile; /* pg_control values */
@@ -1091,6 +1093,7 @@ WriteEmptyXLOG(void)
  record->xl_tot_len = SizeOfXLogRecord + SizeOfXLogRecordDataHeaderShort + sizeof(CheckPoint);
  record->xl_info = XLOG_CHECKPOINT_SHUTDOWN;
  record->xl_rmid = RM_XLOG_ID;
+
  recptr += SizeOfXLogRecord;
  *(recptr++) = XLR_BLOCK_ID_DATA_SHORT;
  *(recptr++) = sizeof(CheckPoint);

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Issues in Replication Progress Tracking

От
Andres Freund
Дата:
Hi,

Thanks for looking through this!

On 2015-05-20 19:27:05 +0530, Amit Kapila wrote:
> 5.
> origin.c
> 
> * * To create and drop replication origins an exclusive lock on
>  *   pg_replication_slot is required for the
> duration. That allows us to
>  *   safely and conflict free assign new origins using a dirty snapshot.

> b. "..safely and conflict free assign..", I understand this part
> of comment, but not sure if this is the best way to write it.

Hm, don't see a problem with that part.

> 8.
> origin.c
> replorigin_drop()
> {
> ..
> tuple = SearchSysCache1(REPLORIGIDENT, ObjectIdGetDatum(roident));
> simple_heap_delete(rel, &tuple-
> >t_self);
> ..
> }
> 
> Isn't it better to have check for a valid tuple after SearchSysCache1()?
> if (!HeapTupleIsValid(tuple))
> elog(ERROR, "cache lookup failed

Sounds good to me.

> 9.
> origin.c
> ReplicationOriginShmemSize(void)
> {
> ..
> * XXX: max_replication_slots is arguablethe wrong
> ..
> }

> b. One thing that in favour of using a separate/new guc for
>    ReplicationState is that even if the user has configured
>    max_replication_slots for some other usage (other than
>    tracking Replication Origin) of ReplicationSlots, even then we
>    will end up allocating shared memory which will never be used,
>    OTOH as the memory will not be huge, so we can even ignore it.

I don't think it matters much for now, as you say it's only a small
amount of memory.

> 12.
> In funcions replorigin_advance() and replorigin_session_setup(),
> different ways (free_state and free_slot) are used. Isn't it better
> to use same way?

Phew, I don't really care.

> 13.
> In function replorigin_session_setup() and or
> replorigin_session_advance(), don't we need to WAL log the
> use of Replication state?

No, the point is that the replication progress is persisted via an extra
data block in the commit record. That's important for both performance
and correctness, because otherwise it gets hard to tie a transaction
made during replay with the update to the progress. Unless you use 2PC
which isn't really an alternative.

Greetings,

Andres Freund



Re: Issues in Replication Progress Tracking

От
Amit Kapila
Дата:
On Thu, May 21, 2015 at 12:42 AM, Andres Freund <andres@anarazel.de> wrote:
>
> On 2015-05-20 19:27:05 +0530, Amit Kapila wrote:
>
> > 13.
> > In function replorigin_session_setup() and or
> > replorigin_session_advance(), don't we need to WAL log the
> > use of Replication state?
>
> No, the point is that the replication progress is persisted via an extra
> data block in the commit record. That's important for both performance
> and correctness, because otherwise it gets hard to tie a transaction
> made during replay with the update to the progress. Unless you use 2PC
> which isn't really an alternative.
>

Okay, but what triggered this question was the difference of those functions
as compare to when user call function pg_replication_origin_advance().
pg_replication_origin_advance() will WAL log the information during that
function call itself (via replorigin_advance()).  So even if the transaction
issuing pg_replication_origin_advance() function will abort, it will still update
the Replication State, why so?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Issues in Replication Progress Tracking

От
Andres Freund
Дата:
On 2015-05-21 09:40:58 +0530, Amit Kapila wrote:
> On Thu, May 21, 2015 at 12:42 AM, Andres Freund <andres@anarazel.de> wrote:
> >
> > On 2015-05-20 19:27:05 +0530, Amit Kapila wrote:
> >
> > > 13.
> > > In function replorigin_session_setup() and or
> > > replorigin_session_advance(), don't we need to WAL log the
> > > use of Replication state?
> >
> > No, the point is that the replication progress is persisted via an extra
> > data block in the commit record. That's important for both performance
> > and correctness, because otherwise it gets hard to tie a transaction
> > made during replay with the update to the progress. Unless you use 2PC
> > which isn't really an alternative.
> >
> 
> Okay, but what triggered this question was the difference of those functions
> as compare to when user call function pg_replication_origin_advance().
> pg_replication_origin_advance() will WAL log the information during that
> function call itself (via replorigin_advance()).  So even if the transaction
> issuing pg_replication_origin_advance() function will abort, it will still
> update
> the Replication State, why so?

I don't see a problem here. pg_replication_origin_advance() is for
setting up the initial position/update the position upon configuration
changes. It'd be a fair amount of infrastructure to make it tie into
transactions - without a point to it afaics?

(Just to be clear, I plan to address all the points I've not commented
upon)

Greetings,

Andres Freund



Re: Issues in Replication Progress Tracking

От
Amit Kapila
Дата:
On Fri, May 22, 2015 at 11:20 PM, Andres Freund <andres@anarazel.de> wrote:
>
> On 2015-05-21 09:40:58 +0530, Amit Kapila wrote:
> > On Thu, May 21, 2015 at 12:42 AM, Andres Freund <andres@anarazel.de> wrote:
> > >
> > > On 2015-05-20 19:27:05 +0530, Amit Kapila wrote:
> > >
> > > > 13.
> > > > In function replorigin_session_setup() and or
> > > > replorigin_session_advance(), don't we need to WAL log the
> > > > use of Replication state?
> > >
> > > No, the point is that the replication progress is persisted via an extra
> > > data block in the commit record. That's important for both performance
> > > and correctness, because otherwise it gets hard to tie a transaction
> > > made during replay with the update to the progress. Unless you use 2PC
> > > which isn't really an alternative.
> > >
> >
> > Okay, but what triggered this question was the difference of those functions
> > as compare to when user call function pg_replication_origin_advance().
> > pg_replication_origin_advance() will WAL log the information during that
> > function call itself (via replorigin_advance()).  So even if the transaction
> > issuing pg_replication_origin_advance() function will abort, it will still
> > update
> > the Replication State, why so?
>
> I don't see a problem here. pg_replication_origin_advance() is for
> setting up the initial position/update the position upon configuration
> changes. 

Okay, I am not aware how exactly these API's will be used for replication
but let me try to clarify what I have in mind related to this API usage.

Can we use pg_replication_origin_advance()  for node where Replay has
to happen, if Yes, then Let us say user of pg_replication_origin_advance()
API set the lsn position to X for the node N1 on which replay has to
happen, so now replay will proceed from X + 1 even though the
information related to X is not persisted, so now it could so happen
X will get written after the replay of X + 1 which might lead to problem
after crash recovery?

> It'd be a fair amount of infrastructure to make it tie into
> transactions - without a point to it afaics?
>

Agreed, that if there is no valid use case then we should keep it
as it is.

> (Just to be clear, I plan to address all the points I've not commented
> upon)
>

Thanks.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Issues in Replication Progress Tracking

От
Andres Freund
Дата:
On May 22, 2015 10:23:06 PM PDT, Amit Kapila <amit.kapila16@gmail.com> wrote:
>On Fri, May 22, 2015 at 11:20 PM, Andres Freund <andres@anarazel.de>
>wrote:
>>
>> On 2015-05-21 09:40:58 +0530, Amit Kapila wrote:
>> > On Thu, May 21, 2015 at 12:42 AM, Andres Freund
><andres@anarazel.de>
>wrote:
>> > >
>> > > On 2015-05-20 19:27:05 +0530, Amit Kapila wrote:
>> > >
>> > > > 13.
>> > > > In function replorigin_session_setup() and or
>> > > > replorigin_session_advance(), don't we need to WAL log the
>> > > > use of Replication state?
>> > >
>> > > No, the point is that the replication progress is persisted via
>an
>extra
>> > > data block in the commit record. That's important for both
>performance
>> > > and correctness, because otherwise it gets hard to tie a
>transaction
>> > > made during replay with the update to the progress. Unless you
>use 2PC
>> > > which isn't really an alternative.
>> > >
>> >
>> > Okay, but what triggered this question was the difference of those
>functions
>> > as compare to when user call function
>pg_replication_origin_advance().
>> > pg_replication_origin_advance() will WAL log the information during
>that
>> > function call itself (via replorigin_advance()).  So even if the
>transaction
>> > issuing pg_replication_origin_advance() function will abort, it
>will
>still
>> > update
>> > the Replication State, why so?
>>
>> I don't see a problem here. pg_replication_origin_advance() is for
>> setting up the initial position/update the position upon
>configuration
>> changes.
>
>Okay, I am not aware how exactly these API's will be used for
>replication
>but let me try to clarify what I have in mind related to this API
>usage.
>
>Can we use pg_replication_origin_advance()  for node where Replay has
>to happen, if Yes, then Let us say user of
>pg_replication_origin_advance()
>API set the lsn position to X for the node N1 on which replay has to
>happen, so now replay will proceed from X + 1 even though the
>information related to X is not persisted, so now it could so happen
>X will get written after the replay of X + 1 which might lead to
>problem
>after crash recovery?

Then you'd use the session variant - which does tie into transactions. Is there a reason that's be unsuitable for you?

Andres


--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.



Re: Issues in Replication Progress Tracking

От
Amit Kapila
Дата:
On Sat, May 23, 2015 at 11:19 AM, Andres Freund <andres@anarazel.de> wrote:
>
> On May 22, 2015 10:23:06 PM PDT, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >On Fri, May 22, 2015 at 11:20 PM, Andres Freund <andres@anarazel.de>
> >wrote:
> >>
> >> On 2015-05-21 09:40:58 +0530, Amit Kapila wrote:
> >> > On Thu, May 21, 2015 at 12:42 AM, Andres Freund
> ><andres@anarazel.de>
> >wrote:
> >> > >
> >> > > On 2015-05-20 19:27:05 +0530, Amit Kapila wrote:
> >> > >
> >> > > > 13.
> >> > > > In function replorigin_session_setup() and or
> >> > > > replorigin_session_advance(), don't we need to WAL log the
> >> > > > use of Replication state?
> >> > >
> >> > > No, the point is that the replication progress is persisted via
> >an
> >extra
> >> > > data block in the commit record. That's important for both
> >performance
> >> > > and correctness, because otherwise it gets hard to tie a
> >transaction
> >> > > made during replay with the update to the progress. Unless you
> >use 2PC
> >> > > which isn't really an alternative.
> >> > >
> >> >
> >> > Okay, but what triggered this question was the difference of those
> >functions
> >> > as compare to when user call function
> >pg_replication_origin_advance().
> >> > pg_replication_origin_advance() will WAL log the information during
> >that
> >> > function call itself (via replorigin_advance()).  So even if the
> >transaction
> >> > issuing pg_replication_origin_advance() function will abort, it
> >will
> >still
> >> > update
> >> > the Replication State, why so?
> >>
> >> I don't see a problem here. pg_replication_origin_advance() is for
> >> setting up the initial position/update the position upon
> >configuration
> >> changes.
> >
> >Okay, I am not aware how exactly these API's will be used for
> >replication
> >but let me try to clarify what I have in mind related to this API
> >usage.
> >
> >Can we use pg_replication_origin_advance()  for node where Replay has
> >to happen, if Yes, then Let us say user of
> >pg_replication_origin_advance()
> >API set the lsn position to X for the node N1 on which replay has to
> >happen, so now replay will proceed from X + 1 even though the
> >information related to X is not persisted, so now it could so happen
> >X will get written after the replay of X + 1 which might lead to
> >problem
> >after crash recovery?
>
> Then you'd use the session variant - which does tie into transactions. Is there a reason that's be unsuitable for you?
>

I don't know because I have not thought about how one can
use these API's in various ways in design of the Replication
system, but I think ideally we should prohibit the use of API
in circumstances where it could lead to problem or if that is
difficult or not possible or not worth, then at least we can mention
the same in docs.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com