Re: Error "initial slot snapshot too large" in create replication slot

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: Error "initial slot snapshot too large" in create replication slot
Дата
Msg-id 20211011.195937.2061830189397708214.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Error "initial slot snapshot too large" in create replication slot  (Dilip Kumar <dilipbalaut@gmail.com>)
Ответы Re: Error "initial slot snapshot too large" in create replication slot  (Dilip Kumar <dilipbalaut@gmail.com>)
Список pgsql-hackers
At Mon, 11 Oct 2021 11:49:41 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in 
> While creating an "export snapshot" I don't see any protection why the
> number of xids in the snapshot can not cross the
> "GetMaxSnapshotXidCount()"?.
> 
> Basically, while converting the HISTORIC snapshot to the MVCC snapshot
> in "SnapBuildInitialSnapshot()", we add all the xids between
> snap->xmin to snap->xmax to the MVCC snap->xip array (xids for which
> commit were not recorded).  The problem is that we add both topxids as
> well as the subxids into the same array and expect that the "xid"
> count does not cross the "GetMaxSnapshotXidCount()".  So it seems like
> an issue but I am not sure what is the fix for this, some options are

It seems to me that it is a compromise between the restriction of the
legitimate snapshot and snapshots created by snapbuild.  If the xids
overflow, the resulting snapshot may lose a siginificant xid, i.e, a
top-level xid.

> a) Don't limit the xid count in the exported snapshot and dynamically
> resize the array b) Increase the limit to GetMaxSnapshotXidCount() +
> GetMaxSnapshotSubxidCount().  But in option b) there would still be a
> problem that how do we handle the overflowed subtransaction?

I'm afraid that we shouldn't expand the size limits.  If I understand
it correctly, we only need the top-level xids in the exported snapshot
and reorder buffer knows whether a xid is a top-level or not after
establishing a consistent snapshot.

The attached diff tries to make SnapBuildInitialSnapshot exclude
subtransaction from generated snapshots.  It seems working fine for
you repro. (Of course, I'm not confident that it is the correct thing,
though..)

What do you think about this?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 46e66608cf..4e452cce7c 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3326,6 +3326,26 @@ ReorderBufferXidHasBaseSnapshot(ReorderBuffer *rb, TransactionId xid)
 }
 
 
+/*
+ * ReorderBufferXidIsKnownSubXact
+ *        Returns true if the xid is a known subtransaction.
+ */
+bool
+ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid)
+{
+    ReorderBufferTXN *txn;
+
+    txn = ReorderBufferTXNByXid(rb, xid, false,
+                                NULL, InvalidXLogRecPtr, false);
+
+    /* a known subtxn? */
+    if (txn && rbtxn_is_known_subxact(txn))
+        return true;
+
+    return false;
+}
+
+
 /*
  * ---------------------------------------
  * Disk serialization support
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index a5333349a8..12d283f4de 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -591,12 +591,18 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
         if (test == NULL)
         {
-            if (newxcnt >= GetMaxSnapshotXidCount())
-                ereport(ERROR,
-                        (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-                         errmsg("initial slot snapshot too large")));
+            bool issubxact =
+                ReorderBufferXidIsKnownSubXact(builder->reorder, xid);
 
-            newxip[newxcnt++] = xid;
+            if (!issubxact)
+            {                
+                if (newxcnt >= GetMaxSnapshotXidCount())
+                    ereport(ERROR,
+                            (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+                             errmsg("initial slot snapshot too large")));
+                
+                newxip[newxcnt++] = xid;
+            }
         }
 
         TransactionIdAdvance(xid);
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 5b40ff75f7..e5fa1051d7 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -669,6 +669,7 @@ void        ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLog
 bool        ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
 bool        ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
 
+bool        ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid);
 bool        ReorderBufferRememberPrepareInfo(ReorderBuffer *rb, TransactionId xid,
                                              XLogRecPtr prepare_lsn, XLogRecPtr end_lsn,
                                              TimestampTz prepare_time,

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bharath Rupireddy
Дата:
Сообщение: Re: Inconsistency in startup process's MyBackendId and procsignal array registration with ProcSignalInit()
Следующее
От: Dilip Kumar
Дата:
Сообщение: Re: Error "initial slot snapshot too large" in create replication slot