Обсуждение: Error "initial slot snapshot too large" in create replication slot

Поиск
Список
Период
Сортировка

Error "initial slot snapshot too large" in create replication slot

От
Dilip Kumar
Дата:
While creating an "export snapshot" I don't see any protection why the
number of xids in the snapshot can not cross the
"GetMaxSnapshotXidCount()"?.

Basically, while converting the HISTORIC snapshot to the MVCC snapshot
in "SnapBuildInitialSnapshot()", we add all the xids between
snap->xmin to snap->xmax to the MVCC snap->xip array (xids for which
commit were not recorded).  The problem is that we add both topxids as
well as the subxids into the same array and expect that the "xid"
count does not cross the "GetMaxSnapshotXidCount()".  So it seems like
an issue but I am not sure what is the fix for this, some options are
a) Don't limit the xid count in the exported snapshot and dynamically
resize the array b) Increase the limit to GetMaxSnapshotXidCount() +
GetMaxSnapshotSubxidCount().  But in option b) there would still be a
problem that how do we handle the overflowed subtransaction?

I have locally, reproduced the issue,

1. Configuration
max_connections= 5
autovacuum = off
max_worker_processes = 0

2.Then from pgbench I have run the attached script (test.sql) from 5 clients.
./pgbench -i postgres
./pgbench -c4 -j4 -T 3000 -f test1.sql -P1 postgres

3. Concurrently, create replication slot,
[dilipkumar@localhost bin]$ ./psql "dbname=postgres replication=database"
postgres[7367]=#
postgres[6463]=# CREATE_REPLICATION_SLOT "slot" LOGICAL "test_decoding";
ERROR:  40001: initial slot snapshot too large
LOCATION:  SnapBuildInitialSnapshot, snapbuild.c:597
postgres[6463]=# CREATE_REPLICATION_SLOT "slot" LOGICAL "test_decoding";
ERROR:  XX000: clearing exported snapshot in wrong transaction state
LOCATION:  SnapBuildClearExportedSnapshot, snapbuild.c:690

I could reproduce this issue, at least once in 8-10 attempts of
creating the replication slot.

Note:  After that issue, I have noticed one more issue "clearing
exported snapshot in wrong transaction state", that is because the
"ExportInProgress" is not cleared on the transaction abort, for this,
a simple fix is we can clear this state on the transaction abort,
maybe I will raise this as a separate issue?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: Error "initial slot snapshot too large" in create replication slot

От
Kyotaro Horiguchi
Дата:
At Mon, 11 Oct 2021 11:49:41 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in 
> While creating an "export snapshot" I don't see any protection why the
> number of xids in the snapshot can not cross the
> "GetMaxSnapshotXidCount()"?.
> 
> Basically, while converting the HISTORIC snapshot to the MVCC snapshot
> in "SnapBuildInitialSnapshot()", we add all the xids between
> snap->xmin to snap->xmax to the MVCC snap->xip array (xids for which
> commit were not recorded).  The problem is that we add both topxids as
> well as the subxids into the same array and expect that the "xid"
> count does not cross the "GetMaxSnapshotXidCount()".  So it seems like
> an issue but I am not sure what is the fix for this, some options are

It seems to me that it is a compromise between the restriction of the
legitimate snapshot and snapshots created by snapbuild.  If the xids
overflow, the resulting snapshot may lose a siginificant xid, i.e, a
top-level xid.

> a) Don't limit the xid count in the exported snapshot and dynamically
> resize the array b) Increase the limit to GetMaxSnapshotXidCount() +
> GetMaxSnapshotSubxidCount().  But in option b) there would still be a
> problem that how do we handle the overflowed subtransaction?

I'm afraid that we shouldn't expand the size limits.  If I understand
it correctly, we only need the top-level xids in the exported snapshot
and reorder buffer knows whether a xid is a top-level or not after
establishing a consistent snapshot.

The attached diff tries to make SnapBuildInitialSnapshot exclude
subtransaction from generated snapshots.  It seems working fine for
you repro. (Of course, I'm not confident that it is the correct thing,
though..)

What do you think about this?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 46e66608cf..4e452cce7c 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3326,6 +3326,26 @@ ReorderBufferXidHasBaseSnapshot(ReorderBuffer *rb, TransactionId xid)
 }
 
 
+/*
+ * ReorderBufferXidIsKnownSubXact
+ *        Returns true if the xid is a known subtransaction.
+ */
+bool
+ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid)
+{
+    ReorderBufferTXN *txn;
+
+    txn = ReorderBufferTXNByXid(rb, xid, false,
+                                NULL, InvalidXLogRecPtr, false);
+
+    /* a known subtxn? */
+    if (txn && rbtxn_is_known_subxact(txn))
+        return true;
+
+    return false;
+}
+
+
 /*
  * ---------------------------------------
  * Disk serialization support
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index a5333349a8..12d283f4de 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -591,12 +591,18 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
         if (test == NULL)
         {
-            if (newxcnt >= GetMaxSnapshotXidCount())
-                ereport(ERROR,
-                        (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-                         errmsg("initial slot snapshot too large")));
+            bool issubxact =
+                ReorderBufferXidIsKnownSubXact(builder->reorder, xid);
 
-            newxip[newxcnt++] = xid;
+            if (!issubxact)
+            {                
+                if (newxcnt >= GetMaxSnapshotXidCount())
+                    ereport(ERROR,
+                            (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+                             errmsg("initial slot snapshot too large")));
+                
+                newxip[newxcnt++] = xid;
+            }
         }
 
         TransactionIdAdvance(xid);
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 5b40ff75f7..e5fa1051d7 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -669,6 +669,7 @@ void        ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLog
 bool        ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
 bool        ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
 
+bool        ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid);
 bool        ReorderBufferRememberPrepareInfo(ReorderBuffer *rb, TransactionId xid,
                                              XLogRecPtr prepare_lsn, XLogRecPtr end_lsn,
                                              TimestampTz prepare_time,

Re: Error "initial slot snapshot too large" in create replication slot

От
Dilip Kumar
Дата:
On Mon, Oct 11, 2021 at 4:29 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Mon, 11 Oct 2021 11:49:41 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in
> > While creating an "export snapshot" I don't see any protection why the
> > number of xids in the snapshot can not cross the
> > "GetMaxSnapshotXidCount()"?.
> >
> > Basically, while converting the HISTORIC snapshot to the MVCC snapshot
> > in "SnapBuildInitialSnapshot()", we add all the xids between
> > snap->xmin to snap->xmax to the MVCC snap->xip array (xids for which
> > commit were not recorded).  The problem is that we add both topxids as
> > well as the subxids into the same array and expect that the "xid"
> > count does not cross the "GetMaxSnapshotXidCount()".  So it seems like
> > an issue but I am not sure what is the fix for this, some options are
>
> It seems to me that it is a compromise between the restriction of the
> legitimate snapshot and snapshots created by snapbuild.  If the xids
> overflow, the resulting snapshot may lose a siginificant xid, i.e, a
> top-level xid.
>
> > a) Don't limit the xid count in the exported snapshot and dynamically
> > resize the array b) Increase the limit to GetMaxSnapshotXidCount() +
> > GetMaxSnapshotSubxidCount().  But in option b) there would still be a
> > problem that how do we handle the overflowed subtransaction?
>
> I'm afraid that we shouldn't expand the size limits.  If I understand
> it correctly, we only need the top-level xids in the exported snapshot

But since we are using this as an MVCC snapshot, if we don't have the
subxid and if we also don't mark the "suboverflowed" flag then IMHO
the sub-transaction visibility might be wrong, Am I missing something?

> and reorder buffer knows whether a xid is a top-level or not after
> establishing a consistent snapshot.
>
> The attached diff tries to make SnapBuildInitialSnapshot exclude
> subtransaction from generated snapshots.  It seems working fine for
> you repro. (Of course, I'm not confident that it is the correct thing,
> though..)
>
> What do you think about this?

If your statement that we only need top-xids in the exported snapshot,
is true then this fix looks fine to me.   If not then we might need to
add the sub-xids in the snapshot->subxip array and if it crosses the
limit of GetMaxSnapshotSubxidCount(), then we can mark "suboverflowed"
flag.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Error "initial slot snapshot too large" in create replication slot

От
Kyotaro Horiguchi
Дата:
At Mon, 11 Oct 2021 16:48:10 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in 
> On Mon, Oct 11, 2021 at 4:29 PM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> >
> > At Mon, 11 Oct 2021 11:49:41 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in
> > > While creating an "export snapshot" I don't see any protection why the
> > > number of xids in the snapshot can not cross the
> > > "GetMaxSnapshotXidCount()"?.
> > >
> > > Basically, while converting the HISTORIC snapshot to the MVCC snapshot
> > > in "SnapBuildInitialSnapshot()", we add all the xids between
> > > snap->xmin to snap->xmax to the MVCC snap->xip array (xids for which
> > > commit were not recorded).  The problem is that we add both topxids as
> > > well as the subxids into the same array and expect that the "xid"
> > > count does not cross the "GetMaxSnapshotXidCount()".  So it seems like
> > > an issue but I am not sure what is the fix for this, some options are
> >
> > It seems to me that it is a compromise between the restriction of the
> > legitimate snapshot and snapshots created by snapbuild.  If the xids
> > overflow, the resulting snapshot may lose a siginificant xid, i.e, a
> > top-level xid.
> >
> > > a) Don't limit the xid count in the exported snapshot and dynamically
> > > resize the array b) Increase the limit to GetMaxSnapshotXidCount() +
> > > GetMaxSnapshotSubxidCount().  But in option b) there would still be a
> > > problem that how do we handle the overflowed subtransaction?
> >
> > I'm afraid that we shouldn't expand the size limits.  If I understand
> > it correctly, we only need the top-level xids in the exported snapshot
> 
> But since we are using this as an MVCC snapshot, if we don't have the
> subxid and if we also don't mark the "suboverflowed" flag then IMHO
> the sub-transaction visibility might be wrong, Am I missing something?

Sorry I should have set suboverflowed in the generated snapshot.
However, we can store subxid list as well when the snapshot (or
running_xact) is not overflown. These (should) works the same way.

On physical standby, snapshot is created just filling up only subxip
with all top and sub xids (procrray.c:2400).  It would be better we do
the same thing here.

> > and reorder buffer knows whether a xid is a top-level or not after
> > establishing a consistent snapshot.
> >
> > The attached diff tries to make SnapBuildInitialSnapshot exclude
> > subtransaction from generated snapshots.  It seems working fine for
> > you repro. (Of course, I'm not confident that it is the correct thing,
> > though..)
> >
> > What do you think about this?
> 
> If your statement that we only need top-xids in the exported snapshot,
> is true then this fix looks fine to me.   If not then we might need to
> add the sub-xids in the snapshot->subxip array and if it crosses the
> limit of GetMaxSnapshotSubxidCount(), then we can mark "suboverflowed"
> flag.

So I came up with the attached version.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 374a10aa6819224ca6af548100aa34e6c772a2c3 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Tue, 12 Oct 2021 13:53:27 +0900
Subject: [PATCH] Allow overflowed snapshot when creating logical replication
 slot

Snapshot can hold top XIDs up to the number of server processes but
SnapBuildInitialSnapshot tries to store all top-level and sub XIDs to
there and fails for certain circumstances. Instead, create a
"takenDuringRecovery" snapshot instead, which stores all XIDs in
subxip array.  Addition to that, allow to create an overflowed
snapshot by adding to reorder buffer a function to tell whether an XID
is a top-level or not.
---
 .../replication/logical/reorderbuffer.c       | 20 ++++++++
 src/backend/replication/logical/snapbuild.c   | 50 ++++++++++++++-----
 src/include/replication/reorderbuffer.h       |  1 +
 3 files changed, 59 insertions(+), 12 deletions(-)

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 46e66608cf..4e452cce7c 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3326,6 +3326,26 @@ ReorderBufferXidHasBaseSnapshot(ReorderBuffer *rb, TransactionId xid)
 }
 
 
+/*
+ * ReorderBufferXidIsKnownSubXact
+ *        Returns true if the xid is a known subtransaction.
+ */
+bool
+ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid)
+{
+    ReorderBufferTXN *txn;
+
+    txn = ReorderBufferTXNByXid(rb, xid, false,
+                                NULL, InvalidXLogRecPtr, false);
+
+    /* a known subtxn? */
+    if (txn && rbtxn_is_known_subxact(txn))
+        return true;
+
+    return false;
+}
+
+
 /*
  * ---------------------------------------
  * Disk serialization support
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index a5333349a8..d422315717 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -531,8 +531,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 {
     Snapshot    snap;
     TransactionId xid;
-    TransactionId *newxip;
-    int            newxcnt = 0;
+    TransactionId *newsubxip;
+    int            newsubxcnt;
+    bool        overflowed = false;
 
     Assert(!FirstSnapshotSet);
     Assert(XactIsoLevel == XACT_REPEATABLE_READ);
@@ -568,9 +569,12 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
     MyProc->xmin = snap->xmin;
 
-    /* allocate in transaction context */
-    newxip = (TransactionId *)
-        palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
+    /*
+     * Allocate in transaction context. We use only subxip to store xids. See
+     * GetSnapshotData for details.
+     */
+    newsubxip = (TransactionId *)
+        palloc(sizeof(TransactionId) * GetMaxSnapshotSubxidCount());
 
     /*
      * snapbuild.c builds transactions in an "inverted" manner, which means it
@@ -578,6 +582,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
      * classical snapshot by marking all non-committed transactions as
      * in-progress. This can be expensive.
      */
+retry:
+    newsubxcnt = 0;
+
     for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
     {
         void       *test;
@@ -591,12 +598,29 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
         if (test == NULL)
         {
-            if (newxcnt >= GetMaxSnapshotXidCount())
-                ereport(ERROR,
-                        (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-                         errmsg("initial slot snapshot too large")));
+            bool add = true;
 
-            newxip[newxcnt++] = xid;
+            /* exlude subxids when subxip is overflowed */
+            if (overflowed &&
+                ReorderBufferXidIsKnownSubXact(builder->reorder, xid))
+                add = false;
+
+            if (add)
+            {
+                if (newsubxcnt >= GetMaxSnapshotSubxidCount())
+                {
+                    if (overflowed)
+                        ereport(ERROR,
+                                (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+                                 errmsg("initial slot snapshot too large")));
+
+                    /* retry excluding subxids */
+                    overflowed = true;
+                    goto retry;
+                }
+
+                newsubxip[newsubxcnt++] = xid;
+            }
         }
 
         TransactionIdAdvance(xid);
@@ -604,8 +628,10 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
     /* adjust remaining snapshot fields as needed */
     snap->snapshot_type = SNAPSHOT_MVCC;
-    snap->xcnt = newxcnt;
-    snap->xip = newxip;
+    snap->xcnt = 0;
+    snap->subxcnt = newsubxcnt;
+    snap->subxip = newsubxip;
+    snap->suboverflowed = overflowed;
 
     return snap;
 }
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 5b40ff75f7..e5fa1051d7 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -669,6 +669,7 @@ void        ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLog
 bool        ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
 bool        ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
 
+bool        ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid);
 bool        ReorderBufferRememberPrepareInfo(ReorderBuffer *rb, TransactionId xid,
                                              XLogRecPtr prepare_lsn, XLogRecPtr end_lsn,
                                              TimestampTz prepare_time,
-- 
2.27.0


Re: Error "initial slot snapshot too large" in create replication slot

От
Kyotaro Horiguchi
Дата:
At Tue, 12 Oct 2021 13:59:59 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> So I came up with the attached version.

Sorry, it was losing a piece of change.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 424f405b4c9d41471eae1edd48cdbb6a6d47217e Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Tue, 12 Oct 2021 13:53:27 +0900
Subject: [PATCH v2] Allow overflowed snapshot when creating logical
 replication slot

Snapshot can hold top XIDs up to the number of server processes but
SnapBuildInitialSnapshot tries to store all top-level and sub XIDs to
there and fails for certain circumstances. Instead, create a
"takenDuringRecovery" snapshot instead, which stores all XIDs in
subxip array.  Addition to that, allow to create an overflowed
snapshot by adding to reorder buffer a function to tell whether an XID
is a top-level or not.
---
 .../replication/logical/reorderbuffer.c       | 20 +++++++
 src/backend/replication/logical/snapbuild.c   | 56 +++++++++++++++----
 src/include/replication/reorderbuffer.h       |  1 +
 3 files changed, 65 insertions(+), 12 deletions(-)

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 46e66608cf..4e452cce7c 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3326,6 +3326,26 @@ ReorderBufferXidHasBaseSnapshot(ReorderBuffer *rb, TransactionId xid)
 }
 
 
+/*
+ * ReorderBufferXidIsKnownSubXact
+ *        Returns true if the xid is a known subtransaction.
+ */
+bool
+ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid)
+{
+    ReorderBufferTXN *txn;
+
+    txn = ReorderBufferTXNByXid(rb, xid, false,
+                                NULL, InvalidXLogRecPtr, false);
+
+    /* a known subtxn? */
+    if (txn && rbtxn_is_known_subxact(txn))
+        return true;
+
+    return false;
+}
+
+
 /*
  * ---------------------------------------
  * Disk serialization support
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index a5333349a8..46c691df20 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -531,8 +531,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 {
     Snapshot    snap;
     TransactionId xid;
-    TransactionId *newxip;
-    int            newxcnt = 0;
+    TransactionId *newsubxip;
+    int            newsubxcnt;
+    bool        overflowed = false;
 
     Assert(!FirstSnapshotSet);
     Assert(XactIsoLevel == XACT_REPEATABLE_READ);
@@ -568,9 +569,12 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
     MyProc->xmin = snap->xmin;
 
-    /* allocate in transaction context */
-    newxip = (TransactionId *)
-        palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
+    /*
+     * Allocate in transaction context. We use only subxip to store xids. See
+     * GetSnapshotData for details.
+     */
+    newsubxip = (TransactionId *)
+        palloc(sizeof(TransactionId) * GetMaxSnapshotSubxidCount());
 
     /*
      * snapbuild.c builds transactions in an "inverted" manner, which means it
@@ -578,6 +582,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
      * classical snapshot by marking all non-committed transactions as
      * in-progress. This can be expensive.
      */
+retry:
+    newsubxcnt = 0;
+
     for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
     {
         void       *test;
@@ -591,12 +598,29 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
         if (test == NULL)
         {
-            if (newxcnt >= GetMaxSnapshotXidCount())
-                ereport(ERROR,
-                        (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-                         errmsg("initial slot snapshot too large")));
+            bool add = true;
 
-            newxip[newxcnt++] = xid;
+            /* exlude subxids when subxip is overflowed */
+            if (overflowed &&
+                ReorderBufferXidIsKnownSubXact(builder->reorder, xid))
+                add = false;
+
+            if (add)
+            {
+                if (newsubxcnt >= GetMaxSnapshotSubxidCount())
+                {
+                    if (overflowed)
+                        ereport(ERROR,
+                                (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+                                 errmsg("initial slot snapshot too large")));
+
+                    /* retry excluding subxids */
+                    overflowed = true;
+                    goto retry;
+                }
+
+                newsubxip[newsubxcnt++] = xid;
+            }
         }
 
         TransactionIdAdvance(xid);
@@ -604,8 +628,16 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
     /* adjust remaining snapshot fields as needed */
     snap->snapshot_type = SNAPSHOT_MVCC;
-    snap->xcnt = newxcnt;
-    snap->xip = newxip;
+    snap->xcnt = 0;
+    snap->subxcnt = newsubxcnt;
+    snap->subxip = newsubxip;
+    snap->suboverflowed = overflowed;
+
+    /*
+     * Although this snapshot is taken actually not during recovery, all XIDs
+     * are stored in subxip even if it is not overflowed.
+     */
+    snap->takenDuringRecovery = true;
 
     return snap;
 }
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 5b40ff75f7..e5fa1051d7 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -669,6 +669,7 @@ void        ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLog
 bool        ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
 bool        ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
 
+bool        ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid);
 bool        ReorderBufferRememberPrepareInfo(ReorderBuffer *rb, TransactionId xid,
                                              XLogRecPtr prepare_lsn, XLogRecPtr end_lsn,
                                              TimestampTz prepare_time,
-- 
2.27.0


Re: Error "initial slot snapshot too large" in create replication slot

От
Dilip Kumar
Дата:
On Tue, Oct 12, 2021 at 10:35 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Tue, 12 Oct 2021 13:59:59 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
> > So I came up with the attached version.
>
> Sorry, it was losing a piece of change.

Yeah, at a high level this is on the idea I have in mind, I will do a
detailed review in a day or two.  Thanks for working on this.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Error "initial slot snapshot too large" in create replication slot

От
Dilip Kumar
Дата:
On Tue, Oct 12, 2021 at 11:30 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Oct 12, 2021 at 10:35 AM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> >
> > At Tue, 12 Oct 2021 13:59:59 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
> > > So I came up with the attached version.
> >
> > Sorry, it was losing a piece of change.
>
> Yeah, at a high level this is on the idea I have in mind, I will do a
> detailed review in a day or two.  Thanks for working on this.

While doing the detailed review, I think there are a couple of
problems with the patch,  the main problem of storing all the xid in
the snap->subxip is that once we mark the snapshot overflown then the
XidInMVCCSnapshot, will not search the subxip array, instead it will
fetch the topXid and search in the snap->xip array.  Another issue is
that the total xids could be GetMaxSnapshotSubxidCount()
+GetMaxSnapshotXidCount().

I think what we should be doing is that if the xid is know subxid then
add in the snap->subxip array otherwise in snap->xip array.  So
snap->xip array size will be GetMaxSnapshotXidCount() whereas the
snap->subxip array size will be GetMaxSnapshotSubxidCount().  And if
the array size is full then we can stop adding the subxids to the
array.

What is your thought on this?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Error "initial slot snapshot too large" in create replication slot

От
Dilip Kumar
Дата:
On Tue, Oct 19, 2021 at 2:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Oct 12, 2021 at 11:30 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Tue, Oct 12, 2021 at 10:35 AM Kyotaro Horiguchi
> > <horikyota.ntt@gmail.com> wrote:
> > >
> > > At Tue, 12 Oct 2021 13:59:59 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
> > > > So I came up with the attached version.
> > >
> > > Sorry, it was losing a piece of change.
> >
> > Yeah, at a high level this is on the idea I have in mind, I will do a
> > detailed review in a day or two.  Thanks for working on this.
>
> While doing the detailed review, I think there are a couple of
> problems with the patch,  the main problem of storing all the xid in
> the snap->subxip is that once we mark the snapshot overflown then the
> XidInMVCCSnapshot, will not search the subxip array, instead it will
> fetch the topXid and search in the snap->xip array.

I missed that you are marking the snapshot as takenDuringRecovery so
your fix looks fine.

Another issue is
> that the total xids could be GetMaxSnapshotSubxidCount()
> +GetMaxSnapshotXidCount().
>

I have fixed this, the updated patch is attached.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: Error "initial slot snapshot too large" in create replication slot

От
Julien Rouhaud
Дата:
Hi,

On Tue, Nov 02, 2021 at 04:40:39PM +0530, Dilip Kumar wrote:
> 
> I have fixed this, the updated patch is attached.

The cfbot reports that this patch doesn't compile:
https://cirrus-ci.com/task/5642000073490432?logs=build

[03:01:24.477] snapbuild.c: In function ‘SnapBuildInitialSnapshot’:
[03:01:24.477] snapbuild.c:587:2: error: ‘newsubxcnt’ undeclared (first use in this function); did you mean
‘newsubxip’?
[03:01:24.477]   587 |  newsubxcnt = 0;
[03:01:24.477]       |  ^~~~~~~~~~
[03:01:24.477]       |  newsubxip
[03:01:24.477] snapbuild.c:587:2: note: each undeclared identifier is reported only once for each function it appears
in
[03:01:24.477] snapbuild.c:535:8: warning: unused variable ‘maxxidcnt’ [-Wunused-variable]
[03:01:24.477]   535 |  int   maxxidcnt;
[03:01:24.477]       |        ^~~~~~~~~

Could you send a new version?  In the meantime I will switch the patch to
Waiting on Author.



Re: Error "initial slot snapshot too large" in create replication slot

От
Dilip Kumar
Дата:
On Wed, Jan 12, 2022 at 4:09 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
Hi,

On Tue, Nov 02, 2021 at 04:40:39PM +0530, Dilip Kumar wrote:
>
> I have fixed this, the updated patch is attached.

The cfbot reports that this patch doesn't compile:
https://cirrus-ci.com/task/5642000073490432?logs=build

[03:01:24.477] snapbuild.c: In function ‘SnapBuildInitialSnapshot’:
[03:01:24.477] snapbuild.c:587:2: error: ‘newsubxcnt’ undeclared (first use in this function); did you mean ‘newsubxip’?
[03:01:24.477]   587 |  newsubxcnt = 0;
[03:01:24.477]       |  ^~~~~~~~~~
[03:01:24.477]       |  newsubxip
[03:01:24.477] snapbuild.c:587:2: note: each undeclared identifier is reported only once for each function it appears in
[03:01:24.477] snapbuild.c:535:8: warning: unused variable ‘maxxidcnt’ [-Wunused-variable]
[03:01:24.477]   535 |  int   maxxidcnt;
[03:01:24.477]       |        ^~~~~~~~~

Could you send a new version?  In the meantime I will switch the patch to
Waiting on Author.

Thanks for notifying, I will work on this and send the update patch soon.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Error "initial slot snapshot too large" in create replication slot

От
Kyotaro Horiguchi
Дата:
At Tue, 2 Nov 2021 16:40:39 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in 
> On Tue, Oct 19, 2021 at 2:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Tue, Oct 12, 2021 at 11:30 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Tue, Oct 12, 2021 at 10:35 AM Kyotaro Horiguchi
> > > <horikyota.ntt@gmail.com> wrote:
> > > >
> > > > At Tue, 12 Oct 2021 13:59:59 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
> > > > > So I came up with the attached version.
> > > >
> > > > Sorry, it was losing a piece of change.
> > >
> > > Yeah, at a high level this is on the idea I have in mind, I will do a
> > > detailed review in a day or two.  Thanks for working on this.
> >
> > While doing the detailed review, I think there are a couple of
> > problems with the patch,  the main problem of storing all the xid in
> > the snap->subxip is that once we mark the snapshot overflown then the
> > XidInMVCCSnapshot, will not search the subxip array, instead it will
> > fetch the topXid and search in the snap->xip array.
> 
> I missed that you are marking the snapshot as takenDuringRecovery so
> your fix looks fine.
> 
> Another issue is
> > that the total xids could be GetMaxSnapshotSubxidCount()
> > +GetMaxSnapshotXidCount().
> >
> 
> I have fixed this, the updated patch is attached.

Mmm. The size of the array cannot be larger than the numbers the
*Connt() functions return.  Thus we cannot attach the oversized array
to ->subxip.  (I don't recall clearly but that would lead to assertion
failure somewhere..)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Error "initial slot snapshot too large" in create replication slot

От
Kyotaro Horiguchi
Дата:
At Mon, 17 Jan 2022 09:27:14 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in 
> On Wed, Jan 12, 2022 at 4:09 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > The cfbot reports that this patch doesn't compile:
> > https://cirrus-ci.com/task/5642000073490432?logs=build
> >
> > [03:01:24.477] snapbuild.c: In function ‘SnapBuildInitialSnapshot’:
> > [03:01:24.477] snapbuild.c:587:2: error: ‘newsubxcnt’ undeclared (first
> > use in this function); did you mean ‘newsubxip’?
> > [03:01:24.477]   587 |  newsubxcnt = 0;
> > [03:01:24.477]       |  ^~~~~~~~~~
> > [03:01:24.477]       |  newsubxip
> > [03:01:24.477] snapbuild.c:587:2: note: each undeclared identifier is
> > reported only once for each function it appears in
> > [03:01:24.477] snapbuild.c:535:8: warning: unused variable ‘maxxidcnt’
> > [-Wunused-variable]
> > [03:01:24.477]   535 |  int   maxxidcnt;
> > [03:01:24.477]       |        ^~~~~~~~~
> >
> > Could you send a new version?  In the meantime I will switch the patch to
> > Waiting on Author.
> >
> 
> Thanks for notifying, I will work on this and send the update patch soon.

me> Mmm. The size of the array cannot be larger than the numbers the
me> *Connt() functions return.  Thus we cannot attach the oversized array
me> to ->subxip.  (I don't recall clearly but that would lead to assertion
me> failure somewhere..)

Then, I fixed the v3 error and post v4.

To recap:

SnapBUildInitialSnapshot tries to store XIDS of both top and sub
transactions into snapshot->xip array but the array is easily
overflowed and CREATE_REPLICATOIN_SLOT command ends with an error.

To fix this, this patch is doing the following things.

- Use subxip array instead of xip array to allow us have larger array
  for xids.  So the snapshot is marked as takenDuringRecovery, which
  is a kind of abuse but largely reduces the chance of getting
  "initial slot snapshot too large" error.

- Still if subxip is overflowed, retry with excluding subtransactions
  then set suboverflowed.  This causes XidInMVCCSnapshot (finally)
  scans over subxip array for targetted top-level xid.

We could take another way: make a !takenDuringRecovery snapshot by
using xip instead of subxip. It is cleaner but it has far larger
chance of needing to retry.

(renamed the patch since it represented a part of the patch)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 1e3ec40d70c2e7f8482632333001fe52527ef17a Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Mon, 31 Jan 2022 15:03:33 +0900
Subject: [PATCH v4] Avoid an error while creating logical replication slot

Snapshot can hold top XIDs up to the number of server processes but
SnapBuildInitialSnapshot tries to store all top-level and sub XIDs to
there and easily fails with "initial slot snapshot too large" error.

Instead, create a "takenDuringRecovery" snapshot, which stores all
XIDs in subxip array. This alone largely reduces the chance of getting
the error. But to eliminate the chance of the error to occur, allow to
create an overflowed snapshot by adding to reorder buffer a function
to tell whether an XID is a top-level or not.

Author: Dilip Kumar and Kyotaro Horiguchi
---
 .../replication/logical/reorderbuffer.c       | 20 +++++++
 src/backend/replication/logical/snapbuild.c   | 54 ++++++++++++++-----
 src/include/replication/reorderbuffer.h       |  1 +
 3 files changed, 63 insertions(+), 12 deletions(-)

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 19b2ba2000..429e6296ab 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3326,6 +3326,26 @@ ReorderBufferXidHasBaseSnapshot(ReorderBuffer *rb, TransactionId xid)
 }
 
 
+/*
+ * ReorderBufferXidIsKnownSubXact
+ *        Returns true if the xid is a known subtransaction.
+ */
+bool
+ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid)
+{
+    ReorderBufferTXN *txn;
+
+    txn = ReorderBufferTXNByXid(rb, xid, false,
+                                NULL, InvalidXLogRecPtr, false);
+
+    /* a known subtxn? */
+    if (txn && rbtxn_is_known_subxact(txn))
+        return true;
+
+    return false;
+}
+
+
 /*
  * ---------------------------------------
  * Disk serialization support
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 83fca8a77d..21f30fa1d3 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -531,8 +531,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 {
     Snapshot    snap;
     TransactionId xid;
-    TransactionId *newxip;
-    int            newxcnt = 0;
+    TransactionId *newsubxip;
+    int            newsubxcnt;
+    bool        overflowed = false;
 
     Assert(!FirstSnapshotSet);
     Assert(XactIsoLevel == XACT_REPEATABLE_READ);
@@ -568,9 +569,12 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
     MyProc->xmin = snap->xmin;
 
-    /* allocate in transaction context */
-    newxip = (TransactionId *)
-        palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
+    /*
+     * Allocate in transaction context. We use only subxip to store xids. See
+     * below and GetSnapshotData for details.
+     */
+    newsubxip = (TransactionId *)
+        palloc(sizeof(TransactionId) * GetMaxSnapshotSubxidCount());
 
     /*
      * snapbuild.c builds transactions in an "inverted" manner, which means it
@@ -578,6 +582,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
      * classical snapshot by marking all non-committed transactions as
      * in-progress. This can be expensive.
      */
+retry:
+    newsubxcnt = 0;
+
     for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
     {
         void       *test;
@@ -591,12 +598,28 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
         if (test == NULL)
         {
-            if (newxcnt >= GetMaxSnapshotXidCount())
-                ereport(ERROR,
-                        (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-                         errmsg("initial slot snapshot too large")));
+            /*
+             * If subtransactions fit the subxid array, we fill the array and
+             * call it a day. Otherwise we store only top-level transactions
+             * into the same subxip array.
+             */
+            if (!overflowed ||
+                !ReorderBufferXidIsKnownSubXact(builder->reorder, xid))
+            {
+                if (newsubxcnt >= GetMaxSnapshotSubxidCount())
+                {
+                    if (overflowed)
+                        ereport(ERROR,
+                                (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+                                 errmsg("initial slot snapshot too large")));
 
-            newxip[newxcnt++] = xid;
+                    /* retry excluding subxids */
+                    overflowed = true;
+                    goto retry;
+                }
+
+                newsubxip[newsubxcnt++] = xid;
+            }
         }
 
         TransactionIdAdvance(xid);
@@ -604,8 +627,15 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
     /* adjust remaining snapshot fields as needed */
     snap->snapshot_type = SNAPSHOT_MVCC;
-    snap->xcnt = newxcnt;
-    snap->xip = newxip;
+    snap->xcnt = 0;
+    snap->subxcnt = newsubxcnt;
+    snap->subxip = newsubxip;
+    snap->suboverflowed = overflowed;
+    /*
+     * Although this snapshot is taken actually not during recovery, all XIDs
+     * are stored in subxip even if it is not overflowed.
+     */
+    snap->takenDuringRecovery = true;
 
     return snap;
 }
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index aa0a73382f..c05cf3078c 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -669,6 +669,7 @@ void        ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLog
 bool        ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
 bool        ReorderBufferXidHasBaseSnapshot(ReorderBuffer *, TransactionId xid);
 
+bool        ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid);
 bool        ReorderBufferRememberPrepareInfo(ReorderBuffer *rb, TransactionId xid,
                                              XLogRecPtr prepare_lsn, XLogRecPtr end_lsn,
                                              TimestampTz prepare_time,
-- 
2.27.0


Re: Error "initial slot snapshot too large" in create replication slot

От
Dilip Kumar
Дата:
On Mon, Jan 31, 2022 at 11:50 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> SnapBUildInitialSnapshot tries to store XIDS of both top and sub
> transactions into snapshot->xip array but the array is easily
> overflowed and CREATE_REPLICATOIN_SLOT command ends with an error.
>
> To fix this, this patch is doing the following things.
>
> - Use subxip array instead of xip array to allow us have larger array
>   for xids.  So the snapshot is marked as takenDuringRecovery, which
>   is a kind of abuse but largely reduces the chance of getting
>   "initial slot snapshot too large" error.
>
> - Still if subxip is overflowed, retry with excluding subtransactions
>   then set suboverflowed.  This causes XidInMVCCSnapshot (finally)
>   scans over subxip array for targetted top-level xid.
>
> We could take another way: make a !takenDuringRecovery snapshot by
> using xip instead of subxip. It is cleaner but it has far larger
> chance of needing to retry.
>
> (renamed the patch since it represented a part of the patch)
>

Thanks for the updated version. I will look into it this week.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Error "initial slot snapshot too large" in create replication slot

От
Dilip Kumar
Дата:
On Mon, Jan 31, 2022 at 11:50 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Mon, 17 Jan 2022 09:27:14 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in
>
> me> Mmm. The size of the array cannot be larger than the numbers the
> me> *Connt() functions return.  Thus we cannot attach the oversized array
> me> to ->subxip.  (I don't recall clearly but that would lead to assertion
> me> failure somewhere..)
>
> Then, I fixed the v3 error and post v4.

Yeah you are right, SetTransactionSnapshot() has that assertion.
Anyway after looking again it appears that
GetMaxSnapshotSubxidCount is the correct size because this is
PGPROC_MAX_CACHED_SUBXIDS +1, i.e. it considers top transactions as
well so we don't need to add them separately.

>
> SnapBUildInitialSnapshot tries to store XIDS of both top and sub
> transactions into snapshot->xip array but the array is easily
> overflowed and CREATE_REPLICATOIN_SLOT command ends with an error.
>
> To fix this, this patch is doing the following things.
>
> - Use subxip array instead of xip array to allow us have larger array
>   for xids.  So the snapshot is marked as takenDuringRecovery, which
>   is a kind of abuse but largely reduces the chance of getting
>   "initial slot snapshot too large" error.

Right. I think the patch looks fine to me.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Error "initial slot snapshot too large" in create replication slot

От
Yura Sokolov
Дата:
В Пн, 07/02/2022 в 13:52 +0530, Dilip Kumar пишет:
> On Mon, Jan 31, 2022 at 11:50 AM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> > At Mon, 17 Jan 2022 09:27:14 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in
> > 
> > me> Mmm. The size of the array cannot be larger than the numbers the
> > me> *Connt() functions return.  Thus we cannot attach the oversized array
> > me> to ->subxip.  (I don't recall clearly but that would lead to assertion
> > me> failure somewhere..)
> > 
> > Then, I fixed the v3 error and post v4.
> 
> Yeah you are right, SetTransactionSnapshot() has that assertion.
> Anyway after looking again it appears that
> GetMaxSnapshotSubxidCount is the correct size because this is
> PGPROC_MAX_CACHED_SUBXIDS +1, i.e. it considers top transactions as
> well so we don't need to add them separately.
> 
> > SnapBUildInitialSnapshot tries to store XIDS of both top and sub
> > transactions into snapshot->xip array but the array is easily
> > overflowed and CREATE_REPLICATOIN_SLOT command ends with an error.
> > 
> > To fix this, this patch is doing the following things.
> > 
> > - Use subxip array instead of xip array to allow us have larger array
> >   for xids.  So the snapshot is marked as takenDuringRecovery, which
> >   is a kind of abuse but largely reduces the chance of getting
> >   "initial slot snapshot too large" error.
> 
> Right. I think the patch looks fine to me.
> 

Good day.

I've looked to the patch. Personally I'd prefer dynamically resize
xip array. But I think there is issue with upgrade if replica source
is upgraded before destination, right?

Concerning patch, I think more comments should be written about new
usage case for `takenDuringRecovery`. May be this field should be renamed
at all?

And there are checks for `takenDuringRecovery` in `heapgetpage` and
`heapam_scan_sample_next_tuple`. Are this checks affected by the change?
Neither the preceding discussion nor commit message answer me.

-------

regards

Yura Sokolov
Postgres Professional
y.sokolov@postgrespro.ru
funny.falcon@gmail.com




Re: Error "initial slot snapshot too large" in create replication slot

От
Kyotaro Horiguchi
Дата:
At Sun, 13 Feb 2022 17:35:38 +0300, Yura Sokolov <y.sokolov@postgrespro.ru> wrote in
> В Пн, 07/02/2022 в 13:52 +0530, Dilip Kumar пишет:
> > Right. I think the patch looks fine to me.
> >
>
> Good day.
>
> I've looked to the patch. Personally I'd prefer dynamically resize
> xip array. But I think there is issue with upgrade if replica source
> is upgraded before destination, right?

I don't think it is relevant.  I think we don't convey snapshot via
streaming replication.  But I think that expanding xip or subxip is
wrong, since it is tied with ProcArray structure. (Even though we
abuse the arrays in some situations, like this).

> Concerning patch, I think more comments should be written about new
> usage case for `takenDuringRecovery`. May be this field should be renamed
> at all?

I don't feel the need to rename it so much. It just signals that "this
snapshot is in the form for recovery".  The more significant reason is
that I don't come up better name:p

And the comment is slightly modified and gets a pointer to detailed
comment.

+     * Although this snapshot is not acutally taken during recovery, all XIDs
+     * are stored in subxip. See GetSnapshotData for the details of that form
+     * of snapshot.


> And there are checks for `takenDuringRecovery` in `heapgetpage` and
> `heapam_scan_sample_next_tuple`. Are this checks affected by the change?
> Neither the preceding discussion nor commit message answer me.

The snapshot works correctly, but for the heapgetpage case, it foces
all_visible to be false.  That unnecessarily prevents visibility check
from skipping.

An annoying thing in SnapBuildInitialSnapshot is that we don't know
the number of xids before looping over the xid range, and we don't
want to bother sorting top-level xids and subxids unless we have to do
so.

Is it better that we hassle in SnapBuildInitialSnapshot to create a
!takenDuringRecovery snapshot?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Error "initial slot snapshot too large" in create replication slot

От
Kyotaro Horiguchi
Дата:
At Fri, 01 Apr 2022 14:44:30 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> At Sun, 13 Feb 2022 17:35:38 +0300, Yura Sokolov <y.sokolov@postgrespro.ru> wrote in 
> > And there are checks for `takenDuringRecovery` in `heapgetpage` and
> > `heapam_scan_sample_next_tuple`. Are this checks affected by the change?
> > Neither the preceding discussion nor commit message answer me.
> 
> The snapshot works correctly, but for the heapgetpage case, it foces
> all_visible to be false.  That unnecessarily prevents visibility check
> from skipping.
> 
> An annoying thing in SnapBuildInitialSnapshot is that we don't know
> the number of xids before looping over the xid range, and we don't
> want to bother sorting top-level xids and subxids unless we have to do
> so.
> 
> Is it better that we hassle in SnapBuildInitialSnapshot to create a
> !takenDuringRecovery snapshot?

So this is that. v5 creates a regular snapshot.

By the way, is there any chance this could be committed to 15?
Otherwise I'll immediately move this to the next CF.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Вложения

Re: Error "initial slot snapshot too large" in create replication slot

От
Jacob Champion
Дата:
On Thu, Mar 31, 2022 at 11:53 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> So this is that. v5 creates a regular snapshot.

This patch will need a quick rebase over 905c020bef9, which added
`extern` to several missing locations.

Thanks,
--Jacob



Re: Error "initial slot snapshot too large" in create replication slot

От
Kyotaro Horiguchi
Дата:
At Tue, 5 Jul 2022 11:32:42 -0700, Jacob Champion <jchampion@timescale.com> wrote in 
> On Thu, Mar 31, 2022 at 11:53 PM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> > So this is that. v5 creates a regular snapshot.
> 
> This patch will need a quick rebase over 905c020bef9, which added
> `extern` to several missing locations.

Thanks! Just rebased.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Вложения

Re: Error "initial slot snapshot too large" in create replication slot

От
Robert Haas
Дата:
On Mon, Jul 18, 2022 at 10:55 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> Thanks! Just rebased.

Hi,

I mentioned this patch to Andres in conversation, and he expressed a
concern that there might be no guarantee that we retain enough CLOG to
look up XIDs. Presumably this wouldn't be an issue when the snapshot
doesn't get marked suboverflowed, but what if it does?

Adding Andres in the hopes that he may comment further.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Error "initial slot snapshot too large" in create replication slot

От
Andres Freund
Дата:
Hi,

Thanks for working on this!


I think this should include a test that fails without this change and succeeds
with it...


On 2022-07-19 11:55:06 +0900, Kyotaro Horiguchi wrote:
> From abcf0a0e0b3e2de9927d8943a3e3c145ab189508 Mon Sep 17 00:00:00 2001
> From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> Date: Tue, 19 Jul 2022 11:50:29 +0900
> Subject: [PATCH v6] Create correct snapshot during CREATE_REPLICATION_SLOT

This sees a tad misleading - the previous snapshot wasn't borken, right?


> +/*
> + * ReorderBufferXidIsKnownSubXact
> + *        Returns true if the xid is a known subtransaction.
> + */
> +bool
> +ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid)
> +{
> +    ReorderBufferTXN *txn;
> +
> +    txn = ReorderBufferTXNByXid(rb, xid, false,
> +                                NULL, InvalidXLogRecPtr, false);
> +
> +    /* a known subtxn? */
> +    if (txn && rbtxn_is_known_subxact(txn))
> +        return true;
> +
> +    return false;
> +}

The comments here just seem to restate the code....


It's not obvious to me that it's the right design (or even correct) to ask
reorderbuffer about an xid being a subxid. Maybe I'm missing something, but
why would reorderbuffer even be guaranteed to know about all these subxids?


> @@ -568,9 +571,17 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
>
>      MyProc->xmin = snap->xmin;
>
> -    /* allocate in transaction context */
> +    /*
> +     * Allocate in transaction context.
> +     *
> +     * We could use only subxip to store all xids (takenduringrecovery
> +     * snapshot) but that causes useless visibility checks later so we hasle to
> +     * create a normal snapshot.
> +     */

I can't really parse this comment at this point, and I seriously doubt I could
later on.


> @@ -591,12 +605,24 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
>
>          if (test == NULL)
>          {
> -            if (newxcnt >= GetMaxSnapshotXidCount())
> -                ereport(ERROR,
> -                        (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
> -                         errmsg("initial slot snapshot too large")));
> -
> -            newxip[newxcnt++] = xid;
> +            /* Store the xid to the appropriate xid array */
> +            if (ReorderBufferXidIsKnownSubXact(builder->reorder, xid))
> +            {
> +                if (!overflowed)
> +                {
> +                    if (newsubxcnt >= GetMaxSnapshotSubxidCount())
> +                        overflowed = true;
> +                    else
> +                        newsubxip[newsubxcnt++] = xid;
> +                }
> +            }
> +            else
> +            {
> +                if (newxcnt >= GetMaxSnapshotXidCount())
> +                    elog(ERROR,
> +                         "too many transactions while creating snapshot");
> +                newxip[newxcnt++] = xid;
> +            }
>          }

Hm, this is starting to be pretty deeply nested...


I wonder if a better fix here wouldn't be to allow importing a snapshot with a
larger ->xid array. Yes, we can't do that in CurrentSnapshotData, but IIRC we
need to be in a transactional snapshot anyway, which is copied anyway?

Greetings,

Andres Freund



Re: Error "initial slot snapshot too large" in create replication slot

От
Andres Freund
Дата:
Hi,

On 2022-09-09 13:19:14 -0400, Robert Haas wrote:
> I mentioned this patch to Andres in conversation, and he expressed a
> concern that there might be no guarantee that we retain enough CLOG to
> look up XIDs.

I was concerned we wouldn't keep enough subtrans, rather than clog. But I
think we're ok, because we need to have an appropriate ->xmin for exporting /
importing the snapshot.

Greetings,

Andres Freund



Re: Error "initial slot snapshot too large" in create replication slot

От
Dilip Kumar
Дата:
On Tue, Sep 13, 2022 at 3:22 AM Andres Freund <andres@anarazel.de> wrote:

>
> It's not obvious to me that it's the right design (or even correct) to ask
> reorderbuffer about an xid being a subxid. Maybe I'm missing something, but
> why would reorderbuffer even be guaranteed to know about all these subxids?

Yeah, you are right, the reorderbuffer will only know about the
transaction for which changes got added to the reorder buffer.  So
this seems not to be the right design idea.

>
> I wonder if a better fix here wouldn't be to allow importing a snapshot with a
> larger ->xid array. Yes, we can't do that in CurrentSnapshotData, but IIRC we
> need to be in a transactional snapshot anyway, which is copied anyway?

Yeah when I first found this issue, I thought that should be the
solution.  But later it went in a different direction.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Error "initial slot snapshot too large" in create replication slot

От
Kyotaro Horiguchi
Дата:
Thanks for raizing this up, Robert and the comment, Andres.

At Tue, 13 Sep 2022 07:00:42 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in 
> On Tue, Sep 13, 2022 at 3:22 AM Andres Freund <andres@anarazel.de> wrote:
> 
> >
> > It's not obvious to me that it's the right design (or even correct) to ask
> > reorderbuffer about an xid being a subxid. Maybe I'm missing something, but
> > why would reorderbuffer even be guaranteed to know about all these subxids?
> 
> Yeah, you are right, the reorderbuffer will only know about the
> transaction for which changes got added to the reorder buffer.  So
> this seems not to be the right design idea.

That function is called after the SnapBuild reaches
SNAPBUILD_CONSISTENT state ,or SnapBuildInitialSnapshot() rejects
other than that state. That is, IIUC the top-sub relationship of all
the currently running transactions is fully known to reorder buffer.
We need a comment about that.

> > I wonder if a better fix here wouldn't be to allow importing a snapshot with a
> > larger ->xid array. Yes, we can't do that in CurrentSnapshotData, but IIRC we
> > need to be in a transactional snapshot anyway, which is copied anyway?
> 
> Yeah when I first found this issue, I thought that should be the
> solution.  But later it went in a different direction.

I think that that is the best solution if rbtxn_is_known_subxzact() is
known to be unreliable at the time.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Error "initial slot snapshot too large" in create replication slot

От
Dilip Kumar
Дата:
On Tue, Sep 13, 2022 at 11:52 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> Thanks for raizing this up, Robert and the comment, Andres.
>
> At Tue, 13 Sep 2022 07:00:42 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in
> > On Tue, Sep 13, 2022 at 3:22 AM Andres Freund <andres@anarazel.de> wrote:
> >
> > >
> > > It's not obvious to me that it's the right design (or even correct) to ask
> > > reorderbuffer about an xid being a subxid. Maybe I'm missing something, but
> > > why would reorderbuffer even be guaranteed to know about all these subxids?
> >
> > Yeah, you are right, the reorderbuffer will only know about the
> > transaction for which changes got added to the reorder buffer.  So
> > this seems not to be the right design idea.
>
> That function is called after the SnapBuild reaches
> SNAPBUILD_CONSISTENT state ,or SnapBuildInitialSnapshot() rejects
> other than that state. That is, IIUC the top-sub relationship of all
> the currently running transactions is fully known to reorder buffer.
> We need a comment about that.

I don't think this assumption is true, any xid started after switching
to the SNAPBUILD_FULL_SNAPSHOT and before switching to the
SNAPBUILD_CONSISTENT, might still be in progress so we can not
identify whether they are subxact or not from reorder buffer.

refer to this comment:
/*
* c) transition from FULL_SNAPSHOT to CONSISTENT.
*
* In FULL_SNAPSHOT state (see d) ), and this xl_running_xacts'
* oldestRunningXid is >= than nextXid from when we switched to
* FULL_SNAPSHOT. This means all transactions that are currently in
* progress have a catalog snapshot, and all their changes have been
* collected. Switch to CONSISTENT.
*/

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Error "initial slot snapshot too large" in create replication slot

От
Kyotaro Horiguchi
Дата:
At Mon, 12 Sep 2022 14:51:56 -0700, Andres Freund <andres@anarazel.de> wrote in 
> Hi,
> 
> Thanks for working on this!
> 
> 
> I think this should include a test that fails without this change and succeeds
> with it...
> 
> 
> On 2022-07-19 11:55:06 +0900, Kyotaro Horiguchi wrote:
> > From abcf0a0e0b3e2de9927d8943a3e3c145ab189508 Mon Sep 17 00:00:00 2001
> > From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> > Date: Tue, 19 Jul 2022 11:50:29 +0900
> > Subject: [PATCH v6] Create correct snapshot during CREATE_REPLICATION_SLOT
> 
> This sees a tad misleading - the previous snapshot wasn't borken, right?

I saw it kind of broken that ->xip contains sub transactions.  But I
didn't meant it's broken by "correct". Is "proper" suitable there?


> > +/*
> > + * ReorderBufferXidIsKnownSubXact
> > + *        Returns true if the xid is a known subtransaction.
> > + */
> > +bool
> > +ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid)
> > +{
> > +    ReorderBufferTXN *txn;
> > +
> > +    txn = ReorderBufferTXNByXid(rb, xid, false,
> > +                                NULL, InvalidXLogRecPtr, false);
> > +
> > +    /* a known subtxn? */
> > +    if (txn && rbtxn_is_known_subxact(txn))
> > +        return true;
> > +
> > +    return false;
> > +}
> 
> The comments here just seem to restate the code....

Yeah, it is pulled from the existing code but result looks like so..

> It's not obvious to me that it's the right design (or even correct) to ask
> reorderbuffer about an xid being a subxid. Maybe I'm missing something, but
> why would reorderbuffer even be guaranteed to know about all these subxids?

I think you're missing that the code is visited only after the reorder
buffer's state becomes SNAPBUILD_CONSISTENT. I think
rbtxn_is_known_subxact() is reliable at that stage.

> > @@ -568,9 +571,17 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
> >
> >      MyProc->xmin = snap->xmin;
> >
> > -    /* allocate in transaction context */
> > +    /*
> > +     * Allocate in transaction context.
> > +     *
> > +     * We could use only subxip to store all xids (takenduringrecovery
> > +     * snapshot) but that causes useless visibility checks later so we hasle to
> > +     * create a normal snapshot.
> > +     */
> 
> I can't really parse this comment at this point, and I seriously doubt I could
> later on.

Mmm. The "takenduringrecovery" is relly perplexing (it has been
somehow lower-cased..), but after removing the parenthesized part, it
looks like this. And it had a misspelling but I removed that word.  Is
this still unreadable?

We could use only subxip to store all xids but that causes useless
visibility checks later so we create a normal snapshot.


> > @@ -591,12 +605,24 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
> >
> >          if (test == NULL)
> >          {
> > -            if (newxcnt >= GetMaxSnapshotXidCount())
> > -                ereport(ERROR,
> > -                        (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
> > -                         errmsg("initial slot snapshot too large")));
> > -
> > -            newxip[newxcnt++] = xid;
> > +            /* Store the xid to the appropriate xid array */
> > +            if (ReorderBufferXidIsKnownSubXact(builder->reorder, xid))
> > +            {
> > +                if (!overflowed)
> > +                {
> > +                    if (newsubxcnt >= GetMaxSnapshotSubxidCount())
> > +                        overflowed = true;
> > +                    else
> > +                        newsubxip[newsubxcnt++] = xid;
> > +                }
> > +            }
> > +            else
> > +            {
> > +                if (newxcnt >= GetMaxSnapshotXidCount())
> > +                    elog(ERROR,
> > +                         "too many transactions while creating snapshot");
> > +                newxip[newxcnt++] = xid;
> > +            }
> >          }
> 
> Hm, this is starting to be pretty deeply nested...

Yeah, at least one if() is removable.

> I wonder if a better fix here wouldn't be to allow importing a snapshot with a
> larger ->xid array. Yes, we can't do that in CurrentSnapshotData, but IIRC we
> need to be in a transactional snapshot anyway, which is copied anyway?

The other reason for oversized xip array is it causes visibility check
when it is used. AFAICS XidInMVCCSnapshot has additional path for
takenDuringRecovery snapshots that contains a linear search (currently
it is replaced by pg_lfind32()). This saves us from doing this for
that snapshot.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Error "initial slot snapshot too large" in create replication slot

От
Kyotaro Horiguchi
Дата:
At Tue, 13 Sep 2022 12:08:18 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in 
> On Tue, Sep 13, 2022 at 11:52 AM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> > That function is called after the SnapBuild reaches
> > SNAPBUILD_CONSISTENT state ,or SnapBuildInitialSnapshot() rejects
> > other than that state. That is, IIUC the top-sub relationship of all
> > the currently running transactions is fully known to reorder buffer.
> > We need a comment about that.
> 
> I don't think this assumption is true, any xid started after switching
> to the SNAPBUILD_FULL_SNAPSHOT and before switching to the
> SNAPBUILD_CONSISTENT, might still be in progress so we can not
> identify whether they are subxact or not from reorder buffer.

Yeah, I misunderstood that the relationship is recorded earlier
(how?).  Thus it is not reliable in the first place.

I agree that the best way is oversized xip. 


By the way, I feel that "is >= than" is redundant or plain wrong..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Error "initial slot snapshot too large" in create replication slot

От
Kyotaro Horiguchi
Дата:
At Tue, 13 Sep 2022 15:45:07 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> At Mon, 12 Sep 2022 14:51:56 -0700, Andres Freund <andres@anarazel.de> wrote in 
> > This sees a tad misleading - the previous snapshot wasn't borken, right?
> 
> I saw it kind of broken that ->xip contains sub transactions.  But I
> didn't meant it's broken by "correct". Is "proper" suitable there?

No. It's not broken if it is takenDuringRecovery.  So this flag can be
used to notify that xip can be oversized.

I realized that rbtxn_is_known_subxact is not reliable. I'm
redirecting to oversized xip. Pleas wait for a while.

regareds.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Error "initial slot snapshot too large" in create replication slot

От
Kyotaro Horiguchi
Дата:
At Tue, 13 Sep 2022 16:10:59 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> At Tue, 13 Sep 2022 12:08:18 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in 
> > On Tue, Sep 13, 2022 at 11:52 AM Kyotaro Horiguchi
> > <horikyota.ntt@gmail.com> wrote:
> > > That function is called after the SnapBuild reaches
> > > SNAPBUILD_CONSISTENT state ,or SnapBuildInitialSnapshot() rejects
> > > other than that state. That is, IIUC the top-sub relationship of all
> > > the currently running transactions is fully known to reorder buffer.
> > > We need a comment about that.
> > 
> > I don't think this assumption is true, any xid started after switching
> > to the SNAPBUILD_FULL_SNAPSHOT and before switching to the
> > SNAPBUILD_CONSISTENT, might still be in progress so we can not
> > identify whether they are subxact or not from reorder buffer.
> 
> Yeah, I misunderstood that the relationship is recorded earlier
> (how?).  Thus it is not reliable in the first place.
> 
> I agree that the best way is oversized xip. 
> 
> 
> By the way, I feel that "is >= than" is redundant or plain wrong..

By the way GetSnapshotData() does this:

>        snapshot->subxip = (TransactionId *)
>            malloc(GetMaxSnapshotSubxidCount() * sizeof(TransactionId));
...
>    if (!snapshot->takenDuringRecovery)
...
>    else
>    {
>        subcount = KnownAssignedXidsGetAndSetXmin(snapshot->subxip, &xmin,
>                                                  xmax);

It is possible that the subxip is overrun. We need to expand the array
somehow. Or assign the array of the size (GetMaxSnapshotXidCount() +
GetMaxSnapshotSubxidCount()) for takenDuringRecovery snapshots.

(I feel deja vu..)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Error "initial slot snapshot too large" in create replication slot

От
Kyotaro Horiguchi
Дата:
Sigh..

At Tue, 13 Sep 2022 16:30:06 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> It is possible that the subxip is overrun. We need to expand the array
> somehow. Or assign the array of the size (GetMaxSnapshotXidCount() +
> GetMaxSnapshotSubxidCount()) for takenDuringRecovery snapshots.

And I found that this is already done.  What we should is doing the
same thing in snapbuild.

Sorry for the noise..

regards.
-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Error "initial slot snapshot too large" in create replication slot

От
Kyotaro Horiguchi
Дата:
At Tue, 13 Sep 2022 16:15:34 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> At Tue, 13 Sep 2022 15:45:07 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> > At Mon, 12 Sep 2022 14:51:56 -0700, Andres Freund <andres@anarazel.de> wrote in 
> > > This sees a tad misleading - the previous snapshot wasn't borken, right?
> > 
> > I saw it kind of broken that ->xip contains sub transactions.  But I
> > didn't meant it's broken by "correct". Is "proper" suitable there?
> 
> No. It's not broken if it is takenDuringRecovery.  So this flag can be
> used to notify that xip can be oversized.
> 
> I realized that rbtxn_is_known_subxact is not reliable. I'm
> redirecting to oversized xip. Pleas wait for a while.

However, the reader of saved snapshots (ImportSnapshot) has the
restriction that

>    if (xcnt < 0 || xcnt > GetMaxSnapshotXidCount())
>        ereport(ERROR,

and

>        if (xcnt < 0 || xcnt > GetMaxSnapshotSubxidCount())
>            ereport(ERROR,
 (this xid is subxcnt)

And ExportSnapshot repalces oversized subxip with overflowed.

So even when GetSnapshotData() returns a snapshot with oversized
subxip, it is saved as just "overflowed" on exporting. I don't think
this is the expected behavior since such (no xip and overflowed)
snapshot no longer works.

On the other hand, it seems to me that snapbuild doesn't like
takenDuringRecovery snapshots.

So snapshot needs additional flag signals that xip is oversized and
all xid are holded there. And also need to let takenDuringRecovery
suggest subxip is oversized.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Error "initial slot snapshot too large" in create replication slot

От
Michael Paquier
Дата:
On Tue, Sep 13, 2022 at 05:31:05PM +0900, Kyotaro Horiguchi wrote:
> And ExportSnapshot repalces oversized subxip with overflowed.
>
> So even when GetSnapshotData() returns a snapshot with oversized
> subxip, it is saved as just "overflowed" on exporting. I don't think
> this is the expected behavior since such (no xip and overflowed)
> snapshot no longer works.
>
> On the other hand, it seems to me that snapbuild doesn't like
> takenDuringRecovery snapshots.
>
> So snapshot needs additional flag signals that xip is oversized and
> all xid are holded there. And also need to let takenDuringRecovery
> suggest subxip is oversized.

The discussion seems to have stopped here.  As this is classified as a
bug fix, I have moved this patch to next CF, waiting on author for the
moment.
--
Michael

Вложения

Re: Error "initial slot snapshot too large" in create replication slot

От
Andres Freund
Дата:
Hi,

On 2022-10-12 14:10:15 +0900, Michael Paquier wrote:
> The discussion seems to have stopped here.  As this is classified as a
> bug fix, I have moved this patch to next CF, waiting on author for the
> moment.

FWIW, I view this more as lifting a limitation. I wouldn't want to
backpatch the change.

Greetings,

Andres Freund



Re: Error "initial slot snapshot too large" in create replication slot

От
"Gregory Stark (as CFM)"
Дата:
On Wed, 12 Oct 2022 at 01:10, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Sep 13, 2022 at 05:31:05PM +0900, Kyotaro Horiguchi wrote:
> > And ExportSnapshot repalces oversized subxip with overflowed.
> >
> > So even when GetSnapshotData() returns a snapshot with oversized
> > subxip, it is saved as just "overflowed" on exporting. I don't think
> > this is the expected behavior since such (no xip and overflowed)
> > snapshot no longer works.
> >
> > On the other hand, it seems to me that snapbuild doesn't like
> > takenDuringRecovery snapshots.
> >
> > So snapshot needs additional flag signals that xip is oversized and
> > all xid are holded there. And also need to let takenDuringRecovery
> > suggest subxip is oversized.
>
> The discussion seems to have stopped here.  As this is classified as a
> bug fix, I have moved this patch to next CF, waiting on author for the
> moment.

Kyotoro Horiguchi, any chance you'll be able to work on this for this
commitfest? If so shout (or anyone else is planning to push it over
the line.... Andres?) otherwise I'll move it on to the next release.


-- 
Gregory Stark
As Commitfest Manager



Re: Error "initial slot snapshot too large" in create replication slot

От
Kyotaro Horiguchi
Дата:
At Mon, 20 Mar 2023 13:46:51 -0400, "Gregory Stark (as CFM)" <stark.cfm@gmail.com> wrote in 
> On Wed, 12 Oct 2022 at 01:10, Michael Paquier <michael@paquier.xyz> wrote:
> > The discussion seems to have stopped here.  As this is classified as a
> > bug fix, I have moved this patch to next CF, waiting on author for the
> > moment.
> 
> Kyotoro Horiguchi, any chance you'll be able to work on this for this
> commitfest? If so shout (or anyone else is planning to push it over
> the line.... Andres?) otherwise I'll move it on to the next release.

Ugg. sorry for being lazy.  I have lost track of the conversation. I'm
currently working on this and will come back soon with a new version.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Error "initial slot snapshot too large" in create replication slot

От
Kyotaro Horiguchi
Дата:
At Wed, 22 Mar 2023 14:27:40 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> At Mon, 20 Mar 2023 13:46:51 -0400, "Gregory Stark (as CFM)" <stark.cfm@gmail.com> wrote in 
> > Kyotoro Horiguchi, any chance you'll be able to work on this for this
> > commitfest? If so shout (or anyone else is planning to push it over
> > the line.... Andres?) otherwise I'll move it on to the next release.
> 
> Ugg. sorry for being lazy.  I have lost track of the conversation. I'm
> currently working on this and will come back soon with a new version.

I relized that attempting to make SnapshotData.xip expansible was
making procarray.c and snapmgr.c too complicated. The main reason is
that SnapShotData is allocated in various ways, like on the stack,
using palloc including xip/subxip arrays, with palloc then allocating
xip/subxip arrays separately, or statically allocated and then having
xip/subxip arrays malloc'ed later. This variety was making the
expansion logic a mess.

So I went back to square one and decided to use subxip as an extension
for the xip array instead.

Like the comment added in the function SnapBuildInitialSnapshot
mentions, I don't think we can reliably identify top-level XIDs. So,
this patch just increases the allowed number of XIDs by using the
subxip array.

(The title of the patch was changed accordingly.)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Вложения

Re: Error "initial slot snapshot too large" in create replication slot

От
Dilip Kumar
Дата:
On Thu, Mar 23, 2023 at 10:53 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Wed, 22 Mar 2023 14:27:40 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
> > At Mon, 20 Mar 2023 13:46:51 -0400, "Gregory Stark (as CFM)" <stark.cfm@gmail.com> wrote in
> > > Kyotoro Horiguchi, any chance you'll be able to work on this for this
> > > commitfest? If so shout (or anyone else is planning to push it over
> > > the line.... Andres?) otherwise I'll move it on to the next release.
> >
> > Ugg. sorry for being lazy.  I have lost track of the conversation. I'm
> > currently working on this and will come back soon with a new version.
>
> I relized that attempting to make SnapshotData.xip expansible was
> making procarray.c and snapmgr.c too complicated. The main reason is
> that SnapShotData is allocated in various ways, like on the stack,
> using palloc including xip/subxip arrays, with palloc then allocating
> xip/subxip arrays separately, or statically allocated and then having
> xip/subxip arrays malloc'ed later. This variety was making the
> expansion logic a mess.
>
> So I went back to square one and decided to use subxip as an extension
> for the xip array instead.
>
> Like the comment added in the function SnapBuildInitialSnapshot
> mentions, I don't think we can reliably identify top-level XIDs. So,
> this patch just increases the allowed number of XIDs by using the
> subxip array.

Thanks for working on this, your idea looks fine but my only worry is
that in the future if someone tries to change the logic of
XidInMVCCSnapshot() then they must be aware that the snap->xip array
and snap->subxip array no long distinguishes the top xids and subxids.
I agree with the current logic if we are not marking sub-overflow then
there is no issue, so can we document this in the SnapshotData
structure?

Also, there are some typos in the patch
/idetify/identify
/carete/create
/Aallocate/Allocate

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Error "initial slot snapshot too large" in create replication slot

От
Kyotaro Horiguchi
Дата:
Thanks for looking this!

At Thu, 23 Mar 2023 14:15:12 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in 
> On Thu, Mar 23, 2023 at 10:53 AM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> Thanks for working on this, your idea looks fine but my only worry is
> that in the future if someone tries to change the logic of
> XidInMVCCSnapshot() then they must be aware that the snap->xip array
> and snap->subxip array no long distinguishes the top xids and subxids.

Yeah, I had the same thought when I was working on the posted version.

> I agree with the current logic if we are not marking sub-overflow then
> there is no issue, so can we document this in the SnapshotData
> structure?

(I found that it was alrady mentioned...)

In a unpublished version (what I referred to as "a mess"), I added a
flag called "topsub_mixed" to SnapshotData, indicating that XIDs of
top and sub transactions are stored in xip and subxip arrays in a
mixed manner.  However, I eventually removed it since it could only be
used for sanity checks related to suboverflowed.

I inserted the following sentense in the middle of the comments for
xip and subxip.

> In the case of !suboverflowed, there's a situation where this
> contains both top and sub-transaction IDs in a mixed manner.

And added similar a similar sentense to a comment of
XidInMVCCSnapshot.

While doning this, I realized that we could simplify and optimize XID
search code by combining the two XID arrays. If !suboverflowed, the
array stored all active XIDs of both top and
sub-transactions. Otherwise it only stores active top XIDs and maybe
XIDs of some sub-transactions. If many subXIDs are stored when
overflowed, there might lead to some degradation but I think the win
we gain from searching just one XID array in most cases outweighs
that. (I didn't do this (of course) in this version.)

> Also, there are some typos in the patch
> /idetify/identify
> /carete/create
> /Aallocate/Allocate

Oops! Thanks for pointing out them.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Вложения

Re: Error "initial slot snapshot too large" in create replication slot

От
Robert Haas
Дата:
On Thu, Mar 23, 2023 at 11:02 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> [ new patch ]

Well, I guess nobody is too excited about fixing this, because it's
been another 10 months with no discussion. Andres doesn't even seem to
think this is as much a bug as it is a limitation, for all that it's
filed in the CF application under bug fixes. I kind of wonder if we
should just close this entry in the CF, but I'll hold off on that for
now.

  /*
  * For normal MVCC snapshot this contains the all xact IDs that are in
  * progress, unless the snapshot was taken during recovery in which case
- * it's empty. For historic MVCC snapshots, the meaning is inverted, i.e.
- * it contains *committed* transactions between xmin and xmax.
+ * it's empty. In the case of !suboverflowed, there's a situation where
+ * this contains both top and sub-transaction IDs in a mixed manner.  For
+ * historic MVCC snapshots, the meaning is inverted, i.e.  it contains
+ * *committed* transactions between xmin and xmax.
  *
  * note: all ids in xip[] satisfy xmin <= xip[i] < xmax
  */

I have to say that I don't like this at all. It's bad enough that we
already use the xip/subxip arrays in two different ways depending on
the situation. Increasing that to three different ways seems painful.
How is anyone supposed to keep track of how the array is being used at
which point in the code?

If we are going to do that, I suspect it needs comment updates in more
places than what the patch does currently. For instance:

+ if (newxcnt < xiplen)
+ newxip[newxcnt++] = xid;
+ else
+ newsubxip[newsubxcnt++] = xid;

Just imagine coming across this code in 5 or 10 years and finding that
it had no comment explaining anything. Yikes!

Aside from the details of the patch, and perhaps more seriously, I'm
not really clear that we have consensus on an approach. A few
different proposals seem to have been floated, and it doesn't seem
like anybody hates anybody else's idea completely, but it doesn't
really seem like everyone agrees on what to do, either.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Error "initial slot snapshot too large" in create replication slot

От
vignesh C
Дата:
On Sat, 6 Jan 2024 at 01:47, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Mar 23, 2023 at 11:02 PM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> > [ new patch ]
>
> Well, I guess nobody is too excited about fixing this, because it's
> been another 10 months with no discussion. Andres doesn't even seem to
> think this is as much a bug as it is a limitation, for all that it's
> filed in the CF application under bug fixes. I kind of wonder if we
> should just close this entry in the CF, but I'll hold off on that for
> now.

I have changed the status of the patch to "Waiting on Author" as we
don't have a concrete patch with an accepted design which is in a
reviewable shape. We can think if we want to pursue this patch further
or probably close this in the current commitfest and start it again
when someone wants to work on this more actively.

Regards,
Vignesh



Re: Error "initial slot snapshot too large" in create replication slot

От
Kyotaro Horiguchi
Дата:
Thank you for the comments. This will help move the discussion
forward.

At Fri, 5 Jan 2024 15:17:11 -0500, Robert Haas <robertmhaas@gmail.com> wrote in 
> On Thu, Mar 23, 2023 at 11:02 PM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> > [ new patch ]
> 
> Well, I guess nobody is too excited about fixing this, because it's
> been another 10 months with no discussion. Andres doesn't even seem to
> think this is as much a bug as it is a limitation, for all that it's
> filed in the CF application under bug fixes. I kind of wonder if we
> should just close this entry in the CF, but I'll hold off on that for
> now.

Perhaps you are correct. Ultimately, this issue is largely
theoretical, and I don't believe anyone would be inconvenienced by
imposing this contraint.

>   * note: all ids in xip[] satisfy xmin <= xip[i] < xmax
>   */
> 
> I have to say that I don't like this at all. It's bad enough that we
> already use the xip/subxip arrays in two different ways depending on
> the situation. Increasing that to three different ways seems painful.
> How is anyone supposed to keep track of how the array is being used at
> which point in the code?

I understand. So, summirizing the current state briefly, I believe it
as follows:

a. snapbuild lacks a means to differentiate between top and sub xids
  during snapshot building.

b. Abusing takenDuringRecovery could lead to potential issues, so it
  has been rejected.

c. Dynamic sizing of xip is likely to have a significant impact on
  performance (as mentioned in the comments of GetSnapshotData).

d. (new!) Adding a third storing method is not favored.

As a method to satisfy these prerequisites, I think one direction
could be to split takenDuringRecovery into flags indicating the
storage method and creation timing. I present this as a last-ditch
effort. It's a rough proposal, and further validation should be
necessary. If this direction is also not acceptable, I'll proceed with
removing the CF entry.

> If we are going to do that, I suspect it needs comment updates in more
> places than what the patch does currently. For instance:
> 
> + if (newxcnt < xiplen)
> + newxip[newxcnt++] = xid;
> + else
> + newsubxip[newsubxcnt++] = xid;
> 
> Just imagine coming across this code in 5 or 10 years and finding that
> it had no comment explaining anything. Yikes!

^^;

> Aside from the details of the patch, and perhaps more seriously, I'm
> not really clear that we have consensus on an approach. A few
> different proposals seem to have been floated, and it doesn't seem
> like anybody hates anybody else's idea completely, but it doesn't
> really seem like everyone agrees on what to do, either.

I don't fully agree with that.It's not so much that I dislike other
proposals, but rather that we haven't been able to find a definitive
solution that stands out.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Вложения

Re: Error "initial slot snapshot too large" in create replication slot

От
Robert Haas
Дата:
On Thu, Jan 11, 2024 at 9:47 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> I understand. So, summirizing the current state briefly, I believe it
> as follows:
>
> a. snapbuild lacks a means to differentiate between top and sub xids
>   during snapshot building.
>
> b. Abusing takenDuringRecovery could lead to potential issues, so it
>   has been rejected.
>
> c. Dynamic sizing of xip is likely to have a significant impact on
>   performance (as mentioned in the comments of GetSnapshotData).
>
> d. (new!) Adding a third storing method is not favored.
>
> As a method to satisfy these prerequisites, I think one direction
> could be to split takenDuringRecovery into flags indicating the
> storage method and creation timing. I present this as a last-ditch
> effort. It's a rough proposal, and further validation should be
> necessary. If this direction is also not acceptable, I'll proceed with
> removing the CF entry.

I think that the idea of evolving takenDuringRecovery could
potentially work for this problem and maybe for some other things as
well. I remember studying that flag before and coming to the
conclusion that it had some pretty specific and surprising semantics
that weren't obvious from the name -- I don't remember the details --
and so I think improving it could be useful. Also, I think that
multiple storage methods could be more palatable if there were a clear
way to distinguish which one was in use and good comments explaining
the distinction in relevant places.

However, I wonder whether this whole area is in need of a bigger
rethink. There seem to be a number of situations in which the split
into xip and subxip arrays is not very convenient, and also some
situations where it's quite important. Sometimes we want to record
what's committed, and sometimes what isn't. It's all a bit messy and
inconsistent. The necessity of limiting snapshot size is annoying,
too. I have no real idea what can be done about all of this, but what
strikes me is that the current system has grown up incrementally: we
started with a data structure designed for the original use case, and
now by gradually adding new use cases things have gotten complicated.
If we were designing things over from scratch, maybe we'd do it
differently and end up with something less messy. And maybe someone
can imagine a redesign that is likewise less messy.

But on the other hand, maybe not. Perhaps we can't really do any
better than what we have. Then the question becomes whether this case
is important enough to justify additional code complexity. I don't
think I've personally seen users run into this problem so I have no
special reason to think that it's important, but if it's causing
issues for other people then maybe it is.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Error "initial slot snapshot too large" in create replication slot

От
Kyotaro Horiguchi
Дата:
At Fri, 12 Jan 2024 11:28:09 -0500, Robert Haas <robertmhaas@gmail.com> wrote in 
> However, I wonder whether this whole area is in need of a bigger
> rethink. There seem to be a number of situations in which the split
> into xip and subxip arrays is not very convenient, and also some
> situations where it's quite important. Sometimes we want to record
> what's committed, and sometimes what isn't. It's all a bit messy and
> inconsistent. The necessity of limiting snapshot size is annoying,
> too. I have no real idea what can be done about all of this, but what
> strikes me is that the current system has grown up incrementally: we
> started with a data structure designed for the original use case, and
> now by gradually adding new use cases things have gotten complicated.
> If we were designing things over from scratch, maybe we'd do it
> differently and end up with something less messy. And maybe someone
> can imagine a redesign that is likewise less messy.
> 
> But on the other hand, maybe not. Perhaps we can't really do any
> better than what we have. Then the question becomes whether this case
> is important enough to justify additional code complexity. I don't
> think I've personally seen users run into this problem so I have no
> special reason to think that it's important, but if it's causing
> issues for other people then maybe it is.

Thank you for the deep insights. I have understood your points. As I
can't think of any further simple modifications on this line, I will
withdraw this CF entry. At the moment, I also lack a fundamental,
comprehensive solution, but should if I or anyone else come up with
such a solution in the future, I believe it would worth a separate
discussion.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Error "initial slot snapshot too large" in create replication slot

От
Robert Haas
Дата:
On Mon, Jan 15, 2024 at 9:18 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> Thank you for the deep insights. I have understood your points. As I
> can't think of any further simple modifications on this line, I will
> withdraw this CF entry. At the moment, I also lack a fundamental,
> comprehensive solution, but should if I or anyone else come up with
> such a solution in the future, I believe it would worth a separate
> discussion.

I completely agree.

--
Robert Haas
EDB: http://www.enterprisedb.com