Обсуждение: Re: [PERFORM] Very slow (2 tuples/second) sequential scan after bulk insert; speed returns to ~500 tuples/second after commit

Поиск
Список
Период
Сортировка
(moved to pgsql-patches, as there's a patch attached)

Tom Lane wrote:
> Getting rid of the linked-list representation would be a win in a couple
> of ways --- we'd not need the bogus "list of XIDs" support in pg_list.h,
> and xactGetCommittedChildren would go away.  OTOH AtSubCommit_childXids
> would get more expensive.

I couldn't let this case go, so I wrote a patch. I replaced the linked
list with an array that's enlarged at AtSubCommit_childXids when necessary.

I couldn't measure any performance hit from the extra work of enlarging
and memcpying the array. I tried two test cases:

1. Insert one row per subtransaction, and commit the subtransaction
after each insert. This is like the OP's case.

         printf("CREATE TABLE foo (id int4);\n");
         printf("BEGIN;\n");
         for(i=1; i <= 100000; i++)
         {
                 printf("SAVEPOINT sp%d;\n", i);
                 printf("INSERT INTO foo VALUES (1);\n");
                 printf("RELEASE SAVEPOINT sp%d;\n", i);
         }
         printf("COMMIT;\n");

2. Insert one row per subtransaction, but leave the subtransaction in
not-committed state

         printf("CREATE TABLE foo (id int4, t text);\n");
         printf("BEGIN;\n");
         for(i=1; i <= 100000; i++)
         {
                 printf("SAVEPOINT sp%d;\n", i);
                 printf("INSERT INTO foo VALUES (1, 'f');\n");
         }
         printf("COMMIT;\n");

Test case 1 is not bad, because we just keep appending to the parent
array one at a time. Test case 2 might become slower, as the number of
subtransactions increases, as at the commit of each subtransaction you
need to enlarge the parent array and copy all the already-committed
childxids to it. However, you hit the limit on amount of shared mem
required for holding the per-xid locks before that becomes a problem,
and the performance becomes dominated by other things anyway (notably
LockReassignCurrentOwner).

I initially thought that using a single palloc'd array to hold all the
XIDs would introduce a new limit on the number committed
subtransactions, thanks to MaxAllocSize, but that's not the case.
Without patch, we actually allocate an array like that anyway in
xactGetCommittedChildren.

Elsewhere in our codebase where we use arrays that are enlarged as
needed, we keep track of the "allocated" size and the "used" size of the
array separately, and only call repalloc when the array fills up, and
repalloc a larger than necessary array when it does. I chose to just
call repalloc every time instead, as repalloc is smart enough to fall
out quickly if the chunk the allocation was made in is already larger
than the new size. There might be some gain avoiding the repeated
repalloc calls, but I doubt it's worth the code complexity, and calling
repalloc with a larger than necessary size can actually force it to
unnecessarily allocate a new, larger chunk instead of reusing the old
one. Thoughts on that?

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
Index: src/backend/access/transam/xact.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/transam/xact.c,v
retrieving revision 1.258
diff -c -r1.258 xact.c
*** src/backend/access/transam/xact.c    4 Mar 2008 19:54:06 -0000    1.258
--- src/backend/access/transam/xact.c    11 Mar 2008 12:31:28 -0000
***************
*** 130,136 ****
      int            gucNestLevel;    /* GUC context nesting depth */
      MemoryContext curTransactionContext;        /* my xact-lifetime context */
      ResourceOwner curTransactionOwner;    /* my query resources */
!     List       *childXids;        /* subcommitted child XIDs */
      Oid            prevUser;        /* previous CurrentUserId setting */
      bool        prevSecDefCxt;    /* previous SecurityDefinerContext setting */
      bool        prevXactReadOnly;        /* entry-time xact r/o state */
--- 130,137 ----
      int            gucNestLevel;    /* GUC context nesting depth */
      MemoryContext curTransactionContext;        /* my xact-lifetime context */
      ResourceOwner curTransactionOwner;    /* my query resources */
!     TransactionId    *childXids;    /* subcommitted child XIDs, in XID order */
!     int            nChildXids;        /* # of subcommitted child XIDs */
      Oid            prevUser;        /* previous CurrentUserId setting */
      bool        prevSecDefCxt;    /* previous SecurityDefinerContext setting */
      bool        prevXactReadOnly;        /* entry-time xact r/o state */
***************
*** 156,162 ****
      0,                            /* GUC context nesting depth */
      NULL,                        /* cur transaction context */
      NULL,                        /* cur transaction resource owner */
!     NIL,                        /* subcommitted child Xids */
      InvalidOid,                    /* previous CurrentUserId setting */
      false,                        /* previous SecurityDefinerContext setting */
      false,                        /* entry-time xact r/o state */
--- 157,164 ----
      0,                            /* GUC context nesting depth */
      NULL,                        /* cur transaction context */
      NULL,                        /* cur transaction resource owner */
!     NULL,                        /* subcommitted child Xids */
!     0,                            /* # of subcommitted child Xids */
      InvalidOid,                    /* previous CurrentUserId setting */
      false,                        /* previous SecurityDefinerContext setting */
      false,                        /* entry-time xact r/o state */
***************
*** 559,565 ****
       */
      for (s = CurrentTransactionState; s != NULL; s = s->parent)
      {
!         ListCell   *cell;

          if (s->state == TRANS_ABORT)
              continue;
--- 561,567 ----
       */
      for (s = CurrentTransactionState; s != NULL; s = s->parent)
      {
!         int low, high;

          if (s->state == TRANS_ABORT)
              continue;
***************
*** 567,576 ****
              continue;            /* it can't have any child XIDs either */
          if (TransactionIdEquals(xid, s->transactionId))
              return true;
!         foreach(cell, s->childXids)
          {
!             if (TransactionIdEquals(xid, lfirst_xid(cell)))
                  return true;
          }
      }

--- 569,594 ----
              continue;            /* it can't have any child XIDs either */
          if (TransactionIdEquals(xid, s->transactionId))
              return true;
!
!         /*
!          * Return true for any committed child of this subtransaction. As
!          * the childXids array is ordered, we can binary search.
!          */
!         low = 0;
!         high = s->nChildXids - 1;
!         while (low <= high)
          {
!             int                middle;
!             TransactionId    probe;
!
!             middle = low + (high - low) / 2;
!             probe = s->childXids[middle];
!             if (TransactionIdEquals(probe, xid))
                  return true;
+             else if (TransactionIdPrecedes(probe, xid))
+                 low = middle + 1;
+             else
+                 high = middle - 1;
          }
      }

***************
*** 1068,1073 ****
--- 1086,1093 ----
  {
      TransactionState s = CurrentTransactionState;
      MemoryContext old_cxt;
+     TransactionId *new_childXids;
+     TransactionId new_nChildXids;

      Assert(s->parent != NULL);

***************
*** 1078,1098 ****
       */
      old_cxt = MemoryContextSwitchTo(TopTransactionContext);

!     s->parent->childXids = lappend_xid(s->parent->childXids,
!                                        s->transactionId);

!     if (s->childXids != NIL)
!     {
!         s->parent->childXids = list_concat(s->parent->childXids,
!                                            s->childXids);

!         /*
!          * list_concat doesn't free the list header for the second list; do so
!          * here to avoid memory leakage (kluge)
!          */
          pfree(s->childXids);
-         s->childXids = NIL;
-     }

      MemoryContextSwitchTo(old_cxt);
  }
--- 1098,1132 ----
       */
      old_cxt = MemoryContextSwitchTo(TopTransactionContext);

!     /* Enlarge the parent array so that it can hold all the new xids */
!     new_nChildXids = s->parent->nChildXids + s->nChildXids + 1;

!     if (s->parent->childXids == NULL)
!         new_childXids = palloc((new_nChildXids + 10) * sizeof(TransactionId));
!     else
!         new_childXids = repalloc(s->parent->childXids,
!                                  new_nChildXids * sizeof(TransactionId));

!     /*
!      * Copy all my XIDs to parent's array.
!      *
!      * Note: We rely on the fact that the XID of a child always follows that
!      * of its parent's. By copying the XID of this subtransaction before the
!      * XIDs of its children, we ensure that the array stays ordered. Likewise,
!      * all XIDs already in the array belong to our parent, or subtransactions
!      * started and committed before us, so their XIDs must precede ours.
!      */
!     new_childXids[s->parent->nChildXids] = s->transactionId;
!
!     if (s->nChildXids > 0)
!         memcpy(&new_childXids[s->parent->nChildXids + 1],
!                s->childXids, s->nChildXids * sizeof(TransactionId));
!
!     s->parent->childXids  = new_childXids;
!     s->parent->nChildXids = new_nChildXids;
!
!     if (s->childXids != NULL)
          pfree(s->childXids);

      MemoryContextSwitchTo(old_cxt);
  }
***************
*** 1336,1343 ****
       * AtSubCommit_childXids).    This means we'd better free the list
       * explicitly at abort to avoid leakage.
       */
!     list_free(s->childXids);
!     s->childXids = NIL;
  }

  /* ----------------------------------------------------------------
--- 1370,1378 ----
       * AtSubCommit_childXids).    This means we'd better free the list
       * explicitly at abort to avoid leakage.
       */
!     if (s->childXids != NULL)
!         pfree(s->childXids);
!     s->nChildXids = 0;
  }

  /* ----------------------------------------------------------------
***************
*** 1506,1512 ****
       */
      s->nestingLevel = 1;
      s->gucNestLevel = 1;
!     s->childXids = NIL;
      GetUserIdAndContext(&s->prevUser, &s->prevSecDefCxt);
      /* SecurityDefinerContext should never be set outside a transaction */
      Assert(!s->prevSecDefCxt);
--- 1541,1548 ----
       */
      s->nestingLevel = 1;
      s->gucNestLevel = 1;
!     s->nChildXids = 0;
!     s->childXids = NULL;
      GetUserIdAndContext(&s->prevUser, &s->prevSecDefCxt);
      /* SecurityDefinerContext should never be set outside a transaction */
      Assert(!s->prevSecDefCxt);
***************
*** 1702,1708 ****
      s->subTransactionId = InvalidSubTransactionId;
      s->nestingLevel = 0;
      s->gucNestLevel = 0;
!     s->childXids = NIL;

      /*
       * done with commit processing, set current transaction state back to
--- 1738,1745 ----
      s->subTransactionId = InvalidSubTransactionId;
      s->nestingLevel = 0;
      s->gucNestLevel = 0;
!     s->nChildXids = 0;
!     s->childXids = NULL;

      /*
       * done with commit processing, set current transaction state back to
***************
*** 1931,1937 ****
      s->subTransactionId = InvalidSubTransactionId;
      s->nestingLevel = 0;
      s->gucNestLevel = 0;
!     s->childXids = NIL;

      /*
       * done with 1st phase commit processing, set current transaction state
--- 1968,1975 ----
      s->subTransactionId = InvalidSubTransactionId;
      s->nestingLevel = 0;
      s->gucNestLevel = 0;
!     s->nChildXids = 0;
!     s->childXids = NULL;

      /*
       * done with 1st phase commit processing, set current transaction state
***************
*** 2101,2107 ****
      s->subTransactionId = InvalidSubTransactionId;
      s->nestingLevel = 0;
      s->gucNestLevel = 0;
!     s->childXids = NIL;

      /*
       * done with abort processing, set current transaction state back to
--- 2139,2146 ----
      s->subTransactionId = InvalidSubTransactionId;
      s->nestingLevel = 0;
      s->gucNestLevel = 0;
!     s->nChildXids = 0;
!     s->childXids = NULL;

      /*
       * done with abort processing, set current transaction state back to
***************
*** 4051,4056 ****
--- 4090,4108 ----
  static void
  ShowTransactionStateRec(TransactionState s)
  {
+     StringInfoData buf;
+
+     initStringInfo(&buf);
+
+     if (s->nChildXids > 0)
+     {
+         int i;
+
+         appendStringInfo(&buf, "%u", s->childXids[0]);
+         for (i = 1; i < s->nChildXids; i++)
+             appendStringInfo(&buf, " %u", s->childXids[i]);
+     }
+
      if (s->parent)
          ShowTransactionStateRec(s->parent);

***************
*** 4064,4071 ****
                               (unsigned int) s->subTransactionId,
                               (unsigned int) currentCommandId,
                               currentCommandIdUsed ? " (used)" : "",
!                              s->nestingLevel,
!                              nodeToString(s->childXids))));
  }

  /*
--- 4116,4122 ----
                               (unsigned int) s->subTransactionId,
                               (unsigned int) currentCommandId,
                               currentCommandIdUsed ? " (used)" : "",
!                              s->nestingLevel, buf.data)));
  }

  /*
***************
*** 4154,4162 ****
      TransactionState s = CurrentTransactionState;
      int            nchildren;
      TransactionId *children;
-     ListCell   *p;

!     nchildren = list_length(s->childXids);
      if (nchildren == 0)
      {
          *ptr = NULL;
--- 4205,4212 ----
      TransactionState s = CurrentTransactionState;
      int            nchildren;
      TransactionId *children;

!     nchildren = s->nChildXids;
      if (nchildren == 0)
      {
          *ptr = NULL;
***************
*** 4166,4177 ****
      children = (TransactionId *) palloc(nchildren * sizeof(TransactionId));
      *ptr = children;

!     foreach(p, s->childXids)
!     {
!         TransactionId child = lfirst_xid(p);
!
!         *children++ = child;
!     }

      return nchildren;
  }
--- 4216,4222 ----
      children = (TransactionId *) palloc(nchildren * sizeof(TransactionId));
      *ptr = children;

!     memcpy(children, s->childXids, nchildren * sizeof(TransactionId));

      return nchildren;
  }
Index: src/include/nodes/pg_list.h
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/nodes/pg_list.h,v
retrieving revision 1.57
diff -c -r1.57 pg_list.h
*** src/include/nodes/pg_list.h    1 Jan 2008 19:45:58 -0000    1.57
--- src/include/nodes/pg_list.h    11 Mar 2008 11:56:21 -0000
***************
*** 26,36 ****
   * (At the moment, ints and Oids are the same size, but they may not
   * always be so; try to be careful to maintain the distinction.)
   *
-  * There is also limited support for lists of TransactionIds; since these
-  * are used in only one or two places, we don't provide a full implementation,
-  * but map them onto Oid lists.  This effectively assumes that TransactionId
-  * is no wider than Oid and both are unsigned types.
-  *
   *
   * Portions Copyright (c) 1996-2008, PostgreSQL Global Development Group
   * Portions Copyright (c) 1994, Regents of the University of California
--- 26,31 ----
***************
*** 160,171 ****
  #define list_make4_oid(x1,x2,x3,x4) lcons_oid(x1, list_make3_oid(x2, x3, x4))

  /*
-  * Limited support for lists of TransactionIds, mapped onto lists of Oids
-  */
- #define lfirst_xid(lc)                ((TransactionId) lfirst_oid(lc))
- #define lappend_xid(list, datum)    lappend_oid(list, (Oid) (datum))
-
- /*
   * foreach -
   *      a convenience macro which loops through the list
   */
--- 155,160 ----

On Tue, Mar 11, 2008 at 6:04 PM, Heikki Linnakangas
<heikki@enterprisedb.com> wrote:
> (moved to pgsql-patches, as there's a patch attached)
>
>
>  I couldn't let this case go, so I wrote a patch. I replaced the linked
>  list with an array that's enlarged at AtSubCommit_childXids when necessary.
>

We can replace the array of xids with an array of flags where i'th flag is
set to true if the corresponding subtransaction is committed. This would
make lookup O(1) operation. Of course, the downside is when the array
is sparse. Or we can switch to flag based representation once the array size
grows beyond a limit.

Thanks,
Pavan


--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Heikki Linnakangas wrote:

> I couldn't let this case go, so I wrote a patch. I replaced the linked
> list with an array that's enlarged at AtSubCommit_childXids when
> necessary.

Do you still need to palloc the return value from
xactGetCommittedChildren?  Perhaps you can save the palloc/memcpy/pfree
and just return the pointer to the array already in memory?

Not that it'll any much of a performance impact, but just for
cleanliness :-)

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Alvaro Herrera wrote:
> Heikki Linnakangas wrote:
>
>> I couldn't let this case go, so I wrote a patch. I replaced the linked
>> list with an array that's enlarged at AtSubCommit_childXids when
>> necessary.
>
> Do you still need to palloc the return value from
> xactGetCommittedChildren?  Perhaps you can save the palloc/memcpy/pfree
> and just return the pointer to the array already in memory?

Yeah, good point. The callers just need to be modified not to pfree it.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

"Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> I initially thought that using a single palloc'd array to hold all the
> XIDs would introduce a new limit on the number committed
> subtransactions, thanks to MaxAllocSize, but that's not the case.
> Without patch, we actually allocate an array like that anyway in
> xactGetCommittedChildren.

Right.

> Elsewhere in our codebase where we use arrays that are enlarged as
> needed, we keep track of the "allocated" size and the "used" size of the
> array separately, and only call repalloc when the array fills up, and
> repalloc a larger than necessary array when it does. I chose to just
> call repalloc every time instead, as repalloc is smart enough to fall
> out quickly if the chunk the allocation was made in is already larger
> than the new size. There might be some gain avoiding the repeated
> repalloc calls, but I doubt it's worth the code complexity, and calling
> repalloc with a larger than necessary size can actually force it to
> unnecessarily allocate a new, larger chunk instead of reusing the old
> one. Thoughts on that?

Seems like a pretty bad idea to me, as the behavior you're counting on
only applies to chunks up to 8K or thereabouts.  In a situation where
you are subcommitting lots of XIDs one at a time, this is likely to have
quite awful behavior (or at least, you're at the mercy of the local
malloc library as to how bad it is).  I'd go with the same
double-it-each-time-needed approach we use elsewhere.

            regards, tom lane

Tom Lane wrote:
> "Heikki Linnakangas" <heikki@enterprisedb.com> writes:
>> Elsewhere in our codebase where we use arrays that are enlarged as
>> needed, we keep track of the "allocated" size and the "used" size of the
>> array separately, and only call repalloc when the array fills up, and
>> repalloc a larger than necessary array when it does. I chose to just
>> call repalloc every time instead, as repalloc is smart enough to fall
>> out quickly if the chunk the allocation was made in is already larger
>> than the new size. There might be some gain avoiding the repeated
>> repalloc calls, but I doubt it's worth the code complexity, and calling
>> repalloc with a larger than necessary size can actually force it to
>> unnecessarily allocate a new, larger chunk instead of reusing the old
>> one. Thoughts on that?
>
> Seems like a pretty bad idea to me, as the behavior you're counting on
> only applies to chunks up to 8K or thereabouts.

Oh, you're right. Though I'm sure libc realloc has all kinds of smarts
as well, it does seem better to not rely too much on that.

> In a situation where
> you are subcommitting lots of XIDs one at a time, this is likely to have
> quite awful behavior (or at least, you're at the mercy of the local
> malloc library as to how bad it is).  I'd go with the same
> double-it-each-time-needed approach we use elsewhere.

Yep, patch attached. I also changed xactGetCommittedChildren to return
the original array instead of copying it, as Alvaro suggested.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
Index: src/backend/access/transam/twophase.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/transam/twophase.c,v
retrieving revision 1.39
diff -c -r1.39 twophase.c
*** src/backend/access/transam/twophase.c    1 Jan 2008 19:45:48 -0000    1.39
--- src/backend/access/transam/twophase.c    12 Mar 2008 12:45:13 -0000
***************
*** 827,833 ****
          save_state_data(children, hdr.nsubxacts * sizeof(TransactionId));
          /* While we have the child-xact data, stuff it in the gxact too */
          GXactLoadSubxactData(gxact, hdr.nsubxacts, children);
-         pfree(children);
      }
      if (hdr.ncommitrels > 0)
      {
--- 827,832 ----
Index: src/backend/access/transam/xact.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/transam/xact.c,v
retrieving revision 1.258
diff -c -r1.258 xact.c
*** src/backend/access/transam/xact.c    4 Mar 2008 19:54:06 -0000    1.258
--- src/backend/access/transam/xact.c    12 Mar 2008 13:41:10 -0000
***************
*** 130,136 ****
      int            gucNestLevel;    /* GUC context nesting depth */
      MemoryContext curTransactionContext;        /* my xact-lifetime context */
      ResourceOwner curTransactionOwner;    /* my query resources */
!     List       *childXids;        /* subcommitted child XIDs */
      Oid            prevUser;        /* previous CurrentUserId setting */
      bool        prevSecDefCxt;    /* previous SecurityDefinerContext setting */
      bool        prevXactReadOnly;        /* entry-time xact r/o state */
--- 130,138 ----
      int            gucNestLevel;    /* GUC context nesting depth */
      MemoryContext curTransactionContext;        /* my xact-lifetime context */
      ResourceOwner curTransactionOwner;    /* my query resources */
!     TransactionId    *childXids;    /* subcommitted child XIDs, in XID order */
!     int            nChildXids;        /* # of subcommitted child XIDs */
!     int            maxChildXids;    /* allocated size of childXids */
      Oid            prevUser;        /* previous CurrentUserId setting */
      bool        prevSecDefCxt;    /* previous SecurityDefinerContext setting */
      bool        prevXactReadOnly;        /* entry-time xact r/o state */
***************
*** 156,162 ****
      0,                            /* GUC context nesting depth */
      NULL,                        /* cur transaction context */
      NULL,                        /* cur transaction resource owner */
!     NIL,                        /* subcommitted child Xids */
      InvalidOid,                    /* previous CurrentUserId setting */
      false,                        /* previous SecurityDefinerContext setting */
      false,                        /* entry-time xact r/o state */
--- 158,166 ----
      0,                            /* GUC context nesting depth */
      NULL,                        /* cur transaction context */
      NULL,                        /* cur transaction resource owner */
!     NULL,                        /* subcommitted child Xids */
!     0,                            /* # of subcommitted child Xids */
!     0,                            /* allocated size of childXids */
      InvalidOid,                    /* previous CurrentUserId setting */
      false,                        /* previous SecurityDefinerContext setting */
      false,                        /* entry-time xact r/o state */
***************
*** 559,565 ****
       */
      for (s = CurrentTransactionState; s != NULL; s = s->parent)
      {
!         ListCell   *cell;

          if (s->state == TRANS_ABORT)
              continue;
--- 563,569 ----
       */
      for (s = CurrentTransactionState; s != NULL; s = s->parent)
      {
!         int low, high;

          if (s->state == TRANS_ABORT)
              continue;
***************
*** 567,576 ****
              continue;            /* it can't have any child XIDs either */
          if (TransactionIdEquals(xid, s->transactionId))
              return true;
!         foreach(cell, s->childXids)
          {
!             if (TransactionIdEquals(xid, lfirst_xid(cell)))
                  return true;
          }
      }

--- 571,596 ----
              continue;            /* it can't have any child XIDs either */
          if (TransactionIdEquals(xid, s->transactionId))
              return true;
!
!         /*
!          * Return true for any committed child of this subtransaction. As
!          * the childXids array is ordered, we can binary search.
!          */
!         low = 0;
!         high = s->nChildXids - 1;
!         while (low <= high)
          {
!             int                middle;
!             TransactionId    probe;
!
!             middle = low + (high - low) / 2;
!             probe = s->childXids[middle];
!             if (TransactionIdEquals(probe, xid))
                  return true;
+             else if (TransactionIdPrecedes(probe, xid))
+                 low = middle + 1;
+             else
+                 high = middle - 1;
          }
      }

***************
*** 985,992 ****
      /* Clean up local data */
      if (rels)
          pfree(rels);
-     if (children)
-         pfree(children);

      return latestXid;
  }
--- 1005,1010 ----
***************
*** 1067,1100 ****
  AtSubCommit_childXids(void)
  {
      TransactionState s = CurrentTransactionState;
!     MemoryContext old_cxt;

      Assert(s->parent != NULL);

      /*
!      * We keep the child-XID lists in TopTransactionContext; this avoids
!      * setting up child-transaction contexts for what might be just a few
!      * bytes of grandchild XIDs.
       */
!     old_cxt = MemoryContextSwitchTo(TopTransactionContext);

!     s->parent->childXids = lappend_xid(s->parent->childXids,
!                                        s->transactionId);
!
!     if (s->childXids != NIL)
      {
!         s->parent->childXids = list_concat(s->parent->childXids,
!                                            s->childXids);

          /*
!          * list_concat doesn't free the list header for the second list; do so
!          * here to avoid memory leakage (kluge)
           */
!         pfree(s->childXids);
!         s->childXids = NIL;
      }

!     MemoryContextSwitchTo(old_cxt);
  }

  /*
--- 1085,1155 ----
  AtSubCommit_childXids(void)
  {
      TransactionState s = CurrentTransactionState;
!     TransactionId new_nChildXids;

      Assert(s->parent != NULL);

      /*
!      * The parent childXids array will need to hold all my XID and all my
!      * childXids, in addition to XIDs already there.
       */
!     new_nChildXids = s->parent->nChildXids + s->nChildXids + 1;

!     /* Allocate or enlarge the parent array if necessary */
!     if (s->parent->maxChildXids < new_nChildXids)
      {
!         int                new_maxChildXids;
!         TransactionId  *new_childXids;

          /*
!          * Make it 2x what's needed right now, to avoid having to enlarge it
!          * repeatedly. But we can't go above MaxAllocSize.
           */
!         new_maxChildXids = Min(new_nChildXids * 2,
!                                MaxAllocSize / sizeof(TransactionId));
!
!         if (new_maxChildXids < new_nChildXids)
!             ereport(ERROR,
!                     (errcode(ERRCODE_OUT_OF_MEMORY),
!                      errmsg("maximum number of committed subtransactions (%d) reached",
!                             new_maxChildXids)));
!
!         /*
!          * We keep the child-XID lists in TopTransactionContext; this avoids
!          * setting up child-transaction contexts for what might be just a few
!          * bytes of grandchild XIDs.
!          */
!         if (s->parent->childXids == NULL)
!             new_childXids =
!                 MemoryContextAlloc(TopTransactionContext,
!                                    new_maxChildXids * sizeof(TransactionId));
!         else
!             new_childXids = repalloc(s->parent->childXids,
!                                      new_maxChildXids * sizeof(TransactionId));
!
!         s->parent->childXids  = new_childXids;
!         s->parent->maxChildXids = new_maxChildXids;
      }

!     /*
!      * Copy all my XIDs to parent's array.
!      *
!      * Note: We rely on the fact that the XID of a child always follows that
!      * of its parent's. By copying the XID of this subtransaction before the
!      * XIDs of its children, we ensure that the array stays ordered. Likewise,
!      * all XIDs already in the array belong to our parent, or subtransactions
!      * started and committed before us, so their XIDs must precede ours.
!      */
!     s->parent->childXids[s->parent->nChildXids] = s->transactionId;
!
!     if (s->nChildXids > 0)
!         memcpy(&s->parent->childXids[s->parent->nChildXids + 1],
!                s->childXids, s->nChildXids * sizeof(TransactionId));
!
!     s->parent->nChildXids = new_nChildXids;
!
!     if (s->childXids != NULL)
!         pfree(s->childXids);
  }

  /*
***************
*** 1259,1266 ****
      /* And clean up local data */
      if (rels)
          pfree(rels);
-     if (children)
-         pfree(children);

      return latestXid;
  }
--- 1314,1319 ----
***************
*** 1336,1343 ****
       * AtSubCommit_childXids).    This means we'd better free the list
       * explicitly at abort to avoid leakage.
       */
!     list_free(s->childXids);
!     s->childXids = NIL;
  }

  /* ----------------------------------------------------------------
--- 1389,1399 ----
       * AtSubCommit_childXids).    This means we'd better free the list
       * explicitly at abort to avoid leakage.
       */
!     if (s->childXids != NULL)
!         pfree(s->childXids);
!     s->childXids = NULL;
!     s->nChildXids = 0;
!     s->maxChildXids = 0;
  }

  /* ----------------------------------------------------------------
***************
*** 1506,1512 ****
       */
      s->nestingLevel = 1;
      s->gucNestLevel = 1;
!     s->childXids = NIL;
      GetUserIdAndContext(&s->prevUser, &s->prevSecDefCxt);
      /* SecurityDefinerContext should never be set outside a transaction */
      Assert(!s->prevSecDefCxt);
--- 1562,1570 ----
       */
      s->nestingLevel = 1;
      s->gucNestLevel = 1;
!     s->childXids = NULL;
!     s->nChildXids = 0;
!     s->maxChildXids = 0;
      GetUserIdAndContext(&s->prevUser, &s->prevSecDefCxt);
      /* SecurityDefinerContext should never be set outside a transaction */
      Assert(!s->prevSecDefCxt);
***************
*** 1702,1708 ****
      s->subTransactionId = InvalidSubTransactionId;
      s->nestingLevel = 0;
      s->gucNestLevel = 0;
!     s->childXids = NIL;

      /*
       * done with commit processing, set current transaction state back to
--- 1760,1768 ----
      s->subTransactionId = InvalidSubTransactionId;
      s->nestingLevel = 0;
      s->gucNestLevel = 0;
!     s->childXids = NULL;
!     s->nChildXids = 0;
!     s->maxChildXids = 0;

      /*
       * done with commit processing, set current transaction state back to
***************
*** 1931,1937 ****
      s->subTransactionId = InvalidSubTransactionId;
      s->nestingLevel = 0;
      s->gucNestLevel = 0;
!     s->childXids = NIL;

      /*
       * done with 1st phase commit processing, set current transaction state
--- 1991,1999 ----
      s->subTransactionId = InvalidSubTransactionId;
      s->nestingLevel = 0;
      s->gucNestLevel = 0;
!     s->childXids = NULL;
!     s->nChildXids = 0;
!     s->maxChildXids = 0;

      /*
       * done with 1st phase commit processing, set current transaction state
***************
*** 2101,2107 ****
      s->subTransactionId = InvalidSubTransactionId;
      s->nestingLevel = 0;
      s->gucNestLevel = 0;
!     s->childXids = NIL;

      /*
       * done with abort processing, set current transaction state back to
--- 2163,2171 ----
      s->subTransactionId = InvalidSubTransactionId;
      s->nestingLevel = 0;
      s->gucNestLevel = 0;
!     s->childXids = NULL;
!     s->nChildXids = 0;
!     s->maxChildXids = 0;

      /*
       * done with abort processing, set current transaction state back to
***************
*** 4051,4056 ****
--- 4115,4133 ----
  static void
  ShowTransactionStateRec(TransactionState s)
  {
+     StringInfoData buf;
+
+     initStringInfo(&buf);
+
+     if (s->nChildXids > 0)
+     {
+         int i;
+
+         appendStringInfo(&buf, "%u", s->childXids[0]);
+         for (i = 1; i < s->nChildXids; i++)
+             appendStringInfo(&buf, " %u", s->childXids[i]);
+     }
+
      if (s->parent)
          ShowTransactionStateRec(s->parent);

***************
*** 4064,4071 ****
                               (unsigned int) s->subTransactionId,
                               (unsigned int) currentCommandId,
                               currentCommandIdUsed ? " (used)" : "",
!                              s->nestingLevel,
!                              nodeToString(s->childXids))));
  }

  /*
--- 4141,4147 ----
                               (unsigned int) s->subTransactionId,
                               (unsigned int) currentCommandId,
                               currentCommandIdUsed ? " (used)" : "",
!                              s->nestingLevel, buf.data)));
  }

  /*
***************
*** 4144,4179 ****
   * xactGetCommittedChildren
   *
   * Gets the list of committed children of the current transaction.    The return
!  * value is the number of child transactions.  *children is set to point to a
!  * palloc'd array of TransactionIds.  If there are no subxacts, *children is
!  * set to NULL.
   */
  int
  xactGetCommittedChildren(TransactionId **ptr)
  {
      TransactionState s = CurrentTransactionState;
-     int            nchildren;
-     TransactionId *children;
-     ListCell   *p;

!     nchildren = list_length(s->childXids);
!     if (nchildren == 0)
!     {
          *ptr = NULL;
!         return 0;
!     }
!
!     children = (TransactionId *) palloc(nchildren * sizeof(TransactionId));
!     *ptr = children;
!
!     foreach(p, s->childXids)
!     {
!         TransactionId child = lfirst_xid(p);
!
!         *children++ = child;
!     }

!     return nchildren;
  }

  /*
--- 4220,4241 ----
   * xactGetCommittedChildren
   *
   * Gets the list of committed children of the current transaction.    The return
!  * value is the number of child transactions.  *children is set to point to an
!  * array of TransactionIds. The array is allocated in TopTransactioncontext;
!  * the caller should not try to pfree() it. If there are no subxacts,
!  * *children is set to NULL.
   */
  int
  xactGetCommittedChildren(TransactionId **ptr)
  {
      TransactionState s = CurrentTransactionState;

!     if (s->nChildXids == 0)
          *ptr = NULL;
!     else
!         *ptr = s->childXids;

!     return s->nChildXids;
  }

  /*
Index: src/include/nodes/pg_list.h
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/nodes/pg_list.h,v
retrieving revision 1.57
diff -c -r1.57 pg_list.h
*** src/include/nodes/pg_list.h    1 Jan 2008 19:45:58 -0000    1.57
--- src/include/nodes/pg_list.h    12 Mar 2008 10:32:25 -0000
***************
*** 26,36 ****
   * (At the moment, ints and Oids are the same size, but they may not
   * always be so; try to be careful to maintain the distinction.)
   *
-  * There is also limited support for lists of TransactionIds; since these
-  * are used in only one or two places, we don't provide a full implementation,
-  * but map them onto Oid lists.  This effectively assumes that TransactionId
-  * is no wider than Oid and both are unsigned types.
-  *
   *
   * Portions Copyright (c) 1996-2008, PostgreSQL Global Development Group
   * Portions Copyright (c) 1994, Regents of the University of California
--- 26,31 ----
***************
*** 160,171 ****
  #define list_make4_oid(x1,x2,x3,x4) lcons_oid(x1, list_make3_oid(x2, x3, x4))

  /*
-  * Limited support for lists of TransactionIds, mapped onto lists of Oids
-  */
- #define lfirst_xid(lc)                ((TransactionId) lfirst_oid(lc))
- #define lappend_xid(list, datum)    lappend_oid(list, (Oid) (datum))
-
- /*
   * foreach -
   *      a convenience macro which loops through the list
   */
--- 155,160 ----

On Wed, Mar 12, 2008 at 7:13 PM, Heikki Linnakangas
<heikki@enterprisedb.com> wrote:
>
>
>  Yep, patch attached. I also changed xactGetCommittedChildren to return
>  the original array instead of copying it, as Alvaro suggested.
>

Any comments on the flag based approach I suggested earlier ? Am I
missing some normal scenario where it won't work well ?

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:
> On Wed, Mar 12, 2008 at 7:13 PM, Heikki Linnakangas
> <heikki@enterprisedb.com> wrote:
>> Yep, patch attached. I also changed xactGetCommittedChildren to return
>> the original array instead of copying it, as Alvaro suggested.

> Any comments on the flag based approach I suggested earlier ?

I didn't like it; it seemed overly complicated (consider dealing with
XID wraparound), and it would have problems with a slow transaction
generating a sparse set of subtransaction XIDs.  I think getting rid of
the linear search will be enough to fix the performance problem.

            regards, tom lane

On Wed, Mar 12, 2008 at 9:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

>
>  I didn't like it; it seemed overly complicated (consider dealing with
>  XID wraparound),

We are talking about subtransactions here. I don't think we support
subtransaction wrap-around, do we ?

> and it would have problems with a slow transaction
>  generating a sparse set of subtransaction XIDs.

I agree thats the worst case. But is that common ? Thats what I
was thinking when I proposed the alternate solution. I thought that can
happen only if most of the subtransactions abort, which again I thought
is not a normal case. But frankly I am not sure if my assumption is correct.

> I think getting rid of
>  the linear search will be enough to fix the performance problem.
>

I wonder if a skewed binary search would help more ? For example,
if we know that the range of xids stored in the array is 1 to 1000 and
if we are searching for a number closer to 1000, we can break the
array into <large,small> parts instead of equal parts and then
search.

Well, may be I making simple things complicated ;-)

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Pavan Deolasee wrote:
> On Wed, Mar 12, 2008 at 9:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>>  I didn't like it; it seemed overly complicated (consider dealing with
>>  XID wraparound),
>
> We are talking about subtransactions here. I don't think we support
> subtransaction wrap-around, do we ?

Imagine that you start a transaction just before transaction
wrap-around, so that the top level XID is 2^31-10. Then you start 20
subtransactions. What XIDs will they get? Now how would you map those to
a bitmap?

It's certainly possible, you could index the bitmap by the index from
top transaction XID for example. But it does get a bit complicated.

>> and it would have problems with a slow transaction
>>  generating a sparse set of subtransaction XIDs.
>
> I agree thats the worst case. But is that common ? Thats what I
> was thinking when I proposed the alternate solution. I thought that can
> happen only if most of the subtransactions abort, which again I thought
> is not a normal case. But frankly I am not sure if my assumption is correct.

It's not that common to have hundreds of thousands of subtransactions to
begin with..

>> I think getting rid of
>>  the linear search will be enough to fix the performance problem.
>
> I wonder if a skewed binary search would help more ? For example,
> if we know that the range of xids stored in the array is 1 to 1000 and
> if we are searching for a number closer to 1000, we can break the
> array into <large,small> parts instead of equal parts and then
> search.

Possibly, but I doubt it's worth the trouble. The simple binary search
solved the performance problem well enough. In the test case of the OP,
with 300000 subtransactions, with the patch, there was no longer any
measurable difference whether you ran the "SELECT COUNT(*)" in the same
transaction as the INSERTs or after a COMMIT.

> Well, may be I making simple things complicated ;-)

I think so :-).

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:
> On Wed, Mar 12, 2008 at 9:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> and it would have problems with a slow transaction
>> generating a sparse set of subtransaction XIDs.

> I agree thats the worst case. But is that common ? Thats what I
> was thinking when I proposed the alternate solution. I thought that can
> happen only if most of the subtransactions abort, which again I thought
> is not a normal case.

No, I was thinking of the case where other sessions are chewing up XIDs
while the lots-of-subtransactions transaction runs.  If it's slow
enough, there could be very large gaps between the XIDs it acquires for
its subtransactions.  So you'd have a situation where the exact same
transaction processing might or might not run out of memory depending
on what else happened meanwhile.  Not a very pleasant property.

            regards, tom lane

Pavan Deolasee wrote:
> Wait. Subtransaction ids are local to a transaction and always start from 1.
> See this:
>
>     /*
>      * reinitialize within-transaction counters
>      */
>     s->subTransactionId = TopSubTransactionId;
>     currentSubTransactionId = TopSubTransactionId;

No, we're not talking about SubTransactionIds. We're talking about real
XIDs assigned to subtransactions. Subtransactions are assigned regular
XIDs as well, just like top-level transactions.

I can see where you were coming from with you suggestion now :-).

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

On Wed, Mar 12, 2008 at 11:03 PM, Heikki Linnakangas
<heikki@enterprisedb.com> wrote:
>  Subtransactions are assigned regular
>  XIDs as well, just like top-level transactions.
>

Ah, got it now. I never noticed this before.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

On Wed, Mar 12, 2008 at 10:44 PM, Heikki Linnakangas
<heikki@enterprisedb.com> wrote:

>
>  Imagine that you start a transaction just before transaction
>  wrap-around, so that the top level XID is 2^31-10. Then you start 20
>  subtransactions. What XIDs will they get? Now how would you map those to
>  a bitmap?
>

Wait. Subtransaction ids are local to a transaction and always start from 1.
See this:

    /*
     * reinitialize within-transaction counters
     */
    s->subTransactionId = TopSubTransactionId;
    currentSubTransactionId = TopSubTransactionId;


>
>  It's not that common to have hundreds of thousands of subtransactions to
>  begin with..

True. But thats the case we are trying to solve here :-)


Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

"Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> Yep, patch attached. I also changed xactGetCommittedChildren to return
> the original array instead of copying it, as Alvaro suggested.

Applied with minor corrections (mostly comment fixes, but there were
a couple of real mistakes).

            regards, tom lane

This has been applied by Tom.

---------------------------------------------------------------------------

Heikki Linnakangas wrote:
> Tom Lane wrote:
> > "Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> >> Elsewhere in our codebase where we use arrays that are enlarged as
> >> needed, we keep track of the "allocated" size and the "used" size of the
> >> array separately, and only call repalloc when the array fills up, and
> >> repalloc a larger than necessary array when it does. I chose to just
> >> call repalloc every time instead, as repalloc is smart enough to fall
> >> out quickly if the chunk the allocation was made in is already larger
> >> than the new size. There might be some gain avoiding the repeated
> >> repalloc calls, but I doubt it's worth the code complexity, and calling
> >> repalloc with a larger than necessary size can actually force it to
> >> unnecessarily allocate a new, larger chunk instead of reusing the old
> >> one. Thoughts on that?
> >
> > Seems like a pretty bad idea to me, as the behavior you're counting on
> > only applies to chunks up to 8K or thereabouts.
>
> Oh, you're right. Though I'm sure libc realloc has all kinds of smarts
> as well, it does seem better to not rely too much on that.
>
> > In a situation where
> > you are subcommitting lots of XIDs one at a time, this is likely to have
> > quite awful behavior (or at least, you're at the mercy of the local
> > malloc library as to how bad it is).  I'd go with the same
> > double-it-each-time-needed approach we use elsewhere.
>
> Yep, patch attached. I also changed xactGetCommittedChildren to return
> the original array instead of copying it, as Alvaro suggested.
>
> --
>    Heikki Linnakangas
>    EnterpriseDB   http://www.enterprisedb.com


>
> --
> Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-patches

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +