Re: Why percent_rank is so slower than rank?

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Why percent_rank is so slower than rank?
Дата
Msg-id 7839.1291933137@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Why percent_rank is so slower than rank?  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Why percent_rank is so slower than rank?  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Re: Why percent_rank is so slower than rank?  (Kenneth Marshall <ktm@rice.edu>)
Re: Why percent_rank is so slower than rank?  (Hitoshi Harada <umi.tanuki@gmail.com>)
Список pgsql-hackers
I wrote:
> We're throwing away one tuple at a time as we advance forward through
> the tuplestore, and moving 100000+ tuple pointers each time.  Ugh.
> This code was all right when written, because (IIRC) the mergejoin
> case was actually the only caller.  But it's not all right for
> WindowAgg's less-predictable usage patterns.

> I thought for a bit about changing things around so that the first-used
> tuple slot isn't necessarily state->memtuples[0], but just like the
> comment says, that complicates a lot of other logic.  And there isn't
> any easy place to reclaim the wasted slots later.

> What seems like the best bet is to put in a heuristic to make
> tuplestore_trim simply not do anything until nremove reaches some
> reasonably large amount, perhaps 10% of the number of stored tuples.
> This wastes up to 10% of the alloted memory, but that seems tolerable.

On reflection I think just not doing anything isn't a very good idea.
The problem with that is that a mis-coded caller could try to fetch
tuples that it had already told the tuplestore could be trimmed away;
and this would work, most of the time, until you got unlucky and the
trim operation had actually deleted them.  I think it's pretty important
for bug-catching purposes that the tuplestore enforce that those tuples
are not available anymore.

Hence the attached patch, which combines the two ideas by recycling
tuples immediately but not sliding the pointer array until a reasonable
amount of movement has occurred.  This fixes the complained-of
performance problem AFAICT.

I'm not sure whether or not to back-patch this into 9.0 and 8.4.  The
code in tuplestore.c hasn't changed at all since 8.4, so there's not
much risk of cross-version bugs, but if I did miss anything we could
be shipping a buggy version next week.  Thoughts?

            regards, tom lane

diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
index 9bbaba43771f495fdf24e9f2afd545b69a22ecbd..8c8139c897679892e0d4ad13e69ae8d814484206 100644
*** a/src/backend/utils/sort/tuplestore.c
--- b/src/backend/utils/sort/tuplestore.c
*************** struct Tuplestorestate
*** 145,152 ****
--- 145,159 ----
      /*
       * This array holds pointers to tuples in memory if we are in state INMEM.
       * In states WRITEFILE and READFILE it's not used.
+      *
+      * When memtupdeleted > 0, the first memtupdeleted pointers are already
+      * released due to a tuplestore_trim() operation, but we haven't expended
+      * the effort to slide the remaining pointers down.  These unused pointers
+      * are set to NULL to catch any invalid accesses.  Note that memtupcount
+      * includes the deleted pointers.
       */
      void      **memtuples;        /* array of pointers to palloc'd tuples */
+     int            memtupdeleted;    /* the first N slots are currently unused */
      int            memtupcount;    /* number of tuples currently present */
      int            memtupsize;        /* allocated length of memtuples array */

*************** tuplestore_begin_common(int eflags, bool
*** 252,257 ****
--- 259,265 ----
      state->context = CurrentMemoryContext;
      state->resowner = CurrentResourceOwner;

+     state->memtupdeleted = 0;
      state->memtupcount = 0;
      state->memtupsize = 1024;    /* initial guess */
      state->memtuples = (void **) palloc(state->memtupsize * sizeof(void *));
*************** tuplestore_clear(Tuplestorestate *state)
*** 401,407 ****
      state->myfile = NULL;
      if (state->memtuples)
      {
!         for (i = 0; i < state->memtupcount; i++)
          {
              FREEMEM(state, GetMemoryChunkSpace(state->memtuples[i]));
              pfree(state->memtuples[i]);
--- 409,415 ----
      state->myfile = NULL;
      if (state->memtuples)
      {
!         for (i = state->memtupdeleted; i < state->memtupcount; i++)
          {
              FREEMEM(state, GetMemoryChunkSpace(state->memtuples[i]));
              pfree(state->memtuples[i]);
*************** tuplestore_clear(Tuplestorestate *state)
*** 409,414 ****
--- 417,423 ----
      }
      state->status = TSS_INMEM;
      state->truncated = false;
+     state->memtupdeleted = 0;
      state->memtupcount = 0;
      readptr = state->readptrs;
      for (i = 0; i < state->readptrcount; readptr++, i++)
*************** tuplestore_end(Tuplestorestate *state)
*** 432,438 ****
          BufFileClose(state->myfile);
      if (state->memtuples)
      {
!         for (i = 0; i < state->memtupcount; i++)
              pfree(state->memtuples[i]);
          pfree(state->memtuples);
      }
--- 441,447 ----
          BufFileClose(state->myfile);
      if (state->memtuples)
      {
!         for (i = state->memtupdeleted; i < state->memtupcount; i++)
              pfree(state->memtuples[i]);
          pfree(state->memtuples);
      }
*************** tuplestore_gettuple(Tuplestorestate *sta
*** 774,787 ****
                  }
                  else
                  {
!                     if (readptr->current <= 0)
                      {
                          Assert(!state->truncated);
                          return NULL;
                      }
                      readptr->current--; /* last returned tuple */
                  }
!                 if (readptr->current <= 0)
                  {
                      Assert(!state->truncated);
                      return NULL;
--- 783,796 ----
                  }
                  else
                  {
!                     if (readptr->current <= state->memtupdeleted)
                      {
                          Assert(!state->truncated);
                          return NULL;
                      }
                      readptr->current--; /* last returned tuple */
                  }
!                 if (readptr->current <= state->memtupdeleted)
                  {
                      Assert(!state->truncated);
                      return NULL;
*************** dumptuples(Tuplestorestate *state)
*** 969,975 ****
  {
      int            i;

!     for (i = 0;; i++)
      {
          TSReadPointer *readptr = state->readptrs;
          int            j;
--- 978,984 ----
  {
      int            i;

!     for (i = state->memtupdeleted;; i++)
      {
          TSReadPointer *readptr = state->readptrs;
          int            j;
*************** dumptuples(Tuplestorestate *state)
*** 984,989 ****
--- 993,999 ----
              break;
          WRITETUP(state, state->memtuples[i]);
      }
+     state->memtupdeleted = 0;
      state->memtupcount = 0;
  }

*************** tuplestore_trim(Tuplestorestate *state)
*** 1153,1176 ****
      nremove = oldest - 1;
      if (nremove <= 0)
          return;                    /* nothing to do */
      Assert(nremove <= state->memtupcount);

      /* Release no-longer-needed tuples */
!     for (i = 0; i < nremove; i++)
      {
          FREEMEM(state, GetMemoryChunkSpace(state->memtuples[i]));
          pfree(state->memtuples[i]);
      }

      /*
!      * Slide the array down and readjust pointers.    This may look pretty
!      * stupid, but we expect that there will usually not be very many
!      * tuple-pointers to move, so this isn't that expensive; and it keeps a
!      * lot of other logic simple.
       *
!      * In fact, in the current usage for merge joins, it's demonstrable that
!      * there will always be exactly one non-removed tuple; so optimize that
!      * case.
       */
      if (nremove + 1 == state->memtupcount)
          state->memtuples[0] = state->memtuples[nremove];
--- 1163,1198 ----
      nremove = oldest - 1;
      if (nremove <= 0)
          return;                    /* nothing to do */
+
+     Assert(nremove >= state->memtupdeleted);
      Assert(nremove <= state->memtupcount);

      /* Release no-longer-needed tuples */
!     for (i = state->memtupdeleted; i < nremove; i++)
      {
          FREEMEM(state, GetMemoryChunkSpace(state->memtuples[i]));
          pfree(state->memtuples[i]);
+         state->memtuples[i] = NULL;
      }
+     state->memtupdeleted = nremove;
+
+     /* mark tuplestore as truncated (used for Assert crosschecks only) */
+     state->truncated = true;

      /*
!      * If nremove is less than 1/8th memtupcount, just stop here, leaving the
!      * "deleted" slots as NULL.  This prevents us from expending O(N^2) time
!      * repeatedly memmove-ing a large pointer array.  The worst case space
!      * wastage is pretty small, since it's just pointers and not whole tuples.
!      */
!     if (nremove < state->memtupcount / 8)
!         return;
!
!     /*
!      * Slide the array down and readjust pointers.
       *
!      * In mergejoin's current usage, it's demonstrable that there will always
!      * be exactly one non-removed tuple; so optimize that case.
       */
      if (nremove + 1 == state->memtupcount)
          state->memtuples[0] = state->memtuples[nremove];
*************** tuplestore_trim(Tuplestorestate *state)
*** 1178,1192 ****
          memmove(state->memtuples, state->memtuples + nremove,
                  (state->memtupcount - nremove) * sizeof(void *));

      state->memtupcount -= nremove;
      for (i = 0; i < state->readptrcount; i++)
      {
          if (!state->readptrs[i].eof_reached)
              state->readptrs[i].current -= nremove;
      }
-
-     /* mark tuplestore as truncated (used for Assert crosschecks only) */
-     state->truncated = true;
  }

  /*
--- 1200,1212 ----
          memmove(state->memtuples, state->memtuples + nremove,
                  (state->memtupcount - nremove) * sizeof(void *));

+     state->memtupdeleted = 0;
      state->memtupcount -= nremove;
      for (i = 0; i < state->readptrcount; i++)
      {
          if (!state->readptrs[i].eof_reached)
              state->readptrs[i].current -= nremove;
      }
  }

  /*

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "David E. Wheeler"
Дата:
Сообщение: Re: Extensions, patch v16
Следующее
От: "Kevin Grittner"
Дата:
Сообщение: Re: Why percent_rank is so slower than rank?