Another try at reducing repeated detoast work for PostGIS

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Another try at reducing repeated detoast work for PostGIS
Дата
Msg-id 4201.1250530640@sss.pgh.pa.us
обсуждение исходный текст
Ответы Re: Another try at reducing repeated detoast work for PostGIS  (Jeff Davis <pgsql@j-davis.com>)
Re: Another try at reducing repeated detoast work for PostGIS  (Mark Cave-Ayland <mark.cave-ayland@siriusit.co.uk>)
Список pgsql-hackers
There was recently another go-round on the postgis-devel list about
the same problem Mark Cave-Ayland complained about last year:
http://archives.postgresql.org/pgsql-hackers/2008-06/msg00384.php
Basically, what is happening is a nestloop join where the inner
indexscan gets a comparison argument from the outer table, and that
argument is toasted.  Every single call of an index support function
detoasts the argument again :-(, leading to upwards of 90% of the
runtime being spent in detoasting.

I made a proposal to fix it
http://archives.postgresql.org/pgsql-hackers/2008-06/msg00709.php
but that failed due to excessive ambition --- the cost/benefit/risk
tradeoffs just weren't good enough.

Thinking about it again, it seems to me that a much narrower patch
could solve the specific forms of the problem that the PostGIS folk
are seeing.  Instead of trying to have a general-purpose method of
preventing repeat de-toasting, we could just prevent it for inner
indexscans by having ExecIndexEvalRuntimeKeys() detoast anything it's
passing to the index AM.  The attached patch accomplishes this with
a net addition of about three statements.  (It looks bigger, because
I had to move a hunk of code to have the datatype info available when
needed.)  Paul Ramsey reports that this fixes the problem for him:
http://postgis.refractions.net/pipermail/postgis-devel/2009-August/006659.html

The only downside I can see offhand is that it will detoast short-header
values that might not actually need to be detoasted.  But we don't have
any very good way to know whether a datatype's index support functions
use PG_DETOAST_DATUM or PG_DETOAST_DATUM_PACKED.  In the former case we
do need to detoast short-header values.  The extra overhead in this case
amounts to only one palloc and a fairly short memcpy, which should be
pretty negligible in comparison to the other setup costs of an
indexscan, so I'm not very concerned about it.

Comments?  Better ideas?

            regards, tom lane

Index: src/backend/executor/nodeIndexscan.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/executor/nodeIndexscan.c,v
retrieving revision 1.133
diff -c -r1.133 nodeIndexscan.c
*** src/backend/executor/nodeIndexscan.c    18 Jul 2009 19:15:41 -0000    1.133
--- src/backend/executor/nodeIndexscan.c    15 Aug 2009 04:28:55 -0000
***************
*** 237,244 ****
--- 237,247 ----
  ExecIndexEvalRuntimeKeys(ExprContext *econtext,
                           IndexRuntimeKeyInfo *runtimeKeys, int numRuntimeKeys)
  {
+     MemoryContext oldContext;
      int            j;

+     oldContext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
+
      for (j = 0; j < numRuntimeKeys; j++)
      {
          ScanKey        scan_key = runtimeKeys[j].scan_key;
***************
*** 256,273 ****
           * econtext->ecxt_per_tuple_memory.  We assume that the outer tuple
           * will stay put throughout our scan.  If this is wrong, we could copy
           * the result into our context explicitly, but I think that's not
!          * necessary...
           */
!         scanvalue = ExecEvalExprSwitchContext(key_expr,
!                                               econtext,
!                                               &isNull,
!                                               NULL);
!         scan_key->sk_argument = scanvalue;
          if (isNull)
              scan_key->sk_flags |= SK_ISNULL;
          else
              scan_key->sk_flags &= ~SK_ISNULL;
      }
  }

  /*
--- 259,290 ----
           * econtext->ecxt_per_tuple_memory.  We assume that the outer tuple
           * will stay put throughout our scan.  If this is wrong, we could copy
           * the result into our context explicitly, but I think that's not
!          * necessary.
!          *
!          * It's also entirely possible that the result of the eval is a
!          * toasted value.  In this case we should forcibly detoast it,
!          * to avoid repeat detoastings each time the value is examined
!          * by an index support function.
           */
!         scanvalue = ExecEvalExpr(key_expr,
!                                  econtext,
!                                  &isNull,
!                                  NULL);
          if (isNull)
+         {
+             scan_key->sk_argument = scanvalue;
              scan_key->sk_flags |= SK_ISNULL;
+         }
          else
+         {
+             if (runtimeKeys[j].key_toastable)
+                 scanvalue = PointerGetDatum(PG_DETOAST_DATUM(scanvalue));
+             scan_key->sk_argument = scanvalue;
              scan_key->sk_flags &= ~SK_ISNULL;
+         }
      }
+
+     MemoryContextSwitchTo(oldContext);
  }

  /*
***************
*** 795,800 ****
--- 812,819 ----
                  runtime_keys[n_runtime_keys].scan_key = this_scan_key;
                  runtime_keys[n_runtime_keys].key_expr =
                      ExecInitExpr(rightop, planstate);
+                 runtime_keys[n_runtime_keys].key_toastable =
+                     TypeIsToastable(op_righttype);
                  n_runtime_keys++;
                  scanvalue = (Datum) 0;
              }
***************
*** 844,849 ****
--- 863,893 ----
                  varattno = ((Var *) leftop)->varattno;

                  /*
+                  * We have to look up the operator's associated btree support
+                  * function
+                  */
+                 opno = lfirst_oid(opnos_cell);
+                 opnos_cell = lnext(opnos_cell);
+
+                 if (index->rd_rel->relam != BTREE_AM_OID ||
+                     varattno < 1 || varattno > index->rd_index->indnatts)
+                     elog(ERROR, "bogus RowCompare index qualification");
+                 opfamily = index->rd_opfamily[varattno - 1];
+
+                 get_op_opfamily_properties(opno, opfamily,
+                                            &op_strategy,
+                                            &op_lefttype,
+                                            &op_righttype);
+
+                 if (op_strategy != rc->rctype)
+                     elog(ERROR, "RowCompare index qualification contains wrong operator");
+
+                 opfuncid = get_opfamily_proc(opfamily,
+                                              op_lefttype,
+                                              op_righttype,
+                                              BTORDER_PROC);
+
+                 /*
                   * rightop is the constant or variable comparison value
                   */
                  rightop = (Expr *) lfirst(rargs_cell);
***************
*** 867,902 ****
                      runtime_keys[n_runtime_keys].scan_key = this_sub_key;
                      runtime_keys[n_runtime_keys].key_expr =
                          ExecInitExpr(rightop, planstate);
                      n_runtime_keys++;
                      scanvalue = (Datum) 0;
                  }

                  /*
-                  * We have to look up the operator's associated btree support
-                  * function
-                  */
-                 opno = lfirst_oid(opnos_cell);
-                 opnos_cell = lnext(opnos_cell);
-
-                 if (index->rd_rel->relam != BTREE_AM_OID ||
-                     varattno < 1 || varattno > index->rd_index->indnatts)
-                     elog(ERROR, "bogus RowCompare index qualification");
-                 opfamily = index->rd_opfamily[varattno - 1];
-
-                 get_op_opfamily_properties(opno, opfamily,
-                                            &op_strategy,
-                                            &op_lefttype,
-                                            &op_righttype);
-
-                 if (op_strategy != rc->rctype)
-                     elog(ERROR, "RowCompare index qualification contains wrong operator");
-
-                 opfuncid = get_opfamily_proc(opfamily,
-                                              op_lefttype,
-                                              op_righttype,
-                                              BTORDER_PROC);
-
-                 /*
                   * initialize the subsidiary scan key's fields appropriately
                   */
                  ScanKeyEntryInitialize(this_sub_key,
--- 911,923 ----
                      runtime_keys[n_runtime_keys].scan_key = this_sub_key;
                      runtime_keys[n_runtime_keys].key_expr =
                          ExecInitExpr(rightop, planstate);
+                     runtime_keys[n_runtime_keys].key_toastable =
+                         TypeIsToastable(op_righttype);
                      n_runtime_keys++;
                      scanvalue = (Datum) 0;
                  }

                  /*
                   * initialize the subsidiary scan key's fields appropriately
                   */
                  ScanKeyEntryInitialize(this_sub_key,
Index: src/include/nodes/execnodes.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/nodes/execnodes.h,v
retrieving revision 1.206
diff -c -r1.206 execnodes.h
*** src/include/nodes/execnodes.h    6 Aug 2009 20:44:31 -0000    1.206
--- src/include/nodes/execnodes.h    15 Aug 2009 04:28:55 -0000
***************
*** 1084,1089 ****
--- 1084,1090 ----
  {
      ScanKey        scan_key;        /* scankey to put value into */
      ExprState  *key_expr;        /* expr to evaluate to get value */
+     bool        key_toastable;    /* is expr's result a toastable datatype? */
  } IndexRuntimeKeyInfo;

  typedef struct

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: CommitFest 2009-07: Remaining Patches Moved To CommitFest 2009-09
Следующее
От: Josh Berkus
Дата:
Сообщение: Re: Alpha 1 release notes