Обсуждение: Composite Datums containing toasted fields are a bad idea(?)

Поиск
Список
Период
Сортировка

Composite Datums containing toasted fields are a bad idea(?)

От
Tom Lane
Дата:
Way way back in commit ae93e5fd6e8a7e2321e87d23165d9d7660cde598,
we established a coding rule that it was okay for composite Datums
to contain external (out-of-line) toasted fields, as long as such
toasting didn't go more than one level deep in any tuple.  This meant
that heap_form_tuple had to go through nontrivial pushups to maintain
that invariant: each composite field has to be inspected to see if any
of its component fields are external datums.  Surprisingly, no one has
complained about the cost of the lookups that are required to see
whether fields are composite in the first place.

However, in view of today's bug report from Jan Pecek, I'm wondering
if we shouldn't rethink this.  Jan pointed out that the array code was
failing to prevent composites-with-external-fields from getting into
arrays, and after a bit of looking around I'm afraid there are more such
bugs elsewhere.  One example is in the planner's evaluate_expr(), which
supposes that just PG_DETOAST_DATUM() is enough to make a value safe for
long-term storage in a plan tree.  Range types are making the same sort
of assumption as arrays (hm, can you have a range over a composite type?
Probably, considering we have sort operators for composites).  And there
are places in the index AMs that seem to assume likewise, which is an
issue for AMs in which an indexed value could be composite.

I think we might be better off to get rid of toast_flatten_tuple_attribute
and instead insist that composite Datums never contain any external toast
pointers in the first place.  That is, places that call heap_form_tuple
to make a composite Datum (rather than a tuple that's due to be stored
to disk) would be responsible for detoasting any fields with external
values first.  We could make a wrapper routine for heap_form_tuple to
take care of this rather than duplicating the code in lots of places.

From a performance standpoint this is probably a small win.  In cases
where a composite Datum is formed and ultimately saved to disk, it should
be a win, since we'd have to detoast those fields anyway, and we can avoid
the overhead of an extra disassembly and reassembly of the composite
value.  If the composite value is never sent to disk, it's a bit harder
to tell: we lose if the specific field value is never extracted from the
composite, but on the other hand we win if it's extracted more than once.
In any case, adding the extra syscache lookups involved in doing
toast_flatten_tuple_attribute in lots more places isn't appealing.

From a code correctness standpoint, the question really is whether we can
find all the places that build composite datums more easily than we can
find all the places that ought to be calling toast_flatten_tuple_attribute
and aren't.  I have to admit I'm not sure about that.  There seem to be
basically two places to fix in the main executor (ExecEvalRow and
ExecEvalFieldStore), and roughly half a dozen calls of heap_form_tuple in
the various PLs, but I'm not sure I've not missed some cases.

One thing we could do to try to flush out missed cases is to remove
heap_form_tuple's setting of the tuple-Datum header fields, pushing
that functionality into the new wrapper routine.  Then, any un-updated
code would generate clearly invalid composite datums, rather than only
failing in infrequent corner cases.

Another issue is what about third-party code.  There seems to be risk
either way, but it would accrue to completely different code depending
on which way we try to fix this.

Thoughts?
        regards, tom lane



Re: Composite Datums containing toasted fields are a bad idea(?)

От
Merlin Moncure
Дата:
On Fri, Mar 28, 2014 at 3:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Way way back in commit ae93e5fd6e8a7e2321e87d23165d9d7660cde598,
> we established a coding rule that it was okay for composite Datums
> to contain external (out-of-line) toasted fields, as long as such
> toasting didn't go more than one level deep in any tuple.  This meant
> that heap_form_tuple had to go through nontrivial pushups to maintain
> that invariant: each composite field has to be inspected to see if any
> of its component fields are external datums.  Surprisingly, no one has
> complained about the cost of the lookups that are required to see
> whether fields are composite in the first place.
>
> However, in view of today's bug report from Jan Pecek, I'm wondering
> if we shouldn't rethink this.  Jan pointed out that the array code was
> failing to prevent composites-with-external-fields from getting into
> arrays, and after a bit of looking around I'm afraid there are more such
> bugs elsewhere.  One example is in the planner's evaluate_expr(), which
> supposes that just PG_DETOAST_DATUM() is enough to make a value safe for
> long-term storage in a plan tree.  Range types are making the same sort
> of assumption as arrays (hm, can you have a range over a composite type?
> Probably, considering we have sort operators for composites).  And there
> are places in the index AMs that seem to assume likewise, which is an
> issue for AMs in which an indexed value could be composite.
>
> I think we might be better off to get rid of toast_flatten_tuple_attribute
> and instead insist that composite Datums never contain any external toast
> pointers in the first place.  That is, places that call heap_form_tuple
> to make a composite Datum (rather than a tuple that's due to be stored
> to disk) would be responsible for detoasting any fields with external
> values first.  We could make a wrapper routine for heap_form_tuple to
> take care of this rather than duplicating the code in lots of places.
>
> From a performance standpoint this is probably a small win.  In cases
> where a composite Datum is formed and ultimately saved to disk, it should
> be a win, since we'd have to detoast those fields anyway, and we can avoid
> the overhead of an extra disassembly and reassembly of the composite
> value.  If the composite value is never sent to disk, it's a bit harder
> to tell: we lose if the specific field value is never extracted from the
> composite, but on the other hand we win if it's extracted more than once.
> In any case, adding the extra syscache lookups involved in doing
> toast_flatten_tuple_attribute in lots more places isn't appealing.
>
> From a code correctness standpoint, the question really is whether we can
> find all the places that build composite datums more easily than we can
> find all the places that ought to be calling toast_flatten_tuple_attribute
> and aren't.  I have to admit I'm not sure about that.  There seem to be
> basically two places to fix in the main executor (ExecEvalRow and
> ExecEvalFieldStore), and roughly half a dozen calls of heap_form_tuple in
> the various PLs, but I'm not sure I've not missed some cases.
>
> One thing we could do to try to flush out missed cases is to remove
> heap_form_tuple's setting of the tuple-Datum header fields, pushing
> that functionality into the new wrapper routine.  Then, any un-updated
> code would generate clearly invalid composite datums, rather than only
> failing in infrequent corner cases.
>
> Another issue is what about third-party code.  There seems to be risk
> either way, but it would accrue to completely different code depending
> on which way we try to fix this.
>
> Thoughts?

Trying to follow along here.  Am I correct in saying that if you make
these changes any SQL level functionality (say, creating a composite
type containing a large array) that used to work will continue to
work?

merlin



Re: Composite Datums containing toasted fields are a bad idea(?)

От
Tom Lane
Дата:
Merlin Moncure <mmoncure@gmail.com> writes:
> Trying to follow along here.  Am I correct in saying that if you make
> these changes any SQL level functionality (say, creating a composite
> type containing a large array) that used to work will continue to
> work?

This wouldn't change any SQL-level functionality ... as long as we don't
introduce new bugs :-(
        regards, tom lane



Re: Composite Datums containing toasted fields are a bad idea(?)

От
Noah Misch
Дата:
Interesting bug.

On Fri, Mar 28, 2014 at 04:34:41PM -0400, Tom Lane wrote:
> I think we might be better off to get rid of toast_flatten_tuple_attribute
> and instead insist that composite Datums never contain any external toast
> pointers in the first place.  That is, places that call heap_form_tuple
> to make a composite Datum (rather than a tuple that's due to be stored
> to disk) would be responsible for detoasting any fields with external
> values first.  We could make a wrapper routine for heap_form_tuple to
> take care of this rather than duplicating the code in lots of places.
> 
> >From a performance standpoint this is probably a small win.  In cases
> where a composite Datum is formed and ultimately saved to disk, it should
> be a win, since we'd have to detoast those fields anyway, and we can avoid
> the overhead of an extra disassembly and reassembly of the composite
> value.  If the composite value is never sent to disk, it's a bit harder
> to tell: we lose if the specific field value is never extracted from the
> composite, but on the other hand we win if it's extracted more than once.

Performance is the dominant issue here; the hacker-centric considerations you
outlined vanish next to it.  Adding a speculative detoast can regress by a
million-fold the performance of a query that passes around, without ever
dereferencing, toast pointers to max-size values.  Passing around a record
without consulting all fields is a credible thing to do, so I'd scarcely
consider taking the performance risk of more-aggressively detoasting every
composite.  That other cases win is little comfort.  Today's PostgreSQL
applications either suffer little enough to care or have already taken
measures to avoid duplicate detoasts.  Past discussions have examined general
ways to avoid repetitive detoast traffic; we'll have something good when it
can do that without imposing open-ended penalties on another usage pattern.

Making the array construction functions use toast_flatten_tuple_attribute()
carries a related performance risk, more elusive yet just as costly when
encountered.  That much risk seems tolerable for 9.4, though.  I won't worry
about performance regressions for range types; using a range to marshal huge
values you'll never detoast is a stretch.

> In any case, adding the extra syscache lookups involved in doing
> toast_flatten_tuple_attribute in lots more places isn't appealing.

True; alas.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: Composite Datums containing toasted fields are a bad idea(?)

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> Interesting bug.
> On Fri, Mar 28, 2014 at 04:34:41PM -0400, Tom Lane wrote:
>> I think we might be better off to get rid of toast_flatten_tuple_attribute
>> and instead insist that composite Datums never contain any external toast
>> pointers in the first place.

> Performance is the dominant issue here; the hacker-centric considerations you
> outlined vanish next to it.

I'm not exactly convinced by that line of argument, especially when it's
entirely unsupported by any evidence as to which implementation approach
will actually perform worse.

However, I have done some investigation into what it'd take to keep the
current approach and teach the array code to detoast composite-type array
elements properly.  The attached draft patch only fixes arrays; ranges
need work too, and I'm not sure what else might need adjustment.  But this
patch does fix the test case provided by Jan Pecek.

The main problem with this patch, as I see it, is that it'll introduce
extra syscache lookup overhead even when there are no toasted fields
anywhere.  I've not really tried to quantify how much, since that would
require first agreeing on a benchmark case --- anybody have a thought
about what that ought to be?  But in at least some of these functions,
we'll be paying a cache lookup or two per array element, which doesn't
sound promising.

This could be alleviated by caching the lookup results over longer
intervals, but I don't see any way to do that without invasive changes
to the APIs of a number of exported array functions, which doesn't seem
like a good thing for a bug fix that we need to back-patch.  I think the
back-branch fix would have to be pretty much what you see here, even if
we then make changes to reduce the costs in HEAD.

Comments?

            regards, tom lane

diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c
index aea9d40..48b09b8 100644
*** a/src/backend/access/common/heaptuple.c
--- b/src/backend/access/common/heaptuple.c
*************** heap_form_tuple(TupleDesc tupleDescripto
*** 649,674 ****
       * Check for nulls and embedded tuples; expand any toasted attributes in
       * embedded tuples.  This preserves the invariant that toasting can only
       * go one level deep.
-      *
-      * We can skip calling toast_flatten_tuple_attribute() if the attribute
-      * couldn't possibly be of composite type.  All composite datums are
-      * varlena and have alignment 'd'; furthermore they aren't arrays. Also,
-      * if an attribute is already toasted, it must have been sent to disk
-      * already and so cannot contain toasted attributes.
       */
      for (i = 0; i < numberOfAttributes; i++)
      {
          if (isnull[i])
              hasnull = true;
!         else if (att[i]->attlen == -1 &&
!                  att[i]->attalign == 'd' &&
!                  att[i]->attndims == 0 &&
!                  !VARATT_IS_EXTENDED(DatumGetPointer(values[i])))
!         {
!             values[i] = toast_flatten_tuple_attribute(values[i],
!                                                       att[i]->atttypid,
!                                                       att[i]->atttypmod);
!         }
      }

      /*
--- 649,661 ----
       * Check for nulls and embedded tuples; expand any toasted attributes in
       * embedded tuples.  This preserves the invariant that toasting can only
       * go one level deep.
       */
      for (i = 0; i < numberOfAttributes; i++)
      {
          if (isnull[i])
              hasnull = true;
!         else
!             values[i] = toast_flatten_tuple_attribute(values[i], att[i]);
      }

      /*
*************** heap_form_minimal_tuple(TupleDesc tupleD
*** 1403,1428 ****
       * Check for nulls and embedded tuples; expand any toasted attributes in
       * embedded tuples.  This preserves the invariant that toasting can only
       * go one level deep.
-      *
-      * We can skip calling toast_flatten_tuple_attribute() if the attribute
-      * couldn't possibly be of composite type.  All composite datums are
-      * varlena and have alignment 'd'; furthermore they aren't arrays. Also,
-      * if an attribute is already toasted, it must have been sent to disk
-      * already and so cannot contain toasted attributes.
       */
      for (i = 0; i < numberOfAttributes; i++)
      {
          if (isnull[i])
              hasnull = true;
!         else if (att[i]->attlen == -1 &&
!                  att[i]->attalign == 'd' &&
!                  att[i]->attndims == 0 &&
!                  !VARATT_IS_EXTENDED(values[i]))
!         {
!             values[i] = toast_flatten_tuple_attribute(values[i],
!                                                       att[i]->atttypid,
!                                                       att[i]->atttypmod);
!         }
      }

      /*
--- 1390,1402 ----
       * Check for nulls and embedded tuples; expand any toasted attributes in
       * embedded tuples.  This preserves the invariant that toasting can only
       * go one level deep.
       */
      for (i = 0; i < numberOfAttributes; i++)
      {
          if (isnull[i])
              hasnull = true;
!         else
!             values[i] = toast_flatten_tuple_attribute(values[i], att[i]);
      }

      /*
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 9a821d3..2d1a922 100644
*** a/src/backend/access/heap/tuptoaster.c
--- b/src/backend/access/heap/tuptoaster.c
*************** toast_insert_or_update(Relation rel, Hea
*** 991,996 ****
--- 991,999 ----
   *
   *    "Flatten" a tuple to contain no out-of-line toasted fields.
   *    (This does not eliminate compressed or short-header datums.)
+  *
+  *    Note: we expect the caller already checked HeapTupleHasExternal(tup),
+  *    so there is no need for a short-circuit path.
   * ----------
   */
  HeapTuple
*************** toast_flatten_tuple(HeapTuple tup, Tuple
*** 1068,1097 ****


  /* ----------
!  * toast_flatten_tuple_attribute -
   *
!  *    If a Datum is of composite type, "flatten" it to contain no toasted fields.
!  *    This must be invoked on any potentially-composite field that is to be
!  *    inserted into a tuple.    Doing this preserves the invariant that toasting
!  *    goes only one level deep in a tuple.
   *
!  *    Note that flattening does not mean expansion of short-header varlenas,
!  *    so in one sense toasting is allowed within composite datums.
   * ----------
   */
  Datum
! toast_flatten_tuple_attribute(Datum value,
!                               Oid typeId, int32 typeMod)
  {
-     TupleDesc    tupleDesc;
      HeapTupleHeader olddata;
      HeapTupleHeader new_data;
      int32        new_header_len;
      int32        new_data_len;
      int32        new_tuple_len;
      HeapTupleData tmptup;
!     Form_pg_attribute *att;
!     int            numAttrs;
      int            i;
      bool        need_change = false;
      bool        has_nulls = false;
--- 1071,1110 ----


  /* ----------
!  * toast_flatten_tuple_datum -
   *
!  *    "Flatten" a composite Datum to contain no toasted fields.
!  *    This must be invoked on any composite value that is to be inserted into
!  *    a tuple, array, range, etc.  Doing this preserves the invariant that
!  *    toasting goes only one level deep in a tuple.
   *
!  *    Unlike toast_flatten_tuple, this does expand compressed fields.  That's
!  *    not necessary for correctness, but is a policy decision based on the
!  *    assumption that compression will be more effective if applied to the
!  *    whole tuple not individual fields.
!  *
!  *    On the other hand, in-line short-header varlena fields are left alone.
!  *    If we "untoasted" them here, they'd just get changed back to short-header
!  *    format anyway within heap_fill_tuple.
!  *
!  *    Note: generally speaking, callers should skip applying this to Datums
!  *    that are toasted overall, even just to the extent of being in short-header
!  *    format.  That implies that the Datum has previously been stored within
!  *    some tuple, array, range, etc, and therefore it should already not contain
!  *    any out-of-line fields.
   * ----------
   */
  Datum
! toast_flatten_tuple_datum(Datum value, TupleDesc tupleDesc)
  {
      HeapTupleHeader olddata;
      HeapTupleHeader new_data;
      int32        new_header_len;
      int32        new_data_len;
      int32        new_tuple_len;
      HeapTupleData tmptup;
!     Form_pg_attribute *att = tupleDesc->attrs;
!     int            numAttrs = tupleDesc->natts;
      int            i;
      bool        need_change = false;
      bool        has_nulls = false;
*************** toast_flatten_tuple_attribute(Datum valu
*** 1100,1120 ****
      bool        toast_free[MaxTupleAttributeNumber];

      /*
-      * See if it's a composite type, and get the tupdesc if so.
-      */
-     tupleDesc = lookup_rowtype_tupdesc_noerror(typeId, typeMod, true);
-     if (tupleDesc == NULL)
-         return value;            /* not a composite type */
-
-     att = tupleDesc->attrs;
-     numAttrs = tupleDesc->natts;
-
-     /*
       * Break down the tuple into fields.
       */
      olddata = DatumGetHeapTupleHeader(value);
!     Assert(typeId == HeapTupleHeaderGetTypeId(olddata));
!     Assert(typeMod == HeapTupleHeaderGetTypMod(olddata));
      /* Build a temporary HeapTuple control structure */
      tmptup.t_len = HeapTupleHeaderGetDatumLength(olddata);
      ItemPointerSetInvalid(&(tmptup.t_self));
--- 1113,1128 ----
      bool        toast_free[MaxTupleAttributeNumber];

      /*
       * Break down the tuple into fields.
+      *
+      * Note: if the supplied datum is toasted overall, DatumGetHeapTupleHeader
+      * will detoast it, and then we'll leak the detoasted copy.  This is not
+      * worth fixing because callers shouldn't call us on toasted datums
+      * anyway.
       */
      olddata = DatumGetHeapTupleHeader(value);
!     Assert(HeapTupleHeaderGetTypeId(olddata) == tupleDesc->tdtypeid);
!     Assert(HeapTupleHeaderGetTypMod(olddata) == tupleDesc->tdtypmod);
      /* Build a temporary HeapTuple control structure */
      tmptup.t_len = HeapTupleHeaderGetDatumLength(olddata);
      ItemPointerSetInvalid(&(tmptup.t_self));
*************** toast_flatten_tuple_attribute(Datum valu
*** 1150,1162 ****
      }

      /*
!      * If nothing to untoast, just return the original tuple.
       */
      if (!need_change)
-     {
-         ReleaseTupleDesc(tupleDesc);
          return value;
-     }

      /*
       * Calculate the new size of the tuple.
--- 1158,1167 ----
      }

      /*
!      * If nothing to untoast, just return the original datum.
       */
      if (!need_change)
          return value;

      /*
       * Calculate the new size of the tuple.
*************** toast_flatten_tuple_attribute(Datum valu
*** 1202,1214 ****
      for (i = 0; i < numAttrs; i++)
          if (toast_free[i])
              pfree(DatumGetPointer(toast_values[i]));
-     ReleaseTupleDesc(tupleDesc);

      return PointerGetDatum(new_data);
  }


  /* ----------
   * toast_compress_datum -
   *
   *    Create a compressed version of a varlena datum
--- 1207,1269 ----
      for (i = 0; i < numAttrs; i++)
          if (toast_free[i])
              pfree(DatumGetPointer(toast_values[i]));

      return PointerGetDatum(new_data);
  }


  /* ----------
+  * toast_flatten_tuple_attribute -
+  *
+  *    If a Datum is of composite type, "flatten" it to contain no toasted fields.
+  *    This is a convenience routine for doing toast_flatten_tuple_datum() on
+  *    tuple fields.
+  * ----------
+  */
+ Datum
+ toast_flatten_tuple_attribute(Datum value, Form_pg_attribute att)
+ {
+     TupleDesc    tupleDesc;
+
+     /*
+      * Exit quickly if the attribute couldn't possibly be of composite type.
+      * All composite datums are varlena and have alignment 'd'; furthermore
+      * they aren't arrays.  This heuristic is sufficient to eliminate all
+      * but a few non-composite types.
+      */
+     if (att->attlen != -1 ||
+         att->attalign != 'd' ||
+         att->attndims != 0)
+         return value;
+
+     /*
+      * Also, if the datum is already toasted, it must have already been stored
+      * within some larger tuple (or array, range, etc) and so it doesn't need
+      * to be flattened again.  (It is not our job here to force it to be
+      * uncompressed, so don't worry about that.)
+      */
+     if (VARATT_IS_EXTENDED(DatumGetPointer(value)))
+         return value;
+
+     /*
+      * See if it's a composite type, and get the tupdesc if so.
+      */
+     tupleDesc = lookup_rowtype_tupdesc_noerror(att->atttypid,
+                                                att->atttypmod,
+                                                true);
+     if (tupleDesc == NULL)
+         return value;            /* not a composite type */
+
+     /* OK, flatten it */
+     value = toast_flatten_tuple_datum(value, tupleDesc);
+
+     ReleaseTupleDesc(tupleDesc);
+
+     return value;
+ }
+
+
+ /* ----------
   * toast_compress_datum -
   *
   *    Create a compressed version of a varlena datum
diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index 91df184..530031f 100644
*** a/src/backend/utils/adt/arrayfuncs.c
--- b/src/backend/utils/adt/arrayfuncs.c
***************
*** 17,22 ****
--- 17,24 ----
  #include <ctype.h>

  #include "access/htup_details.h"
+ #include "access/tuptoaster.h"
+ #include "catalog/pg_type.h"
  #include "funcapi.h"
  #include "libpq/pqformat.h"
  #include "utils/array.h"
*************** typedef struct ArrayIteratorData
*** 74,79 ****
--- 76,120 ----
      int            current_item;    /* the item # we're at in the array */
  }    ArrayIteratorData;

+ /* Support for array element detoasting */
+ typedef struct ArrayFlattenState
+ {
+     bool        typiscomposite; /* element type is composite? */
+     TupleDesc    tupdesc;        /* element type's tupledesc, if known */
+ } ArrayFlattenState;
+
+ /* Call this on each non-NULL value about to be inserted into an array */
+ #define FLATTEN_ARRAY_ELEMENT(val, typlen, afstate) \
+     do { \
+         if ((typlen) == -1) \
+         { \
+             if ((afstate).typiscomposite) \
+                 val = flatten_composite_element(val, &(afstate)); \
+             else \
+                 val = PointerGetDatum(PG_DETOAST_DATUM(val)); \
+         } \
+     } while (0)
+
+ /* Variant that guarantees to make a copy */
+ #define FLATTEN_ARRAY_ELEMENT_COPY(val, typlen, typbyval, afstate) \
+     do { \
+         if ((typlen) == -1) \
+         { \
+             if ((afstate).typiscomposite) \
+             { \
+                 Datum _orig_val = (val); \
+                 val = flatten_composite_element(val, &(afstate)); \
+                 if (val == _orig_val) \
+                     val = datumCopy(val, typbyval, typlen); \
+             } \
+             else \
+                 val = PointerGetDatum(PG_DETOAST_DATUM_COPY(val)); \
+         } \
+         else \
+             val = datumCopy(val, typbyval, typlen); \
+     } while (0)
+
+ /* Local functions */
  static bool array_isspace(char ch);
  static int    ArrayCount(const char *str, int *dim, char typdelim);
  static void ReadArrayStr(char *arrayStr, const char *origStr,
*************** static void CopyArrayEls(ArrayType *arra
*** 92,97 ****
--- 133,143 ----
               Datum *values, bool *nulls, int nitems,
               int typlen, bool typbyval, char typalign,
               bool freedata);
+ static int    array_cmp(FunctionCallInfo fcinfo);
+ static void init_flatten_support(ArrayFlattenState *afstate, Oid elmtype,
+                      int16 typlen, char typalign);
+ static Datum flatten_composite_element(Datum value, ArrayFlattenState *afstate);
+ static void end_flatten_support(ArrayFlattenState *afstate);
  static bool array_get_isnull(const bits8 *nullbitmap, int offset);
  static void array_set_isnull(bits8 *nullbitmap, int offset, bool isNull);
  static Datum ArrayCast(char *value, bool byval, int len);
*************** static void array_insert_slice(ArrayType
*** 119,125 ****
                     int ndim, int *dim, int *lb,
                     int *st, int *endp,
                     int typlen, bool typbyval, char typalign);
- static int    array_cmp(FunctionCallInfo fcinfo);
  static ArrayType *create_array_envelope(int ndims, int *dimv, int *lbv, int nbytes,
                        Oid elmtype, int dataoffset);
  static ArrayType *array_fill_internal(ArrayType *dims, ArrayType *lbs,
--- 165,170 ----
*************** ReadArrayStr(char *arrayStr,
*** 864,870 ****
              hasnull = true;
          else
          {
!             /* let's just make sure data is not toasted */
              if (typlen == -1)
                  values[i] = PointerGetDatum(PG_DETOAST_DATUM(values[i]));
              totbytes = att_addlength_datum(totbytes, typlen, values[i]);
--- 909,922 ----
              hasnull = true;
          else
          {
!             /*
!              * Let's just make sure data is not toasted.  It seems quite
!              * unlikely that the input function could have produced a toasted
!              * value at all, and certainly it shouldn't have produced
!              * something containing nested toasted values, so we don't bother
!              * with FLATTEN_ARRAY_ELEMENT().  But a PG_DETOAST_DATUM() call is
!              * reasonably cheap when it's a no-op.
!              */
              if (typlen == -1)
                  values[i] = PointerGetDatum(PG_DETOAST_DATUM(values[i]));
              totbytes = att_addlength_datum(totbytes, typlen, values[i]);
*************** ReadArrayBinary(StringInfo buf,
*** 1466,1472 ****
              hasnull = true;
          else
          {
!             /* let's just make sure data is not toasted */
              if (typlen == -1)
                  values[i] = PointerGetDatum(PG_DETOAST_DATUM(values[i]));
              totbytes = att_addlength_datum(totbytes, typlen, values[i]);
--- 1518,1531 ----
              hasnull = true;
          else
          {
!             /*
!              * Let's just make sure data is not toasted.  It seems quite
!              * unlikely that the receive function could have produced a
!              * toasted value at all, and certainly it shouldn't have produced
!              * something containing nested toasted values, so we don't bother
!              * with FLATTEN_ARRAY_ELEMENT().  But a PG_DETOAST_DATUM() call is
!              * reasonably cheap when it's a no-op.
!              */
              if (typlen == -1)
                  values[i] = PointerGetDatum(PG_DETOAST_DATUM(values[i]));
              totbytes = att_addlength_datum(totbytes, typlen, values[i]);
*************** array_set(ArrayType *array,
*** 2060,2065 ****
--- 2119,2125 ----
            char elmalign)
  {
      ArrayType  *newarray;
+     Oid            elmtype;
      int            i,
                  ndim,
                  dim[MAXDIM],
*************** array_set(ArrayType *array,
*** 2114,2128 ****
                  (errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
                   errmsg("wrong number of array subscripts")));

-     /* make sure item to be inserted is not toasted */
-     if (elmlen == -1 && !isNull)
-         dataValue = PointerGetDatum(PG_DETOAST_DATUM(dataValue));
-
      /* detoast input array if necessary */
      array = DatumGetArrayTypeP(PointerGetDatum(array));

      ndim = ARR_NDIM(array);

      /*
       * if number of dims is zero, i.e. an empty array, create an array with
       * nSubscripts dimensions, and set the lower bounds to the supplied
--- 2174,2195 ----
                  (errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
                   errmsg("wrong number of array subscripts")));

      /* detoast input array if necessary */
      array = DatumGetArrayTypeP(PointerGetDatum(array));

+     elmtype = ARR_ELEMTYPE(array);
      ndim = ARR_NDIM(array);

+     /* make sure item to be inserted is not toasted */
+     if (elmlen == -1 && !isNull)
+     {
+         ArrayFlattenState afstate;
+
+         init_flatten_support(&afstate, elmtype, elmlen, elmalign);
+         FLATTEN_ARRAY_ELEMENT(dataValue, elmlen, afstate);
+         end_flatten_support(&afstate);
+     }
+
      /*
       * if number of dims is zero, i.e. an empty array, create an array with
       * nSubscripts dimensions, and set the lower bounds to the supplied
*************** array_set(ArrayType *array,
*** 2130,2137 ****
       */
      if (ndim == 0)
      {
-         Oid            elmtype = ARR_ELEMTYPE(array);
-
          for (i = 0; i < nSubscripts; i++)
          {
              dim[i] = 1;
--- 2197,2202 ----
*************** array_map(FunctionCallInfo fcinfo, Oid i
*** 2688,2693 ****
--- 2753,2759 ----
      int            bitmask;
      ArrayMetaState *inp_extra;
      ArrayMetaState *ret_extra;
+     ArrayFlattenState afstate;

      /* Get input array */
      if (fcinfo->nargs < 1)
*************** array_map(FunctionCallInfo fcinfo, Oid i
*** 2741,2746 ****
--- 2807,2814 ----
      typbyval = ret_extra->typbyval;
      typalign = ret_extra->typalign;

+     init_flatten_support(&afstate, retType, typlen, typalign);
+
      /* Allocate temporary arrays for new values */
      values = (Datum *) palloc(nitems * sizeof(Datum));
      nulls = (bool *) palloc(nitems * sizeof(bool));
*************** array_map(FunctionCallInfo fcinfo, Oid i
*** 2800,2807 ****
          else
          {
              /* Ensure data is not toasted */
!             if (typlen == -1)
!                 values[i] = PointerGetDatum(PG_DETOAST_DATUM(values[i]));
              /* Update total result size */
              nbytes = att_addlength_datum(nbytes, typlen, values[i]);
              nbytes = att_align_nominal(nbytes, typalign);
--- 2868,2874 ----
          else
          {
              /* Ensure data is not toasted */
!             FLATTEN_ARRAY_ELEMENT(values[i], typlen, afstate);
              /* Update total result size */
              nbytes = att_addlength_datum(nbytes, typlen, values[i]);
              nbytes = att_align_nominal(nbytes, typalign);
*************** construct_md_array(Datum *elems,
*** 2921,2926 ****
--- 2988,2994 ----
      int32        dataoffset;
      int            i;
      int            nelems;
+     ArrayFlattenState afstate;

      if (ndims < 0)                /* we do allow zero-dimension arrays */
          ereport(ERROR,
*************** construct_md_array(Datum *elems,
*** 2938,2943 ****
--- 3006,3013 ----

      nelems = ArrayGetNItems(ndims, dims);

+     init_flatten_support(&afstate, elmtype, elmlen, elmalign);
+
      /* compute required space */
      nbytes = 0;
      hasnulls = false;
*************** construct_md_array(Datum *elems,
*** 2949,2956 ****
              continue;
          }
          /* make sure data is not toasted */
!         if (elmlen == -1)
!             elems[i] = PointerGetDatum(PG_DETOAST_DATUM(elems[i]));
          nbytes = att_addlength_datum(nbytes, elmlen, elems[i]);
          nbytes = att_align_nominal(nbytes, elmalign);
          /* check for overflow of total request */
--- 3019,3026 ----
              continue;
          }
          /* make sure data is not toasted */
!         FLATTEN_ARRAY_ELEMENT(elems[i], elmlen, afstate);
!         /* update total result size */
          nbytes = att_addlength_datum(nbytes, elmlen, elems[i]);
          nbytes = att_align_nominal(nbytes, elmalign);
          /* check for overflow of total request */
*************** construct_md_array(Datum *elems,
*** 2961,2966 ****
--- 3031,3038 ----
                              (int) MaxAllocSize)));
      }

+     end_flatten_support(&afstate);
+
      /* Allocate and initialize result array */
      if (hasnulls)
      {
*************** array_free_iterator(ArrayIterator iterat
*** 4070,4075 ****
--- 4142,4276 ----
  /***************************************************************************/

  /*
+  * Array element detoasting ("flattening") support
+  *
+  * It is critical that no external TOAST pointers appear within an array,
+  * either directly as an array element or within a non-scalar element type,
+  * so we must detoast any value that is about to be inserted into an array.
+  *
+  * Since we allow composite-type datums to contain toasted fields (at the
+  * first level; nested toasted values aren't allowed), we have to work extra
+  * hard to get rid of possible toasting when the array element type is
+  * composite.  We assume that other composites such as ranges contain no
+  * external pointers, though.
+  *
+  * In addition to the no-external-pointers requirement, we make a policy
+  * decision that array elements not be compressed.    We expect that compression
+  * would be better applied to the whole array rather than individual elements.
+  *
+  * The current implementation of FLATTEN_ARRAY_ELEMENT() also forces varlena
+  * elements to not be in short-header format, but this is an implementation
+  * artifact that should probably be removed.
+  */
+
+ /*
+  * Prepare for calls to FLATTEN_ARRAY_ELEMENT()
+  *
+  * Currently, we only have typlen/typbyval/typalign information cached for
+  * the array element type.    To minimize unnecessary syscache lookups, this
+  * function uses the knowledge that a composite type must have typlen -1
+  * (varlena) and alignment 'd'.  There are only a few non-composite, non-array
+  * types for which this is true, so though imperfect this heuristic should
+  * work well enough.
+  */
+ static void
+ init_flatten_support(ArrayFlattenState *afstate, Oid elmtype,
+                      int16 typlen, char typalign)
+ {
+     if (typlen == -1 && typalign == 'd')
+     {
+         /* Possibly composite */
+         if (elmtype == RECORDOID)
+         {
+             /* We can't identify the specific record type yet */
+             afstate->typiscomposite = true;
+             afstate->tupdesc = NULL;
+         }
+         else if (type_is_rowtype(elmtype))
+         {
+             /* Named composite type, we can look it up right away */
+             afstate->typiscomposite = true;
+             afstate->tupdesc = lookup_rowtype_tupdesc(elmtype, -1);
+         }
+         else
+         {
+             /* It's not composite */
+             afstate->typiscomposite = false;
+             afstate->tupdesc = NULL;
+         }
+     }
+     else
+     {
+         /* Element type cannot be composite */
+         afstate->typiscomposite = false;
+         afstate->tupdesc = NULL;
+     }
+ }
+
+ /*
+  * Detoast ("flatten") a composite value
+  *
+  * This is the out-of-line portion of the FLATTEN_ARRAY_ELEMENT macro.
+  * We assume we have a non-null, known-composite value to work on.
+  * Note: for some logic paths, it's important that this does not
+  * copy the Datum unnecessarily.
+  */
+ static Datum
+ flatten_composite_element(Datum value, ArrayFlattenState *afstate)
+ {
+     TupleDesc    tupleDesc;
+     HeapTupleHeader tupdata;
+
+     /*
+      * If the value is itself toasted (even just to the extent of having a
+      * short varlena header), then it must have been incorporated into some
+      * larger composite structure, so it should not contain any embedded toast
+      * pointers.  We need only detoast it overall and we're done.
+      */
+     if (VARATT_IS_EXTENDED(DatumGetPointer(value)))
+         return PointerGetDatum(PG_DETOAST_DATUM(value));
+
+     /*
+      * When working with an array of RECORD, it's possible that each array
+      * element is of a different record subtype, so we have to be prepared to
+      * look up the appropriate tupdesc here.  But we cache the tupdesc across
+      * calls as much as we can.
+      */
+     tupleDesc = afstate->tupdesc;
+     tupdata = DatumGetHeapTupleHeader(value);
+     if (HeapTupleHeaderGetTypeId(tupdata) == RECORDOID)
+     {
+         int32        typmod = HeapTupleHeaderGetTypMod(tupdata);
+
+         if (tupleDesc && tupleDesc->tdtypmod != typmod)
+         {
+             ReleaseTupleDesc(tupleDesc);
+             tupleDesc = NULL;
+         }
+         if (tupleDesc == NULL)
+         {
+             tupleDesc = lookup_rowtype_tupdesc(RECORDOID, typmod);
+             afstate->tupdesc = tupleDesc;
+         }
+     }
+
+     /* tuptoaster.c does the actual work */
+     return toast_flatten_tuple_datum(value, tupleDesc);
+ }
+
+ /*
+  * Clean up after calls to FLATTEN_ARRAY_ELEMENT()
+  */
+ static void
+ end_flatten_support(ArrayFlattenState *afstate)
+ {
+     /* We have to release any refcount we may hold on a typcache tupdesc */
+     if (afstate->tupdesc)
+         ReleaseTupleDesc(afstate->tupdesc);
+ }
+
+
+ /*
   * Check whether a specific array element is NULL
   *
   * nullbitmap: pointer to array's null bitmap (NULL if none)
*************** accumArrayResult(ArrayBuildState *astate
*** 4617,4628 ****
       * because construct_md_array can detoast the array elements later.
       * However, we must not let construct_md_array modify the ArrayBuildState
       * because that would mean array_agg_finalfn damages its input, which is
!      * verboten.  Also, this way frequently saves one copying step.)
       */
      if (!disnull && !astate->typbyval)
      {
          if (astate->typlen == -1)
!             dvalue = PointerGetDatum(PG_DETOAST_DATUM_COPY(dvalue));
          else
              dvalue = datumCopy(dvalue, astate->typbyval, astate->typlen);
      }
--- 4818,4837 ----
       * because construct_md_array can detoast the array elements later.
       * However, we must not let construct_md_array modify the ArrayBuildState
       * because that would mean array_agg_finalfn damages its input, which is
!      * verboten.  Also, this way frequently saves one copying step.  It's
!      * annoying however that we might be doing a syscache lookup for each item
!      * when the element type is composite.)
       */
      if (!disnull && !astate->typbyval)
      {
          if (astate->typlen == -1)
!         {
!             ArrayFlattenState afstate;
!
!             init_flatten_support(&afstate, element_type, -1, astate->typalign);
!             FLATTEN_ARRAY_ELEMENT_COPY(dvalue, -1, false, afstate);
!             end_flatten_support(&afstate);
!         }
          else
              dvalue = datumCopy(dvalue, astate->typbyval, astate->typlen);
      }
*************** array_fill_internal(ArrayType *dims, Arr
*** 5037,5043 ****

          /* make sure data is not toasted */
          if (elmlen == -1)
!             value = PointerGetDatum(PG_DETOAST_DATUM(value));

          nbytes = att_addlength_datum(0, elmlen, value);
          nbytes = att_align_nominal(nbytes, elmalign);
--- 5246,5258 ----

          /* make sure data is not toasted */
          if (elmlen == -1)
!         {
!             ArrayFlattenState afstate;
!
!             init_flatten_support(&afstate, elmtype, elmlen, elmalign);
!             FLATTEN_ARRAY_ELEMENT(value, elmlen, afstate);
!             end_flatten_support(&afstate);
!         }

          nbytes = att_addlength_datum(0, elmlen, value);
          nbytes = att_align_nominal(nbytes, elmalign);
*************** array_replace_internal(ArrayType *array,
*** 5277,5286 ****
       */
      if (typlen == -1)
      {
          if (!search_isnull)
!             search = PointerGetDatum(PG_DETOAST_DATUM(search));
          if (!replace_isnull)
!             replace = PointerGetDatum(PG_DETOAST_DATUM(replace));
      }

      /* Prepare to apply the comparison operator */
--- 5492,5505 ----
       */
      if (typlen == -1)
      {
+         ArrayFlattenState afstate;
+
+         init_flatten_support(&afstate, element_type, typlen, typalign);
          if (!search_isnull)
!             FLATTEN_ARRAY_ELEMENT(search, typlen, afstate);
          if (!replace_isnull)
!             FLATTEN_ARRAY_ELEMENT(replace, typlen, afstate);
!         end_flatten_support(&afstate);
      }

      /* Prepare to apply the comparison operator */
diff --git a/src/include/access/tuptoaster.h b/src/include/access/tuptoaster.h
index 296d016..5cdefad 100644
*** a/src/include/access/tuptoaster.h
--- b/src/include/access/tuptoaster.h
*************** extern struct varlena *heap_tuple_untoas
*** 184,199 ****
  extern HeapTuple toast_flatten_tuple(HeapTuple tup, TupleDesc tupleDesc);

  /* ----------
   * toast_flatten_tuple_attribute -
   *
   *    If a Datum is of composite type, "flatten" it to contain no toasted fields.
!  *    This must be invoked on any potentially-composite field that is to be
!  *    inserted into a tuple.    Doing this preserves the invariant that toasting
!  *    goes only one level deep in a tuple.
   * ----------
   */
! extern Datum toast_flatten_tuple_attribute(Datum value,
!                               Oid typeId, int32 typeMod);

  /* ----------
   * toast_compress_datum -
--- 184,208 ----
  extern HeapTuple toast_flatten_tuple(HeapTuple tup, TupleDesc tupleDesc);

  /* ----------
+  * toast_flatten_tuple_datum -
+  *
+  *    "Flatten" a composite Datum to contain no toasted fields.
+  *    This must be invoked on any composite value that is to be inserted into
+  *    a tuple, array, range, etc.  Doing this preserves the invariant that
+  *    toasting goes only one level deep in a tuple.
+  * ----------
+  */
+ extern Datum toast_flatten_tuple_datum(Datum value, TupleDesc tupleDesc);
+
+ /* ----------
   * toast_flatten_tuple_attribute -
   *
   *    If a Datum is of composite type, "flatten" it to contain no toasted fields.
!  *    This is a convenience routine for doing toast_flatten_tuple_datum() on
!  *    tuple fields.
   * ----------
   */
! extern Datum toast_flatten_tuple_attribute(Datum value, Form_pg_attribute att);

  /* ----------
   * toast_compress_datum -

Re: Composite Datums containing toasted fields are a bad idea(?)

От
Tom Lane
Дата:
I wrote:
> The main problem with this patch, as I see it, is that it'll introduce
> extra syscache lookup overhead even when there are no toasted fields
> anywhere.  I've not really tried to quantify how much, since that would
> require first agreeing on a benchmark case --- anybody have a thought
> about what that ought to be?  But in at least some of these functions,
> we'll be paying a cache lookup or two per array element, which doesn't
> sound promising.

I created a couple of test cases that I think are close to worst-case for
accumArrayResult() and array_set(), which are the two functions that seem
most worth worrying about.  The accumArrayResult test case is

create table manycomplex as
select row(random(),random())::complex as c
from generate_series(1,1000000);
vacuum analyze manycomplex;

then time:

select array_dims(array_agg(c)) from manycomplex;

On HEAD, this takes about 295-300 msec on my machine in a non-cassert
build.  With the patch I sent previously, the time goes to 495-500 msec.
Ouch.

The other test case is

create or replace function timearrayset(n int) returns void language plpgsql as
$$
declare a complex[];
begin
a := array[row(1,2), row(3,4), row(5,6)];
for i in 1..$1 loop
   a[2] := row(5,6)::complex;
end loop;
end;
$$;

select timearrayset(1000000);

This goes from circa 1250 ms in HEAD to around 1680 ms.

Some poking around with oprofile says that much of the added time is
coming from added syscache and typcache lookups, as I suspected.

I did some further work on the patch to make it possible to cache
the element tuple descriptor across calls for these two functions.
With the attached patch, the first test case comes down to about 335 ms
and the second to about 1475 ms (plpgsql is still leaving some extra
cache lookups on the table).  More could be done with some further API
changes, but I think this is about as much as is safe to back-patch.

At least in the accumArrayResult test case, it would be possible to
bring the runtime back to very close to HEAD if we were willing to
abandon the principle that compressed fields within a tuple should
always get decompressed when the tuple goes into a larger structure.
That would allow flatten_composite_element to quit early if the
tuple doesn't have the HEAP_HASEXTERNAL infomask bit set.  I'm not
sure that this would be a good tradeoff however: if we fail to remove nested
compression, we could end up paying for that with double compression.
On the other hand, it's unclear that that case would come up often,
while the overhead of disassembling the tuple is definitely going to
happen every time we assign to an array-of-composites element.  (Also,
there is no question here of regression relative to previous releases,
since the existing code fails to prevent nested compression.)

Comments?

            regards, tom lane

diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c
index aea9d40..48b09b8 100644
*** a/src/backend/access/common/heaptuple.c
--- b/src/backend/access/common/heaptuple.c
*************** heap_form_tuple(TupleDesc tupleDescripto
*** 649,674 ****
       * Check for nulls and embedded tuples; expand any toasted attributes in
       * embedded tuples.  This preserves the invariant that toasting can only
       * go one level deep.
-      *
-      * We can skip calling toast_flatten_tuple_attribute() if the attribute
-      * couldn't possibly be of composite type.  All composite datums are
-      * varlena and have alignment 'd'; furthermore they aren't arrays. Also,
-      * if an attribute is already toasted, it must have been sent to disk
-      * already and so cannot contain toasted attributes.
       */
      for (i = 0; i < numberOfAttributes; i++)
      {
          if (isnull[i])
              hasnull = true;
!         else if (att[i]->attlen == -1 &&
!                  att[i]->attalign == 'd' &&
!                  att[i]->attndims == 0 &&
!                  !VARATT_IS_EXTENDED(DatumGetPointer(values[i])))
!         {
!             values[i] = toast_flatten_tuple_attribute(values[i],
!                                                       att[i]->atttypid,
!                                                       att[i]->atttypmod);
!         }
      }

      /*
--- 649,661 ----
       * Check for nulls and embedded tuples; expand any toasted attributes in
       * embedded tuples.  This preserves the invariant that toasting can only
       * go one level deep.
       */
      for (i = 0; i < numberOfAttributes; i++)
      {
          if (isnull[i])
              hasnull = true;
!         else
!             values[i] = toast_flatten_tuple_attribute(values[i], att[i]);
      }

      /*
*************** heap_form_minimal_tuple(TupleDesc tupleD
*** 1403,1428 ****
       * Check for nulls and embedded tuples; expand any toasted attributes in
       * embedded tuples.  This preserves the invariant that toasting can only
       * go one level deep.
-      *
-      * We can skip calling toast_flatten_tuple_attribute() if the attribute
-      * couldn't possibly be of composite type.  All composite datums are
-      * varlena and have alignment 'd'; furthermore they aren't arrays. Also,
-      * if an attribute is already toasted, it must have been sent to disk
-      * already and so cannot contain toasted attributes.
       */
      for (i = 0; i < numberOfAttributes; i++)
      {
          if (isnull[i])
              hasnull = true;
!         else if (att[i]->attlen == -1 &&
!                  att[i]->attalign == 'd' &&
!                  att[i]->attndims == 0 &&
!                  !VARATT_IS_EXTENDED(values[i]))
!         {
!             values[i] = toast_flatten_tuple_attribute(values[i],
!                                                       att[i]->atttypid,
!                                                       att[i]->atttypmod);
!         }
      }

      /*
--- 1390,1402 ----
       * Check for nulls and embedded tuples; expand any toasted attributes in
       * embedded tuples.  This preserves the invariant that toasting can only
       * go one level deep.
       */
      for (i = 0; i < numberOfAttributes; i++)
      {
          if (isnull[i])
              hasnull = true;
!         else
!             values[i] = toast_flatten_tuple_attribute(values[i], att[i]);
      }

      /*
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 9a821d3..2d1a922 100644
*** a/src/backend/access/heap/tuptoaster.c
--- b/src/backend/access/heap/tuptoaster.c
*************** toast_insert_or_update(Relation rel, Hea
*** 991,996 ****
--- 991,999 ----
   *
   *    "Flatten" a tuple to contain no out-of-line toasted fields.
   *    (This does not eliminate compressed or short-header datums.)
+  *
+  *    Note: we expect the caller already checked HeapTupleHasExternal(tup),
+  *    so there is no need for a short-circuit path.
   * ----------
   */
  HeapTuple
*************** toast_flatten_tuple(HeapTuple tup, Tuple
*** 1068,1097 ****


  /* ----------
!  * toast_flatten_tuple_attribute -
   *
!  *    If a Datum is of composite type, "flatten" it to contain no toasted fields.
!  *    This must be invoked on any potentially-composite field that is to be
!  *    inserted into a tuple.    Doing this preserves the invariant that toasting
!  *    goes only one level deep in a tuple.
   *
!  *    Note that flattening does not mean expansion of short-header varlenas,
!  *    so in one sense toasting is allowed within composite datums.
   * ----------
   */
  Datum
! toast_flatten_tuple_attribute(Datum value,
!                               Oid typeId, int32 typeMod)
  {
-     TupleDesc    tupleDesc;
      HeapTupleHeader olddata;
      HeapTupleHeader new_data;
      int32        new_header_len;
      int32        new_data_len;
      int32        new_tuple_len;
      HeapTupleData tmptup;
!     Form_pg_attribute *att;
!     int            numAttrs;
      int            i;
      bool        need_change = false;
      bool        has_nulls = false;
--- 1071,1110 ----


  /* ----------
!  * toast_flatten_tuple_datum -
   *
!  *    "Flatten" a composite Datum to contain no toasted fields.
!  *    This must be invoked on any composite value that is to be inserted into
!  *    a tuple, array, range, etc.  Doing this preserves the invariant that
!  *    toasting goes only one level deep in a tuple.
   *
!  *    Unlike toast_flatten_tuple, this does expand compressed fields.  That's
!  *    not necessary for correctness, but is a policy decision based on the
!  *    assumption that compression will be more effective if applied to the
!  *    whole tuple not individual fields.
!  *
!  *    On the other hand, in-line short-header varlena fields are left alone.
!  *    If we "untoasted" them here, they'd just get changed back to short-header
!  *    format anyway within heap_fill_tuple.
!  *
!  *    Note: generally speaking, callers should skip applying this to Datums
!  *    that are toasted overall, even just to the extent of being in short-header
!  *    format.  That implies that the Datum has previously been stored within
!  *    some tuple, array, range, etc, and therefore it should already not contain
!  *    any out-of-line fields.
   * ----------
   */
  Datum
! toast_flatten_tuple_datum(Datum value, TupleDesc tupleDesc)
  {
      HeapTupleHeader olddata;
      HeapTupleHeader new_data;
      int32        new_header_len;
      int32        new_data_len;
      int32        new_tuple_len;
      HeapTupleData tmptup;
!     Form_pg_attribute *att = tupleDesc->attrs;
!     int            numAttrs = tupleDesc->natts;
      int            i;
      bool        need_change = false;
      bool        has_nulls = false;
*************** toast_flatten_tuple_attribute(Datum valu
*** 1100,1120 ****
      bool        toast_free[MaxTupleAttributeNumber];

      /*
-      * See if it's a composite type, and get the tupdesc if so.
-      */
-     tupleDesc = lookup_rowtype_tupdesc_noerror(typeId, typeMod, true);
-     if (tupleDesc == NULL)
-         return value;            /* not a composite type */
-
-     att = tupleDesc->attrs;
-     numAttrs = tupleDesc->natts;
-
-     /*
       * Break down the tuple into fields.
       */
      olddata = DatumGetHeapTupleHeader(value);
!     Assert(typeId == HeapTupleHeaderGetTypeId(olddata));
!     Assert(typeMod == HeapTupleHeaderGetTypMod(olddata));
      /* Build a temporary HeapTuple control structure */
      tmptup.t_len = HeapTupleHeaderGetDatumLength(olddata);
      ItemPointerSetInvalid(&(tmptup.t_self));
--- 1113,1128 ----
      bool        toast_free[MaxTupleAttributeNumber];

      /*
       * Break down the tuple into fields.
+      *
+      * Note: if the supplied datum is toasted overall, DatumGetHeapTupleHeader
+      * will detoast it, and then we'll leak the detoasted copy.  This is not
+      * worth fixing because callers shouldn't call us on toasted datums
+      * anyway.
       */
      olddata = DatumGetHeapTupleHeader(value);
!     Assert(HeapTupleHeaderGetTypeId(olddata) == tupleDesc->tdtypeid);
!     Assert(HeapTupleHeaderGetTypMod(olddata) == tupleDesc->tdtypmod);
      /* Build a temporary HeapTuple control structure */
      tmptup.t_len = HeapTupleHeaderGetDatumLength(olddata);
      ItemPointerSetInvalid(&(tmptup.t_self));
*************** toast_flatten_tuple_attribute(Datum valu
*** 1150,1162 ****
      }

      /*
!      * If nothing to untoast, just return the original tuple.
       */
      if (!need_change)
-     {
-         ReleaseTupleDesc(tupleDesc);
          return value;
-     }

      /*
       * Calculate the new size of the tuple.
--- 1158,1167 ----
      }

      /*
!      * If nothing to untoast, just return the original datum.
       */
      if (!need_change)
          return value;

      /*
       * Calculate the new size of the tuple.
*************** toast_flatten_tuple_attribute(Datum valu
*** 1202,1214 ****
      for (i = 0; i < numAttrs; i++)
          if (toast_free[i])
              pfree(DatumGetPointer(toast_values[i]));
-     ReleaseTupleDesc(tupleDesc);

      return PointerGetDatum(new_data);
  }


  /* ----------
   * toast_compress_datum -
   *
   *    Create a compressed version of a varlena datum
--- 1207,1269 ----
      for (i = 0; i < numAttrs; i++)
          if (toast_free[i])
              pfree(DatumGetPointer(toast_values[i]));

      return PointerGetDatum(new_data);
  }


  /* ----------
+  * toast_flatten_tuple_attribute -
+  *
+  *    If a Datum is of composite type, "flatten" it to contain no toasted fields.
+  *    This is a convenience routine for doing toast_flatten_tuple_datum() on
+  *    tuple fields.
+  * ----------
+  */
+ Datum
+ toast_flatten_tuple_attribute(Datum value, Form_pg_attribute att)
+ {
+     TupleDesc    tupleDesc;
+
+     /*
+      * Exit quickly if the attribute couldn't possibly be of composite type.
+      * All composite datums are varlena and have alignment 'd'; furthermore
+      * they aren't arrays.  This heuristic is sufficient to eliminate all
+      * but a few non-composite types.
+      */
+     if (att->attlen != -1 ||
+         att->attalign != 'd' ||
+         att->attndims != 0)
+         return value;
+
+     /*
+      * Also, if the datum is already toasted, it must have already been stored
+      * within some larger tuple (or array, range, etc) and so it doesn't need
+      * to be flattened again.  (It is not our job here to force it to be
+      * uncompressed, so don't worry about that.)
+      */
+     if (VARATT_IS_EXTENDED(DatumGetPointer(value)))
+         return value;
+
+     /*
+      * See if it's a composite type, and get the tupdesc if so.
+      */
+     tupleDesc = lookup_rowtype_tupdesc_noerror(att->atttypid,
+                                                att->atttypmod,
+                                                true);
+     if (tupleDesc == NULL)
+         return value;            /* not a composite type */
+
+     /* OK, flatten it */
+     value = toast_flatten_tuple_datum(value, tupleDesc);
+
+     ReleaseTupleDesc(tupleDesc);
+
+     return value;
+ }
+
+
+ /* ----------
   * toast_compress_datum -
   *
   *    Create a compressed version of a varlena datum
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 0eba025..d84c051 100644
*** a/src/backend/executor/execQual.c
--- b/src/backend/executor/execQual.c
*************** ExecEvalArrayRef(ArrayRefExprState *asta
*** 449,462 ****
          }

          if (lIndex == NULL)
!             resultArray = array_set(array_source, i,
!                                     upper.indx,
!                                     sourceData,
!                                     eisnull,
!                                     astate->refattrlength,
!                                     astate->refelemlength,
!                                     astate->refelembyval,
!                                     astate->refelemalign);
          else
              resultArray = array_set_slice(array_source, i,
                                            upper.indx, lower.indx,
--- 449,492 ----
          }

          if (lIndex == NULL)
!         {
!             /*
!              * If element type is composite, look up the tupledesc.  This is
!              * not because array_set_element couldn't do so, but because we
!              * need to install the ShutdownTupleDescRef callback to clean up
!              * the tupledesc reference later, and we don't have another easy
!              * way to remember if that's been done already.
!              */
!             if (astate->refiscomposite && !eisnull)
!             {
!                 HeapTupleHeader tuple;
!                 Oid            tupType;
!                 int32        tupTypmod;
!
!                 tuple = DatumGetHeapTupleHeader(sourceData);
!
!                 tupType = HeapTupleHeaderGetTypeId(tuple);
!                 tupTypmod = HeapTupleHeaderGetTypMod(tuple);
!
!                 (void) get_cached_rowtype(tupType, tupTypmod,
!                                           &astate->refelemtupdesc,
!                                           econtext);
!
!                 /* If we detoasted sourceData overall, pass that on */
!                 sourceData = PointerGetDatum(tuple);
!             }
!
!             resultArray = array_set_element(array_source, i,
!                                             upper.indx,
!                                             sourceData,
!                                             eisnull,
!                                             astate->refattrlength,
!                                             astate->refelemlength,
!                                             astate->refelembyval,
!                                             astate->refelemalign,
!                                             astate->refiscomposite,
!                                             &astate->refelemtupdesc);
!         }
          else
              resultArray = array_set_slice(array_source, i,
                                            upper.indx, lower.indx,
*************** ExecInitExpr(Expr *node, PlanState *pare
*** 4510,4519 ****
                                                      parent);
                  /* do one-time catalog lookups for type info */
                  astate->refattrlength = get_typlen(aref->refarraytype);
!                 get_typlenbyvalalign(aref->refelemtype,
!                                      &astate->refelemlength,
!                                      &astate->refelembyval,
!                                      &astate->refelemalign);
                  state = (ExprState *) astate;
              }
              break;
--- 4540,4551 ----
                                                      parent);
                  /* do one-time catalog lookups for type info */
                  astate->refattrlength = get_typlen(aref->refarraytype);
!                 get_typlenbyvalaligncomp(aref->refelemtype,
!                                          &astate->refelemlength,
!                                          &astate->refelembyval,
!                                          &astate->refelemalign,
!                                          &astate->refiscomposite);
!                 astate->refelemtupdesc = NULL;
                  state = (ExprState *) astate;
              }
              break;
diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index 91df184..c52a5de 100644
*** a/src/backend/utils/adt/arrayfuncs.c
--- b/src/backend/utils/adt/arrayfuncs.c
***************
*** 17,22 ****
--- 17,23 ----
  #include <ctype.h>

  #include "access/htup_details.h"
+ #include "access/tuptoaster.h"
  #include "funcapi.h"
  #include "libpq/pqformat.h"
  #include "utils/array.h"
*************** typedef struct ArrayIteratorData
*** 74,79 ****
--- 75,130 ----
      int            current_item;    /* the item # we're at in the array */
  }    ArrayIteratorData;

+ /* Support for array element detoasting */
+
+ /* Initialize state needed by FLATTEN_ARRAY_ELEMENT */
+ #define INIT_FLATTEN(tupdesc) \
+     do { \
+         tupdesc = NULL; \
+     } while (0)
+
+ /* Call this on each non-NULL value about to be inserted into an array */
+ #define FLATTEN_ARRAY_ELEMENT(val, typlen, typbyval, typiscomposite, tupdesc) \
+     do { \
+         if ((typlen) == -1) \
+         { \
+             if (typiscomposite) \
+                 val = flatten_composite_element(val, &(tupdesc)); \
+             else \
+                 val = PointerGetDatum(PG_DETOAST_DATUM(val)); \
+         } \
+     } while (0)
+
+ /* Variant that guarantees to make a copy */
+ #define FLATTEN_ARRAY_ELEMENT_COPY(val, typlen, typbyval, typiscomposite, tupdesc) \
+     do { \
+         if ((typlen) == -1) \
+         { \
+             if (typiscomposite) \
+             { \
+                 Datum _orig_val = (val); \
+                 val = flatten_composite_element(val, &(tupdesc)); \
+                 if (val == _orig_val) \
+                     val = datumCopy(val, typbyval, typlen); \
+             } \
+             else \
+                 val = PointerGetDatum(PG_DETOAST_DATUM_COPY(val)); \
+         } \
+         else \
+             val = datumCopy(val, typbyval, typlen); \
+     } while (0)
+
+ /* Clean up after some FLATTEN_ARRAY_ELEMENT calls */
+ #define END_FLATTEN(tupdesc) \
+     do { \
+         if (tupdesc) \
+         { \
+             ReleaseTupleDesc(tupdesc); \
+             tupdesc = NULL; \
+         } \
+     } while (0)
+
+ /* Local functions */
  static bool array_isspace(char ch);
  static int    ArrayCount(const char *str, int *dim, char typdelim);
  static void ReadArrayStr(char *arrayStr, const char *origStr,
*************** static void CopyArrayEls(ArrayType *arra
*** 92,97 ****
--- 143,151 ----
               Datum *values, bool *nulls, int nitems,
               int typlen, bool typbyval, char typalign,
               bool freedata);
+ static int    array_cmp(FunctionCallInfo fcinfo);
+ static bool element_type_is_composite(Oid elmtype, int16 typlen, char typalign);
+ static Datum flatten_composite_element(Datum value, TupleDesc *tupdescptr);
  static bool array_get_isnull(const bits8 *nullbitmap, int offset);
  static void array_set_isnull(bits8 *nullbitmap, int offset, bool isNull);
  static Datum ArrayCast(char *value, bool byval, int len);
*************** static void array_insert_slice(ArrayType
*** 119,125 ****
                     int ndim, int *dim, int *lb,
                     int *st, int *endp,
                     int typlen, bool typbyval, char typalign);
- static int    array_cmp(FunctionCallInfo fcinfo);
  static ArrayType *create_array_envelope(int ndims, int *dimv, int *lbv, int nbytes,
                        Oid elmtype, int dataoffset);
  static ArrayType *array_fill_internal(ArrayType *dims, ArrayType *lbs,
--- 173,178 ----
*************** ReadArrayStr(char *arrayStr,
*** 864,870 ****
              hasnull = true;
          else
          {
!             /* let's just make sure data is not toasted */
              if (typlen == -1)
                  values[i] = PointerGetDatum(PG_DETOAST_DATUM(values[i]));
              totbytes = att_addlength_datum(totbytes, typlen, values[i]);
--- 917,930 ----
              hasnull = true;
          else
          {
!             /*
!              * Let's just make sure data is not toasted.  It seems quite
!              * unlikely that the input function could have produced a toasted
!              * value at all, and certainly it shouldn't have produced
!              * something containing nested toasted values, so we don't bother
!              * with FLATTEN_ARRAY_ELEMENT().  But a PG_DETOAST_DATUM() call is
!              * reasonably cheap when it's a no-op.
!              */
              if (typlen == -1)
                  values[i] = PointerGetDatum(PG_DETOAST_DATUM(values[i]));
              totbytes = att_addlength_datum(totbytes, typlen, values[i]);
*************** ReadArrayBinary(StringInfo buf,
*** 1466,1472 ****
              hasnull = true;
          else
          {
!             /* let's just make sure data is not toasted */
              if (typlen == -1)
                  values[i] = PointerGetDatum(PG_DETOAST_DATUM(values[i]));
              totbytes = att_addlength_datum(totbytes, typlen, values[i]);
--- 1526,1539 ----
              hasnull = true;
          else
          {
!             /*
!              * Let's just make sure data is not toasted.  It seems quite
!              * unlikely that the receive function could have produced a
!              * toasted value at all, and certainly it shouldn't have produced
!              * something containing nested toasted values, so we don't bother
!              * with FLATTEN_ARRAY_ELEMENT().  But a PG_DETOAST_DATUM() call is
!              * reasonably cheap when it's a no-op.
!              */
              if (typlen == -1)
                  values[i] = PointerGetDatum(PG_DETOAST_DATUM(values[i]));
              totbytes = att_addlength_datum(totbytes, typlen, values[i]);
*************** array_get_slice(ArrayType *array,
*** 2019,2025 ****
  }

  /*
!  * array_set :
   *          This routine sets the value of an array element (specified by
   *          a subscript array) to a new value specified by "dataValue".
   *
--- 2086,2092 ----
  }

  /*
!  * array_set_element :
   *          This routine sets the value of an array element (specified by
   *          a subscript array) to a new value specified by "dataValue".
   *
*************** array_get_slice(ArrayType *array,
*** 2035,2045 ****
--- 2102,2117 ----
   *    elmlen: pg_type.typlen for the array's element type
   *    elmbyval: pg_type.typbyval for the array's element type
   *    elmalign: pg_type.typalign for the array's element type
+  *    elmiscomposite: true if element type is composite
+  *    element_tupdesc: pointer to where to cache tupledesc, if composite
   *
   * Result:
   *          A new array is returned, just like the old except for the one
   *          modified entry.  The original array object is not changed.
   *
+  * Caller must arrange for any refcount on *element_tupdesc to be released
+  * when no longer useful.
+  *
   * For one-dimensional arrays only, we allow the array to be extended
   * by assigning to a position outside the existing subscript range; any
   * positions between the existing elements and the new one are set to NULLs.
*************** array_get_slice(ArrayType *array,
*** 2049,2063 ****
   * rather than returning a NULL as the fetch operations do.
   */
  ArrayType *
! array_set(ArrayType *array,
!           int nSubscripts,
!           int *indx,
!           Datum dataValue,
!           bool isNull,
!           int arraytyplen,
!           int elmlen,
!           bool elmbyval,
!           char elmalign)
  {
      ArrayType  *newarray;
      int            i,
--- 2121,2137 ----
   * rather than returning a NULL as the fetch operations do.
   */
  ArrayType *
! array_set_element(ArrayType *array,
!                   int nSubscripts,
!                   int *indx,
!                   Datum dataValue,
!                   bool isNull,
!                   int arraytyplen,
!                   int elmlen,
!                   bool elmbyval,
!                   char elmalign,
!                   bool elmiscomposite,
!                   TupleDesc *element_tupdesc)
  {
      ArrayType  *newarray;
      int            i,
*************** array_set(ArrayType *array,
*** 2116,2122 ****

      /* make sure item to be inserted is not toasted */
      if (elmlen == -1 && !isNull)
!         dataValue = PointerGetDatum(PG_DETOAST_DATUM(dataValue));

      /* detoast input array if necessary */
      array = DatumGetArrayTypeP(PointerGetDatum(array));
--- 2190,2201 ----

      /* make sure item to be inserted is not toasted */
      if (elmlen == -1 && !isNull)
!     {
!         FLATTEN_ARRAY_ELEMENT(dataValue,
!                               elmlen, elmbyval,
!                               elmiscomposite,
!                               *element_tupdesc);
!     }

      /* detoast input array if necessary */
      array = DatumGetArrayTypeP(PointerGetDatum(array));
*************** array_set(ArrayType *array,
*** 2306,2311 ****
--- 2385,2446 ----
  }

  /*
+  * array_set :
+  *        Backwards-compatibility function that doesn't require caller
+  *        to supply info about composite element types.
+  *
+  * Identical to array_set_element except elmiscomposite is computed internally
+  * and there is no chance to cache element tupledesc across calls.
+  */
+ ArrayType *
+ array_set(ArrayType *array,
+           int nSubscripts,
+           int *indx,
+           Datum dataValue,
+           bool isNull,
+           int arraytyplen,
+           int elmlen,
+           bool elmbyval,
+           char elmalign)
+ {
+     ArrayType  *newarray;
+     bool        elmiscomposite;
+     TupleDesc    elmtupdesc;
+
+     if (arraytyplen < 0)
+     {
+         Oid            elmtype;
+
+         /* detoast input array if necessary */
+         array = DatumGetArrayTypeP(PointerGetDatum(array));
+         /* now we can get the element type safely */
+         elmtype = ARR_ELEMTYPE(array);
+
+         elmiscomposite = element_type_is_composite(elmtype, elmlen, elmalign);
+     }
+     else
+         elmiscomposite = false; /* not possible for fixed-size array */
+
+     INIT_FLATTEN(elmtupdesc);
+
+     newarray = array_set_element(array,
+                                  nSubscripts,
+                                  indx,
+                                  dataValue,
+                                  isNull,
+                                  arraytyplen,
+                                  elmlen,
+                                  elmbyval,
+                                  elmalign,
+                                  elmiscomposite,
+                                  &elmtupdesc);
+
+     END_FLATTEN(elmtupdesc);
+
+     return newarray;
+ }
+
+ /*
   * array_set_slice :
   *          This routine sets the value of a range of array locations (specified
   *          by upper and lower subscript values) to new values passed as
*************** array_map(FunctionCallInfo fcinfo, Oid i
*** 2683,2688 ****
--- 2818,2825 ----
      int            typlen;
      bool        typbyval;
      char        typalign;
+     bool        typiscomposite;
+     TupleDesc    typtupdesc;
      char       *s;
      bits8       *bitmap;
      int            bitmask;
*************** array_map(FunctionCallInfo fcinfo, Oid i
*** 2741,2746 ****
--- 2878,2886 ----
      typbyval = ret_extra->typbyval;
      typalign = ret_extra->typalign;

+     typiscomposite = element_type_is_composite(retType, typlen, typalign);
+     INIT_FLATTEN(typtupdesc);
+
      /* Allocate temporary arrays for new values */
      values = (Datum *) palloc(nitems * sizeof(Datum));
      nulls = (bool *) palloc(nitems * sizeof(bool));
*************** array_map(FunctionCallInfo fcinfo, Oid i
*** 2800,2807 ****
          else
          {
              /* Ensure data is not toasted */
!             if (typlen == -1)
!                 values[i] = PointerGetDatum(PG_DETOAST_DATUM(values[i]));
              /* Update total result size */
              nbytes = att_addlength_datum(nbytes, typlen, values[i]);
              nbytes = att_align_nominal(nbytes, typalign);
--- 2940,2948 ----
          else
          {
              /* Ensure data is not toasted */
!             FLATTEN_ARRAY_ELEMENT(values[i],
!                                   typlen, typbyval,
!                                   typiscomposite, typtupdesc);
              /* Update total result size */
              nbytes = att_addlength_datum(nbytes, typlen, values[i]);
              nbytes = att_align_nominal(nbytes, typalign);
*************** array_map(FunctionCallInfo fcinfo, Oid i
*** 2825,2830 ****
--- 2966,2973 ----
          }
      }

+     END_FLATTEN(typtupdesc);
+
      /* Allocate and initialize the result array */
      if (hasnulls)
      {
*************** array_map(FunctionCallInfo fcinfo, Oid i
*** 2868,2874 ****
   * A palloc'd 1-D array object is constructed and returned.  Note that
   * elem values will be copied into the object even if pass-by-ref type.
   *
!  * NOTE: it would be cleaner to look up the elmlen/elmbval/elmalign info
   * from the system catalogs, given the elmtype.  However, the caller is
   * in a better position to cache this info across multiple uses, or even
   * to hard-wire values if the element type is hard-wired.
--- 3011,3017 ----
   * A palloc'd 1-D array object is constructed and returned.  Note that
   * elem values will be copied into the object even if pass-by-ref type.
   *
!  * NOTE: it would be cleaner to look up the elmlen/elmbyval/elmalign info
   * from the system catalogs, given the elmtype.  However, the caller is
   * in a better position to cache this info across multiple uses, or even
   * to hard-wire values if the element type is hard-wired.
*************** construct_array(Datum *elems, int nelems
*** 2902,2908 ****
   * A palloc'd ndims-D array object is constructed and returned.  Note that
   * elem values will be copied into the object even if pass-by-ref type.
   *
!  * NOTE: it would be cleaner to look up the elmlen/elmbval/elmalign info
   * from the system catalogs, given the elmtype.  However, the caller is
   * in a better position to cache this info across multiple uses, or even
   * to hard-wire values if the element type is hard-wired.
--- 3045,3051 ----
   * A palloc'd ndims-D array object is constructed and returned.  Note that
   * elem values will be copied into the object even if pass-by-ref type.
   *
!  * NOTE: it would be cleaner to look up the elmlen/elmbyval/elmalign info
   * from the system catalogs, given the elmtype.  However, the caller is
   * in a better position to cache this info across multiple uses, or even
   * to hard-wire values if the element type is hard-wired.
*************** construct_md_array(Datum *elems,
*** 2921,2926 ****
--- 3064,3071 ----
      int32        dataoffset;
      int            i;
      int            nelems;
+     bool        elmiscomposite;
+     TupleDesc    elmtupdesc;

      if (ndims < 0)                /* we do allow zero-dimension arrays */
          ereport(ERROR,
*************** construct_md_array(Datum *elems,
*** 2938,2943 ****
--- 3083,3091 ----

      nelems = ArrayGetNItems(ndims, dims);

+     elmiscomposite = element_type_is_composite(elmtype, elmlen, elmalign);
+     INIT_FLATTEN(elmtupdesc);
+
      /* compute required space */
      nbytes = 0;
      hasnulls = false;
*************** construct_md_array(Datum *elems,
*** 2949,2956 ****
              continue;
          }
          /* make sure data is not toasted */
!         if (elmlen == -1)
!             elems[i] = PointerGetDatum(PG_DETOAST_DATUM(elems[i]));
          nbytes = att_addlength_datum(nbytes, elmlen, elems[i]);
          nbytes = att_align_nominal(nbytes, elmalign);
          /* check for overflow of total request */
--- 3097,3106 ----
              continue;
          }
          /* make sure data is not toasted */
!         FLATTEN_ARRAY_ELEMENT(elems[i],
!                               elmlen, elmbyval,
!                               elmiscomposite, elmtupdesc);
!         /* update total result size */
          nbytes = att_addlength_datum(nbytes, elmlen, elems[i]);
          nbytes = att_align_nominal(nbytes, elmalign);
          /* check for overflow of total request */
*************** construct_md_array(Datum *elems,
*** 2961,2966 ****
--- 3111,3118 ----
                              (int) MaxAllocSize)));
      }

+     END_FLATTEN(elmtupdesc);
+
      /* Allocate and initialize result array */
      if (hasnulls)
      {
*************** construct_empty_array(Oid elmtype)
*** 3020,3026 ****
   * If array elements are pass-by-ref data type, the returned Datums will
   * be pointers into the array object.
   *
!  * NOTE: it would be cleaner to look up the elmlen/elmbval/elmalign info
   * from the system catalogs, given the elmtype.  However, in most current
   * uses the type is hard-wired into the caller and so we can save a lookup
   * cycle by hard-wiring the type info as well.
--- 3172,3178 ----
   * If array elements are pass-by-ref data type, the returned Datums will
   * be pointers into the array object.
   *
!  * NOTE: it would be cleaner to look up the elmlen/elmbyval/elmalign info
   * from the system catalogs, given the elmtype.  However, in most current
   * uses the type is hard-wired into the caller and so we can save a lookup
   * cycle by hard-wiring the type info as well.
*************** array_free_iterator(ArrayIterator iterat
*** 4070,4075 ****
--- 4222,4328 ----
  /***************************************************************************/

  /*
+  * Array element detoasting ("flattening") support
+  *
+  * It is critical that no external TOAST pointers appear within an array,
+  * either directly as an array element or within a non-scalar element type,
+  * so we must detoast any value that is about to be inserted into an array.
+  *
+  * Since we allow composite-type datums to contain toasted fields (at the
+  * first level; nested toasted values aren't allowed), we have to work extra
+  * hard to get rid of possible toasting when the array element type is
+  * composite.  We assume that other composites such as ranges contain no
+  * external pointers, though.
+  *
+  * In addition to the no-external-pointers requirement, we make a policy
+  * decision that array elements not be compressed.    We expect that compression
+  * would be better applied to the whole array rather than individual elements.
+  *
+  * The current implementation of FLATTEN_ARRAY_ELEMENT() also forces varlena
+  * elements to not be in short-header format, but this is an implementation
+  * artifact that should probably be removed.
+  */
+
+ /*
+  * Check if array element type is composite.
+  *
+  * Currently, we only have typlen/typbyval/typalign information cached for
+  * the array element type.    To minimize unnecessary syscache lookups, this
+  * function uses the knowledge that a composite type must have typlen -1
+  * (varlena) and alignment 'd'.  There are only a few non-composite, non-array
+  * types for which this is true, so though imperfect this heuristic should
+  * work well enough.
+  *
+  * XXX ideally this would go away in favor of always getting typiscomposite
+  * info in the same syscache lookup that we get typlen etc with.  That will
+  * require additional API changes though, so it doesn't seem back-patchable.
+  */
+ static bool
+ element_type_is_composite(Oid elmtype, int16 typlen, char typalign)
+ {
+     if (typlen == -1 && typalign == 'd' && type_is_rowtype(elmtype))
+         return true;
+     else
+         return false;
+ }
+
+ /*
+  * Detoast ("flatten") a composite value
+  *
+  * This is the out-of-line portion of the FLATTEN_ARRAY_ELEMENT macro.
+  * We assume we have a non-null, known-composite value to work on.
+  * The tuple descriptor for the type can be cached at *tupdescptr.
+  *
+  * Note: for some logic paths, it's important that this does not
+  * copy the Datum if re-invoked on a previously flattened Datum.
+  */
+ static Datum
+ flatten_composite_element(Datum value, TupleDesc *tupdescptr)
+ {
+     HeapTupleHeader tupdata;
+     Oid            typeid;
+     int32        typmod;
+     TupleDesc    tupleDesc;
+
+     /*
+      * If the value is itself toasted (even just to the extent of having a
+      * short varlena header), then it must have been incorporated into some
+      * larger composite structure, so it should not contain any embedded toast
+      * pointers.  We need only detoast it overall and we're done.
+      */
+     if (VARATT_IS_EXTENDED(DatumGetPointer(value)))
+         return PointerGetDatum(PG_DETOAST_DATUM(value));
+
+     /*
+      * When working with an array of RECORD, it's possible that each array
+      * element is of a different record subtype, so we have to be prepared to
+      * look up the appropriate tupdesc here.  But we cache the tupdesc across
+      * calls as much as we can.  For an array of a named composite type, we
+      * should only have to do one lookup, but it seems prudent to check the
+      * type every time anyway.
+      */
+     tupdata = DatumGetHeapTupleHeader(value);
+     typeid = HeapTupleHeaderGetTypeId(tupdata);
+     typmod = HeapTupleHeaderGetTypMod(tupdata);
+
+     tupleDesc = *tupdescptr;
+     if (tupleDesc &&
+         (tupleDesc->tdtypeid != typeid || tupleDesc->tdtypmod != typmod))
+     {
+         ReleaseTupleDesc(tupleDesc);
+         tupleDesc = NULL;
+     }
+     if (tupleDesc == NULL)
+     {
+         tupleDesc = lookup_rowtype_tupdesc(typeid, typmod);
+         *tupdescptr = tupleDesc;
+     }
+
+     /* tuptoaster.c does the actual work */
+     return toast_flatten_tuple_datum(value, tupleDesc);
+ }
+
+ /*
   * Check whether a specific array element is NULL
   *
   * nullbitmap: pointer to array's null bitmap (NULL if none)
*************** accumArrayResult(ArrayBuildState *astate
*** 4591,4600 ****
          astate->dnulls = (bool *) palloc(astate->alen * sizeof(bool));
          astate->nelems = 0;
          astate->element_type = element_type;
!         get_typlenbyvalalign(element_type,
!                              &astate->typlen,
!                              &astate->typbyval,
!                              &astate->typalign);
      }
      else
      {
--- 4844,4855 ----
          astate->dnulls = (bool *) palloc(astate->alen * sizeof(bool));
          astate->nelems = 0;
          astate->element_type = element_type;
!         get_typlenbyvalaligncomp(element_type,
!                                  &astate->typlen,
!                                  &astate->typbyval,
!                                  &astate->typalign,
!                                  &astate->typiscomposite);
!         INIT_FLATTEN(astate->element_tupdesc);
      }
      else
      {
*************** accumArrayResult(ArrayBuildState *astate
*** 4621,4630 ****
       */
      if (!disnull && !astate->typbyval)
      {
!         if (astate->typlen == -1)
!             dvalue = PointerGetDatum(PG_DETOAST_DATUM_COPY(dvalue));
!         else
!             dvalue = datumCopy(dvalue, astate->typbyval, astate->typlen);
      }

      astate->dvalues[astate->nelems] = dvalue;
--- 4876,4885 ----
       */
      if (!disnull && !astate->typbyval)
      {
!         FLATTEN_ARRAY_ELEMENT_COPY(dvalue,
!                                    astate->typlen, astate->typbyval,
!                                    astate->typiscomposite,
!                                    astate->element_tupdesc);
      }

      astate->dvalues[astate->nelems] = dvalue;
*************** makeMdArrayResult(ArrayBuildState *astat
*** 4691,4697 ****

      MemoryContextSwitchTo(oldcontext);

!     /* Clean up all the junk */
      if (release)
          MemoryContextDelete(astate->mcontext);

--- 4946,4959 ----

      MemoryContextSwitchTo(oldcontext);

!     /*
!      * Clean up all the junk.  We must release any tupdesc refcount held for
!      * composite-flattening purposes even if !release, since we can't be sure
!      * we'll get called again.  flatten_composite_element can re-acquire the
!      * refcount if needed.
!      */
!     END_FLATTEN(astate->element_tupdesc);
!
      if (release)
          MemoryContextDelete(astate->mcontext);

*************** array_fill_internal(ArrayType *dims, Arr
*** 5037,5043 ****

          /* make sure data is not toasted */
          if (elmlen == -1)
!             value = PointerGetDatum(PG_DETOAST_DATUM(value));

          nbytes = att_addlength_datum(0, elmlen, value);
          nbytes = att_align_nominal(nbytes, elmalign);
--- 5299,5315 ----

          /* make sure data is not toasted */
          if (elmlen == -1)
!         {
!             bool        elmiscomposite;
!             TupleDesc    elmtupdesc;
!
!             elmiscomposite = element_type_is_composite(elmtype, elmlen, elmalign);
!             INIT_FLATTEN(elmtupdesc);
!             FLATTEN_ARRAY_ELEMENT(value,
!                                   elmlen, elmbyval,
!                                   elmiscomposite, elmtupdesc);
!             END_FLATTEN(elmtupdesc);
!         }

          nbytes = att_addlength_datum(0, elmlen, value);
          nbytes = att_align_nominal(nbytes, elmalign);
*************** array_replace_internal(ArrayType *array,
*** 5277,5286 ****
       */
      if (typlen == -1)
      {
          if (!search_isnull)
!             search = PointerGetDatum(PG_DETOAST_DATUM(search));
          if (!replace_isnull)
!             replace = PointerGetDatum(PG_DETOAST_DATUM(replace));
      }

      /* Prepare to apply the comparison operator */
--- 5549,5569 ----
       */
      if (typlen == -1)
      {
+         bool        typiscomposite;
+         TupleDesc    typtupdesc;
+
+         typiscomposite = element_type_is_composite(element_type,
+                                                    typlen, typalign);
+         INIT_FLATTEN(typtupdesc);
          if (!search_isnull)
!             FLATTEN_ARRAY_ELEMENT(search,
!                                   typlen, typbyval,
!                                   typiscomposite, typtupdesc);
          if (!replace_isnull)
!             FLATTEN_ARRAY_ELEMENT(replace,
!                                   typlen, typbyval,
!                                   typiscomposite, typtupdesc);
!         END_FLATTEN(typtupdesc);
      }

      /* Prepare to apply the comparison operator */
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index a4ce716..278709d 100644
*** a/src/backend/utils/cache/lsyscache.c
--- b/src/backend/utils/cache/lsyscache.c
*************** get_typlenbyvalalign(Oid typid, int16 *t
*** 1922,1927 ****
--- 1922,1953 ----
  }

  /*
+  * get_typlenbyvalaligncomp
+  *
+  *        A four-fer: given the type OID, return typlen, typbyval, typalign,
+  *        and typiscomposite.
+  */
+ void
+ get_typlenbyvalaligncomp(Oid typid, int16 *typlen, bool *typbyval,
+                          char *typalign, bool *typiscomposite)
+ {
+     HeapTuple    tp;
+     Form_pg_type typtup;
+
+     tp = SearchSysCache1(TYPEOID, ObjectIdGetDatum(typid));
+     if (!HeapTupleIsValid(tp))
+         elog(ERROR, "cache lookup failed for type %u", typid);
+     typtup = (Form_pg_type) GETSTRUCT(tp);
+     *typlen = typtup->typlen;
+     *typbyval = typtup->typbyval;
+     *typalign = typtup->typalign;
+     /* this computation must match type_is_rowtype(): */
+     *typiscomposite = (typid == RECORDOID ||
+                        typtup->typtype == TYPTYPE_COMPOSITE);
+     ReleaseSysCache(tp);
+ }
+
+ /*
   * getTypeIOParam
   *        Given a pg_type row, select the type OID to pass to I/O functions
   *
diff --git a/src/include/access/tuptoaster.h b/src/include/access/tuptoaster.h
index 296d016..5cdefad 100644
*** a/src/include/access/tuptoaster.h
--- b/src/include/access/tuptoaster.h
*************** extern struct varlena *heap_tuple_untoas
*** 184,199 ****
  extern HeapTuple toast_flatten_tuple(HeapTuple tup, TupleDesc tupleDesc);

  /* ----------
   * toast_flatten_tuple_attribute -
   *
   *    If a Datum is of composite type, "flatten" it to contain no toasted fields.
!  *    This must be invoked on any potentially-composite field that is to be
!  *    inserted into a tuple.    Doing this preserves the invariant that toasting
!  *    goes only one level deep in a tuple.
   * ----------
   */
! extern Datum toast_flatten_tuple_attribute(Datum value,
!                               Oid typeId, int32 typeMod);

  /* ----------
   * toast_compress_datum -
--- 184,208 ----
  extern HeapTuple toast_flatten_tuple(HeapTuple tup, TupleDesc tupleDesc);

  /* ----------
+  * toast_flatten_tuple_datum -
+  *
+  *    "Flatten" a composite Datum to contain no toasted fields.
+  *    This must be invoked on any composite value that is to be inserted into
+  *    a tuple, array, range, etc.  Doing this preserves the invariant that
+  *    toasting goes only one level deep in a tuple.
+  * ----------
+  */
+ extern Datum toast_flatten_tuple_datum(Datum value, TupleDesc tupleDesc);
+
+ /* ----------
   * toast_flatten_tuple_attribute -
   *
   *    If a Datum is of composite type, "flatten" it to contain no toasted fields.
!  *    This is a convenience routine for doing toast_flatten_tuple_datum() on
!  *    tuple fields.
   * ----------
   */
! extern Datum toast_flatten_tuple_attribute(Datum value, Form_pg_attribute att);

  /* ----------
   * toast_compress_datum -
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 6c94e8a..7309022 100644
*** a/src/include/nodes/execnodes.h
--- b/src/include/nodes/execnodes.h
*************** typedef struct ArrayRefExprState
*** 624,629 ****
--- 624,631 ----
      int16        refelemlength;    /* typlen of the array element type */
      bool        refelembyval;    /* is the element type pass-by-value? */
      char        refelemalign;    /* typalign of the element type */
+     bool        refiscomposite; /* is the element type composite? */
+     TupleDesc    refelemtupdesc; /* may be set if refiscomposite */
  } ArrayRefExprState;

  /* ----------------
diff --git a/src/include/utils/array.h b/src/include/utils/array.h
index 9bbfaae..63307c1 100644
*** a/src/include/utils/array.h
--- b/src/include/utils/array.h
*************** typedef struct ArrayBuildState
*** 88,93 ****
--- 88,96 ----
      int16        typlen;            /* needed info about datatype */
      bool        typbyval;
      char        typalign;
+     bool        typiscomposite;
+     /* use struct pointer to avoid including tupdesc.h here */
+     struct tupleDesc *element_tupdesc;    /* may be set if typiscomposite */
  } ArrayBuildState;

  /*
*************** extern Datum array_replace(PG_FUNCTION_A
*** 218,223 ****
--- 221,230 ----
  extern Datum array_ref(ArrayType *array, int nSubscripts, int *indx,
            int arraytyplen, int elmlen, bool elmbyval, char elmalign,
            bool *isNull);
+ extern ArrayType *array_set_element(ArrayType *array, int nSubscripts, int *indx,
+                   Datum dataValue, bool isNull,
+                   int arraytyplen, int elmlen, bool elmbyval, char elmalign,
+                   bool elmiscomposite, struct tupleDesc **element_tupdesc);
  extern ArrayType *array_set(ArrayType *array, int nSubscripts, int *indx,
            Datum dataValue, bool isNull,
            int arraytyplen, int elmlen, bool elmbyval, char elmalign);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index f46460a..30fb0a0 100644
*** a/src/include/utils/lsyscache.h
--- b/src/include/utils/lsyscache.h
*************** extern bool get_typbyval(Oid typid);
*** 109,114 ****
--- 109,116 ----
  extern void get_typlenbyval(Oid typid, int16 *typlen, bool *typbyval);
  extern void get_typlenbyvalalign(Oid typid, int16 *typlen, bool *typbyval,
                       char *typalign);
+ extern void get_typlenbyvalaligncomp(Oid typid, int16 *typlen, bool *typbyval,
+                          char *typalign, bool *typiscomposite);
  extern Oid    getTypeIOParam(HeapTuple typeTuple);
  extern void get_type_io_data(Oid typid,
                   IOFuncSelector which_func,
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 3749fac..27d6b65 100644
*** a/src/pl/plpgsql/src/pl_exec.c
--- b/src/pl/plpgsql/src/pl_exec.c
*************** exec_assign_value(PLpgSQL_execstate *est
*** 4246,4251 ****
--- 4246,4252 ----
                  ArrayType  *oldarrayval;
                  ArrayType  *newarrayval;
                  SPITupleTable *save_eval_tuptable;
+                 TupleDesc    elemtyptupdesc = NULL;
                  MemoryContext oldcontext;

                  /*
*************** exec_assign_value(PLpgSQL_execstate *est
*** 4297,4302 ****
--- 4298,4304 ----
                      int16        elemtyplen;
                      bool        elemtypbyval;
                      char        elemtypalign;
+                     bool        elemtypiscomposite;

                      /* If target is domain over array, reduce to base type */
                      arraytypoid = getBaseTypeAndTypmod(parenttypoid,
*************** exec_assign_value(PLpgSQL_execstate *est
*** 4312,4321 ****
                      /* Collect needed data about the types */
                      arraytyplen = get_typlen(arraytypoid);

!                     get_typlenbyvalalign(elemtypoid,
!                                          &elemtyplen,
!                                          &elemtypbyval,
!                                          &elemtypalign);

                      /* Now safe to update the cached data */
                      arrayelem->parenttypoid = parenttypoid;
--- 4314,4324 ----
                      /* Collect needed data about the types */
                      arraytyplen = get_typlen(arraytypoid);

!                     get_typlenbyvalaligncomp(elemtypoid,
!                                              &elemtyplen,
!                                              &elemtypbyval,
!                                              &elemtypalign,
!                                              &elemtypiscomposite);

                      /* Now safe to update the cached data */
                      arrayelem->parenttypoid = parenttypoid;
*************** exec_assign_value(PLpgSQL_execstate *est
*** 4327,4332 ****
--- 4330,4336 ----
                      arrayelem->elemtyplen = elemtyplen;
                      arrayelem->elemtypbyval = elemtypbyval;
                      arrayelem->elemtypalign = elemtypalign;
+                     arrayelem->elemtypiscomposite = elemtypiscomposite;
                  }

                  /*
*************** exec_assign_value(PLpgSQL_execstate *est
*** 4395,4409 ****
                  /*
                   * Build the modified array value.
                   */
!                 newarrayval = array_set(oldarrayval,
!                                         nsubscripts,
!                                         subscriptvals,
!                                         coerced_value,
!                                         *isNull,
!                                         arrayelem->arraytyplen,
!                                         arrayelem->elemtyplen,
!                                         arrayelem->elemtypbyval,
!                                         arrayelem->elemtypalign);

                  MemoryContextSwitchTo(oldcontext);

--- 4399,4423 ----
                  /*
                   * Build the modified array value.
                   */
!                 newarrayval = array_set_element(oldarrayval,
!                                                 nsubscripts,
!                                                 subscriptvals,
!                                                 coerced_value,
!                                                 *isNull,
!                                                 arrayelem->arraytyplen,
!                                                 arrayelem->elemtyplen,
!                                                 arrayelem->elemtypbyval,
!                                                 arrayelem->elemtypalign,
!                                                 arrayelem->elemtypiscomposite,
!                                                 &elemtyptupdesc);
!
!                 /*
!                  * Unfortunately, we have no easy way to cache element type
!                  * tupledescs across calls, because there's no clear place
!                  * to release them again.  Perhaps this can be improved later.
!                  */
!                 if (elemtyptupdesc)
!                     ReleaseTupleDesc(elemtyptupdesc);

                  MemoryContextSwitchTo(oldcontext);

diff --git a/src/pl/plpgsql/src/plpgsql.h b/src/pl/plpgsql/src/plpgsql.h
index b4d1498..fc28e07 100644
*** a/src/pl/plpgsql/src/plpgsql.h
--- b/src/pl/plpgsql/src/plpgsql.h
*************** typedef struct
*** 321,326 ****
--- 321,327 ----
      int16        elemtyplen;        /* typlen of element type */
      bool        elemtypbyval;    /* element type is pass-by-value? */
      char        elemtypalign;    /* typalign of element type */
+     bool        elemtypiscomposite;    /* element type is composite? */
  } PLpgSQL_arrayelem;



Re: Composite Datums containing toasted fields are a bad idea(?)

От
Noah Misch
Дата:
I lack time to give this a solid review, but here's a preliminary reaction:

On Sun, Apr 20, 2014 at 03:38:23PM -0400, Tom Lane wrote:
> On HEAD, this takes about 295-300 msec on my machine in a non-cassert
> build.  With the patch I sent previously, the time goes to 495-500 msec.

> This goes from circa 1250 ms in HEAD to around 1680 ms.
> 
> Some poking around with oprofile says that much of the added time is
> coming from added syscache and typcache lookups, as I suspected.
> 
> I did some further work on the patch to make it possible to cache
> the element tuple descriptor across calls for these two functions.
> With the attached patch, the first test case comes down to about 335 ms
> and the second to about 1475 ms (plpgsql is still leaving some extra
> cache lookups on the table).  More could be done with some further API
> changes, but I think this is about as much as is safe to back-patch.

That particular performance drop seems acceptable.  As I hinted upthread, the
large performance risk arises for test cases that do not visit a toast table
today but will do so after the change.

> At least in the accumArrayResult test case, it would be possible to
> bring the runtime back to very close to HEAD if we were willing to
> abandon the principle that compressed fields within a tuple should
> always get decompressed when the tuple goes into a larger structure.
> That would allow flatten_composite_element to quit early if the
> tuple doesn't have the HEAP_HASEXTERNAL infomask bit set.  I'm not
> sure that this would be a good tradeoff however: if we fail to remove nested
> compression, we could end up paying for that with double compression.
> On the other hand, it's unclear that that case would come up often,
> while the overhead of disassembling the tuple is definitely going to
> happen every time we assign to an array-of-composites element.  (Also,
> there is no question here of regression relative to previous releases,
> since the existing code fails to prevent nested compression.)

I wonder how it would work out to instead delay this new detoast effort until
toast_insert_or_update().  That timing has the major advantage of not adding
any toast table visits beyond those actually needed to avoid improperly
writing an embedded toast pointer to disk.  It would give us the flexibility
to detoast array elements more lazily than we do today, though I don't propose
any immediate change there.  To reduce syscache traffic, make the TupleDesc
record whether any column requires recursive detoast work.  Perhaps also use
the TupleDesc as a place to cache the recursive-detoast syscache lookups for
tables that do need them.  Thoughts?

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: Composite Datums containing toasted fields are a bad idea(?)

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> I wonder how it would work out to instead delay this new detoast effort until
> toast_insert_or_update().

That would require toast_insert_or_update() to know about every container
datatype.  I doubt it could lead to an extensible or maintainable
solution.

I'm actually planning to set this patch on the shelf for a bit and go
investigate the other alternative, ie, not generating composite Datums
containing toast pointers in the first place.  We now know that the idea
that we aren't going to take performance hits *somewhere* is an illusion,
and I still suspect that the other way is going to lead to a smaller and
cleaner patch.  The main performance downside for plpgsql might be
addressable by making sure that plpgsql record variables fall on the "heap
tuple" rather than the "composite Datum" side of the line.  I'm also quite
concerned about correctness: I don't have a lot of confidence that this
patch has closed every loophole with respect to arrays, and it hasn't even
touched ranges or any of the related one-off bugs that I believe exist.
        regards, tom lane



Re: Composite Datums containing toasted fields are a bad idea(?)

От
Andres Freund
Дата:
Hi,

On 2014-04-20 15:38:23 -0400, Tom Lane wrote:
> Some poking around with oprofile says that much of the added time is
> coming from added syscache and typcache lookups, as I suspected.
> 
> I did some further work on the patch to make it possible to cache
> the element tuple descriptor across calls for these two functions.
> With the attached patch, the first test case comes down to about 335 ms
> and the second to about 1475 ms (plpgsql is still leaving some extra
> cache lookups on the table).  More could be done with some further API
> changes, but I think this is about as much as is safe to back-patch.

I unfortunately haven't followed this in detail, but shouldn't it be
relatively easily to make this even cheaper by checking for
HEAP_HASEXTERNAL?  If it's not set we don't need to iterate over the
composite's columns, right?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Composite Datums containing toasted fields are a bad idea(?)

От
Tom Lane
Дата:
Andres Freund <andres@2ndquadrant.com> writes:
> I unfortunately haven't followed this in detail, but shouldn't it be
> relatively easily to make this even cheaper by checking for
> HEAP_HASEXTERNAL?  If it's not set we don't need to iterate over the
> composite's columns, right?

That's the point I made further down: we could do that if we were willing
to abandon the principle that nested fields shouldn't be compressed.
It's not very clear what it'd cost us to give that up.  (Too bad we didn't
define a HEAP_HASCOMPRESSED flag bit ...)
        regards, tom lane



Re: Composite Datums containing toasted fields are a bad idea(?)

От
Andres Freund
Дата:
On 2014-04-21 11:30:57 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > I unfortunately haven't followed this in detail, but shouldn't it be
> > relatively easily to make this even cheaper by checking for
> > HEAP_HASEXTERNAL?  If it's not set we don't need to iterate over the
> > composite's columns, right?
> 
> That's the point I made further down:

Oh, sorry. I started reading this thread from the end just now.

> we could do that if we were willing
> to abandon the principle that nested fields shouldn't be compressed.
> It's not very clear what it'd cost us to give that up.

I don't think the cost of that would be all that high. As you argue,
without that trick the cost of iterating over all columns will be paid
all the time, whereas double compression will take effect much less
often. And might even end up being beneficial.
The risk of significant performance regressions due to backpatching
seems significantly less likely if we pay heed to HASEXTERNAL.

> (Too bad we didn't define a HEAP_HASCOMPRESSED flag bit ...)

And too bad that infomask bits are so scarce :(. We really need to
reclaim HEAP_MOVED_OFF and HEAP_MOVED_IN.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Composite Datums containing toasted fields are a bad idea(?)

От
Merlin Moncure
Дата:
On Mon, Apr 21, 2014 at 10:40 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-04-21 11:30:57 -0400, Tom Lane wrote:
>> Andres Freund <andres@2ndquadrant.com> writes:
>> > I unfortunately haven't followed this in detail, but shouldn't it be
>> > relatively easily to make this even cheaper by checking for
>> > HEAP_HASEXTERNAL?  If it's not set we don't need to iterate over the
>> > composite's columns, right?
>>
>> That's the point I made further down:
>
> Oh, sorry. I started reading this thread from the end just now.
>
>> we could do that if we were willing
>> to abandon the principle that nested fields shouldn't be compressed.
>> It's not very clear what it'd cost us to give that up.
>
> I don't think the cost of that would be all that high.

I think it's pretty reasonable too.

> And too bad that infomask bits are so scarce :(. We really need to
> reclaim HEAP_MOVED_OFF and HEAP_MOVED_IN.

The only consequence of that is losing support for in-place update for
pre-9.0 (of which the only supported version is 8.4).  I figure it's
also pretty reasonable to drop support for IPU for out of support
versions for new versions going forward.  That would recover the bits
and yield some nice cleanups in tqual.c.

8.4 is scheduled to go out of support in July, so if you agree with my
reasoning 9.4 would be a candidate for not honoring 8.4 upgrade
support.

merlin



Re: Composite Datums containing toasted fields are a bad idea(?)

От
Alvaro Herrera
Дата:
Merlin Moncure wrote:
> On Mon, Apr 21, 2014 at 10:40 AM, Andres Freund <andres@2ndquadrant.com> wrote:

> > And too bad that infomask bits are so scarce :(. We really need to
> > reclaim HEAP_MOVED_OFF and HEAP_MOVED_IN.
> 
> The only consequence of that is losing support for in-place update for
> pre-9.0 (of which the only supported version is 8.4).  I figure it's
> also pretty reasonable to drop support for IPU for out of support
> versions for new versions going forward.  That would recover the bits
> and yield some nice cleanups in tqual.c.

There's no way to be sure that the bits have been removed from tables
that were upgraded from a 8.4 server to a 9.0 server and from there to a
newer server.  We'd need some way to catalogue which tables have been
cleaned prior to an upgrade to a release that no longer accepts those
bits; pg_upgrade would then say "table xyz needs to be vacuumed before
the upgrade" in check mode, or something like that.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Composite Datums containing toasted fields are a bad idea(?)

От
Noah Misch
Дата:
On Mon, Apr 21, 2014 at 10:57:34AM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > I wonder how it would work out to instead delay this new detoast effort until
> > toast_insert_or_update().
> 
> That would require toast_insert_or_update() to know about every container
> datatype.  I doubt it could lead to an extensible or maintainable
> solution.

If that's its worst drawback, it's excellent.

> I'm actually planning to set this patch on the shelf for a bit and go
> investigate the other alternative, ie, not generating composite Datums
> containing toast pointers in the first place.  We now know that the idea
> that we aren't going to take performance hits *somewhere* is an illusion,
> and I still suspect that the other way is going to lead to a smaller and
> cleaner patch.  The main performance downside for plpgsql might be
> addressable by making sure that plpgsql record variables fall on the "heap
> tuple" rather than the "composite Datum" side of the line.  I'm also quite
> concerned about correctness: I don't have a lot of confidence that this
> patch has closed every loophole with respect to arrays, and it hasn't even
> touched ranges or any of the related one-off bugs that I believe exist.

I maintain that the potential slowdown is too great to consider adopting that
for the sake of a cleaner patch.  Your last message examined a 67% performance
regression.  The strategy you're outlining now can slow a query by 1,000,000%.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: Composite Datums containing toasted fields are a bad idea(?)

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> On Mon, Apr 21, 2014 at 10:57:34AM -0400, Tom Lane wrote:
>> I'm actually planning to set this patch on the shelf for a bit and go
>> investigate the other alternative, ie, not generating composite Datums
>> containing toast pointers in the first place.

> I maintain that the potential slowdown is too great to consider adopting that
> for the sake of a cleaner patch.  Your last message examined a 67% performance
> regression.  The strategy you're outlining now can slow a query by 1,000,000%.

[ shrug... ]  It could also speed up a query by similar factors.  I see
no good reason to suppose that it would be a net loss overall.  I agree
that it might change performance characteristics in a way that we'd
ideally not do in the back branches.  But the fact remains that we've
got a bad bug to fix, and absent a reasonably trustworthy functional fix,
arguing about performance characteristics is a waste of breath.  I can
make it arbitrarily fast if it's not required to give the right answer.
        regards, tom lane



Re: Composite Datums containing toasted fields are a bad idea(?)

От
Tom Lane
Дата:
I wrote:
> I'm actually planning to set this patch on the shelf for a bit and go
> investigate the other alternative, ie, not generating composite Datums
> containing toast pointers in the first place.

Here's a draft patch along those lines.  It turned out to be best to
leave heap_form_tuple() alone and instead put the dirty work into
HeapTupleGetDatum().  That has some consequences worth discussing:

* The patch changes HeapTupleGetDatum from a simple inline macro into
a function call.  This means that third-party extensions will not get
protection against creation of toast-pointer-containing composite Datums
until they recompile.  I'm not sure that this is a big deal, though.
After looking through the existing core and contrib code, it seems that
nearly all users of HeapTupleGetDatum are building their tuples from
locally-sourced data that could not possibly be toasted anyway.  (This is
true a-priori for users of BuildTupleFromCStrings, for example, and seems
to be true of all uses of HeapTupleGetDatum in SQL-callable functions.)
So it's entirely possible that there would be no live bug anywhere even
without recompiles.

* If we were sufficiently worried about the previous point, we could
get some protection against unfixed extension code by not removing
toast_flatten_tuple_attribute() in the back branches, only in HEAD.
I'm doubtful that it's worth the cycles though.

* Because HeapTupleGetDatum might allocate a new tuple, the wrong thing
might happen if the caller changes CurrentMemoryContext between
heap_form_tuple and HeapTupleGetDatum.  I could only find two places
that did that, though, both in plpgsql.  I thought about having
HeapTupleGetDatum try to identify the context the passed tuple had been
allocated in, but the problem with that is the passed tuple isn't
necessarily heap-allocated at all --- in fact, one of the two problematic
places in plpgsql passes a pointer to a stack-local variable, so we'd
actually crash if we tried to apply GetMemoryChunkContext() to it.
Of course we can (and I did) change plpgsql, but the question is whether
any third-party code has copied either coding pattern.  On balance it
seems best to just use palloc; that's unlikely to cause anything worse
than a memory leak.

* I was quite pleased with the size of the patch: under 100 net new lines,
half of that being a new regression test case.  And it's worth emphasizing
that this is a *complete* fix, modulo the question of third-party code
recompiles.  The patch I showed a few days ago added several times that
much just to fix arrays of composites, and I wasn't too confident that it
fixed every case even for arrays.

* The cases where an "extra" detoast would be incurred are pretty narrow:
basically, whole-row-Var references, ROW() constructs, and the outputs of
functions returning tuples.  As discussed earlier, whether this would cost
anything or save anything would depend on the number of subsequent uses of
the composite Datum's previously-toasted fields.  But I'm thinking that
the amount of application code that would actually be impacted in either
direction is probably pretty small.  Moreover, we aren't adding any
noticeable overhead in cases where no detoasting occurs, unlike the
situation with the previous patch.  (In fact, we're saving some overhead
by getting rid of syscache lookups in toast_flatten_tuple_attribute.)

So, despite Noah's misgivings, I'm thinking this is the way to proceed.

Comments?

            regards, tom lane

diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c
index aea9d40..c64ede9 100644
*** a/src/backend/access/common/heaptuple.c
--- b/src/backend/access/common/heaptuple.c
*************** heap_copytuple_with_tuple(HeapTuple src,
*** 617,622 ****
--- 617,657 ----
      memcpy((char *) dest->t_data, (char *) src->t_data, src->t_len);
  }

+ /* ----------------
+  *        heap_copy_tuple_as_datum
+  *
+  *        copy a tuple as a composite-type Datum
+  * ----------------
+  */
+ Datum
+ heap_copy_tuple_as_datum(HeapTuple tuple, TupleDesc tupleDesc)
+ {
+     HeapTupleHeader td;
+
+     /*
+      * If the tuple contains any external TOAST pointers, we have to inline
+      * those fields to meet the conventions for composite-type Datums.
+      */
+     if (HeapTupleHasExternal(tuple))
+         return toast_flatten_tuple_to_datum(tuple->t_data,
+                                             tuple->t_len,
+                                             tupleDesc);
+
+     /*
+      * Fast path for easy case: just make a palloc'd copy and insert the
+      * correct composite-Datum header fields (since those may not be set if
+      * the given tuple came from disk, rather than from heap_form_tuple).
+      */
+     td = (HeapTupleHeader) palloc(tuple->t_len);
+     memcpy((char *) td, (char *) tuple->t_data, tuple->t_len);
+
+     HeapTupleHeaderSetDatumLength(td, tuple->t_len);
+     HeapTupleHeaderSetTypeId(td, tupleDesc->tdtypeid);
+     HeapTupleHeaderSetTypMod(td, tupleDesc->tdtypmod);
+
+     return PointerGetDatum(td);
+ }
+
  /*
   * heap_form_tuple
   *        construct a tuple from the given values[] and isnull[] arrays,
*************** heap_form_tuple(TupleDesc tupleDescripto
*** 635,641 ****
                  data_len;
      int            hoff;
      bool        hasnull = false;
-     Form_pg_attribute *att = tupleDescriptor->attrs;
      int            numberOfAttributes = tupleDescriptor->natts;
      int            i;

--- 670,675 ----
*************** heap_form_tuple(TupleDesc tupleDescripto
*** 646,673 ****
                          numberOfAttributes, MaxTupleAttributeNumber)));

      /*
!      * Check for nulls and embedded tuples; expand any toasted attributes in
!      * embedded tuples.  This preserves the invariant that toasting can only
!      * go one level deep.
!      *
!      * We can skip calling toast_flatten_tuple_attribute() if the attribute
!      * couldn't possibly be of composite type.  All composite datums are
!      * varlena and have alignment 'd'; furthermore they aren't arrays. Also,
!      * if an attribute is already toasted, it must have been sent to disk
!      * already and so cannot contain toasted attributes.
       */
      for (i = 0; i < numberOfAttributes; i++)
      {
          if (isnull[i])
-             hasnull = true;
-         else if (att[i]->attlen == -1 &&
-                  att[i]->attalign == 'd' &&
-                  att[i]->attndims == 0 &&
-                  !VARATT_IS_EXTENDED(DatumGetPointer(values[i])))
          {
!             values[i] = toast_flatten_tuple_attribute(values[i],
!                                                       att[i]->atttypid,
!                                                       att[i]->atttypmod);
          }
      }

--- 680,693 ----
                          numberOfAttributes, MaxTupleAttributeNumber)));

      /*
!      * Check for nulls
       */
      for (i = 0; i < numberOfAttributes; i++)
      {
          if (isnull[i])
          {
!             hasnull = true;
!             break;
          }
      }

*************** heap_form_tuple(TupleDesc tupleDescripto
*** 697,703 ****

      /*
       * And fill in the information.  Note we fill the Datum fields even though
!      * this tuple may never become a Datum.
       */
      tuple->t_len = len;
      ItemPointerSetInvalid(&(tuple->t_self));
--- 717,724 ----

      /*
       * And fill in the information.  Note we fill the Datum fields even though
!      * this tuple may never become a Datum.  This lets HeapTupleHeaderGetDatum
!      * identify the tuple type if needed.
       */
      tuple->t_len = len;
      ItemPointerSetInvalid(&(tuple->t_self));
*************** heap_form_minimal_tuple(TupleDesc tupleD
*** 1389,1395 ****
                  data_len;
      int            hoff;
      bool        hasnull = false;
-     Form_pg_attribute *att = tupleDescriptor->attrs;
      int            numberOfAttributes = tupleDescriptor->natts;
      int            i;

--- 1410,1415 ----
*************** heap_form_minimal_tuple(TupleDesc tupleD
*** 1400,1427 ****
                          numberOfAttributes, MaxTupleAttributeNumber)));

      /*
!      * Check for nulls and embedded tuples; expand any toasted attributes in
!      * embedded tuples.  This preserves the invariant that toasting can only
!      * go one level deep.
!      *
!      * We can skip calling toast_flatten_tuple_attribute() if the attribute
!      * couldn't possibly be of composite type.  All composite datums are
!      * varlena and have alignment 'd'; furthermore they aren't arrays. Also,
!      * if an attribute is already toasted, it must have been sent to disk
!      * already and so cannot contain toasted attributes.
       */
      for (i = 0; i < numberOfAttributes; i++)
      {
          if (isnull[i])
-             hasnull = true;
-         else if (att[i]->attlen == -1 &&
-                  att[i]->attalign == 'd' &&
-                  att[i]->attndims == 0 &&
-                  !VARATT_IS_EXTENDED(values[i]))
          {
!             values[i] = toast_flatten_tuple_attribute(values[i],
!                                                       att[i]->atttypid,
!                                                       att[i]->atttypmod);
          }
      }

--- 1420,1433 ----
                          numberOfAttributes, MaxTupleAttributeNumber)));

      /*
!      * Check for nulls
       */
      for (i = 0; i < numberOfAttributes; i++)
      {
          if (isnull[i])
          {
!             hasnull = true;
!             break;
          }
      }

diff --git a/src/backend/access/common/indextuple.c b/src/backend/access/common/indextuple.c
index b4c68e9..7da10e9 100644
*** a/src/backend/access/common/indextuple.c
--- b/src/backend/access/common/indextuple.c
*************** index_form_tuple(TupleDesc tupleDescript
*** 158,163 ****
--- 158,168 ----
      if (tupmask & HEAP_HASVARWIDTH)
          infomask |= INDEX_VAR_MASK;

+     /* Also assert we got rid of external attributes */
+ #ifdef TOAST_INDEX_HACK
+     Assert((tupmask & HEAP_HASEXTERNAL) == 0);
+ #endif
+
      /*
       * Here we make sure that the size will fit in the field reserved for it
       * in t_info.
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 9a821d3..dde74d4 100644
*** a/src/backend/access/heap/tuptoaster.c
--- b/src/backend/access/heap/tuptoaster.c
*************** toast_insert_or_update(Relation rel, Hea
*** 991,996 ****
--- 991,999 ----
   *
   *    "Flatten" a tuple to contain no out-of-line toasted fields.
   *    (This does not eliminate compressed or short-header datums.)
+  *
+  *    Note: we expect the caller already checked HeapTupleHasExternal(tup),
+  *    so there is no need for a short-circuit path.
   * ----------
   */
  HeapTuple
*************** toast_flatten_tuple(HeapTuple tup, Tuple
*** 1068,1126 ****


  /* ----------
!  * toast_flatten_tuple_attribute -
   *
!  *    If a Datum is of composite type, "flatten" it to contain no toasted fields.
!  *    This must be invoked on any potentially-composite field that is to be
!  *    inserted into a tuple.    Doing this preserves the invariant that toasting
!  *    goes only one level deep in a tuple.
   *
!  *    Note that flattening does not mean expansion of short-header varlenas,
!  *    so in one sense toasting is allowed within composite datums.
   * ----------
   */
  Datum
! toast_flatten_tuple_attribute(Datum value,
!                               Oid typeId, int32 typeMod)
  {
-     TupleDesc    tupleDesc;
-     HeapTupleHeader olddata;
      HeapTupleHeader new_data;
      int32        new_header_len;
      int32        new_data_len;
      int32        new_tuple_len;
      HeapTupleData tmptup;
!     Form_pg_attribute *att;
!     int            numAttrs;
      int            i;
-     bool        need_change = false;
      bool        has_nulls = false;
      Datum        toast_values[MaxTupleAttributeNumber];
      bool        toast_isnull[MaxTupleAttributeNumber];
      bool        toast_free[MaxTupleAttributeNumber];

-     /*
-      * See if it's a composite type, and get the tupdesc if so.
-      */
-     tupleDesc = lookup_rowtype_tupdesc_noerror(typeId, typeMod, true);
-     if (tupleDesc == NULL)
-         return value;            /* not a composite type */
-
-     att = tupleDesc->attrs;
-     numAttrs = tupleDesc->natts;
-
-     /*
-      * Break down the tuple into fields.
-      */
-     olddata = DatumGetHeapTupleHeader(value);
-     Assert(typeId == HeapTupleHeaderGetTypeId(olddata));
-     Assert(typeMod == HeapTupleHeaderGetTypMod(olddata));
      /* Build a temporary HeapTuple control structure */
!     tmptup.t_len = HeapTupleHeaderGetDatumLength(olddata);
      ItemPointerSetInvalid(&(tmptup.t_self));
      tmptup.t_tableOid = InvalidOid;
!     tmptup.t_data = olddata;

      Assert(numAttrs <= MaxTupleAttributeNumber);
      heap_deform_tuple(&tmptup, tupleDesc, toast_values, toast_isnull);

--- 1071,1131 ----


  /* ----------
!  * toast_flatten_tuple_to_datum -
   *
!  *    "Flatten" a tuple containing out-of-line toasted fields into a Datum.
!  *    The result is always palloc'd in the current memory context.
   *
!  *    We have a general rule that Datums of container types (rows, arrays,
!  *    ranges, etc) must not contain any external TOAST pointers.  Without
!  *    this rule, we'd have to look inside each Datum when preparing a tuple
!  *    for storage, which would be expensive and would fail to extend cleanly
!  *    to new sorts of container types.
!  *
!  *    However, we don't want to say that tuples represented as HeapTuples
!  *    can't contain toasted fields, so instead this routine should be called
!  *    when such a HeapTuple is being converted into a Datum.
!  *
!  *    While we're at it, we decompress any compressed fields too.  This is not
!  *    necessary for correctness, but reflects an expectation that compression
!  *    will be more effective if applied to the whole tuple not individual
!  *    fields.  We are not so concerned about that that we want to deconstruct
!  *    and reconstruct tuples just to get rid of compressed fields, however.
!  *    So callers typically won't call this unless they see that the tuple has
!  *    at least one external field.
!  *
!  *    On the other hand, in-line short-header varlena fields are left alone.
!  *    If we "untoasted" them here, they'd just get changed back to short-header
!  *    format anyway within heap_fill_tuple.
   * ----------
   */
  Datum
! toast_flatten_tuple_to_datum(HeapTupleHeader tup,
!                              uint32 tup_len,
!                              TupleDesc tupleDesc)
  {
      HeapTupleHeader new_data;
      int32        new_header_len;
      int32        new_data_len;
      int32        new_tuple_len;
      HeapTupleData tmptup;
!     Form_pg_attribute *att = tupleDesc->attrs;
!     int            numAttrs = tupleDesc->natts;
      int            i;
      bool        has_nulls = false;
      Datum        toast_values[MaxTupleAttributeNumber];
      bool        toast_isnull[MaxTupleAttributeNumber];
      bool        toast_free[MaxTupleAttributeNumber];

      /* Build a temporary HeapTuple control structure */
!     tmptup.t_len = tup_len;
      ItemPointerSetInvalid(&(tmptup.t_self));
      tmptup.t_tableOid = InvalidOid;
!     tmptup.t_data = tup;

+     /*
+      * Break down the tuple into fields.
+      */
      Assert(numAttrs <= MaxTupleAttributeNumber);
      heap_deform_tuple(&tmptup, tupleDesc, toast_values, toast_isnull);

*************** toast_flatten_tuple_attribute(Datum valu
*** 1144,1164 ****
                  new_value = heap_tuple_untoast_attr(new_value);
                  toast_values[i] = PointerGetDatum(new_value);
                  toast_free[i] = true;
-                 need_change = true;
              }
          }
      }

      /*
-      * If nothing to untoast, just return the original tuple.
-      */
-     if (!need_change)
-     {
-         ReleaseTupleDesc(tupleDesc);
-         return value;
-     }
-
-     /*
       * Calculate the new size of the tuple.
       *
       * This should match the reconstruction code in toast_insert_or_update.
--- 1149,1159 ----
*************** toast_flatten_tuple_attribute(Datum valu
*** 1166,1172 ****
      new_header_len = offsetof(HeapTupleHeaderData, t_bits);
      if (has_nulls)
          new_header_len += BITMAPLEN(numAttrs);
!     if (olddata->t_infomask & HEAP_HASOID)
          new_header_len += sizeof(Oid);
      new_header_len = MAXALIGN(new_header_len);
      new_data_len = heap_compute_data_size(tupleDesc,
--- 1161,1167 ----
      new_header_len = offsetof(HeapTupleHeaderData, t_bits);
      if (has_nulls)
          new_header_len += BITMAPLEN(numAttrs);
!     if (tup->t_infomask & HEAP_HASOID)
          new_header_len += sizeof(Oid);
      new_header_len = MAXALIGN(new_header_len);
      new_data_len = heap_compute_data_size(tupleDesc,
*************** toast_flatten_tuple_attribute(Datum valu
*** 1178,1191 ****
      /*
       * Copy the existing tuple header, but adjust natts and t_hoff.
       */
!     memcpy(new_data, olddata, offsetof(HeapTupleHeaderData, t_bits));
      HeapTupleHeaderSetNatts(new_data, numAttrs);
      new_data->t_hoff = new_header_len;
!     if (olddata->t_infomask & HEAP_HASOID)
!         HeapTupleHeaderSetOid(new_data, HeapTupleHeaderGetOid(olddata));

!     /* Reset the datum length field, too */
      HeapTupleHeaderSetDatumLength(new_data, new_tuple_len);

      /* Copy over the data, and fill the null bitmap if needed */
      heap_fill_tuple(tupleDesc,
--- 1173,1188 ----
      /*
       * Copy the existing tuple header, but adjust natts and t_hoff.
       */
!     memcpy(new_data, tup, offsetof(HeapTupleHeaderData, t_bits));
      HeapTupleHeaderSetNatts(new_data, numAttrs);
      new_data->t_hoff = new_header_len;
!     if (tup->t_infomask & HEAP_HASOID)
!         HeapTupleHeaderSetOid(new_data, HeapTupleHeaderGetOid(tup));

!     /* Set the composite-Datum header fields correctly */
      HeapTupleHeaderSetDatumLength(new_data, new_tuple_len);
+     HeapTupleHeaderSetTypeId(new_data, tupleDesc->tdtypeid);
+     HeapTupleHeaderSetTypMod(new_data, tupleDesc->tdtypmod);

      /* Copy over the data, and fill the null bitmap if needed */
      heap_fill_tuple(tupleDesc,
*************** toast_flatten_tuple_attribute(Datum valu
*** 1202,1208 ****
      for (i = 0; i < numAttrs; i++)
          if (toast_free[i])
              pfree(DatumGetPointer(toast_values[i]));
-     ReleaseTupleDesc(tupleDesc);

      return PointerGetDatum(new_data);
  }
--- 1199,1204 ----
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 0eba025..833c4ed 100644
*** a/src/backend/executor/execQual.c
--- b/src/backend/executor/execQual.c
*************** ExecEvalWholeRowFast(WholeRowVarExprStat
*** 896,903 ****
  {
      Var           *variable = (Var *) wrvstate->xprstate.expr;
      TupleTableSlot *slot;
-     HeapTuple    tuple;
-     TupleDesc    tupleDesc;
      HeapTupleHeader dtuple;

      if (isDone)
--- 896,901 ----
*************** ExecEvalWholeRowFast(WholeRowVarExprStat
*** 927,958 ****
      if (wrvstate->wrv_junkFilter != NULL)
          slot = ExecFilterJunk(wrvstate->wrv_junkFilter, slot);

-     tuple = ExecFetchSlotTuple(slot);
-     tupleDesc = slot->tts_tupleDescriptor;
-
      /*
!      * We have to make a copy of the tuple so we can safely insert the Datum
!      * overhead fields, which are not set in on-disk tuples.
       */
!     dtuple = (HeapTupleHeader) palloc(tuple->t_len);
!     memcpy((char *) dtuple, (char *) tuple->t_data, tuple->t_len);
!
!     HeapTupleHeaderSetDatumLength(dtuple, tuple->t_len);

      /*
!      * If the Var identifies a named composite type, label the tuple with that
!      * type; otherwise use what is in the tupleDesc.
       */
      if (variable->vartype != RECORDOID)
      {
          HeapTupleHeaderSetTypeId(dtuple, variable->vartype);
          HeapTupleHeaderSetTypMod(dtuple, variable->vartypmod);
      }
-     else
-     {
-         HeapTupleHeaderSetTypeId(dtuple, tupleDesc->tdtypeid);
-         HeapTupleHeaderSetTypMod(dtuple, tupleDesc->tdtypmod);
-     }

      return PointerGetDatum(dtuple);
  }
--- 925,944 ----
      if (wrvstate->wrv_junkFilter != NULL)
          slot = ExecFilterJunk(wrvstate->wrv_junkFilter, slot);

      /*
!      * Copy the slot tuple and make sure any toasted fields get detoasted.
       */
!     dtuple = DatumGetHeapTupleHeader(ExecFetchSlotTupleDatum(slot));

      /*
!      * If the Var identifies a named composite type, label the datum with that
!      * type; otherwise we'll use the slot's info.
       */
      if (variable->vartype != RECORDOID)
      {
          HeapTupleHeaderSetTypeId(dtuple, variable->vartype);
          HeapTupleHeaderSetTypMod(dtuple, variable->vartypmod);
      }

      return PointerGetDatum(dtuple);
  }
*************** ExecEvalWholeRowSlow(WholeRowVarExprStat
*** 1029,1041 ****
      }

      /*
!      * We have to make a copy of the tuple so we can safely insert the Datum
!      * overhead fields, which are not set in on-disk tuples.
       */
!     dtuple = (HeapTupleHeader) palloc(tuple->t_len);
!     memcpy((char *) dtuple, (char *) tuple->t_data, tuple->t_len);

!     HeapTupleHeaderSetDatumLength(dtuple, tuple->t_len);
      HeapTupleHeaderSetTypeId(dtuple, variable->vartype);
      HeapTupleHeaderSetTypMod(dtuple, variable->vartypmod);

--- 1015,1027 ----
      }

      /*
!      * Copy the slot tuple and make sure any toasted fields get detoasted.
       */
!     dtuple = DatumGetHeapTupleHeader(ExecFetchSlotTupleDatum(slot));

!     /*
!      * Reset datum's type ID fields to match the Var.
!      */
      HeapTupleHeaderSetTypeId(dtuple, variable->vartype);
      HeapTupleHeaderSetTypMod(dtuple, variable->vartypmod);

diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index af74202..928b5e3 100644
*** a/src/backend/executor/execTuples.c
--- b/src/backend/executor/execTuples.c
***************
*** 89,94 ****
--- 89,95 ----
  #include "postgres.h"

  #include "access/htup_details.h"
+ #include "access/tuptoaster.h"
  #include "funcapi.h"
  #include "catalog/pg_type.h"
  #include "nodes/nodeFuncs.h"
*************** ExecFetchSlotMinimalTuple(TupleTableSlot
*** 700,726 ****
   *        ExecFetchSlotTupleDatum
   *            Fetch the slot's tuple as a composite-type Datum.
   *
!  *        We convert the slot's contents to local physical-tuple form,
!  *        and fill in the Datum header fields.  Note that the result
!  *        always points to storage owned by the slot.
   * --------------------------------
   */
  Datum
  ExecFetchSlotTupleDatum(TupleTableSlot *slot)
  {
      HeapTuple    tup;
-     HeapTupleHeader td;
      TupleDesc    tupdesc;

!     /* Make sure we can scribble on the slot contents ... */
!     tup = ExecMaterializeSlot(slot);
!     /* ... and set up the composite-Datum header fields, in case not done */
!     td = tup->t_data;
      tupdesc = slot->tts_tupleDescriptor;
!     HeapTupleHeaderSetDatumLength(td, tup->t_len);
!     HeapTupleHeaderSetTypeId(td, tupdesc->tdtypeid);
!     HeapTupleHeaderSetTypMod(td, tupdesc->tdtypmod);
!     return PointerGetDatum(td);
  }

  /* --------------------------------
--- 701,721 ----
   *        ExecFetchSlotTupleDatum
   *            Fetch the slot's tuple as a composite-type Datum.
   *
!  *        The result is always freshly palloc'd in the caller's memory context.
   * --------------------------------
   */
  Datum
  ExecFetchSlotTupleDatum(TupleTableSlot *slot)
  {
      HeapTuple    tup;
      TupleDesc    tupdesc;

!     /* Fetch slot's contents in regular-physical-tuple form */
!     tup = ExecFetchSlotTuple(slot);
      tupdesc = slot->tts_tupleDescriptor;
!
!     /* Convert to Datum form */
!     return heap_copy_tuple_as_datum(tup, tupdesc);
  }

  /* --------------------------------
*************** BuildTupleFromCStrings(AttInMetadata *at
*** 1133,1138 ****
--- 1128,1193 ----
  }

  /*
+  * HeapTupleHeaderGetDatum - convert a HeapTupleHeader pointer to a Datum.
+  *
+  * This must *not* get applied to an on-disk tuple; the tuple should be
+  * freshly made by heap_form_tuple or some wrapper routine for it (such as
+  * BuildTupleFromCStrings).  Be sure also that the tupledesc used to build
+  * the tuple has a properly "blessed" rowtype.
+  *
+  * Formerly this was a macro equivalent to PointerGetDatum, relying on the
+  * fact that heap_form_tuple fills in the appropriate tuple header fields
+  * for a composite Datum.  However, we now require that composite Datums not
+  * contain any external TOAST pointers.  We do not want heap_form_tuple itself
+  * to enforce that; more specifically, the rule applies only to actual Datums
+  * and not to HeapTuple structures.  Therefore, HeapTupleHeaderGetDatum is
+  * now a function that detects whether there are externally-toasted fields
+  * and constructs a new tuple with inlined fields if so.  We still need
+  * heap_form_tuple to insert the Datum header fields, because otherwise this
+  * code would have no way to obtain a tupledesc for the tuple.
+  *
+  * Note that if we do build a new tuple, it's palloc'd in the current
+  * memory context.    Beware of code that changes context between the initial
+  * heap_form_tuple/etc call and calling HeapTuple(Header)GetDatum.
+  *
+  * For performance-critical callers, it could be worthwhile to take extra
+  * steps to ensure that there aren't TOAST pointers in the output of
+  * heap_form_tuple to begin with.  It's likely however that the costs of the
+  * typcache lookup and tuple disassembly/reassembly are swamped by TOAST
+  * dereference costs, so that the benefits of such extra effort would be
+  * minimal.
+  *
+  * XXX it would likely be better to create wrapper functions that produce
+  * a composite Datum from the field values in one step.  However, there's
+  * enough code using the existing APIs that we couldn't get rid of this
+  * hack anytime soon.
+  */
+ Datum
+ HeapTupleHeaderGetDatum(HeapTupleHeader tuple)
+ {
+     Datum        result;
+     TupleDesc    tupDesc;
+
+     /* No work if there are no external TOAST pointers in the tuple */
+     if (!HeapTupleHeaderHasExternal(tuple))
+         return PointerGetDatum(tuple);
+
+     /* Use the type data saved by heap_form_tuple to look up the rowtype */
+     tupDesc = lookup_rowtype_tupdesc(HeapTupleHeaderGetTypeId(tuple),
+                                      HeapTupleHeaderGetTypMod(tuple));
+
+     /* And do the flattening */
+     result = toast_flatten_tuple_to_datum(tuple,
+                                         HeapTupleHeaderGetDatumLength(tuple),
+                                           tupDesc);
+
+     ReleaseTupleDesc(tupDesc);
+
+     return result;
+ }
+
+
+ /*
   * Functions for sending tuples to the frontend (or other specified destination)
   * as though it is a SELECT result. These are used by utility commands that
   * need to project directly to the destination and don't need or want full
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index b5af19a..f0a89d2 100644
*** a/src/backend/executor/functions.c
--- b/src/backend/executor/functions.c
*************** postquel_get_single_result(TupleTableSlo
*** 954,960 ****
          /* We must return the whole tuple as a Datum. */
          fcinfo->isnull = false;
          value = ExecFetchSlotTupleDatum(slot);
-         value = datumCopy(value, fcache->typbyval, fcache->typlen);
      }
      else
      {
--- 954,959 ----
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 36dcfcf..e0325c4 100644
*** a/src/backend/executor/spi.c
--- b/src/backend/executor/spi.c
*************** SPI_returntuple(HeapTuple tuple, TupleDe
*** 742,753 ****
          oldcxt = MemoryContextSwitchTo(_SPI_current->savedcxt);
      }

!     dtup = (HeapTupleHeader) palloc(tuple->t_len);
!     memcpy((char *) dtup, (char *) tuple->t_data, tuple->t_len);
!
!     HeapTupleHeaderSetDatumLength(dtup, tuple->t_len);
!     HeapTupleHeaderSetTypeId(dtup, tupdesc->tdtypeid);
!     HeapTupleHeaderSetTypMod(dtup, tupdesc->tdtypmod);

      if (oldcxt)
          MemoryContextSwitchTo(oldcxt);
--- 742,748 ----
          oldcxt = MemoryContextSwitchTo(_SPI_current->savedcxt);
      }

!     dtup = DatumGetHeapTupleHeader(heap_copy_tuple_as_datum(tuple, tupdesc));

      if (oldcxt)
          MemoryContextSwitchTo(oldcxt);
diff --git a/src/backend/utils/adt/rowtypes.c b/src/backend/utils/adt/rowtypes.c
index 438cbb3..521c3da 100644
*** a/src/backend/utils/adt/rowtypes.c
--- b/src/backend/utils/adt/rowtypes.c
***************
*** 19,24 ****
--- 19,25 ----
  #include "access/htup_details.h"
  #include "access/tuptoaster.h"
  #include "catalog/pg_type.h"
+ #include "funcapi.h"
  #include "libpq/pqformat.h"
  #include "utils/builtins.h"
  #include "utils/lsyscache.h"
diff --git a/src/include/access/htup_details.h b/src/include/access/htup_details.h
index a3eba98..039d4b4 100644
*** a/src/include/access/htup_details.h
--- b/src/include/access/htup_details.h
*************** do { \
*** 476,481 ****
--- 476,484 ----
      (tup)->t_infomask2 = ((tup)->t_infomask2 & ~HEAP_NATTS_MASK) | (natts) \
  )

+ #define HeapTupleHeaderHasExternal(tup) \
+         (((tup)->t_infomask & HEAP_HASEXTERNAL) != 0)
+

  /*
   * BITMAPLEN(NATTS) -
*************** extern Datum heap_getsysattr(HeapTuple t
*** 730,735 ****
--- 733,739 ----
                  bool *isnull);
  extern HeapTuple heap_copytuple(HeapTuple tuple);
  extern void heap_copytuple_with_tuple(HeapTuple src, HeapTuple dest);
+ extern Datum heap_copy_tuple_as_datum(HeapTuple tuple, TupleDesc tupleDesc);
  extern HeapTuple heap_form_tuple(TupleDesc tupleDescriptor,
                  Datum *values, bool *isnull);
  extern HeapTuple heap_modify_tuple(HeapTuple tuple,
diff --git a/src/include/access/tuptoaster.h b/src/include/access/tuptoaster.h
index 296d016..e038e1a 100644
*** a/src/include/access/tuptoaster.h
--- b/src/include/access/tuptoaster.h
*************** extern struct varlena *heap_tuple_untoas
*** 184,199 ****
  extern HeapTuple toast_flatten_tuple(HeapTuple tup, TupleDesc tupleDesc);

  /* ----------
!  * toast_flatten_tuple_attribute -
   *
!  *    If a Datum is of composite type, "flatten" it to contain no toasted fields.
!  *    This must be invoked on any potentially-composite field that is to be
!  *    inserted into a tuple.    Doing this preserves the invariant that toasting
!  *    goes only one level deep in a tuple.
   * ----------
   */
! extern Datum toast_flatten_tuple_attribute(Datum value,
!                               Oid typeId, int32 typeMod);

  /* ----------
   * toast_compress_datum -
--- 184,197 ----
  extern HeapTuple toast_flatten_tuple(HeapTuple tup, TupleDesc tupleDesc);

  /* ----------
!  * toast_flatten_tuple_to_datum -
   *
!  *    "Flatten" a tuple containing out-of-line toasted fields into a Datum.
   * ----------
   */
! extern Datum toast_flatten_tuple_to_datum(HeapTupleHeader tup,
!                              uint32 tup_len,
!                              TupleDesc tupleDesc);

  /* ----------
   * toast_compress_datum -
diff --git a/src/include/fmgr.h b/src/include/fmgr.h
index 22539ee..edb97f6 100644
*** a/src/include/fmgr.h
--- b/src/include/fmgr.h
*************** extern struct varlena *pg_detoast_datum_
*** 313,319 ****
  #define PG_RETURN_TEXT_P(x)    PG_RETURN_POINTER(x)
  #define PG_RETURN_BPCHAR_P(x)  PG_RETURN_POINTER(x)
  #define PG_RETURN_VARCHAR_P(x) PG_RETURN_POINTER(x)
! #define PG_RETURN_HEAPTUPLEHEADER(x)  PG_RETURN_POINTER(x)


  /*-------------------------------------------------------------------------
--- 313,319 ----
  #define PG_RETURN_TEXT_P(x)    PG_RETURN_POINTER(x)
  #define PG_RETURN_BPCHAR_P(x)  PG_RETURN_POINTER(x)
  #define PG_RETURN_VARCHAR_P(x) PG_RETURN_POINTER(x)
! #define PG_RETURN_HEAPTUPLEHEADER(x)  return HeapTupleHeaderGetDatum(x)


  /*-------------------------------------------------------------------------
diff --git a/src/include/funcapi.h b/src/include/funcapi.h
index 3610fc8..a3a12f7 100644
*** a/src/include/funcapi.h
--- b/src/include/funcapi.h
*************** extern TupleDesc build_function_result_t
*** 200,205 ****
--- 200,207 ----
   * HeapTuple BuildTupleFromCStrings(AttInMetadata *attinmeta, char **values) -
   *        build a HeapTuple given user data in C string form. values is an array
   *        of C strings, one for each attribute of the return tuple.
+  * Datum HeapTupleHeaderGetDatum(HeapTupleHeader tuple) - convert a
+  *        HeapTupleHeader to a Datum.
   *
   * Macro declarations:
   * HeapTupleGetDatum(HeapTuple tuple) - convert a HeapTuple to a Datum.
*************** extern TupleDesc build_function_result_t
*** 216,224 ****
   *----------
   */

! #define HeapTupleGetDatum(_tuple)        PointerGetDatum((_tuple)->t_data)
  /* obsolete version of above */
! #define TupleGetDatum(_slot, _tuple)    PointerGetDatum((_tuple)->t_data)

  extern TupleDesc RelationNameGetTupleDesc(const char *relname);
  extern TupleDesc TypeGetTupleDesc(Oid typeoid, List *colaliases);
--- 218,226 ----
   *----------
   */

! #define HeapTupleGetDatum(tuple)        HeapTupleHeaderGetDatum((tuple)->t_data)
  /* obsolete version of above */
! #define TupleGetDatum(_slot, _tuple)    HeapTupleGetDatum(_tuple)

  extern TupleDesc RelationNameGetTupleDesc(const char *relname);
  extern TupleDesc TypeGetTupleDesc(Oid typeoid, List *colaliases);
*************** extern TupleDesc TypeGetTupleDesc(Oid ty
*** 227,232 ****
--- 229,235 ----
  extern TupleDesc BlessTupleDesc(TupleDesc tupdesc);
  extern AttInMetadata *TupleDescGetAttInMetadata(TupleDesc tupdesc);
  extern HeapTuple BuildTupleFromCStrings(AttInMetadata *attinmeta, char **values);
+ extern Datum HeapTupleHeaderGetDatum(HeapTupleHeader tuple);
  extern TupleTableSlot *TupleDescGetSlot(TupleDesc tupdesc);


diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 3749fac..5d54399 100644
*** a/src/pl/plpgsql/src/pl_exec.c
--- b/src/pl/plpgsql/src/pl_exec.c
*************** exec_eval_datum(PLpgSQL_execstate *estat
*** 4473,4490 ****
                  tup = make_tuple_from_row(estate, row, row->rowtupdesc);
                  if (tup == NULL)    /* should not happen */
                      elog(ERROR, "row not compatible with its own tupdesc");
-                 MemoryContextSwitchTo(oldcontext);
                  *typeid = row->rowtupdesc->tdtypeid;
                  *typetypmod = row->rowtupdesc->tdtypmod;
                  *value = HeapTupleGetDatum(tup);
                  *isnull = false;
                  break;
              }

          case PLPGSQL_DTYPE_REC:
              {
                  PLpgSQL_rec *rec = (PLpgSQL_rec *) datum;
-                 HeapTupleData worktup;

                  if (!HeapTupleIsValid(rec->tup))
                      ereport(ERROR,
--- 4473,4489 ----
                  tup = make_tuple_from_row(estate, row, row->rowtupdesc);
                  if (tup == NULL)    /* should not happen */
                      elog(ERROR, "row not compatible with its own tupdesc");
                  *typeid = row->rowtupdesc->tdtypeid;
                  *typetypmod = row->rowtupdesc->tdtypmod;
                  *value = HeapTupleGetDatum(tup);
                  *isnull = false;
+                 MemoryContextSwitchTo(oldcontext);
                  break;
              }

          case PLPGSQL_DTYPE_REC:
              {
                  PLpgSQL_rec *rec = (PLpgSQL_rec *) datum;

                  if (!HeapTupleIsValid(rec->tup))
                      ereport(ERROR,
*************** exec_eval_datum(PLpgSQL_execstate *estat
*** 4496,4516 ****
                  /* Make sure we have a valid type/typmod setting */
                  BlessTupleDesc(rec->tupdesc);

-                 /*
-                  * In a trigger, the NEW and OLD parameters are likely to be
-                  * on-disk tuples that don't have the desired Datum fields.
-                  * Copy the tuple body and insert the right values.
-                  */
                  oldcontext = MemoryContextSwitchTo(estate->eval_econtext->ecxt_per_tuple_memory);
-                 heap_copytuple_with_tuple(rec->tup, &worktup);
-                 HeapTupleHeaderSetDatumLength(worktup.t_data, worktup.t_len);
-                 HeapTupleHeaderSetTypeId(worktup.t_data, rec->tupdesc->tdtypeid);
-                 HeapTupleHeaderSetTypMod(worktup.t_data, rec->tupdesc->tdtypmod);
-                 MemoryContextSwitchTo(oldcontext);
                  *typeid = rec->tupdesc->tdtypeid;
                  *typetypmod = rec->tupdesc->tdtypmod;
!                 *value = HeapTupleGetDatum(&worktup);
                  *isnull = false;
                  break;
              }

--- 4495,4506 ----
                  /* Make sure we have a valid type/typmod setting */
                  BlessTupleDesc(rec->tupdesc);

                  oldcontext = MemoryContextSwitchTo(estate->eval_econtext->ecxt_per_tuple_memory);
                  *typeid = rec->tupdesc->tdtypeid;
                  *typetypmod = rec->tupdesc->tdtypmod;
!                 *value = heap_copy_tuple_as_datum(rec->tup, rec->tupdesc);
                  *isnull = false;
+                 MemoryContextSwitchTo(oldcontext);
                  break;
              }

diff --git a/src/test/regress/expected/arrays.out b/src/test/regress/expected/arrays.out
index 6dce5c9..4286691 100644
*** a/src/test/regress/expected/arrays.out
--- b/src/test/regress/expected/arrays.out
*************** select * from t1;
*** 1676,1678 ****
--- 1676,1708 ----
   [5:5]={"(42,43)"}
  (1 row)

+ -- Check that arrays of composites are safely detoasted when needed
+ create temp table src (f1 text);
+ insert into src
+   select string_agg(random()::text,'') from generate_series(1,10000);
+ create type textandtext as (c1 text, c2 text);
+ create temp table dest (f1 textandtext[]);
+ insert into dest select array[row(f1,f1)::textandtext] from src;
+ select length(md5((f1[1]).c2)) from dest;
+  length
+ --------
+      32
+ (1 row)
+
+ delete from src;
+ select length(md5((f1[1]).c2)) from dest;
+  length
+ --------
+      32
+ (1 row)
+
+ truncate table src;
+ drop table src;
+ select length(md5((f1[1]).c2)) from dest;
+  length
+ --------
+      32
+ (1 row)
+
+ drop table dest;
+ drop type textandtext;
diff --git a/src/test/regress/regress.c b/src/test/regress/regress.c
index c25bf6e..4b69067 100644
*** a/src/test/regress/regress.c
--- b/src/test/regress/regress.c
*************** make_tuple_indirect(PG_FUNCTION_ARGS)
*** 810,814 ****

      MemoryContextSwitchTo(old_context);

!     PG_RETURN_HEAPTUPLEHEADER(newtup->t_data);
  }
--- 810,823 ----

      MemoryContextSwitchTo(old_context);

!     /*
!      * We intentionally don't use PG_RETURN_HEAPTUPLEHEADER here, because that
!      * would cause the indirect toast pointers to be flattened out of the
!      * tuple immediately, rendering subsequent testing irrelevant.  So just
!      * return the HeapTupleHeader pointer as-is.  This violates the general
!      * rule that composite Datums shouldn't contain toast pointers, but so
!      * long as the regression test scripts don't insert the result of this
!      * function into a container type (record, array, etc) it should be OK.
!      */
!     PG_RETURN_POINTER(newtup->t_data);
  }
diff --git a/src/test/regress/sql/arrays.sql b/src/test/regress/sql/arrays.sql
index 92af172..d9f7cbf 100644
*** a/src/test/regress/sql/arrays.sql
--- b/src/test/regress/sql/arrays.sql
*************** insert into t1 (f1[5].q1) values(42);
*** 459,461 ****
--- 459,478 ----
  select * from t1;
  update t1 set f1[5].q2 = 43;
  select * from t1;
+
+ -- Check that arrays of composites are safely detoasted when needed
+
+ create temp table src (f1 text);
+ insert into src
+   select string_agg(random()::text,'') from generate_series(1,10000);
+ create type textandtext as (c1 text, c2 text);
+ create temp table dest (f1 textandtext[]);
+ insert into dest select array[row(f1,f1)::textandtext] from src;
+ select length(md5((f1[1]).c2)) from dest;
+ delete from src;
+ select length(md5((f1[1]).c2)) from dest;
+ truncate table src;
+ drop table src;
+ select length(md5((f1[1]).c2)) from dest;
+ drop table dest;
+ drop type textandtext;

Re: Composite Datums containing toasted fields are a bad idea(?)

От
Heikki Linnakangas
Дата:
On 04/25/2014 02:40 AM, Tom Lane wrote:
> * The patch changes HeapTupleGetDatum from a simple inline macro into
> a function call.  This means that third-party extensions will not get
> protection against creation of toast-pointer-containing composite Datums
> until they recompile.

One consequence of that is that an extension compiled with headers from 
new minor version won't work with binaries from an older minor version. 
Packagers beware.

- Heikki



Re: Composite Datums containing toasted fields are a bad idea(?)

От
Tom Lane
Дата:
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> On 04/25/2014 02:40 AM, Tom Lane wrote:
>> * The patch changes HeapTupleGetDatum from a simple inline macro into
>> a function call.  This means that third-party extensions will not get
>> protection against creation of toast-pointer-containing composite Datums
>> until they recompile.

> One consequence of that is that an extension compiled with headers from 
> new minor version won't work with binaries from an older minor version. 
> Packagers beware.

Yeah, that's a possible issue, though I think we've done such things
before.  In any case, alternative approaches to fixing this would likely
also involve extensions needing to call core functions that don't exist
today; though maybe the number of affected extensions would be smaller.
        regards, tom lane



Re: Composite Datums containing toasted fields are a bad idea(?)

От
Andres Freund
Дата:
Hi,

On 2014-04-24 19:40:30 -0400, Tom Lane wrote:
> * Because HeapTupleGetDatum might allocate a new tuple, the wrong thing
> might happen if the caller changes CurrentMemoryContext between
> heap_form_tuple and HeapTupleGetDatum.

It's fscking ugly to allocate memory in a PG_RETURN_... But I don't
really have a better backward compatible idea :(

Greetings,

Andres Freund



Re: Composite Datums containing toasted fields are a bad idea(?)

От
Tom Lane
Дата:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-04-24 19:40:30 -0400, Tom Lane wrote:
>> * Because HeapTupleGetDatum might allocate a new tuple, the wrong thing
>> might happen if the caller changes CurrentMemoryContext between
>> heap_form_tuple and HeapTupleGetDatum.

> It's fscking ugly to allocate memory in a PG_RETURN_... But I don't
> really have a better backward compatible idea :(

It's hardly without precedent; see PG_RETURN_INT64 or PG_RETURN_FLOAT8 on
a 32-bit machine, for starters.  There's never been an assumption that
these macros couldn't do that.
        regards, tom lane



Re: Composite Datums containing toasted fields are a bad idea(?)

От
Andres Freund
Дата:
On 2014-04-25 11:22:09 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2014-04-24 19:40:30 -0400, Tom Lane wrote:
> >> * Because HeapTupleGetDatum might allocate a new tuple, the wrong thing
> >> might happen if the caller changes CurrentMemoryContext between
> >> heap_form_tuple and HeapTupleGetDatum.
> 
> > It's fscking ugly to allocate memory in a PG_RETURN_... But I don't
> > really have a better backward compatible idea :(
> 
> It's hardly without precedent; see PG_RETURN_INT64 or PG_RETURN_FLOAT8 on
> a 32-bit machine, for starters.  There's never been an assumption that
> these macros couldn't do that.

There's a fair bit of difference between allocating 8 bytes and
allocation of nearly unbounded size... But as I said, I don't really
have a better idea.

I agree that the risk from this patch seems more manageable than your
previous approach.

The case I am worried most about is queries like:
SELECT a, b FROM f WHERE f > ROW(38, 'whatever') ORDER BY f;
I've seen such generated by a some query generators for paging. But I
guess that's something we're going to have to accept.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Composite Datums containing toasted fields are a bad idea(?)

От
Tom Lane
Дата:
Andres Freund <andres@2ndquadrant.com> writes:
> The case I am worried most about is queries like:
> SELECT a, b FROM f WHERE f > ROW(38, 'whatever') ORDER BY f;
> I've seen such generated by a some query generators for paging. But I
> guess that's something we're going to have to accept.

Meh ... is it likely that the columns involved in an ordering comparison
would be so wide as to be toasted out-of-line?  Such a query would only be
fast if the row value were indexed, which would pretty much preclude use
of wide columns.

I'm actually more worried about the function-returning-tuple case, as that
might bite people who thought they'd use some cute functional notation or
other and it wouldn't cost 'em anything.
        regards, tom lane



Re: Composite Datums containing toasted fields are a bad idea(?)

От
Andres Freund
Дата:
On 2014-04-25 12:05:17 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > The case I am worried most about is queries like:
> > SELECT a, b FROM f WHERE f > ROW(38, 'whatever') ORDER BY f;
> > I've seen such generated by a some query generators for paging. But I
> > guess that's something we're going to have to accept.
> 
> Meh ... is it likely that the columns involved in an ordering comparison
> would be so wide as to be toasted out-of-line?  Such a query would only be
> fast if the row value were indexed, which would pretty much preclude use
> of wide columns.

In the cases I've seen it it was usually used in addition to a indexable
condition, just for paging across different http requests.

As completely ridiculous example:
before:
postgres=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM pg_rewrite r WHERE r > ('x'::name, '11854'::oid, NULL, NULL, NULL,
NULL);                                              QUERY PLAN                                                
 
----------------------------------------------------------------------------------------------------------Seq Scan on
pg_rewriter  (cost=0.00..12.36 rows=36 width=720) (actual time=0.425..0.425 rows=0 loops=1)  Filter: (r.* >
ROW('x'::name,11854::oid, NULL::unknown, NULL::unknown, NULL::unknown, NULL::unknown))  Rows Removed by Filter: 109
Buffers:shared hit=11Planning time: 0.141 msExecution time: 0.485 ms
 

after:
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM pg_rewrite r WHERE r > ('x'::name, '11854'::oid, NULL, NULL, NULL, NULL);
                                         QUERY PLAN                                                 
 
------------------------------------------------------------------------------------------------------------Seq Scan on
pg_rewriter  (cost=0.00..12.36 rows=36 width=720) (actual time=14.257..14.257 rows=0 loops=1)  Filter: (r.* >
ROW('x'::name,11854::oid, NULL::unknown, NULL::unknown, NULL::unknown, NULL::unknown))  Rows Removed by Filter: 109
Buffers:shared hit=152Planning time: 0.139 msExecution time: 14.310 ms
 
(6 rows)


> I'm actually more worried about the function-returning-tuple case, as that
> might bite people who thought they'd use some cute functional notation or
> other and it wouldn't cost 'em anything.

Right, that's not actually all that infrequent :/.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Composite Datums containing toasted fields are a bad idea(?)

От
Andres Freund
Дата:
On 2014-04-25 18:25:44 +0200, Andres Freund wrote:
> On 2014-04-25 12:05:17 -0400, Tom Lane wrote:
> > Andres Freund <andres@2ndquadrant.com> writes:
> > > The case I am worried most about is queries like:
> > > SELECT a, b FROM f WHERE f > ROW(38, 'whatever') ORDER BY f;
> > > I've seen such generated by a some query generators for paging. But I
> > > guess that's something we're going to have to accept.
> > 
> > Meh ... is it likely that the columns involved in an ordering comparison
> > would be so wide as to be toasted out-of-line?  Such a query would only be
> > fast if the row value were indexed, which would pretty much preclude use
> > of wide columns.
> 
> In the cases I've seen it it was usually used in addition to a indexable
> condition, just for paging across different http requests.
> 
> As completely ridiculous example:

> before:
> postgres=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM pg_rewrite r WHERE r > ('x'::name, '11854'::oid, NULL, NULL,
NULL,NULL);
 
>                                                 QUERY PLAN                                                

Just for some clarity, that also happens with expressions like:
WHERE   ROW(ev_class, rulename, ev_action) >= ROW('pg_rewrite'::regclass, '_RETURN', NULL)
ORDER BY ROW(ev_class, rulename, ev_action);

which is what is generated by such query generators - where the leading
columns *are* indexed but not necessarily unique.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Composite Datums containing toasted fields are a bad idea(?)

От
Tom Lane
Дата:
Andres Freund <andres@2ndquadrant.com> writes:
>> On 2014-04-25 12:05:17 -0400, Tom Lane wrote:
>>> Meh ... is it likely that the columns involved in an ordering comparison
>>> would be so wide as to be toasted out-of-line?  Such a query would only be
>>> fast if the row value were indexed, which would pretty much preclude use
>>> of wide columns.

> Just for some clarity, that also happens with expressions like:
> WHERE
>     ROW(ev_class, rulename, ev_action) >= ROW('pg_rewrite'::regclass, '_RETURN', NULL)
> ORDER BY ROW(ev_class, rulename, ev_action);

> which is what is generated by such query generators - where the leading
> columns *are* indexed but not necessarily unique.

Ah, I see.  Well, we're pretty darn stupid about such queries anyway :-(.
Your first example could be greatly improved by expanding the whole-row
Var into a ROW() construct (so that RowCompareExpr could be used), and
the second one by exploding the ROW() order-by into separate order-by
columns.  Maybe someday we can do that, or persuade the query generators
not to generate such brain-dead SQL in the first place.  But in the
meantime these coding techniques lead to highly suboptimal plans anyway,
with or without TOAST considerations.  It's also worth noting that
it's merest luck that the existing code isn't *slower* about such
queries; if there were any significant number of comparisons of the
toasted columns occurring during the sort step, it could come out far
behind.  So I'm not finding myself terribly concerned here.

Also, I did a bit more research and verified that my patch doesn't cause
any extra detoasting activity for simple set-returning-function cases,
for example:

regression=# create or replace function pgr() returns setof pg_rewrite as
'declare r pg_rewrite;                                     
begin
for r in select * from pg_rewrite loop return next r;
end loop;
end' language plpgsql;
CREATE FUNCTION

regression=# explain (analyze, buffers) select r.* from pgr() r;                                                QUERY
PLAN                    
 
------------------------------------------------------------------------------------------------------------Function
Scanon pgr r  (cost=0.25..10.25 rows=1000 width=135) (actual time=0.881..0.911 rows=177 loops=1)  Buffers: shared
hit=36Planningtime: 0.059 msExecution time: 0.986 ms
 

The same for SQL-language functions, either inlined or not.  It's not so
good if you insist on putting the SRF call in the targetlist:

explain (analyze, buffers) select pgr();                                         QUERY PLAN

 
------------------------------------------------------------------------------------------Result  (cost=0.00..5.25
rows=1000width=0) (actual time=0.941..10.575 rows=177 loops=1)  Buffers: shared hit=179Planning time: 0.029 msExecution
time:10.677 ms
 

On the other hand, in real-world usage (not EXPLAIN), a query like that is
certainly going to be detoasting all the fields anyway to return them to
the client.

On the whole I feel fairly good about the opinion that this change won't
be disastrous for mainstream usages, and will be beneficial for
performance some of the time.  Since I'm not hearing any volunteers to
try to convert the other approach into a complete patch, I plan to push
forward with this one.
        regards, tom lane



Re: Composite Datums containing toasted fields are a bad idea(?)

От
Andres Freund
Дата:
On 2014-04-27 14:18:46 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> >> On 2014-04-25 12:05:17 -0400, Tom Lane wrote:
> >>> Meh ... is it likely that the columns involved in an ordering comparison
> >>> would be so wide as to be toasted out-of-line?  Such a query would only be
> >>> fast if the row value were indexed, which would pretty much preclude use
> >>> of wide columns.
> 
> > Just for some clarity, that also happens with expressions like:
> > WHERE
> >     ROW(ev_class, rulename, ev_action) >= ROW('pg_rewrite'::regclass, '_RETURN', NULL)
> > ORDER BY ROW(ev_class, rulename, ev_action);
> 
> > which is what is generated by such query generators - where the leading
> > columns *are* indexed but not necessarily unique.
> 
> Ah, I see.  Well, we're pretty darn stupid about such queries anyway :-(.
> Your first example could be greatly improved by expanding the whole-row
> Var into a ROW() construct (so that RowCompareExpr could be used), and
> the second one by exploding the ROW() order-by into separate order-by
> columns.

The problem is that - at least to my knowledge - it's not possible to do
the WHERE part as an indexable clause using individual columns.

> On the whole I feel fairly good about the opinion that this change won't
> be disastrous for mainstream usages, and will be beneficial for
> performance some of the time.

I am less convinced of that. But I don't have a better idea. How about
letting it stew in HEAD for a while? It's not like it's affecting all
that many people, given the amount of reports over the last couple
years.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Composite Datums containing toasted fields are a bad idea(?)

От
Tom Lane
Дата:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-04-27 14:18:46 -0400, Tom Lane wrote:
>> Ah, I see.  Well, we're pretty darn stupid about such queries anyway :-(.
>> Your first example could be greatly improved by expanding the whole-row
>> Var into a ROW() construct (so that RowCompareExpr could be used), and
>> the second one by exploding the ROW() order-by into separate order-by
>> columns.

> The problem is that - at least to my knowledge - it's not possible to do
> the WHERE part as an indexable clause using individual columns.

You mean like this?

regression=# EXPLAIN verbose SELECT * FROM pg_rewrite r WHERE r > ('x'::name, '11854'::oid, NULL, NULL, NULL, NULL,
null);                                                     QUERY PLAN
    
 

-------------------------------------------------------------------------------------------------------------------------Seq
Scanon pg_catalog.pg_rewrite r  (cost=0.00..46.21 rows=59 width=983)  Output: rulename, ev_class, ev_type, ev_enabled,
is_instead,ev_qual, ev_action  Filter: (r.* > ROW('x'::name, 11854::oid, NULL::unknown, NULL::unknown, NULL::unknown,
NULL::unknown,NULL::unknown))Planning time: 0.119 ms
 
(4 rows)

regression=# EXPLAIN verbose SELECT * FROM pg_rewrite r WHERE row(rulename, ev_class, ev_type, ev_enabled, is_instead,
ev_qual,ev_action) > ('x'::name, '11854'::oid, NULL, NULL, NULL, NULL, null);
                                                          QUERY PLAN
                                                  
 

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Index
Scanusing pg_rewrite_rel_rulename_index on pg_catalog.pg_rewrite r  (cost=0.15..13.50 rows=1 width=983)  Output:
rulename,ev_class, ev_type, ev_enabled, is_instead, ev_qual, ev_action  Index Cond: (ROW(r.rulename, r.ev_class) >=
ROW('x'::name,11854::oid))  Filter: (ROW(r.rulename, r.ev_class, r.ev_type, r.ev_enabled, r.is_instead,
(r.ev_qual)::text,(r.ev_action)::text) > ROW('x'::name, 11854::oid, NULL::"char", NULL::"char", NULL::boolean,
NULL::text,NULL::text))Planning time: 0.201 ms
 
(5 rows)

The code for extracting prefixes of RowCompareExprs like that has existed
for quite some time.  But it doesn't understand about whole-row variables.

>> On the whole I feel fairly good about the opinion that this change won't
>> be disastrous for mainstream usages, and will be beneficial for
>> performance some of the time.

> I am less convinced of that. But I don't have a better idea. How about
> letting it stew in HEAD for a while? It's not like it's affecting all
> that many people, given the amount of reports over the last couple
> years.

Well, mumble.  It's certainly true that it took a long time for someone to
produce a reproducible test case.  But it's not like we don't regularly
hear reports of corrupted data with "missing chunk number 0 for toast
value ...".  Are you really going to argue that few of those reports can
be blamed on this class of bug?  If so, on what evidence?  Of course
I have no evidence either to claim that this *is* biting people; we don't
know, since it never previously occurred to us to ask complainants if they
were using arrays-of-composites or one of the other risk cases.  But it
seems to me that letting a known data-loss bug go unfixed on the grounds
that the fix might create performance issues for some people is not a
prudent way to proceed.  People expect a database to store their data
reliably, period.
        regards, tom lane



Re: Composite Datums containing toasted fields are a bad idea(?)

От
Andres Freund
Дата:
On 2014-04-27 17:49:53 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2014-04-27 14:18:46 -0400, Tom Lane wrote:
> >> Ah, I see.  Well, we're pretty darn stupid about such queries anyway :-(.
> >> Your first example could be greatly improved by expanding the whole-row
> >> Var into a ROW() construct (so that RowCompareExpr could be used), and
> >> the second one by exploding the ROW() order-by into separate order-by
> >> columns.
> 
> > The problem is that - at least to my knowledge - it's not possible to do
> > the WHERE part as an indexable clause using individual columns.
> 
> You mean like this?
> 
> ..
> The code for extracting prefixes of RowCompareExprs like that has existed
> for quite some time.  But it doesn't understand about whole-row variables.

Yea, but:

> Just for some clarity, that also happens with expressions like:
> WHERE
>     ROW(ev_class, rulename, ev_action) >= ROW('pg_rewrite'::regclass, '_RETURN', NULL)
> ORDER BY ROW(ev_class, rulename, ev_action);

Previously we wouldn't detoast ev_action here, but now we do.

e.g.
set enable_seqscan = false; set enable_bitmapscan = false;
EXPLAIN (ANALYZE, BUFFERS)
SELECT oid
FROM pg_rewrite
WHERE ROW(ev_class, rulename, ev_action) >= ROW('pg_user_mappings'::regclass, '_RETURN',NULL)
ORDER BY ROW(ev_class, rulename, ev_action);

goes from 0.527ms to 9.698ms. And that's damn few rows.

> >> On the whole I feel fairly good about the opinion that this change won't
> >> be disastrous for mainstream usages, and will be beneficial for
> >> performance some of the time.
> 
> > I am less convinced of that. But I don't have a better idea. How about
> > letting it stew in HEAD for a while? It's not like it's affecting all
> > that many people, given the amount of reports over the last couple
> > years.
> 
> Well, mumble.  It's certainly true that it took a long time for someone to
> produce a reproducible test case.  But it's not like we don't regularly
> hear reports of corrupted data with "missing chunk number 0 for toast
> value ...".  Are you really going to argue that few of those reports can
> be blamed on this class of bug?  If so, on what evidence?

No, I am not claiming that.

> Of course
> I have no evidence either to claim that this *is* biting people; we don't
> know, since it never previously occurred to us to ask complainants if they
> were using arrays-of-composites or one of the other risk cases.  But it
> seems to me that letting a known data-loss bug go unfixed on the grounds
> that the fix might create performance issues for some people is not a
> prudent way to proceed.  People expect a database to store their data
> reliably, period.

I agree that this needs to get backpatched at some point. But people
also get very upset if queries gets slower by a magnitude or two after a
minor version upgrade. And this does have the potential to do that, no?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Composite Datums containing toasted fields are a bad idea(?)

От
Tom Lane
Дата:
Andres Freund <andres@2ndquadrant.com> writes:
>> Just for some clarity, that also happens with expressions like:
>> WHERE
>> ROW(ev_class, rulename, ev_action) >= ROW('pg_rewrite'::regclass, '_RETURN', NULL)
>> ORDER BY ROW(ev_class, rulename, ev_action);

> Previously we wouldn't detoast ev_action here, but now we do.

No, we don't; not in that WHERE expression, because it's a RowCompareExpr
which does not involve forming any composite datums.  We'd only detoast if
the row comparison actually gets to that column --- which is the same
behavior as before.

The extra detoast you're seeing in this example is caused by the ROW()
in the ORDER BY.  Which, as I said, is just bad SQL coding.

> I agree that this needs to get backpatched at some point. But people
> also get very upset if queries gets slower by a magnitude or two after a
> minor version upgrade. And this does have the potential to do that, no?

They don't get nearly as upset as they do if the database loses their
data.  Yes, in an ideal world, we could fix this and not suffer any
performance loss anywhere.  But no such option is on the table, and
waiting is not going to make one appear.  What waiting *will* do is
afford more opportunities for this bug to eat people's data.
        regards, tom lane



Re: Composite Datums containing toasted fields are a bad idea(?)

От
Andres Freund
Дата:
On 2014-04-27 18:44:16 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> >> Just for some clarity, that also happens with expressions like:
> >> WHERE
> >> ROW(ev_class, rulename, ev_action) >= ROW('pg_rewrite'::regclass, '_RETURN', NULL)
> >> ORDER BY ROW(ev_class, rulename, ev_action);
> 
> > Previously we wouldn't detoast ev_action here, but now we do.
> ...
> The extra detoast you're seeing in this example is caused by the ROW()
> in the ORDER BY.  Which, as I said, is just bad SQL coding.

Good point.

> > I agree that this needs to get backpatched at some point. But people
> > also get very upset if queries gets slower by a magnitude or two after a
> > minor version upgrade. And this does have the potential to do that, no?
> 
> They don't get nearly as upset as they do if the database loses their
> data.  Yes, in an ideal world, we could fix this and not suffer any
> performance loss anywhere.  But no such option is on the table, and
> waiting is not going to make one appear.  What waiting *will* do is
> afford more opportunities for this bug to eat people's data.

We make tradeoffs between performance and safety *all the time*. A new
performance regression that has the potential of affecting a fair number
of installations very well can be worse than a decade old correctness
bug. A bug that only will hit in some specific usage scenarios and will
only affect individual datums.

And you *have* come up with an alternative approach. It might very well
be inferior, but that certainly doesn't make this approach one without
alternatives.

Anyway, I've now voiced my doubts about immediate backpatching. Any
other opinions?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Composite Datums containing toasted fields are a bad idea(?)

От
Tom Lane
Дата:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-04-27 18:44:16 -0400, Tom Lane wrote:
>> They don't get nearly as upset as they do if the database loses their
>> data.  Yes, in an ideal world, we could fix this and not suffer any
>> performance loss anywhere.  But no such option is on the table, and
>> waiting is not going to make one appear.  What waiting *will* do is
>> afford more opportunities for this bug to eat people's data.

> We make tradeoffs between performance and safety *all the time*.

Really?  I'm not aware of any case where we've left unrecoverable data
corruption go unfixed.  It's true that we've rejected fixing some cases
where plpgsql code might transiently try to use a stale toast pointer ---
but that's a lot different from losing stored data permanently.

> And you *have* come up with an alternative approach.

No, I haven't.  I experimented with a different approach, at Noah's
insistence.  But I did not and will not try to extend it to a complete
solution, first because I'm unsure that a 100% solution can reasonably
be reached that way, and second because that approach *also* entails
performance penalties --- unavoidable ones, unlike this approach where
people can modify their SQL code to avoid fetching unnecessary columns.

If somebody *else* is sufficiently hell-bent on doing it that way that
they will take responsibility for completing and back-patching that
approach, I will stand aside and let them find out the downsides.
Otherwise, I propose to commit and back-patch this version.
        regards, tom lane