Re: Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Дата
Msg-id 23800.1332947161@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)  (Peter Geoghegan <peter@2ndquadrant.com>)
Список pgsql-hackers
Peter Geoghegan <peter@2ndquadrant.com> writes:
> Since you've already removed the intoClause chunk, I'm not sure how
> far underway the review effort is - would you like me to produce a new
> revision, or is that unnecessary?

I've whacked it around to the point that that wouldn't be too helpful
as far as the code goes.  (Just for transparency I'll attach what I've
currently got, which mostly consists of getting rid of the static state
and cleaning up the scanner interface a bit.  I've not yet touched the
jumble-producing code, but I think it needs work too.)  However, if
you've got or can produce the appropriate documentation updates, that
would save me some time.

            regards, tom lane

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 5d3bea09b1b63df0dcfd1b1e0dc1817025190176..b95333e5805e10c4290bb2f0575db30e906a445e 100644
*** a/contrib/pg_stat_statements/pg_stat_statements.c
--- b/contrib/pg_stat_statements/pg_stat_statements.c
***************
*** 3,8 ****
--- 3,31 ----
   * pg_stat_statements.c
   *        Track statement execution times across a whole database cluster.
   *
+  * Execution costs are totalled for each distinct source query, and kept in
+  * a shared hashtable.  (We track only as many distinct queries as will fit
+  * in the designated amount of shared memory.)
+  *
+  * As of Postgres 9.2, this module normalizes query entries.  Normalization
+  * is a process whereby similar queries, typically differing only in their
+  * constants (though the exact rules are somewhat more subtle than that) are
+  * recognized as equivalent, and are tracked as a single entry.  This is
+  * particularly useful for non-prepared queries.
+  *
+  * Normalization is implemented by fingerprinting queries, selectively
+  * serializing those fields of each query tree's nodes that are judged to be
+  * essential to the query.  This is referred to as a query jumble.  This is
+  * distinct from a regular serialization in that various extraneous
+  * information is ignored as irrelevant or not essential to the query, such
+  * as the collation of Vars and, most notably, the values of constants.
+  *
+  * This jumble is acquired at the end of parse analysis of each query, and
+  * a 32-bit hash of it is stored into the query's Query.queryId field.
+  * The server then copies this value around, making it available in plan
+  * tree(s) generated from the query.  The executor can then use this value
+  * to blame query costs on the proper queryId.
+  *
   * Note about locking issues: to create or delete an entry in the shared
   * hashtable, one must hold pgss->lock exclusively.  Modifying any field
   * in an entry except the counters requires the same.  To look up an entry,
***************
*** 27,32 ****
--- 50,58 ----
  #include "funcapi.h"
  #include "mb/pg_wchar.h"
  #include "miscadmin.h"
+ #include "parser/analyze.h"
+ #include "parser/parsetree.h"
+ #include "parser/scanner.h"
  #include "pgstat.h"
  #include "storage/fd.h"
  #include "storage/ipc.h"
*************** PG_MODULE_MAGIC;
*** 41,58 ****
  #define PGSS_DUMP_FILE    "global/pg_stat_statements.stat"

  /* This constant defines the magic number in the stats file header */
! static const uint32 PGSS_FILE_HEADER = 0x20100108;

  /* XXX: Should USAGE_EXEC reflect execution time and/or buffer usage? */
  #define USAGE_EXEC(duration)    (1.0)
  #define USAGE_INIT                (1.0)    /* including initial planning */
  #define USAGE_DECREASE_FACTOR    (0.99)    /* decreased every entry_dealloc */
  #define USAGE_DEALLOC_PERCENT    5        /* free this % of entries at once */

  /*
!  * Hashtable key that defines the identity of a hashtable entry.  The
!  * hash comparators do not assume that the query string is null-terminated;
!  * this lets us search for an mbcliplen'd string without copying it first.
   *
   * Presently, the query encoding is fully determined by the source database
   * and so we don't really need it to be in the key.  But that might not always
--- 67,88 ----
  #define PGSS_DUMP_FILE    "global/pg_stat_statements.stat"

  /* This constant defines the magic number in the stats file header */
! static const uint32 PGSS_FILE_HEADER = 0x20120103;

  /* XXX: Should USAGE_EXEC reflect execution time and/or buffer usage? */
  #define USAGE_EXEC(duration)    (1.0)
  #define USAGE_INIT                (1.0)    /* including initial planning */
+ #define USAGE_NON_EXEC_STICK    (1.0e1)    /* to make new entries sticky */
  #define USAGE_DECREASE_FACTOR    (0.99)    /* decreased every entry_dealloc */
  #define USAGE_DEALLOC_PERCENT    5        /* free this % of entries at once */
+ #define JUMBLE_SIZE                1024    /* query serialization buffer size */
+ /* Magic values for jumble */
+ #define MAG_RETURN_LIST            0xAE    /* returning list node follows */
+ #define MAG_LIMIT_OFFSET        0xBA    /* limit/offset node follows */

  /*
!  * Hashtable key that defines the identity of a hashtable entry.  We separate
!  * queries by user and by database even if they are otherwise identical.
   *
   * Presently, the query encoding is fully determined by the source database
   * and so we don't really need it to be in the key.  But that might not always
*************** typedef struct pgssHashKey
*** 63,70 ****
      Oid            userid;            /* user OID */
      Oid            dbid;            /* database OID */
      int            encoding;        /* query encoding */
!     int            query_len;        /* # of valid bytes in query string */
!     const char *query_ptr;        /* query string proper */
  } pgssHashKey;

  /*
--- 93,99 ----
      Oid            userid;            /* user OID */
      Oid            dbid;            /* database OID */
      int            encoding;        /* query encoding */
!     uint32        queryid;        /* query identifier */
  } pgssHashKey;

  /*
*************** typedef struct pgssEntry
*** 99,104 ****
--- 128,134 ----
  {
      pgssHashKey key;            /* hash key of entry - MUST BE FIRST */
      Counters    counters;        /* the statistics for this query */
+     int            query_len;        /* # of valid bytes in query string */
      slock_t        mutex;            /* protects the counters only */
      char        query[1];        /* VARIABLE LENGTH ARRAY - MUST BE LAST */
      /* Note: the allocated length of query[] is actually pgss->query_size */
*************** typedef struct pgssSharedState
*** 113,118 ****
--- 143,182 ----
      int            query_size;        /* max query length in bytes */
  } pgssSharedState;

+ /*
+  * Struct for tracking locations/lengths of constants during canonicalization
+  */
+ typedef struct pgssLocationLen
+ {
+     int            location;        /* start offset in query text */
+     int            length;            /* length in bytes, or -1 to ignore */
+ } pgssLocationLen;
+
+ /*
+  * Working state for computing a query jumble and producing a canonicalized
+  * query string
+  */
+ typedef struct pgssJumbleState
+ {
+     /* Jumble of current query tree */
+     unsigned char *jumble;
+
+     /* Number of bytes used in jumble[] */
+     Size        jumble_len;
+
+     /* Array of locations of constants that should be removed */
+     pgssLocationLen *clocations;
+
+     /* Allocated length of clocations array */
+     int            clocations_buf_size;
+
+     /* Current number of valid entries in clocations array */
+     int            clocations_count;
+
+     /* Stack of rangetable lists; first entry is for current Query */
+     List       *rangetables;
+ } pgssJumbleState;
+
  /*---- Local variables ----*/

  /* Current nesting depth of ExecutorRun calls */
*************** static int    nested_level = 0;
*** 120,125 ****
--- 184,190 ----

  /* Saved hook values in case of unload */
  static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
+ static post_parse_analyze_hook_type prev_post_parse_analyze_hook = NULL;
  static ExecutorStart_hook_type prev_ExecutorStart = NULL;
  static ExecutorRun_hook_type prev_ExecutorRun = NULL;
  static ExecutorFinish_hook_type prev_ExecutorFinish = NULL;
*************** static int    pgss_max;            /* max # statemen
*** 151,156 ****
--- 216,222 ----
  static int    pgss_track;            /* tracking level */
  static bool pgss_track_utility; /* whether to track utility commands */
  static bool pgss_save;            /* whether to save stats across shutdown */
+ static bool pgss_string_key;    /* whether to always only hash query str */


  #define pgss_enabled() \
*************** PG_FUNCTION_INFO_V1(pg_stat_statements);
*** 170,175 ****
--- 236,242 ----

  static void pgss_shmem_startup(void);
  static void pgss_shmem_shutdown(int code, Datum arg);
+ static void pgss_post_parse_analyze(ParseState *pstate, Query *query);
  static void pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
  static void pgss_ExecutorRun(QueryDesc *queryDesc,
                   ScanDirection direction,
*************** static void pgss_ProcessUtility(Node *pa
*** 181,192 ****
                      DestReceiver *dest, char *completionTag);
  static uint32 pgss_hash_fn(const void *key, Size keysize);
  static int    pgss_match_fn(const void *key1, const void *key2, Size keysize);
! static void pgss_store(const char *query, double total_time, uint64 rows,
!            const BufferUsage *bufusage);
  static Size pgss_memsize(void);
! static pgssEntry *entry_alloc(pgssHashKey *key);
  static void entry_dealloc(void);
  static void entry_reset(void);


  /*
--- 248,276 ----
                      DestReceiver *dest, char *completionTag);
  static uint32 pgss_hash_fn(const void *key, Size keysize);
  static int    pgss_match_fn(const void *key1, const void *key2, Size keysize);
! static uint32 pgss_hash_string(const char *str);
! static void pgss_store(const char *query, uint32 queryId,
!            double total_time, uint64 rows,
!            const BufferUsage *bufusage,
!            pgssJumbleState *jstate);
  static Size pgss_memsize(void);
! static pgssEntry *entry_alloc(pgssHashKey *key, const char *query, int query_len);
  static void entry_dealloc(void);
  static void entry_reset(void);
+ static uint32 JumbleQuery(pgssJumbleState *jstate, Query *query);
+ static void AppendJumble(pgssJumbleState *jstate,
+                          const unsigned char *item, Size size);
+ static void PerformJumble(pgssJumbleState *jstate, const Query *tree);
+ static void QualsNode(pgssJumbleState *jstate, const OpExpr *node);
+ static void LeafNode(pgssJumbleState *jstate, const Node *arg);
+ static void LimitOffsetNode(pgssJumbleState *jstate, const Node *node);
+ static void JoinExprNode(pgssJumbleState *jstate, const JoinExpr *node);
+ static void JoinExprNodeChild(pgssJumbleState *jstate, const Node *node);
+ static void RecordConstLocation(pgssJumbleState *jstate, int location);
+ static char *generate_normalized_query(pgssJumbleState *jstate, const char *query,
+                           int *query_len_p, int encoding);
+ static void fill_in_constant_lengths(pgssJumbleState *jstate, const char *query);
+ static int    comp_location(const void *a, const void *b);


  /*
*************** _PG_init(void)
*** 256,261 ****
--- 340,360 ----
                               NULL,
                               NULL);

+     /*
+      * Support legacy pg_stat_statements behavior, for compatibility with
+      * versions shipped with Postgres 8.4, 9.0 and 9.1
+      */
+     DefineCustomBoolVariable("pg_stat_statements.string_key",
+                         "Differentiate queries based on query string alone.",
+                              NULL,
+                              &pgss_string_key,
+                              false,
+                              PGC_POSTMASTER,
+                              0,
+                              NULL,
+                              NULL,
+                              NULL);
+
      EmitWarningsOnPlaceholders("pg_stat_statements");

      /*
*************** _PG_init(void)
*** 271,276 ****
--- 370,377 ----
       */
      prev_shmem_startup_hook = shmem_startup_hook;
      shmem_startup_hook = pgss_shmem_startup;
+     prev_post_parse_analyze_hook = post_parse_analyze_hook;
+     post_parse_analyze_hook = pgss_post_parse_analyze;
      prev_ExecutorStart = ExecutorStart_hook;
      ExecutorStart_hook = pgss_ExecutorStart;
      prev_ExecutorRun = ExecutorRun_hook;
*************** _PG_fini(void)
*** 291,296 ****
--- 392,398 ----
  {
      /* Uninstall hooks. */
      shmem_startup_hook = prev_shmem_startup_hook;
+     post_parse_analyze_hook = prev_post_parse_analyze_hook;
      ExecutorStart_hook = prev_ExecutorStart;
      ExecutorRun_hook = prev_ExecutorRun;
      ExecutorFinish_hook = prev_ExecutorFinish;
*************** pgss_shmem_startup(void)
*** 400,425 ****
              goto error;

          /* Previous incarnation might have had a larger query_size */
!         if (temp.key.query_len >= buffer_size)
          {
!             buffer = (char *) repalloc(buffer, temp.key.query_len + 1);
!             buffer_size = temp.key.query_len + 1;
          }

!         if (fread(buffer, 1, temp.key.query_len, file) != temp.key.query_len)
              goto error;
!         buffer[temp.key.query_len] = '\0';

          /* Clip to available length if needed */
!         if (temp.key.query_len >= query_size)
!             temp.key.query_len = pg_encoding_mbcliplen(temp.key.encoding,
!                                                        buffer,
!                                                        temp.key.query_len,
!                                                        query_size - 1);
!         temp.key.query_ptr = buffer;

          /* make the hashtable entry (discards old entries if too many) */
!         entry = entry_alloc(&temp.key);

          /* copy in the actual stats */
          entry->counters = temp.counters;
--- 502,530 ----
              goto error;

          /* Previous incarnation might have had a larger query_size */
!         if (temp.query_len >= buffer_size)
          {
!             buffer = (char *) repalloc(buffer, temp.query_len + 1);
!             buffer_size = temp.query_len + 1;
          }

!         if (fread(buffer, 1, temp.query_len, file) != temp.query_len)
              goto error;
!         buffer[temp.query_len] = '\0';
!
!         /* Skip loading "sticky" entries */
!         if (temp.counters.calls == 0)
!             continue;

          /* Clip to available length if needed */
!         if (temp.query_len >= query_size)
!             temp.query_len = pg_encoding_mbcliplen(temp.key.encoding,
!                                                    buffer,
!                                                    temp.query_len,
!                                                    query_size - 1);

          /* make the hashtable entry (discards old entries if too many) */
!         entry = entry_alloc(&temp.key, buffer, temp.query_len);

          /* copy in the actual stats */
          entry->counters = temp.counters;
*************** pgss_shmem_shutdown(int code, Datum arg)
*** 481,487 ****
      hash_seq_init(&hash_seq, pgss_hash);
      while ((entry = hash_seq_search(&hash_seq)) != NULL)
      {
!         int            len = entry->key.query_len;

          if (fwrite(entry, offsetof(pgssEntry, mutex), 1, file) != 1 ||
              fwrite(entry->query, 1, len, file) != len)
--- 586,592 ----
      hash_seq_init(&hash_seq, pgss_hash);
      while ((entry = hash_seq_search(&hash_seq)) != NULL)
      {
!         int            len = entry->query_len;

          if (fwrite(entry, offsetof(pgssEntry, mutex), 1, file) != 1 ||
              fwrite(entry->query, 1, len, file) != len)
*************** error:
*** 507,512 ****
--- 612,670 ----
  }

  /*
+  * Post-parse-analysis hook: mark query with a queryId
+  */
+ static void
+ pgss_post_parse_analyze(ParseState *pstate, Query *query)
+ {
+     pgssJumbleState jstate;
+     BufferUsage bufusage;
+
+     /* Assert we didn't do this already */
+     Assert(query->queryId == 0);
+
+     /* Safety check... */
+     if (!pgss || !pgss_hash)
+         return;
+
+     /* We do nothing with utility statements at this stage */
+     if (query->utilityStmt)
+         return;
+
+     /* Set up workspace for query jumbling */
+     jstate.jumble = (unsigned char *) palloc(JUMBLE_SIZE);
+     jstate.jumble_len = 0;
+     jstate.clocations_buf_size = 32;
+     jstate.clocations = (pgssLocationLen *)
+         palloc(jstate.clocations_buf_size * sizeof(pgssLocationLen));
+     jstate.clocations_count = 0;
+     jstate.rangetables = NIL;
+
+     /* Compute query ID and mark the Query node with it */
+     query->queryId = JumbleQuery(&jstate, query);
+
+     /*
+      * For non-parameterized queries, we immediately create a hash table
+      * entry for the query, so that we can record the canonicalized form
+      * of the query string.  For parameterized queries, it generally is not
+      * worth the trouble to construct a canonicalized string; also, if we
+      * did not identify any suppressable constants, the canonicalized
+      * string would be the same anyway, so no need for an early entry.
+      */
+     if (pstate->p_paramref_hook == NULL && jstate.clocations_count > 0)
+     {
+         memset(&bufusage, 0, sizeof(bufusage));
+
+         pgss_store(pstate->p_sourcetext,
+                    query->queryId,
+                    0,
+                    0,
+                    &bufusage,
+                    &jstate);
+     }
+ }
+
+ /*
   * ExecutorStart hook: start up tracking if needed
   */
  static void
*************** pgss_ExecutorEnd(QueryDesc *queryDesc)
*** 589,594 ****
--- 747,759 ----
  {
      if (queryDesc->totaltime && pgss_enabled())
      {
+         uint32        queryId;
+
+         if (pgss_string_key)
+             queryId = pgss_hash_string(queryDesc->sourceText);
+         else
+             queryId = queryDesc->plannedstmt->queryId;
+
          /*
           * Make sure stats accumulation is done.  (Note: it's okay if several
           * levels of hook all do this.)
*************** pgss_ExecutorEnd(QueryDesc *queryDesc)
*** 596,604 ****
          InstrEndLoop(queryDesc->totaltime);

          pgss_store(queryDesc->sourceText,
                     queryDesc->totaltime->total,
                     queryDesc->estate->es_processed,
!                    &queryDesc->totaltime->bufusage);
      }

      if (prev_ExecutorEnd)
--- 761,771 ----
          InstrEndLoop(queryDesc->totaltime);

          pgss_store(queryDesc->sourceText,
+                    queryId,
                     queryDesc->totaltime->total,
                     queryDesc->estate->es_processed,
!                    &queryDesc->totaltime->bufusage,
!                    NULL);
      }

      if (prev_ExecutorEnd)
*************** pgss_ProcessUtility(Node *parsetree, con
*** 620,626 ****
          instr_time    start;
          instr_time    duration;
          uint64        rows = 0;
!         BufferUsage bufusage_start, bufusage;

          bufusage_start = pgBufferUsage;
          INSTR_TIME_SET_CURRENT(start);
--- 787,795 ----
          instr_time    start;
          instr_time    duration;
          uint64        rows = 0;
!         BufferUsage bufusage_start,
!                     bufusage;
!         uint32        queryId;

          bufusage_start = pgBufferUsage;
          INSTR_TIME_SET_CURRENT(start);
*************** pgss_ProcessUtility(Node *parsetree, con
*** 677,684 ****
          bufusage.time_write = pgBufferUsage.time_write;
          INSTR_TIME_SUBTRACT(bufusage.time_write, bufusage_start.time_write);

!         pgss_store(queryString, INSTR_TIME_GET_DOUBLE(duration), rows,
!                    &bufusage);
      }
      else
      {
--- 846,860 ----
          bufusage.time_write = pgBufferUsage.time_write;
          INSTR_TIME_SUBTRACT(bufusage.time_write, bufusage_start.time_write);

!         /* For utility statements, we just hash the query string directly */
!         queryId = pgss_hash_string(queryString);
!
!         pgss_store(queryString,
!                    queryId,
!                    INSTR_TIME_GET_DOUBLE(duration),
!                    rows,
!                    &bufusage,
!                    NULL);
      }
      else
      {
*************** pgss_hash_fn(const void *key, Size keysi
*** 702,709 ****
      /* we don't bother to include encoding in the hash */
      return hash_uint32((uint32) k->userid) ^
          hash_uint32((uint32) k->dbid) ^
!         DatumGetUInt32(hash_any((const unsigned char *) k->query_ptr,
!                                 k->query_len));
  }

  /*
--- 878,884 ----
      /* we don't bother to include encoding in the hash */
      return hash_uint32((uint32) k->userid) ^
          hash_uint32((uint32) k->dbid) ^
!         hash_uint32((uint32) k->queryid);
  }

  /*
*************** pgss_match_fn(const void *key1, const vo
*** 718,740 ****
      if (k1->userid == k2->userid &&
          k1->dbid == k2->dbid &&
          k1->encoding == k2->encoding &&
!         k1->query_len == k2->query_len &&
!         memcmp(k1->query_ptr, k2->query_ptr, k1->query_len) == 0)
          return 0;
      else
          return 1;
  }

  /*
   * Store some statistics for a statement.
   */
  static void
! pgss_store(const char *query, double total_time, uint64 rows,
!            const BufferUsage *bufusage)
  {
      pgssHashKey key;
      double        usage;
      pgssEntry  *entry;

      Assert(query != NULL);

--- 893,932 ----
      if (k1->userid == k2->userid &&
          k1->dbid == k2->dbid &&
          k1->encoding == k2->encoding &&
!         k1->queryid == k2->queryid)
          return 0;
      else
          return 1;
  }

  /*
+  * Given an arbitrarily long query string, produce a hash for the purposes of
+  * identifying the query, without canonicalizing constants. Used when hashing
+  * utility statements, or for legacy compatibility mode.
+  */
+ static uint32
+ pgss_hash_string(const char *str)
+ {
+     return hash_any((const unsigned char *) str, strlen(str));
+ }
+
+ /*
   * Store some statistics for a statement.
+  *
+  * If jstate is not NULL then we're trying to create an entry for which
+  * we have no statistics as yet; we just want to record the canonicalized
+  * query string while we can.
   */
  static void
! pgss_store(const char *query, uint32 queryId,
!            double total_time, uint64 rows,
!            const BufferUsage *bufusage,
!            pgssJumbleState *jstate)
  {
      pgssHashKey key;
      double        usage;
      pgssEntry  *entry;
+     char       *norm_query = NULL;

      Assert(query != NULL);

*************** pgss_store(const char *query, double tot
*** 746,779 ****
      key.userid = GetUserId();
      key.dbid = MyDatabaseId;
      key.encoding = GetDatabaseEncoding();
!     key.query_len = strlen(query);
!     if (key.query_len >= pgss->query_size)
!         key.query_len = pg_encoding_mbcliplen(key.encoding,
!                                               query,
!                                               key.query_len,
!                                               pgss->query_size - 1);
!     key.query_ptr = query;
!
!     usage = USAGE_EXEC(duration);

      /* Lookup the hash table entry with shared lock. */
      LWLockAcquire(pgss->lock, LW_SHARED);

      entry = (pgssEntry *) hash_search(pgss_hash, &key, HASH_FIND, NULL);
      if (!entry)
      {
!         /* Must acquire exclusive lock to add a new entry. */
          LWLockRelease(pgss->lock);
!         LWLockAcquire(pgss->lock, LW_EXCLUSIVE);
!         entry = entry_alloc(&key);
      }

!     /* Grab the spinlock while updating the counters. */
      {
          volatile pgssEntry *e = (volatile pgssEntry *) entry;

          SpinLockAcquire(&e->mutex);
!         e->counters.calls += 1;
          e->counters.total_time += total_time;
          e->counters.rows += rows;
          e->counters.shared_blks_hit += bufusage->shared_blks_hit;
--- 938,1026 ----
      key.userid = GetUserId();
      key.dbid = MyDatabaseId;
      key.encoding = GetDatabaseEncoding();
!     key.queryid = queryId;

      /* Lookup the hash table entry with shared lock. */
      LWLockAcquire(pgss->lock, LW_SHARED);

      entry = (pgssEntry *) hash_search(pgss_hash, &key, HASH_FIND, NULL);
+
+     /*
+      * When creating an entry just to store the canonicalized string, make it
+      * artificially sticky so that it will probably still be there when
+      * executed.  Strictly speaking, query strings are canonicalized on a best
+      * effort basis, though it would be difficult to demonstrate this even
+      * under artificial conditions.
+      */
+     if (jstate && !entry)
+         usage = USAGE_NON_EXEC_STICK;
+     else
+         usage = USAGE_EXEC(duration);
+
      if (!entry)
      {
!         int            query_len;
!
!         /*
!          * We'll need exclusive lock to make a new entry.  There is no point
!          * in holding shared lock while we canonicalize the string, though.
!          */
          LWLockRelease(pgss->lock);
!
!         query_len = strlen(query);
!
!         if (jstate)
!         {
!             /* Canonicalize the string if enabled */
!             norm_query = generate_normalized_query(jstate, query,
!                                                    &query_len,
!                                                    key.encoding);
!
!             /* Acquire exclusive lock as required by entry_alloc() */
!             LWLockAcquire(pgss->lock, LW_EXCLUSIVE);
!
!             entry = entry_alloc(&key, norm_query, query_len);
!         }
!         else
!         {
!             /*
!              * We're just going to store the query string as-is; but we
!              * have to truncate it if over-length.
!              */
!             if (query_len >= pgss->query_size)
!                 query_len = pg_encoding_mbcliplen(key.encoding,
!                                                   query,
!                                                   query_len,
!                                                   pgss->query_size - 1);
!
!             /* Acquire exclusive lock as required by entry_alloc() */
!             LWLockAcquire(pgss->lock, LW_EXCLUSIVE);
!
!             entry = entry_alloc(&key, query, query_len);
!         }
      }

!     /*
!      * Grab the spinlock while updating the counters (see comment about
!      * locking rules at the head of the file)
!      */
      {
          volatile pgssEntry *e = (volatile pgssEntry *) entry;

          SpinLockAcquire(&e->mutex);
!
!         /*
!          * If we're entering real data, "unstick" entry if it was previously
!          * sticky, and then increment calls.
!          */
!         if (!jstate)
!         {
!             if (e->counters.calls == 0)
!                 e->counters.usage = USAGE_INIT;
!
!             e->counters.calls += 1;
!         }
!
          e->counters.total_time += total_time;
          e->counters.rows += rows;
          e->counters.shared_blks_hit += bufusage->shared_blks_hit;
*************** pgss_store(const char *query, double tot
*** 786,798 ****
          e->counters.local_blks_written += bufusage->local_blks_written;
          e->counters.temp_blks_read += bufusage->temp_blks_read;
          e->counters.temp_blks_written += bufusage->temp_blks_written;
!         e->counters.time_read +=  INSTR_TIME_GET_DOUBLE(bufusage->time_read);
          e->counters.time_write += INSTR_TIME_GET_DOUBLE(bufusage->time_write);
          e->counters.usage += usage;
          SpinLockRelease(&e->mutex);
      }

      LWLockRelease(pgss->lock);
  }

  /*
--- 1033,1050 ----
          e->counters.local_blks_written += bufusage->local_blks_written;
          e->counters.temp_blks_read += bufusage->temp_blks_read;
          e->counters.temp_blks_written += bufusage->temp_blks_written;
!         e->counters.time_read += INSTR_TIME_GET_DOUBLE(bufusage->time_read);
          e->counters.time_write += INSTR_TIME_GET_DOUBLE(bufusage->time_write);
          e->counters.usage += usage;
+
          SpinLockRelease(&e->mutex);
      }

      LWLockRelease(pgss->lock);
+
+     /* We postpone this pfree until we're out of the lock */
+     if (norm_query)
+         pfree(norm_query);
  }

  /*
*************** pg_stat_statements(PG_FUNCTION_ARGS)
*** 883,889 ****

              qstr = (char *)
                  pg_do_encoding_conversion((unsigned char *) entry->query,
!                                           entry->key.query_len,
                                            entry->key.encoding,
                                            GetDatabaseEncoding());
              values[i++] = CStringGetTextDatum(qstr);
--- 1135,1141 ----

              qstr = (char *)
                  pg_do_encoding_conversion((unsigned char *) entry->query,
!                                           entry->query_len,
                                            entry->key.encoding,
                                            GetDatabaseEncoding());
              values[i++] = CStringGetTextDatum(qstr);
*************** pg_stat_statements(PG_FUNCTION_ARGS)
*** 902,907 ****
--- 1154,1163 ----
              SpinLockRelease(&e->mutex);
          }

+         /* Skip entry if unexecuted (ie, it's a pending "sticky" entry) */
+         if (tmp.calls == 0)
+             continue;
+
          values[i++] = Int64GetDatumFast(tmp.calls);
          values[i++] = Float8GetDatumFast(tmp.total_time);
          values[i++] = Int64GetDatumFast(tmp.rows);
*************** pg_stat_statements(PG_FUNCTION_ARGS)
*** 923,930 ****
              values[i++] = Float8GetDatumFast(tmp.time_write);
          }

!         Assert(i == sql_supports_v1_1_counters ? \
!             PG_STAT_STATEMENTS_COLS : PG_STAT_STATEMENTS_COLS_V1_0);

          tuplestore_putvalues(tupstore, tupdesc, values, nulls);
      }
--- 1179,1186 ----
              values[i++] = Float8GetDatumFast(tmp.time_write);
          }

!         Assert(i == sql_supports_v1_1_counters ?
!                PG_STAT_STATEMENTS_COLS : PG_STAT_STATEMENTS_COLS_V1_0);

          tuplestore_putvalues(tupstore, tupdesc, values, nulls);
      }
*************** pgss_memsize(void)
*** 957,976 ****
   * Allocate a new hashtable entry.
   * caller must hold an exclusive lock on pgss->lock
   *
   * Note: despite needing exclusive lock, it's not an error for the target
   * entry to already exist.    This is because pgss_store releases and
   * reacquires lock after failing to find a match; so someone else could
   * have made the entry while we waited to get exclusive lock.
   */
  static pgssEntry *
! entry_alloc(pgssHashKey *key)
  {
      pgssEntry  *entry;
      bool        found;

-     /* Caller must have clipped query properly */
-     Assert(key->query_len < pgss->query_size);
-
      /* Make space if needed */
      while (hash_get_num_entries(pgss_hash) >= pgss_max)
          entry_dealloc();
--- 1213,1231 ----
   * Allocate a new hashtable entry.
   * caller must hold an exclusive lock on pgss->lock
   *
+  * "query" need not be null-terminated; we rely on query_len instead
+  *
   * Note: despite needing exclusive lock, it's not an error for the target
   * entry to already exist.    This is because pgss_store releases and
   * reacquires lock after failing to find a match; so someone else could
   * have made the entry while we waited to get exclusive lock.
   */
  static pgssEntry *
! entry_alloc(pgssHashKey *key, const char *query, int query_len)
  {
      pgssEntry  *entry;
      bool        found;

      /* Make space if needed */
      while (hash_get_num_entries(pgss_hash) >= pgss_max)
          entry_dealloc();
*************** entry_alloc(pgssHashKey *key)
*** 982,997 ****
      {
          /* New entry, initialize it */

-         /* dynahash tried to copy the key for us, but must fix query_ptr */
-         entry->key.query_ptr = entry->query;
          /* reset the statistics */
          memset(&entry->counters, 0, sizeof(Counters));
          entry->counters.usage = USAGE_INIT;
          /* re-initialize the mutex each time ... we assume no one using it */
          SpinLockInit(&entry->mutex);
          /* ... and don't forget the query text */
!         memcpy(entry->query, key->query_ptr, key->query_len);
!         entry->query[key->query_len] = '\0';
      }

      return entry;
--- 1237,1252 ----
      {
          /* New entry, initialize it */

          /* reset the statistics */
          memset(&entry->counters, 0, sizeof(Counters));
          entry->counters.usage = USAGE_INIT;
          /* re-initialize the mutex each time ... we assume no one using it */
          SpinLockInit(&entry->mutex);
          /* ... and don't forget the query text */
!         Assert(query_len >= 0 && query_len < pgss->query_size);
!         entry->query_len = query_len;
!         memcpy(entry->query, query, query_len);
!         entry->query[query_len] = '\0';
      }

      return entry;
*************** entry_alloc(pgssHashKey *key)
*** 1003,1010 ****
  static int
  entry_cmp(const void *lhs, const void *rhs)
  {
!     double        l_usage = (*(pgssEntry * const *) lhs)->counters.usage;
!     double        r_usage = (*(pgssEntry * const *) rhs)->counters.usage;

      if (l_usage < r_usage)
          return -1;
--- 1258,1265 ----
  static int
  entry_cmp(const void *lhs, const void *rhs)
  {
!     double        l_usage = (*(pgssEntry *const *) lhs)->counters.usage;
!     double        r_usage = (*(pgssEntry *const *) rhs)->counters.usage;

      if (l_usage < r_usage)
          return -1;
*************** entry_reset(void)
*** 1070,1072 ****
--- 1325,2416 ----

      LWLockRelease(pgss->lock);
  }
+
+ /*
+  * JumbleQuery: Selectively serialize query tree, and return a hash
+  * representing that serialization - its queryId.
+  *
+  * Note that this doesn't necessarily uniquely identify the query across
+  * different databases and encodings.
+  */
+ static uint32
+ JumbleQuery(pgssJumbleState *jstate, Query *query)
+ {
+     PerformJumble(jstate, query);
+     return hash_any(jstate->jumble, jstate->jumble_len);
+ }
+
+ /*
+  * AppendJumble: Append a value that is substantive in a given query to
+  * the current jumble.
+  */
+ static void
+ AppendJumble(pgssJumbleState *jstate, const unsigned char *item, Size size)
+ {
+     unsigned char *jumble = jstate->jumble;
+     Size        jumble_len = jstate->jumble_len;
+
+     /*
+      * Whenever the jumble buffer is full, we hash the current contents and
+      * reset the buffer to contain just that hash value, thus relying on the
+      * hash to summarize everything so far.
+      */
+     while (size > 0)
+     {
+         Size    part_size;
+
+         if (jumble_len >= JUMBLE_SIZE)
+         {
+             uint32    start_hash = hash_any(jumble, JUMBLE_SIZE);
+
+             memcpy(jumble, &start_hash, sizeof(start_hash));
+             jumble_len = sizeof(start_hash);
+         }
+         part_size = Min(size, JUMBLE_SIZE - jumble_len);
+         memcpy(jumble + jumble_len, item, part_size);
+         jumble_len += part_size;
+         item += part_size;
+         size -= part_size;
+     }
+     jstate->jumble_len = jumble_len;
+ }
+
+ /*
+  * Wrapper around AppendJumble to encapsulate details of serialization
+  * of individual local variable elements.
+  */
+ #define APP_JUMB(item) \
+     AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
+
+ /*
+  * PerformJumble: Selectively serialize the query tree and canonicalize
+  * constants (i.e.    don't consider their actual value - just their type).
+  */
+ static void
+ PerformJumble(pgssJumbleState *jstate, const Query *tree)
+ {
+     /* table join tree (FROM and WHERE clauses) */
+     FromExpr   *jt = (FromExpr *) tree->jointree;
+     /* # of result tuples to skip (int8 expr) */
+     FuncExpr   *off = (FuncExpr *) tree->limitOffset;
+     /* # of result tuples to skip (int8 expr) */
+     FuncExpr   *limcount = (FuncExpr *) tree->limitCount;
+     ListCell   *l;
+
+     /* Push this query level's rtable onto the stack */
+     jstate->rangetables = lcons(tree->rtable, jstate->rangetables);
+
+     APP_JUMB(tree->resultRelation);
+
+     /* WITH list (of CommonTableExpr's) */
+     foreach(l, tree->cteList)
+     {
+         CommonTableExpr *cte = (CommonTableExpr *) lfirst(l);
+         Query       *cteq = (Query *) cte->ctequery;
+
+         if (cteq)
+             PerformJumble(jstate, cteq);
+     }
+     if (jt)
+     {
+         if (jt->quals)
+         {
+             if (IsA(jt->quals, OpExpr))
+             {
+                 QualsNode(jstate, (OpExpr *) jt->quals);
+             }
+             else
+             {
+                 LeafNode(jstate, (Node *) jt->quals);
+             }
+         }
+         /* table join tree */
+         foreach(l, jt->fromlist)
+         {
+             Node       *fr = lfirst(l);
+
+             if (IsA(fr, JoinExpr))
+             {
+                 JoinExprNode(jstate, (JoinExpr *) fr);
+             }
+             else if (IsA(fr, RangeTblRef))
+             {
+                 RangeTblRef *rtf = (RangeTblRef *) fr;
+                 RangeTblEntry *rte = rt_fetch(rtf->rtindex, tree->rtable);
+
+                 APP_JUMB(rte->relid);
+                 APP_JUMB(rte->rtekind);
+                 /* Subselection in where clause */
+                 if (rte->subquery)
+                     PerformJumble(jstate, rte->subquery);
+
+                 /* Function call in where clause */
+                 if (rte->funcexpr)
+                     LeafNode(jstate, (Node *) rte->funcexpr);
+             }
+             else
+             {
+                 ereport(WARNING,
+                         (errcode(ERRCODE_INTERNAL_ERROR),
+                     errmsg("unexpected, unrecognised fromlist node type: %d",
+                            (int) nodeTag(fr))));
+             }
+         }
+     }
+
+     /*
+      * target list (of TargetEntry) columns returned by query
+      */
+     foreach(l, tree->targetList)
+     {
+         TargetEntry *tg = (TargetEntry *) lfirst(l);
+         Node       *e = (Node *) tg->expr;
+
+         if (tg->ressortgroupref)
+             /* nonzero if referenced by a sort/group - for ORDER BY */
+             APP_JUMB(tg->ressortgroupref);
+         APP_JUMB(tg->resno);    /* column number for select */
+
+         /*
+          * Handle the various types of nodes in the select list of this query
+          */
+         LeafNode(jstate, e);
+     }
+     /* return-values list (of TargetEntry) */
+     foreach(l, tree->returningList)
+     {
+         TargetEntry *rt = (TargetEntry *) lfirst(l);
+         Expr       *e = (Expr *) rt->expr;
+         unsigned char magic = MAG_RETURN_LIST;
+
+         APP_JUMB(magic);
+
+         /*
+          * Handle the various types of nodes in the select list of this query
+          */
+         LeafNode(jstate, (Node *) e);
+     }
+     /* a list of SortGroupClause's */
+     foreach(l, tree->groupClause)
+     {
+         SortGroupClause *gc = (SortGroupClause *) lfirst(l);
+
+         APP_JUMB(gc->tleSortGroupRef);
+         APP_JUMB(gc->nulls_first);
+     }
+
+     if (tree->havingQual)
+     {
+         if (IsA(tree->havingQual, OpExpr))
+         {
+             OpExpr       *na = (OpExpr *) tree->havingQual;
+
+             QualsNode(jstate, na);
+         }
+         else
+         {
+             Node       *n = (Node *) tree->havingQual;
+
+             LeafNode(jstate, n);
+         }
+     }
+
+     foreach(l, tree->windowClause)
+     {
+         WindowClause *wc = (WindowClause *) lfirst(l);
+         ListCell   *il;
+
+         APP_JUMB(wc->frameOptions);
+         foreach(il, wc->partitionClause)        /* PARTITION BY list */
+         {
+             Node       *n = (Node *) lfirst(il);
+
+             LeafNode(jstate, n);
+         }
+         foreach(il, wc->orderClause)    /* ORDER BY list */
+         {
+             Node       *n = (Node *) lfirst(il);
+
+             LeafNode(jstate, n);
+         }
+     }
+
+     foreach(l, tree->distinctClause)
+     {
+         SortGroupClause *dc = (SortGroupClause *) lfirst(l);
+
+         APP_JUMB(dc->tleSortGroupRef);
+         APP_JUMB(dc->nulls_first);
+     }
+
+     /*
+      * Don't look at tree->sortClause, because the value ressortgroupref is
+      * already serialized when we iterate through targetList
+      */
+
+     if (off)
+         LimitOffsetNode(jstate, (Node *) off);
+
+     if (limcount)
+         LimitOffsetNode(jstate, (Node *) limcount);
+
+     if (tree->setOperations)
+     {
+         /*
+          * set-operation tree if this is top level of a UNION/INTERSECT/EXCEPT
+          * query
+          */
+         SetOperationStmt *topop = (SetOperationStmt *) tree->setOperations;
+
+         APP_JUMB(topop->op);
+         APP_JUMB(topop->all);
+
+         /* leaf selects are RTE subselections */
+         foreach(l, tree->rtable)
+         {
+             RangeTblEntry *r = (RangeTblEntry *) lfirst(l);
+
+             if (r->subquery)
+                 PerformJumble(jstate, r->subquery);
+         }
+     }
+
+     /* Pop the rangetable stack */
+     jstate->rangetables = list_delete_first(jstate->rangetables);
+ }
+
+ /*
+  * Perform selective serialization of "Quals" nodes when
+  * they're IsA(*, OpExpr)
+  */
+ static void
+ QualsNode(pgssJumbleState *jstate, const OpExpr *node)
+ {
+     ListCell   *l;
+
+     APP_JUMB(node->xpr);
+     APP_JUMB(node->opno);
+     foreach(l, node->args)
+     {
+         Node       *arg = (Node *) lfirst(l);
+
+         LeafNode(jstate, arg);
+     }
+ }
+
+ /*
+  * LeafNode: Selectively serialize a selection of parser/prim nodes that are
+  * frequently, though certainly not necesssarily leaf nodes, such as Vars
+  * (columns), constants and function calls
+  */
+ static void
+ LeafNode(pgssJumbleState *jstate, const Node *arg)
+ {
+     ListCell   *l;
+
+     /* Use the node's NodeTag as a magic number */
+     APP_JUMB(arg->type);
+
+     if (IsA(arg, Const))
+     {
+         Const       *c = (Const *) arg;
+
+         /*
+          * Datatype of the constant is a differentiator
+          */
+         APP_JUMB(c->consttype);
+         RecordConstLocation(jstate, c->location);
+     }
+     else if (IsA(arg, CoerceToDomain))
+     {
+         CoerceToDomain *cd = (CoerceToDomain *) arg;
+
+         /*
+          * Datatype of the constant is a differentiator
+          */
+         APP_JUMB(cd->resulttype);
+         LeafNode(jstate, (Node *) cd->arg);
+     }
+     else if (IsA(arg, Var))
+     {
+         Var           *v = (Var *) arg;
+         List       *rtable;
+         RangeTblEntry *rte;
+         ListCell   *lc;
+
+         rtable = (List *) list_nth(jstate->rangetables, v->varlevelsup);
+         rte = rt_fetch(v->varno, rtable);
+
+         APP_JUMB(rte->relid);
+
+         foreach(lc, rte->values_lists)
+         {
+             List       *sublist = (List *) lfirst(lc);
+             ListCell   *lc2;
+
+             foreach(lc2, sublist)
+             {
+                 Node       *col = (Node *) lfirst(lc2);
+
+                 LeafNode(jstate, col);
+             }
+         }
+         APP_JUMB(v->varattno);
+     }
+     else if (IsA(arg, CurrentOfExpr))
+     {
+         CurrentOfExpr *CoE = (CurrentOfExpr *) arg;
+
+         APP_JUMB(CoE->cvarno);
+         APP_JUMB(CoE->cursor_param);
+     }
+     else if (IsA(arg, CollateExpr))
+     {
+         CollateExpr *Ce = (CollateExpr *) arg;
+
+         APP_JUMB(Ce->collOid);
+     }
+     else if (IsA(arg, FieldSelect))
+     {
+         FieldSelect *Fs = (FieldSelect *) arg;
+
+         APP_JUMB(Fs->resulttype);
+         LeafNode(jstate, (Node *) Fs->arg);
+     }
+     else if (IsA(arg, NamedArgExpr))
+     {
+         NamedArgExpr *Nae = (NamedArgExpr *) arg;
+
+         APP_JUMB(Nae->argnumber);
+         LeafNode(jstate, (Node *) Nae->arg);
+     }
+     else if (IsA(arg, Param))
+     {
+         Param       *p = ((Param *) arg);
+
+         APP_JUMB(p->paramkind);
+         APP_JUMB(p->paramid);
+     }
+     else if (IsA(arg, RelabelType))
+     {
+         RelabelType *rt = (RelabelType *) arg;
+
+         APP_JUMB(rt->resulttype);
+         LeafNode(jstate, (Node *) rt->arg);
+     }
+     else if (IsA(arg, WindowFunc))
+     {
+         WindowFunc *wf = (WindowFunc *) arg;
+
+         APP_JUMB(wf->winfnoid);
+         foreach(l, wf->args)
+         {
+             Node       *arg = (Node *) lfirst(l);
+
+             LeafNode(jstate, arg);
+         }
+     }
+     else if (IsA(arg, FuncExpr))
+     {
+         FuncExpr   *f = (FuncExpr *) arg;
+
+         APP_JUMB(f->funcid);
+         foreach(l, f->args)
+         {
+             Node       *arg = (Node *) lfirst(l);
+
+             LeafNode(jstate, arg);
+         }
+     }
+     else if (IsA(arg, OpExpr) ||IsA(arg, DistinctExpr))
+     {
+         QualsNode(jstate, (OpExpr *) arg);
+     }
+     else if (IsA(arg, CoerceViaIO))
+     {
+         CoerceViaIO *Cio = (CoerceViaIO *) arg;
+
+         APP_JUMB(Cio->coerceformat);
+         APP_JUMB(Cio->resulttype);
+         LeafNode(jstate, (Node *) Cio->arg);
+     }
+     else if (IsA(arg, Aggref))
+     {
+         Aggref       *a = (Aggref *) arg;
+
+         APP_JUMB(a->aggfnoid);
+         foreach(l, a->args)
+         {
+             Node       *arg = (Node *) lfirst(l);
+
+             LeafNode(jstate, arg);
+         }
+     }
+     else if (IsA(arg, SubLink))
+     {
+         SubLink    *s = (SubLink *) arg;
+
+         APP_JUMB(s->subLinkType);
+         /* Serialize select-list subselect recursively */
+         if (s->subselect)
+             PerformJumble(jstate, (Query *) s->subselect);
+
+         if (s->testexpr)
+             LeafNode(jstate, (Node *) s->testexpr);
+         foreach(l, s->operName)
+         {
+             Node       *arg = (Node *) lfirst(l);
+
+             LeafNode(jstate, arg);
+         }
+     }
+     else if (IsA(arg, TargetEntry))
+     {
+         TargetEntry *rt = (TargetEntry *) arg;
+         Node       *e = (Node *) rt->expr;
+
+         APP_JUMB(rt->resorigtbl);
+         APP_JUMB(rt->ressortgroupref);
+         LeafNode(jstate, e);
+     }
+     else if (IsA(arg, BoolExpr))
+     {
+         BoolExpr   *be = (BoolExpr *) arg;
+
+         APP_JUMB(be->boolop);
+         foreach(l, be->args)
+         {
+             Node       *arg = (Node *) lfirst(l);
+
+             LeafNode(jstate, arg);
+         }
+     }
+     else if (IsA(arg, NullTest))
+     {
+         NullTest   *nt = (NullTest *) arg;
+         Node       *arg = (Node *) nt->arg;
+
+         APP_JUMB(nt->nulltesttype);        /* IS NULL, IS NOT NULL */
+         APP_JUMB(nt->argisrow); /* is input a composite type ? */
+         LeafNode(jstate, arg);
+     }
+     else if (IsA(arg, ArrayExpr))
+     {
+         ArrayExpr  *ae = (ArrayExpr *) arg;
+
+         APP_JUMB(ae->array_typeid);        /* type of expression result */
+         APP_JUMB(ae->element_typeid);    /* common type of array elements */
+         foreach(l, ae->elements)
+         {
+             Node       *arg = (Node *) lfirst(l);
+
+             LeafNode(jstate, arg);
+         }
+     }
+     else if (IsA(arg, CaseExpr))
+     {
+         CaseExpr   *ce = (CaseExpr *) arg;
+
+         Assert(ce->casetype != InvalidOid);
+         APP_JUMB(ce->casetype);
+         foreach(l, ce->args)
+         {
+             Node       *arg = (Node *) lfirst(l);
+
+             LeafNode(jstate, arg);
+         }
+         if (ce->arg)
+             LeafNode(jstate, (Node *) ce->arg);
+
+         if (ce->defresult)
+         {
+             /*
+              * Default result (ELSE clause).
+              *
+              * May be NULL, because no else clause was actually specified, and
+              * thus the value is equivalent to SQL ELSE NULL
+              */
+             LeafNode(jstate, (Node *) ce->defresult);
+         }
+     }
+     else if (IsA(arg, CaseTestExpr))
+     {
+         CaseTestExpr *ct = (CaseTestExpr *) arg;
+
+         APP_JUMB(ct->typeId);
+     }
+     else if (IsA(arg, CaseWhen))
+     {
+         CaseWhen   *cw = (CaseWhen *) arg;
+         Node       *res = (Node *) cw->result;
+         Node       *exp = (Node *) cw->expr;
+
+         if (res)
+             LeafNode(jstate, res);
+         if (exp)
+             LeafNode(jstate, exp);
+     }
+     else if (IsA(arg, MinMaxExpr))
+     {
+         MinMaxExpr *cw = (MinMaxExpr *) arg;
+
+         APP_JUMB(cw->minmaxtype);
+         APP_JUMB(cw->op);
+         foreach(l, cw->args)
+         {
+             Node       *arg = (Node *) lfirst(l);
+
+             LeafNode(jstate, arg);
+         }
+     }
+     else if (IsA(arg, ScalarArrayOpExpr))
+     {
+         ScalarArrayOpExpr *sa = (ScalarArrayOpExpr *) arg;
+
+         APP_JUMB(sa->opfuncid);
+         APP_JUMB(sa->useOr);
+         foreach(l, sa->args)
+         {
+             Node       *arg = (Node *) lfirst(l);
+
+             LeafNode(jstate, arg);
+         }
+     }
+     else if (IsA(arg, CoalesceExpr))
+     {
+         CoalesceExpr *ca = (CoalesceExpr *) arg;
+
+         foreach(l, ca->args)
+         {
+             Node       *arg = (Node *) lfirst(l);
+
+             LeafNode(jstate, arg);
+         }
+     }
+     else if (IsA(arg, ArrayCoerceExpr))
+     {
+         ArrayCoerceExpr *ac = (ArrayCoerceExpr *) arg;
+
+         LeafNode(jstate, (Node *) ac->arg);
+     }
+     else if (IsA(arg, WindowClause))
+     {
+         WindowClause *wc = (WindowClause *) arg;
+
+         foreach(l, wc->partitionClause)
+         {
+             Node       *arg = (Node *) lfirst(l);
+
+             LeafNode(jstate, arg);
+         }
+         foreach(l, wc->orderClause)
+         {
+             Node       *arg = (Node *) lfirst(l);
+
+             LeafNode(jstate, arg);
+         }
+     }
+     else if (IsA(arg, SortGroupClause))
+     {
+         SortGroupClause *sgc = (SortGroupClause *) arg;
+
+         APP_JUMB(sgc->tleSortGroupRef);
+         APP_JUMB(sgc->nulls_first);
+     }
+     else if (IsA(arg, Integer) ||
+              IsA(arg, Float) ||
+              IsA(arg, String) ||
+              IsA(arg, BitString) ||
+              IsA(arg, Null)
+         )
+     {
+         /*
+          * It is not necessary to serialize Value nodes - they are seen when
+          * aliases are used, which are ignored.
+          */
+         return;
+     }
+     else if (IsA(arg, BooleanTest))
+     {
+         BooleanTest *bt = (BooleanTest *) arg;
+
+         APP_JUMB(bt->booltesttype);
+         LeafNode(jstate, (Node *) bt->arg);
+     }
+     else if (IsA(arg, ArrayRef))
+     {
+         ArrayRef   *ar = (ArrayRef *) arg;
+
+         APP_JUMB(ar->refarraytype);
+         foreach(l, ar->refupperindexpr)
+         {
+             Node       *arg = (Node *) lfirst(l);
+
+             LeafNode(jstate, arg);
+         }
+         foreach(l, ar->reflowerindexpr)
+         {
+             Node       *arg = (Node *) lfirst(l);
+
+             LeafNode(jstate, arg);
+         }
+         if (ar->refexpr)
+             LeafNode(jstate, (Node *) ar->refexpr);
+         if (ar->refassgnexpr)
+             LeafNode(jstate, (Node *) ar->refassgnexpr);
+     }
+     else if (IsA(arg, NullIfExpr))
+     {
+         /* NullIfExpr is just a typedef for OpExpr */
+         QualsNode(jstate, (OpExpr *) arg);
+     }
+     else if (IsA(arg, RowExpr))
+     {
+         RowExpr    *re = (RowExpr *) arg;
+
+         APP_JUMB(re->row_format);
+         foreach(l, re->args)
+         {
+             Node       *arg = (Node *) lfirst(l);
+
+             LeafNode(jstate, arg);
+         }
+
+     }
+     else if (IsA(arg, XmlExpr))
+     {
+         XmlExpr    *xml = (XmlExpr *) arg;
+
+         APP_JUMB(xml->op);
+         foreach(l, xml->args)
+         {
+             Node       *arg = (Node *) lfirst(l);
+
+             LeafNode(jstate, arg);
+         }
+         /* non-XML expressions for xml_attributes */
+         foreach(l, xml->named_args)
+         {
+             Node       *arg = (Node *) lfirst(l);
+
+             LeafNode(jstate, arg);
+         }
+         /* parallel list of Value strings */
+         foreach(l, xml->arg_names)
+         {
+             Node       *arg = (Node *) lfirst(l);
+
+             LeafNode(jstate, arg);
+         }
+     }
+     else if (IsA(arg, RowCompareExpr))
+     {
+         RowCompareExpr *rc = (RowCompareExpr *) arg;
+
+         foreach(l, rc->largs)
+         {
+             Node       *arg = (Node *) lfirst(l);
+
+             LeafNode(jstate, arg);
+         }
+         foreach(l, rc->rargs)
+         {
+             Node       *arg = (Node *) lfirst(l);
+
+             LeafNode(jstate, arg);
+         }
+     }
+     else if (IsA(arg, SetToDefault))
+     {
+         SetToDefault *sd = (SetToDefault *) arg;
+
+         APP_JUMB(sd->typeId);
+         APP_JUMB(sd->typeMod);
+     }
+     else if (IsA(arg, ConvertRowtypeExpr))
+     {
+         ConvertRowtypeExpr *Cr = (ConvertRowtypeExpr *) arg;
+
+         APP_JUMB(Cr->convertformat);
+         APP_JUMB(Cr->resulttype);
+         LeafNode(jstate, (Node *) Cr->arg);
+     }
+     else if (IsA(arg, FieldStore))
+     {
+         FieldStore *Fs = (FieldStore *) arg;
+
+         LeafNode(jstate, (Node *) Fs->arg);
+         foreach(l, Fs->newvals)
+         {
+             Node       *arg = (Node *) lfirst(l);
+
+             LeafNode(jstate, arg);
+         }
+     }
+     else
+     {
+         elog(WARNING, "unrecognized node type in LeafNode: %d",
+              (int) nodeTag(arg));
+     }
+ }
+
+ /*
+  * Perform selective serialization of limit or offset nodes
+  */
+ static void
+ LimitOffsetNode(pgssJumbleState *jstate, const Node *node)
+ {
+     ListCell   *l;
+     unsigned char magic = MAG_LIMIT_OFFSET;
+
+     APP_JUMB(magic);
+
+     if (IsA(node, FuncExpr))
+     {
+
+         foreach(l, ((FuncExpr *) node)->args)
+         {
+             Node       *arg = (Node *) lfirst(l);
+
+             LeafNode(jstate, arg);
+         }
+     }
+     else
+     {
+         /* Fall back on leaf node representation */
+         LeafNode(jstate, node);
+     }
+ }
+
+ /*
+  * JoinExprNode: Perform selective serialization of JoinExpr nodes
+  */
+ static void
+ JoinExprNode(pgssJumbleState *jstate, const JoinExpr *node)
+ {
+     Node       *larg = node->larg;        /* left subtree */
+     Node       *rarg = node->rarg;        /* right subtree */
+     ListCell   *l;
+
+     Assert(IsA(node, JoinExpr));
+
+     APP_JUMB(node->jointype);
+     APP_JUMB(node->isNatural);
+
+     if (node->quals)
+     {
+         if (IsA(node->quals, OpExpr))
+         {
+             QualsNode(jstate, (OpExpr *) node->quals);
+         }
+         else
+         {
+             LeafNode(jstate, (Node *) node->quals);
+         }
+     }
+     /* USING clause, if any (list of String) */
+     foreach(l, node->usingClause)
+     {
+         Node       *arg = (Node *) lfirst(l);
+
+         LeafNode(jstate, arg);
+     }
+     if (larg)
+         JoinExprNodeChild(jstate, larg);
+     if (rarg)
+         JoinExprNodeChild(jstate, rarg);
+ }
+
+ /*
+  * JoinExprNodeChild: Serialize children of the JoinExpr node
+  */
+ static void
+ JoinExprNodeChild(pgssJumbleState *jstate, const Node *node)
+ {
+     if (IsA(node, RangeTblRef))
+     {
+         RangeTblRef *rt = (RangeTblRef *) node;
+         List       *rtable;
+         RangeTblEntry *rte;
+         ListCell   *l;
+
+         rtable = (List *) linitial(jstate->rangetables);
+         rte = rt_fetch(rt->rtindex, rtable);
+
+         APP_JUMB(rte->relid);
+         APP_JUMB(rte->jointype);
+
+         if (rte->subquery)
+             PerformJumble(jstate, rte->subquery);
+
+         foreach(l, rte->joinaliasvars)
+         {
+             Node       *arg = (Node *) lfirst(l);
+
+             LeafNode(jstate, arg);
+         }
+     }
+     else if (IsA(node, JoinExpr))
+     {
+         JoinExprNode(jstate, (JoinExpr *) node);
+     }
+     else
+     {
+         LeafNode(jstate, node);
+     }
+ }
+
+ /*
+  * Record location of constant within query string of query tree that is
+  * currently being walked.
+  */
+ static void
+ RecordConstLocation(pgssJumbleState *jstate, int location)
+ {
+     /* -1 indicates unknown or undefined location */
+     if (location >= 0)
+     {
+         /* enlarge array if needed */
+         if (jstate->clocations_count >= jstate->clocations_buf_size)
+         {
+             jstate->clocations_buf_size *= 2;
+             jstate->clocations = (pgssLocationLen *)
+                 repalloc(jstate->clocations,
+                          jstate->clocations_buf_size *
+                          sizeof(pgssLocationLen));
+         }
+         jstate->clocations[jstate->clocations_count++].location = location;
+     }
+ }
+
+ /*
+  * Generate a normalized version of the query string that will be used to
+  * represent all similar queries.
+  *
+  * Note that the normalized representation may well vary depending on
+  * just which "equivalent" query is used to create the hashtable entry.
+  * We assume this is OK.
+  *
+  * *query_len_p contains the input string length, and is updated with
+  * the result string length (which cannot be longer) on exit.
+  *
+  * Returns a palloc'd string, which is not necessarily null-terminated.
+  */
+ static char *
+ generate_normalized_query(pgssJumbleState *jstate, const char *query,
+                           int *query_len_p, int encoding)
+ {
+     char       *norm_query;
+     int            query_len = *query_len_p;
+     int            max_output_len;
+     int            i,
+                 len_to_wrt,            /* Length (in bytes) to write */
+                 quer_loc = 0,        /* Source query byte location */
+                 n_quer_loc = 0,        /* Normalized query byte location */
+                 last_off = 0,        /* Offset from start for previous tok */
+                 last_tok_len = 0;    /* Length (in bytes) of that tok */
+
+     /*
+      * Get constants' lengths - core system only gives us locations.
+      * Note this also ensures the items are sorted by location.
+      */
+     fill_in_constant_lengths(jstate, query);
+
+     /* Allocate result buffer, ensuring we limit result to allowed size */
+     max_output_len = Min(query_len, pgss->query_size - 1);
+     norm_query = palloc(max_output_len);
+
+     for (i = 0; i < jstate->clocations_count; i++)
+     {
+         int        off,            /* Offset from start for cur tok */
+                 tok_len;         /* Length (in bytes) of that tok */
+
+         off = jstate->clocations[i].location;
+         tok_len = jstate->clocations[i].length;
+
+         if (tok_len < 0)
+             continue;            /* ignore any duplicates */
+
+         /* Copy next chunk, or as much as will fit */
+         len_to_wrt = off - last_off;
+         len_to_wrt -= last_tok_len;
+         len_to_wrt = Min(len_to_wrt, max_output_len - n_quer_loc);
+
+         Assert(len_to_wrt >= 0);
+         memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
+         n_quer_loc += len_to_wrt;
+
+         if (n_quer_loc < max_output_len)
+             norm_query[n_quer_loc++] = '?';
+
+         quer_loc = off + tok_len;
+         last_off = off;
+         last_tok_len = tok_len;
+
+         /* If we run out of space, might as well stop iterating */
+         if (n_quer_loc >= max_output_len)
+             break;
+     }
+
+     /*
+      * We've copied up until the last ignorable constant.  Copy over the
+      * remaining bytes of the original query string, or at least as much as
+      * will fit.
+      */
+     len_to_wrt = query_len - quer_loc;
+     len_to_wrt = Min(len_to_wrt, max_output_len - n_quer_loc);
+
+     Assert(len_to_wrt >= 0);
+     memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
+     n_quer_loc += len_to_wrt;
+
+     /*
+      * If we ran out of space, we need to do an encoding-aware truncation,
+      * just to make sure we don't have an incomplete character at the end.
+      */
+     if (n_quer_loc >= max_output_len)
+         query_len = pg_encoding_mbcliplen(encoding,
+                                           norm_query,
+                                           n_quer_loc,
+                                           pgss->query_size - 1);
+     else
+         query_len = n_quer_loc;
+
+     *query_len_p = query_len;
+     return norm_query;
+ }
+
+ /*
+  * Given a valid SQL string, and constant-location records whose lengths are
+  * uninitialized, fill in the corresponding lengths of those constants.
+  *
+  * The constants may use any allowed constant syntax, such as float literals,
+  * bit-strings, single-quoted strings and dollar-quoted strings.  This is
+  * accomplished by using the public API for the core scanner.
+  *
+  * It is the caller's job to ensure that the string is a valid SQL statement
+  * with constants at the indicated locations.  Since in practice the string
+  * has already been parsed, and the locations that the caller provides will
+  * have originated from within the authoritative parser, this should not be
+  * a problem.
+  *
+  * Duplicate constant pointers are possible, and will have their lengths
+  * marked as '-1', so that they are later ignored.
+  *
+  * N.B. There is an assumption that a '-' character at a Const location begins
+  * a negative numeric constant.  This precludes there ever being another
+  * reason for a constant to start with a '-'.
+  */
+ static void
+ fill_in_constant_lengths(pgssJumbleState *jstate, const char *query)
+ {
+     pgssLocationLen *locs;
+     core_yyscan_t yyscanner;
+     core_yy_extra_type yyextra;
+     core_YYSTYPE yylval;
+     YYLTYPE        yylloc;
+     int            last_loc = -1;
+     int            i;
+
+     /*
+      * Sort the records by location so that we can process them in order
+      * while scanning the query text.
+      */
+     if (jstate->clocations_count > 1)
+         qsort(jstate->clocations, jstate->clocations_count,
+               sizeof(pgssLocationLen), comp_location);
+     locs = jstate->clocations;
+
+     /* initialize the flex scanner --- should match raw_parser() */
+     yyscanner = scanner_init(query,
+                              &yyextra,
+                              ScanKeywords,
+                              NumScanKeywords);
+
+     /* Search for each constant, in sequence */
+     for (i = 0; i < jstate->clocations_count; i++)
+     {
+         int            loc = locs[i].location;
+
+         Assert(loc >= 0);
+
+         if (loc <= last_loc)
+         {
+             /* Duplicate constant, ignore */
+             locs[i].length = -1;
+             continue;
+         }
+
+         /* Lex tokens until we find the desired constant */
+         for (;;)
+         {
+             int            tok;
+
+             tok = core_yylex(&yylval, &yylloc, yyscanner);
+
+             /* We should not hit end-of-string, but if we do, behave sanely */
+             if (tok == 0)
+             {
+                 locs[i].length = -1;
+                 break;            /* out of inner for-loop */
+             }
+
+             /*
+              * We should find the token position exactly, but if we somehow
+              * run past it, work with that.
+              */
+             if (yylloc >= loc)
+             {
+                 if (query[loc] == '-')
+                 {
+                     /*
+                      * It's a negative value - this is the one and only case
+                      * where we replace more than a single token.
+                      *
+                      * Do not compensate for the core system's special-case
+                      * adjustment of location to that of the leading '-'
+                      * operator in the event of a negative constant.  It is
+                      * also useful for our purposes to start from the minus
+                      * symbol.  In this way, queries like "select * from foo
+                      * where bar = 1" and "select * from foo where bar = -2"
+                      * will have identical canonicalized query strings.
+                      */
+                     tok = core_yylex(&yylval, &yylloc, yyscanner);
+                     if (tok == 0)
+                     {
+                         locs[i].length = -1;
+                         break;    /* out of inner for-loop */
+                     }
+                 }
+
+                 /*
+                  * We now rely on the assumption that flex has placed a zero
+                  * byte after the text of the current token in scanbuf.
+                  */
+                 locs[i].length = strlen(yyextra.scanbuf + loc);
+                 break;            /* out of inner for-loop */
+             }
+         }
+
+         last_loc = loc;
+     }
+
+     scanner_finish(yyscanner);
+ }
+
+ /*
+  * comp_location: comparator for qsorting pgssLocationLen structs by location
+  */
+ static int
+ comp_location(const void *a, const void *b)
+ {
+     int            l = ((const pgssLocationLen *) a)->location;
+     int            r = ((const pgssLocationLen *) b)->location;
+
+     if (l < r)
+         return -1;
+     else if (l > r)
+         return +1;
+     else
+         return 0;
+ }

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: triggers and inheritance tree
Следующее
От: Fujii Masao
Дата:
Сообщение: Re: PATCH: pg_basebackup (missing exit on error)