Обсуждение: Performance improvements for src/port/snprintf.c

Поиск
Список
Период
Сортировка

Performance improvements for src/port/snprintf.c

От
Tom Lane
Дата:
Over in the what-about-%m thread, we speculated about replacing the
platform's *printf functions if they didn't support %m, which would
basically mean using src/port/snprintf.c on all non-glibc platforms,
rather than only on Windows as happens right now (ignoring some
obsolete platforms with busted snprintf's).

I've been looking into the possible performance consequences of that,
in particular comparing snprintf.c to the library versions on macOS,
FreeBSD, OpenBSD, and NetBSD.  While it held up well in simpler cases,
I noted that it was significantly slower on long format strings, which
I traced to two separate problems:

1. Our implementation always scans the format string twice, so that it
can sort out argument-ordering options (%n$).  Everybody else is bright
enough to do that only for formats that actually use %n$, and it turns
out that it doesn't really cost anything extra to do so: you can just
perform the extra scan when and if you first find a dollar specifier.
(Perhaps there's an arguable downside for this, with invalid format
strings that have non-dollar conversion specs followed by dollar ones:
with this approach we might fetch some arguments before realizing that
the format is broken.  But a wrong format can cause indefinitely bad
results already, so that seems like a pretty thin objection to me,
especially if all other implementations share the same hazard.)

2. Our implementation is shoving simple data characters in the format
out to the result buffer one at a time.  More common is to skip to the
next % as fast as possible, and then dump anything skipped over using
the string-output code path, reducing the overhead of buffer overrun
checking.

The attached patch fixes both of those things, and also does some
micro-optimization hacking to avoid loops around dopr_outch() as well
as unnecessary use of pass-by-ref arguments.  This version stacks up
pretty well against all the libraries I compared it to.  The remaining
weak spot is that floating-point conversions are consistently 30%-50%
slower than the native libraries, which is not terribly surprising
considering that our implementation involves calling the native sprintf
and then massaging the result.  Perhaps there's a way to improve that
without writing our own floating-point conversion code, but I'm not
seeing an easy way offhand.  I don't think that's a showstopper though.
This code is now faster than the native code for very many other cases,
so on average it should cause no real performance problem.

I've attached both the patch and a simple performance testbed in case
anybody wants to do their own measurements.  For reference's sake,
these are the specific test cases I looked at:

        snprintf(buffer, sizeof(buffer),
                 "%2$.*3$f %1$d\n",
                 42, 123.456, 2);

        snprintf(buffer, sizeof(buffer),
                 "%.*g", 15, 123.456);

        snprintf(buffer, sizeof(buffer),
                 "%d %d", 15, 16);

        snprintf(buffer, sizeof(buffer),
                 "%10d", 15);

        snprintf(buffer, sizeof(buffer),
                 "%s",

"0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890");

        snprintf(buffer, sizeof(buffer),
                 "%d
0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890",

        snprintf(buffer, sizeof(buffer),
                 "%1$d
0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890",
                 42);

A couple of other notes of interest:

* The skip-to-next-% searches could alternatively be implemented with
strchr(), although then you need a strlen() call if there isn't another %.
glibc's version of strchr() is fast enough to make that a win, but since
we're not contemplating using this atop glibc, that's not a case we care
about.  On other platforms the manual loop mostly seems to be faster.

* NetBSD seems to have a special fast path for the case that the format
string is exactly "%s".  I did not adopt that idea here, reasoning that
checking for it would add overhead to all other cases, making it probably
a net loss overall.  I'm prepared to listen to arguments otherwise,
though.  It is a common case, I just doubt it's common enough (and
other library authors seem to agree).

I'll add this to the upcoming CF.

            regards, tom lane

diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index 851e2ae..211ff1b 100644
*** a/src/port/snprintf.c
--- b/src/port/snprintf.c
*************** flushbuffer(PrintfTarget *target)
*** 295,301 ****
  }


! static void fmtstr(char *value, int leftjust, int minlen, int maxwidth,
         int pointflag, PrintfTarget *target);
  static void fmtptr(void *value, PrintfTarget *target);
  static void fmtint(int64 value, char type, int forcesign,
--- 295,303 ----
  }


! static bool find_arguments(const char *format, va_list args,
!                PrintfArgValue *argvalues);
! static void fmtstr(const char *value, int leftjust, int minlen, int maxwidth,
         int pointflag, PrintfTarget *target);
  static void fmtptr(void *value, PrintfTarget *target);
  static void fmtint(int64 value, char type, int forcesign,
*************** static void fmtfloat(double value, char
*** 307,317 ****
           PrintfTarget *target);
  static void dostr(const char *str, int slen, PrintfTarget *target);
  static void dopr_outch(int c, PrintfTarget *target);
  static int    adjust_sign(int is_negative, int forcesign, int *signvalue);
! static void adjust_padlen(int minlen, int vallen, int leftjust, int *padlen);
! static void leading_pad(int zpad, int *signvalue, int *padlen,
              PrintfTarget *target);
! static void trailing_pad(int *padlen, PrintfTarget *target);


  /*
--- 309,320 ----
           PrintfTarget *target);
  static void dostr(const char *str, int slen, PrintfTarget *target);
  static void dopr_outch(int c, PrintfTarget *target);
+ static void dopr_outchmulti(int c, int slen, PrintfTarget *target);
  static int    adjust_sign(int is_negative, int forcesign, int *signvalue);
! static int    compute_padlen(int minlen, int vallen, int leftjust);
! static void leading_pad(int zpad, int signvalue, int *padlen,
              PrintfTarget *target);
! static void trailing_pad(int padlen, PrintfTarget *target);


  /*
*************** static void trailing_pad(int *padlen, Pr
*** 320,329 ****
  static void
  dopr(PrintfTarget *target, const char *format, va_list args)
  {
!     const char *format_start = format;
      int            ch;
      bool        have_dollar;
-     bool        have_non_dollar;
      bool        have_star;
      bool        afterstar;
      int            accum;
--- 323,331 ----
  static void
  dopr(PrintfTarget *target, const char *format, va_list args)
  {
!     const char *first_pct = NULL;
      int            ch;
      bool        have_dollar;
      bool        have_star;
      bool        afterstar;
      int            accum;
*************** dopr(PrintfTarget *target, const char *f
*** 335,559 ****
      int            precision;
      int            zpad;
      int            forcesign;
-     int            last_dollar;
      int            fmtpos;
      int            cvalue;
      int64        numvalue;
      double        fvalue;
      char       *strvalue;
-     int            i;
-     PrintfArgType argtypes[NL_ARGMAX + 1];
      PrintfArgValue argvalues[NL_ARGMAX + 1];

      /*
!      * Parse the format string to determine whether there are %n$ format
!      * specs, and identify the types and order of the format parameters.
       */
!     have_dollar = have_non_dollar = false;
!     last_dollar = 0;
!     MemSet(argtypes, 0, sizeof(argtypes));

!     while ((ch = *format++) != '\0')
      {
!         if (ch != '%')
!             continue;
!         longflag = longlongflag = pointflag = 0;
!         fmtpos = accum = 0;
!         afterstar = false;
! nextch1:
!         ch = *format++;
!         if (ch == '\0')
!             break;                /* illegal, but we don't complain */
!         switch (ch)
          {
!             case '-':
!             case '+':
!                 goto nextch1;
!             case '0':
!             case '1':
!             case '2':
!             case '3':
!             case '4':
!             case '5':
!             case '6':
!             case '7':
!             case '8':
!             case '9':
!                 accum = accum * 10 + (ch - '0');
!                 goto nextch1;
!             case '.':
!                 pointflag = 1;
!                 accum = 0;
!                 goto nextch1;
!             case '*':
!                 if (afterstar)
!                     have_non_dollar = true; /* multiple stars */
!                 afterstar = true;
!                 accum = 0;
!                 goto nextch1;
!             case '$':
!                 have_dollar = true;
!                 if (accum <= 0 || accum > NL_ARGMAX)
!                     goto bad_format;
!                 if (afterstar)
!                 {
!                     if (argtypes[accum] &&
!                         argtypes[accum] != ATYPE_INT)
!                         goto bad_format;
!                     argtypes[accum] = ATYPE_INT;
!                     last_dollar = Max(last_dollar, accum);
!                     afterstar = false;
!                 }
!                 else
!                     fmtpos = accum;
!                 accum = 0;
!                 goto nextch1;
!             case 'l':
!                 if (longflag)
!                     longlongflag = 1;
!                 else
!                     longflag = 1;
!                 goto nextch1;
!             case 'z':
! #if SIZEOF_SIZE_T == 8
! #ifdef HAVE_LONG_INT_64
!                 longflag = 1;
! #elif defined(HAVE_LONG_LONG_INT_64)
!                 longlongflag = 1;
! #else
! #error "Don't know how to print 64bit integers"
! #endif
! #else
!                 /* assume size_t is same size as int */
! #endif
!                 goto nextch1;
!             case 'h':
!             case '\'':
!                 /* ignore these */
!                 goto nextch1;
!             case 'd':
!             case 'i':
!             case 'o':
!             case 'u':
!             case 'x':
!             case 'X':
!                 if (fmtpos)
!                 {
!                     PrintfArgType atype;

!                     if (longlongflag)
!                         atype = ATYPE_LONGLONG;
!                     else if (longflag)
!                         atype = ATYPE_LONG;
!                     else
!                         atype = ATYPE_INT;
!                     if (argtypes[fmtpos] &&
!                         argtypes[fmtpos] != atype)
!                         goto bad_format;
!                     argtypes[fmtpos] = atype;
!                     last_dollar = Max(last_dollar, fmtpos);
!                 }
!                 else
!                     have_non_dollar = true;
!                 break;
!             case 'c':
!                 if (fmtpos)
!                 {
!                     if (argtypes[fmtpos] &&
!                         argtypes[fmtpos] != ATYPE_INT)
!                         goto bad_format;
!                     argtypes[fmtpos] = ATYPE_INT;
!                     last_dollar = Max(last_dollar, fmtpos);
!                 }
!                 else
!                     have_non_dollar = true;
!                 break;
!             case 's':
!             case 'p':
!                 if (fmtpos)
!                 {
!                     if (argtypes[fmtpos] &&
!                         argtypes[fmtpos] != ATYPE_CHARPTR)
!                         goto bad_format;
!                     argtypes[fmtpos] = ATYPE_CHARPTR;
!                     last_dollar = Max(last_dollar, fmtpos);
!                 }
!                 else
!                     have_non_dollar = true;
!                 break;
!             case 'e':
!             case 'E':
!             case 'f':
!             case 'g':
!             case 'G':
!                 if (fmtpos)
!                 {
!                     if (argtypes[fmtpos] &&
!                         argtypes[fmtpos] != ATYPE_DOUBLE)
!                         goto bad_format;
!                     argtypes[fmtpos] = ATYPE_DOUBLE;
!                     last_dollar = Max(last_dollar, fmtpos);
!                 }
!                 else
!                     have_non_dollar = true;
                  break;
!             case '%':
                  break;
          }

          /*
!          * If we finish the spec with afterstar still set, there's a
!          * non-dollar star in there.
           */
!         if (afterstar)
!             have_non_dollar = true;
!     }
!
!     /* Per spec, you use either all dollar or all not. */
!     if (have_dollar && have_non_dollar)
!         goto bad_format;
!
!     /*
!      * In dollar mode, collect the arguments in physical order.
!      */
!     for (i = 1; i <= last_dollar; i++)
!     {
!         switch (argtypes[i])
!         {
!             case ATYPE_NONE:
!                 goto bad_format;
!             case ATYPE_INT:
!                 argvalues[i].i = va_arg(args, int);
!                 break;
!             case ATYPE_LONG:
!                 argvalues[i].l = va_arg(args, long);
!                 break;
!             case ATYPE_LONGLONG:
!                 argvalues[i].ll = va_arg(args, int64);
!                 break;
!             case ATYPE_DOUBLE:
!                 argvalues[i].d = va_arg(args, double);
!                 break;
!             case ATYPE_CHARPTR:
!                 argvalues[i].cptr = va_arg(args, char *);
!                 break;
!         }
!     }
!
!     /*
!      * At last we can parse the format for real.
!      */
!     format = format_start;
!     while ((ch = *format++) != '\0')
!     {
!         if (target->failed)
!             break;

!         if (ch != '%')
!         {
!             dopr_outch(ch, target);
!             continue;
!         }
          fieldwidth = precision = zpad = leftjust = forcesign = 0;
          longflag = longlongflag = pointflag = 0;
          fmtpos = accum = 0;
--- 337,387 ----
      int            precision;
      int            zpad;
      int            forcesign;
      int            fmtpos;
      int            cvalue;
      int64        numvalue;
      double        fvalue;
      char       *strvalue;
      PrintfArgValue argvalues[NL_ARGMAX + 1];

      /*
!      * Initially, we suppose the format string does not use %n$.  The first
!      * time we come to a conversion spec that has that, we'll call
!      * find_arguments() to check for consistent use of %n$ and fill the
!      * argvalues array with the argument values in the correct order.
       */
!     have_dollar = false;

!     while (*format != '\0')
      {
!         /* Locate next conversion specifier */
!         if (*format != '%')
          {
!             const char *next_pct = format + 1;

!             while (*next_pct != '\0' && *next_pct != '%')
!                 next_pct++;
!
!             /* Dump literal data we just scanned over */
!             dostr(format, next_pct - format, target);
!             if (target->failed)
                  break;
!
!             if (*next_pct == '\0')
                  break;
+             format = next_pct;
          }

          /*
!          * Remember start of first conversion spec; if we find %n$, then it's
!          * sufficient for find_arguments() to start here, without rescanning
!          * earlier literal text.
           */
!         if (first_pct == NULL)
!             first_pct = format;

!         /* Process conversion spec starting at *format */
!         format++;
          fieldwidth = precision = zpad = leftjust = forcesign = 0;
          longflag = longlongflag = pointflag = 0;
          fmtpos = accum = 0;
*************** nextch2:
*** 597,603 ****
              case '*':
                  if (have_dollar)
                  {
!                     /* process value after reading n$ */
                      afterstar = true;
                  }
                  else
--- 425,435 ----
              case '*':
                  if (have_dollar)
                  {
!                     /*
!                      * We'll process value after reading n$.  Note it's OK to
!                      * assume have_dollar is set correctly, because in a valid
!                      * format string the initial % must have had n$ if * does.
!                      */
                      afterstar = true;
                  }
                  else
*************** nextch2:
*** 628,633 ****
--- 460,473 ----
                  accum = 0;
                  goto nextch2;
              case '$':
+                 /* First dollar sign? */
+                 if (!have_dollar)
+                 {
+                     /* Yup, so examine all conversion specs in format */
+                     if (!find_arguments(first_pct, args, argvalues))
+                         goto bad_format;
+                     have_dollar = true;
+                 }
                  if (afterstar)
                  {
                      /* fetch and process star value */
*************** nextch2:
*** 806,811 ****
--- 646,655 ----
                  dopr_outch('%', target);
                  break;
          }
+
+         /* Check for failure after each conversion spec */
+         if (target->failed)
+             break;
      }

      return;
*************** bad_format:
*** 815,822 ****
      target->failed = true;
  }

  static void
! fmtstr(char *value, int leftjust, int minlen, int maxwidth,
         int pointflag, PrintfTarget *target)
  {
      int            padlen,
--- 659,896 ----
      target->failed = true;
  }

+ /*
+  * find_arguments(): sort out the arguments for a format spec with %n$
+  *
+  * If format is valid, return true and fill argvalues[i] with the value
+  * for the conversion spec that has %i$ or *i$.  Else return false.
+  */
+ static bool
+ find_arguments(const char *format, va_list args,
+                PrintfArgValue *argvalues)
+ {
+     int            ch;
+     bool        afterstar;
+     int            accum;
+     int            longlongflag;
+     int            longflag;
+     int            fmtpos;
+     int            i;
+     int            last_dollar;
+     PrintfArgType argtypes[NL_ARGMAX + 1];
+
+     /* Initialize to "no dollar arguments known" */
+     last_dollar = 0;
+     MemSet(argtypes, 0, sizeof(argtypes));
+
+     /*
+      * This loop must accept the same format strings as the one in dopr().
+      * However, we don't need to analyze them to the same level of detail.
+      *
+      * Since we're only called if there's a dollar-type spec somewhere, we can
+      * fail immediately if we find a non-dollar spec.  Per the C99 standard,
+      * all argument references in the format string must be one or the other.
+      */
+     while (*format != '\0')
+     {
+         /* Locate next conversion specifier */
+         if (*format != '%')
+         {
+             const char *next_pct = format + 1;
+
+             while (*next_pct != '\0' && *next_pct != '%')
+                 next_pct++;
+             if (*next_pct == '\0')
+                 break;
+             format = next_pct;
+         }
+
+         /* Process conversion spec starting at *format */
+         format++;
+         longflag = longlongflag = 0;
+         fmtpos = accum = 0;
+         afterstar = false;
+ nextch1:
+         ch = *format++;
+         if (ch == '\0')
+             break;                /* illegal, but we don't complain */
+         switch (ch)
+         {
+             case '-':
+             case '+':
+                 goto nextch1;
+             case '0':
+             case '1':
+             case '2':
+             case '3':
+             case '4':
+             case '5':
+             case '6':
+             case '7':
+             case '8':
+             case '9':
+                 accum = accum * 10 + (ch - '0');
+                 goto nextch1;
+             case '.':
+                 accum = 0;
+                 goto nextch1;
+             case '*':
+                 if (afterstar)
+                     return false;    /* previous star missing dollar */
+                 afterstar = true;
+                 accum = 0;
+                 goto nextch1;
+             case '$':
+                 if (accum <= 0 || accum > NL_ARGMAX)
+                     return false;
+                 if (afterstar)
+                 {
+                     if (argtypes[accum] &&
+                         argtypes[accum] != ATYPE_INT)
+                         return false;
+                     argtypes[accum] = ATYPE_INT;
+                     last_dollar = Max(last_dollar, accum);
+                     afterstar = false;
+                 }
+                 else
+                     fmtpos = accum;
+                 accum = 0;
+                 goto nextch1;
+             case 'l':
+                 if (longflag)
+                     longlongflag = 1;
+                 else
+                     longflag = 1;
+                 goto nextch1;
+             case 'z':
+ #if SIZEOF_SIZE_T == 8
+ #ifdef HAVE_LONG_INT_64
+                 longflag = 1;
+ #elif defined(HAVE_LONG_LONG_INT_64)
+                 longlongflag = 1;
+ #else
+ #error "Don't know how to print 64bit integers"
+ #endif
+ #else
+                 /* assume size_t is same size as int */
+ #endif
+                 goto nextch1;
+             case 'h':
+             case '\'':
+                 /* ignore these */
+                 goto nextch1;
+             case 'd':
+             case 'i':
+             case 'o':
+             case 'u':
+             case 'x':
+             case 'X':
+                 if (fmtpos)
+                 {
+                     PrintfArgType atype;
+
+                     if (longlongflag)
+                         atype = ATYPE_LONGLONG;
+                     else if (longflag)
+                         atype = ATYPE_LONG;
+                     else
+                         atype = ATYPE_INT;
+                     if (argtypes[fmtpos] &&
+                         argtypes[fmtpos] != atype)
+                         return false;
+                     argtypes[fmtpos] = atype;
+                     last_dollar = Max(last_dollar, fmtpos);
+                 }
+                 else
+                     return false;    /* non-dollar conversion spec */
+                 break;
+             case 'c':
+                 if (fmtpos)
+                 {
+                     if (argtypes[fmtpos] &&
+                         argtypes[fmtpos] != ATYPE_INT)
+                         return false;
+                     argtypes[fmtpos] = ATYPE_INT;
+                     last_dollar = Max(last_dollar, fmtpos);
+                 }
+                 else
+                     return false;    /* non-dollar conversion spec */
+                 break;
+             case 's':
+             case 'p':
+                 if (fmtpos)
+                 {
+                     if (argtypes[fmtpos] &&
+                         argtypes[fmtpos] != ATYPE_CHARPTR)
+                         return false;
+                     argtypes[fmtpos] = ATYPE_CHARPTR;
+                     last_dollar = Max(last_dollar, fmtpos);
+                 }
+                 else
+                     return false;    /* non-dollar conversion spec */
+                 break;
+             case 'e':
+             case 'E':
+             case 'f':
+             case 'g':
+             case 'G':
+                 if (fmtpos)
+                 {
+                     if (argtypes[fmtpos] &&
+                         argtypes[fmtpos] != ATYPE_DOUBLE)
+                         return false;
+                     argtypes[fmtpos] = ATYPE_DOUBLE;
+                     last_dollar = Max(last_dollar, fmtpos);
+                 }
+                 else
+                     return false;    /* non-dollar conversion spec */
+                 break;
+             case '%':
+                 break;
+         }
+
+         /*
+          * If we finish the spec with afterstar still set, there's a
+          * non-dollar star in there.
+          */
+         if (afterstar)
+             return false;        /* non-dollar conversion spec */
+     }
+
+     /*
+      * Format appears valid so far, so collect the arguments in physical
+      * order.  (Since we rejected any non-dollar specs that would have
+      * collected arguments, we know that dopr() hasn't collected any yet.)
+      */
+     for (i = 1; i <= last_dollar; i++)
+     {
+         switch (argtypes[i])
+         {
+             case ATYPE_NONE:
+                 return false;
+             case ATYPE_INT:
+                 argvalues[i].i = va_arg(args, int);
+                 break;
+             case ATYPE_LONG:
+                 argvalues[i].l = va_arg(args, long);
+                 break;
+             case ATYPE_LONGLONG:
+                 argvalues[i].ll = va_arg(args, int64);
+                 break;
+             case ATYPE_DOUBLE:
+                 argvalues[i].d = va_arg(args, double);
+                 break;
+             case ATYPE_CHARPTR:
+                 argvalues[i].cptr = va_arg(args, char *);
+                 break;
+         }
+     }
+
+     return true;
+ }
+
  static void
! fmtstr(const char *value, int leftjust, int minlen, int maxwidth,
         int pointflag, PrintfTarget *target)
  {
      int            padlen,
*************** fmtstr(char *value, int leftjust, int mi
*** 831,847 ****
      else
          vallen = strlen(value);

!     adjust_padlen(minlen, vallen, leftjust, &padlen);

!     while (padlen > 0)
      {
!         dopr_outch(' ', target);
!         --padlen;
      }

      dostr(value, vallen, target);

!     trailing_pad(&padlen, target);
  }

  static void
--- 905,921 ----
      else
          vallen = strlen(value);

!     padlen = compute_padlen(minlen, vallen, leftjust);

!     if (padlen > 0)
      {
!         dopr_outchmulti(' ', padlen, target);
!         padlen = 0;
      }

      dostr(value, vallen, target);

!     trailing_pad(padlen, target);
  }

  static void
*************** fmtint(int64 value, char type, int force
*** 869,875 ****
      int            signvalue = 0;
      char        convert[64];
      int            vallen = 0;
!     int            padlen = 0;        /* amount to pad */
      int            zeropad;        /* extra leading zeroes */

      switch (type)
--- 943,949 ----
      int            signvalue = 0;
      char        convert[64];
      int            vallen = 0;
!     int            padlen;            /* amount to pad */
      int            zeropad;        /* extra leading zeroes */

      switch (type)
*************** fmtint(int64 value, char type, int force
*** 917,958 ****

          do
          {
!             convert[vallen++] = cvt[uvalue % base];
              uvalue = uvalue / base;
          } while (uvalue);
      }

      zeropad = Max(0, precision - vallen);

!     adjust_padlen(minlen, vallen + zeropad, leftjust, &padlen);

!     leading_pad(zpad, &signvalue, &padlen, target);

!     while (zeropad-- > 0)
!         dopr_outch('0', target);

!     while (vallen > 0)
!         dopr_outch(convert[--vallen], target);

!     trailing_pad(&padlen, target);
  }

  static void
  fmtchar(int value, int leftjust, int minlen, PrintfTarget *target)
  {
!     int            padlen = 0;        /* amount to pad */

!     adjust_padlen(minlen, 1, leftjust, &padlen);

!     while (padlen > 0)
      {
!         dopr_outch(' ', target);
!         --padlen;
      }

      dopr_outch(value, target);

!     trailing_pad(&padlen, target);
  }

  static void
--- 991,1031 ----

          do
          {
!             convert[sizeof(convert) - (++vallen)] = cvt[uvalue % base];
              uvalue = uvalue / base;
          } while (uvalue);
      }

      zeropad = Max(0, precision - vallen);

!     padlen = compute_padlen(minlen, vallen + zeropad, leftjust);

!     leading_pad(zpad, signvalue, &padlen, target);

!     if (zeropad > 0)
!         dopr_outchmulti('0', zeropad, target);

!     dostr(convert + sizeof(convert) - vallen, vallen, target);

!     trailing_pad(padlen, target);
  }

  static void
  fmtchar(int value, int leftjust, int minlen, PrintfTarget *target)
  {
!     int            padlen;            /* amount to pad */

!     padlen = compute_padlen(minlen, 1, leftjust);

!     if (padlen > 0)
      {
!         dopr_outchmulti(' ', padlen, target);
!         padlen = 0;
      }

      dopr_outch(value, target);

!     trailing_pad(padlen, target);
  }

  static void
*************** fmtfloat(double value, char type, int fo
*** 966,972 ****
      char        fmt[32];
      char        convert[1024];
      int            zeropadlen = 0; /* amount to pad with zeroes */
!     int            padlen = 0;        /* amount to pad with spaces */

      /*
       * We rely on the regular C library's sprintf to do the basic conversion,
--- 1039,1045 ----
      char        fmt[32];
      char        convert[1024];
      int            zeropadlen = 0; /* amount to pad with zeroes */
!     int            padlen;            /* amount to pad with spaces */

      /*
       * We rely on the regular C library's sprintf to do the basic conversion,
*************** fmtfloat(double value, char type, int fo
*** 1006,1014 ****
      if (zeropadlen > 0 && !isdigit((unsigned char) convert[vallen - 1]))
          zeropadlen = 0;

!     adjust_padlen(minlen, vallen + zeropadlen, leftjust, &padlen);

!     leading_pad(zpad, &signvalue, &padlen, target);

      if (zeropadlen > 0)
      {
--- 1079,1087 ----
      if (zeropadlen > 0 && !isdigit((unsigned char) convert[vallen - 1]))
          zeropadlen = 0;

!     padlen = compute_padlen(minlen, vallen + zeropadlen, leftjust);

!     leading_pad(zpad, signvalue, &padlen, target);

      if (zeropadlen > 0)
      {
*************** fmtfloat(double value, char type, int fo
*** 1021,1036 ****
          {
              /* pad after exponent */
              dostr(convert, epos - convert, target);
!             while (zeropadlen-- > 0)
!                 dopr_outch('0', target);
              dostr(epos, vallen - (epos - convert), target);
          }
          else
          {
              /* no exponent, pad after the digits */
              dostr(convert, vallen, target);
!             while (zeropadlen-- > 0)
!                 dopr_outch('0', target);
          }
      }
      else
--- 1094,1109 ----
          {
              /* pad after exponent */
              dostr(convert, epos - convert, target);
!             if (zeropadlen > 0)
!                 dopr_outchmulti('0', zeropadlen, target);
              dostr(epos, vallen - (epos - convert), target);
          }
          else
          {
              /* no exponent, pad after the digits */
              dostr(convert, vallen, target);
!             if (zeropadlen > 0)
!                 dopr_outchmulti('0', zeropadlen, target);
          }
      }
      else
*************** fmtfloat(double value, char type, int fo
*** 1039,1045 ****
          dostr(convert, vallen, target);
      }

!     trailing_pad(&padlen, target);
      return;

  fail:
--- 1112,1118 ----
          dostr(convert, vallen, target);
      }

!     trailing_pad(padlen, target);
      return;

  fail:
*************** fail:
*** 1049,1054 ****
--- 1122,1134 ----
  static void
  dostr(const char *str, int slen, PrintfTarget *target)
  {
+     /* fast path for common case of slen == 1 */
+     if (slen == 1)
+     {
+         dopr_outch(*str, target);
+         return;
+     }
+
      while (slen > 0)
      {
          int            avail;
*************** dopr_outch(int c, PrintfTarget *target)
*** 1092,1097 ****
--- 1172,1213 ----
      *(target->bufptr++) = c;
  }

+ static void
+ dopr_outchmulti(int c, int slen, PrintfTarget *target)
+ {
+     /* fast path for common case of slen == 1 */
+     if (slen == 1)
+     {
+         dopr_outch(c, target);
+         return;
+     }
+
+     while (slen > 0)
+     {
+         int            avail;
+
+         if (target->bufend != NULL)
+             avail = target->bufend - target->bufptr;
+         else
+             avail = slen;
+         if (avail <= 0)
+         {
+             /* buffer full, can we dump to stream? */
+             if (target->stream == NULL)
+             {
+                 target->nchars += slen; /* no, lose the data */
+                 return;
+             }
+             flushbuffer(target);
+             continue;
+         }
+         avail = Min(avail, slen);
+         memset(target->bufptr, c, avail);
+         target->bufptr += avail;
+         slen -= avail;
+     }
+ }
+

  static int
  adjust_sign(int is_negative, int forcesign, int *signvalue)
*************** adjust_sign(int is_negative, int forcesi
*** 1107,1148 ****
  }


! static void
! adjust_padlen(int minlen, int vallen, int leftjust, int *padlen)
  {
!     *padlen = minlen - vallen;
!     if (*padlen < 0)
!         *padlen = 0;
      if (leftjust)
!         *padlen = -(*padlen);
  }


  static void
! leading_pad(int zpad, int *signvalue, int *padlen, PrintfTarget *target)
  {
      if (*padlen > 0 && zpad)
      {
!         if (*signvalue)
          {
!             dopr_outch(*signvalue, target);
              --(*padlen);
!             *signvalue = 0;
          }
!         while (*padlen > 0)
          {
!             dopr_outch(zpad, target);
!             --(*padlen);
          }
      }
!     while (*padlen > (*signvalue != 0))
      {
!         dopr_outch(' ', target);
!         --(*padlen);
      }
!     if (*signvalue)
      {
!         dopr_outch(*signvalue, target);
          if (*padlen > 0)
              --(*padlen);
          else if (*padlen < 0)
--- 1223,1270 ----
  }


! static int
! compute_padlen(int minlen, int vallen, int leftjust)
  {
!     int            padlen;
!
!     padlen = minlen - vallen;
!     if (padlen < 0)
!         padlen = 0;
      if (leftjust)
!         padlen = -padlen;
!     return padlen;
  }


  static void
! leading_pad(int zpad, int signvalue, int *padlen, PrintfTarget *target)
  {
+     int            maxpad;
+
      if (*padlen > 0 && zpad)
      {
!         if (signvalue)
          {
!             dopr_outch(signvalue, target);
              --(*padlen);
!             signvalue = 0;
          }
!         if (*padlen > 0)
          {
!             dopr_outchmulti(zpad, *padlen, target);
!             *padlen = 0;
          }
      }
!     maxpad = (signvalue != 0);
!     if (*padlen > maxpad)
      {
!         dopr_outchmulti(' ', *padlen - maxpad, target);
!         *padlen = maxpad;
      }
!     if (signvalue)
      {
!         dopr_outch(signvalue, target);
          if (*padlen > 0)
              --(*padlen);
          else if (*padlen < 0)
*************** leading_pad(int zpad, int *signvalue, in
*** 1152,1162 ****


  static void
! trailing_pad(int *padlen, PrintfTarget *target)
  {
!     while (*padlen < 0)
!     {
!         dopr_outch(' ', target);
!         ++(*padlen);
!     }
  }
--- 1274,1281 ----


  static void
! trailing_pad(int padlen, PrintfTarget *target)
  {
!     if (padlen < 0)
!         dopr_outchmulti(' ', -padlen, target);
  }
#include "postgres_fe.h"

#include "portability/instr_time.h"

#include "snprintf.c"

int
main(int argc, char **argv)
{
    int count = 0;
    char buffer[1000];
    instr_time    start;
    instr_time    stop;
    double elapsed;
    double elapsed2;
    int i;

    if (argc > 1)
        count = atoi(argv[1]);
    if (count <= 0)
        count = 1000000;

    INSTR_TIME_SET_CURRENT(start);

    for (i = 0; i < count; i++)
    {
        snprintf(buffer, sizeof(buffer),
                 "%d %d", 15, 16);
    }

    INSTR_TIME_SET_CURRENT(stop);
    INSTR_TIME_SUBTRACT(stop, start);
    elapsed = INSTR_TIME_GET_MILLISEC(stop);

    printf("snprintf time = %g ms total, %g ms per iteration\n",
           elapsed, elapsed / count);

    INSTR_TIME_SET_CURRENT(start);

    for (i = 0; i < count; i++)
    {
        pg_snprintf(buffer, sizeof(buffer),
                    "%d %d", 15, 16);
    }

    INSTR_TIME_SET_CURRENT(stop);
    INSTR_TIME_SUBTRACT(stop, start);
    elapsed2 = INSTR_TIME_GET_MILLISEC(stop);

    printf("pg_snprintf time = %g ms total, %g ms per iteration\n",
           elapsed2, elapsed2 / count);
    printf("ratio = %.3f\n", elapsed2 / elapsed);

    return 0;
}

Re: Performance improvements for src/port/snprintf.c

От
Tom Lane
Дата:
I wrote:
> [ snprintf-speedups-1.patch ]

Here's a slightly improved version of that, with two changes:

* Given the current state of the what-about-%m thread, it's no longer
academic how well this performs relative to glibc's version.  I poked
at that and found that a lot of the discrepancy came from glibc using
strchrnul() to find the next format specifier --- apparently, that
function is a *lot* faster than the equivalent manual loop.  So this
version uses that if available.

* I thought of a couple of easy wins for fmtfloat.  We can pass the
precision spec down to the platform's sprintf using "*" notation instead
of converting it to text and back, and that also simplifies matters enough
that we can avoid using an sprintf call to build the simplified format
string.  This seems to get us down to the vicinity of a 10% speed penalty
on microbenchmarks of just float conversion, which is enough to satisfy
me given the other advantages of switching to our own snprintf.

            regards, tom lane

diff --git a/configure b/configure
index 836d68d..dff9f0c 100755
*** a/configure
--- b/configure
*************** fi
*** 15032,15038 ****
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`

! for ac_func in cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll
posix_fallocatepstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink
sync_file_rangeutime utimes wcstombs_l 
  do :
    as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
--- 15032,15038 ----
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`

! for ac_func in cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll
posix_fallocatepstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul symlink
sync_file_rangeutime utimes wcstombs_l 
  do :
    as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.in b/configure.in
index 6e14106..c00bb8f 100644
*** a/configure.in
--- b/configure.in
*************** PGAC_FUNC_WCSTOMBS_L
*** 1535,1541 ****
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`

! AC_CHECK_FUNCS([cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll
posix_fallocatepstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink
sync_file_rangeutime utimes wcstombs_l]) 

  AC_REPLACE_FUNCS(fseeko)
  case $host_os in
--- 1535,1541 ----
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`

! AC_CHECK_FUNCS([cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll
posix_fallocatepstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul symlink
sync_file_rangeutime utimes wcstombs_l]) 

  AC_REPLACE_FUNCS(fseeko)
  case $host_os in
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 827574e..da9cfa7 100644
*** a/src/include/pg_config.h.in
--- b/src/include/pg_config.h.in
***************
*** 519,524 ****
--- 519,527 ----
  /* Define to 1 if you have the <stdlib.h> header file. */
  #undef HAVE_STDLIB_H

+ /* Define to 1 if you have the `strchrnul' function. */
+ #undef HAVE_STRCHRNUL
+
  /* Define to 1 if you have the `strerror' function. */
  #undef HAVE_STRERROR

diff --git a/src/include/pg_config.h.win32 b/src/include/pg_config.h.win32
index 46ce49d..73d7424 100644
*** a/src/include/pg_config.h.win32
--- b/src/include/pg_config.h.win32
***************
*** 390,395 ****
--- 390,398 ----
  /* Define to 1 if you have the <stdlib.h> header file. */
  #define HAVE_STDLIB_H 1

+ /* Define to 1 if you have the `strchrnul' function. */
+ /* #undef HAVE_STRCHRNUL */
+
  /* Define to 1 if you have the `strerror' function. */
  #ifndef HAVE_STRERROR
  #define HAVE_STRERROR 1
diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index 851e2ae..66151c2 100644
*** a/src/port/snprintf.c
--- b/src/port/snprintf.c
*************** flushbuffer(PrintfTarget *target)
*** 295,301 ****
  }


! static void fmtstr(char *value, int leftjust, int minlen, int maxwidth,
         int pointflag, PrintfTarget *target);
  static void fmtptr(void *value, PrintfTarget *target);
  static void fmtint(int64 value, char type, int forcesign,
--- 295,303 ----
  }


! static bool find_arguments(const char *format, va_list args,
!                PrintfArgValue *argvalues);
! static void fmtstr(const char *value, int leftjust, int minlen, int maxwidth,
         int pointflag, PrintfTarget *target);
  static void fmtptr(void *value, PrintfTarget *target);
  static void fmtint(int64 value, char type, int forcesign,
*************** static void fmtfloat(double value, char
*** 307,317 ****
           PrintfTarget *target);
  static void dostr(const char *str, int slen, PrintfTarget *target);
  static void dopr_outch(int c, PrintfTarget *target);
  static int    adjust_sign(int is_negative, int forcesign, int *signvalue);
! static void adjust_padlen(int minlen, int vallen, int leftjust, int *padlen);
! static void leading_pad(int zpad, int *signvalue, int *padlen,
              PrintfTarget *target);
! static void trailing_pad(int *padlen, PrintfTarget *target);


  /*
--- 309,320 ----
           PrintfTarget *target);
  static void dostr(const char *str, int slen, PrintfTarget *target);
  static void dopr_outch(int c, PrintfTarget *target);
+ static void dopr_outchmulti(int c, int slen, PrintfTarget *target);
  static int    adjust_sign(int is_negative, int forcesign, int *signvalue);
! static int    compute_padlen(int minlen, int vallen, int leftjust);
! static void leading_pad(int zpad, int signvalue, int *padlen,
              PrintfTarget *target);
! static void trailing_pad(int padlen, PrintfTarget *target);


  /*
*************** static void trailing_pad(int *padlen, Pr
*** 320,329 ****
  static void
  dopr(PrintfTarget *target, const char *format, va_list args)
  {
!     const char *format_start = format;
      int            ch;
      bool        have_dollar;
-     bool        have_non_dollar;
      bool        have_star;
      bool        afterstar;
      int            accum;
--- 323,331 ----
  static void
  dopr(PrintfTarget *target, const char *format, va_list args)
  {
!     const char *first_pct = NULL;
      int            ch;
      bool        have_dollar;
      bool        have_star;
      bool        afterstar;
      int            accum;
*************** dopr(PrintfTarget *target, const char *f
*** 335,559 ****
      int            precision;
      int            zpad;
      int            forcesign;
-     int            last_dollar;
      int            fmtpos;
      int            cvalue;
      int64        numvalue;
      double        fvalue;
      char       *strvalue;
-     int            i;
-     PrintfArgType argtypes[NL_ARGMAX + 1];
      PrintfArgValue argvalues[NL_ARGMAX + 1];

      /*
!      * Parse the format string to determine whether there are %n$ format
!      * specs, and identify the types and order of the format parameters.
       */
!     have_dollar = have_non_dollar = false;
!     last_dollar = 0;
!     MemSet(argtypes, 0, sizeof(argtypes));

!     while ((ch = *format++) != '\0')
      {
!         if (ch != '%')
!             continue;
!         longflag = longlongflag = pointflag = 0;
!         fmtpos = accum = 0;
!         afterstar = false;
! nextch1:
!         ch = *format++;
!         if (ch == '\0')
!             break;                /* illegal, but we don't complain */
!         switch (ch)
          {
!             case '-':
!             case '+':
!                 goto nextch1;
!             case '0':
!             case '1':
!             case '2':
!             case '3':
!             case '4':
!             case '5':
!             case '6':
!             case '7':
!             case '8':
!             case '9':
!                 accum = accum * 10 + (ch - '0');
!                 goto nextch1;
!             case '.':
!                 pointflag = 1;
!                 accum = 0;
!                 goto nextch1;
!             case '*':
!                 if (afterstar)
!                     have_non_dollar = true; /* multiple stars */
!                 afterstar = true;
!                 accum = 0;
!                 goto nextch1;
!             case '$':
!                 have_dollar = true;
!                 if (accum <= 0 || accum > NL_ARGMAX)
!                     goto bad_format;
!                 if (afterstar)
!                 {
!                     if (argtypes[accum] &&
!                         argtypes[accum] != ATYPE_INT)
!                         goto bad_format;
!                     argtypes[accum] = ATYPE_INT;
!                     last_dollar = Max(last_dollar, accum);
!                     afterstar = false;
!                 }
!                 else
!                     fmtpos = accum;
!                 accum = 0;
!                 goto nextch1;
!             case 'l':
!                 if (longflag)
!                     longlongflag = 1;
!                 else
!                     longflag = 1;
!                 goto nextch1;
!             case 'z':
! #if SIZEOF_SIZE_T == 8
! #ifdef HAVE_LONG_INT_64
!                 longflag = 1;
! #elif defined(HAVE_LONG_LONG_INT_64)
!                 longlongflag = 1;
! #else
! #error "Don't know how to print 64bit integers"
! #endif
  #else
!                 /* assume size_t is same size as int */
  #endif
-                 goto nextch1;
-             case 'h':
-             case '\'':
-                 /* ignore these */
-                 goto nextch1;
-             case 'd':
-             case 'i':
-             case 'o':
-             case 'u':
-             case 'x':
-             case 'X':
-                 if (fmtpos)
-                 {
-                     PrintfArgType atype;

!                     if (longlongflag)
!                         atype = ATYPE_LONGLONG;
!                     else if (longflag)
!                         atype = ATYPE_LONG;
!                     else
!                         atype = ATYPE_INT;
!                     if (argtypes[fmtpos] &&
!                         argtypes[fmtpos] != atype)
!                         goto bad_format;
!                     argtypes[fmtpos] = atype;
!                     last_dollar = Max(last_dollar, fmtpos);
!                 }
!                 else
!                     have_non_dollar = true;
!                 break;
!             case 'c':
!                 if (fmtpos)
!                 {
!                     if (argtypes[fmtpos] &&
!                         argtypes[fmtpos] != ATYPE_INT)
!                         goto bad_format;
!                     argtypes[fmtpos] = ATYPE_INT;
!                     last_dollar = Max(last_dollar, fmtpos);
!                 }
!                 else
!                     have_non_dollar = true;
!                 break;
!             case 's':
!             case 'p':
!                 if (fmtpos)
!                 {
!                     if (argtypes[fmtpos] &&
!                         argtypes[fmtpos] != ATYPE_CHARPTR)
!                         goto bad_format;
!                     argtypes[fmtpos] = ATYPE_CHARPTR;
!                     last_dollar = Max(last_dollar, fmtpos);
!                 }
!                 else
!                     have_non_dollar = true;
!                 break;
!             case 'e':
!             case 'E':
!             case 'f':
!             case 'g':
!             case 'G':
!                 if (fmtpos)
!                 {
!                     if (argtypes[fmtpos] &&
!                         argtypes[fmtpos] != ATYPE_DOUBLE)
!                         goto bad_format;
!                     argtypes[fmtpos] = ATYPE_DOUBLE;
!                     last_dollar = Max(last_dollar, fmtpos);
!                 }
!                 else
!                     have_non_dollar = true;
                  break;
!             case '%':
                  break;
          }

          /*
!          * If we finish the spec with afterstar still set, there's a
!          * non-dollar star in there.
           */
!         if (afterstar)
!             have_non_dollar = true;
!     }
!
!     /* Per spec, you use either all dollar or all not. */
!     if (have_dollar && have_non_dollar)
!         goto bad_format;
!
!     /*
!      * In dollar mode, collect the arguments in physical order.
!      */
!     for (i = 1; i <= last_dollar; i++)
!     {
!         switch (argtypes[i])
!         {
!             case ATYPE_NONE:
!                 goto bad_format;
!             case ATYPE_INT:
!                 argvalues[i].i = va_arg(args, int);
!                 break;
!             case ATYPE_LONG:
!                 argvalues[i].l = va_arg(args, long);
!                 break;
!             case ATYPE_LONGLONG:
!                 argvalues[i].ll = va_arg(args, int64);
!                 break;
!             case ATYPE_DOUBLE:
!                 argvalues[i].d = va_arg(args, double);
!                 break;
!             case ATYPE_CHARPTR:
!                 argvalues[i].cptr = va_arg(args, char *);
!                 break;
!         }
!     }
!
!     /*
!      * At last we can parse the format for real.
!      */
!     format = format_start;
!     while ((ch = *format++) != '\0')
!     {
!         if (target->failed)
!             break;

!         if (ch != '%')
!         {
!             dopr_outch(ch, target);
!             continue;
!         }
          fieldwidth = precision = zpad = leftjust = forcesign = 0;
          longflag = longlongflag = pointflag = 0;
          fmtpos = accum = 0;
--- 337,397 ----
      int            precision;
      int            zpad;
      int            forcesign;
      int            fmtpos;
      int            cvalue;
      int64        numvalue;
      double        fvalue;
      char       *strvalue;
      PrintfArgValue argvalues[NL_ARGMAX + 1];

      /*
!      * Initially, we suppose the format string does not use %n$.  The first
!      * time we come to a conversion spec that has that, we'll call
!      * find_arguments() to check for consistent use of %n$ and fill the
!      * argvalues array with the argument values in the correct order.
       */
!     have_dollar = false;

!     while (*format != '\0')
      {
!         /* Locate next conversion specifier */
!         if (*format != '%')
          {
!             const char *next_pct = format + 1;
!
!             /*
!              * If strchrnul exists (it's a glibc-ism), it's a good bit faster
!              * than the equivalent manual loop.  Note: this doesn't compile
!              * cleanly without -D_GNU_SOURCE, but we normally use that on
!              * glibc platforms.
!              */
! #ifdef HAVE_STRCHRNUL
!             next_pct = strchrnul(next_pct, '%');
  #else
!             while (*next_pct != '\0' && *next_pct != '%')
!                 next_pct++;
  #endif

!             /* Dump literal data we just scanned over */
!             dostr(format, next_pct - format, target);
!             if (target->failed)
                  break;
!
!             if (*next_pct == '\0')
                  break;
+             format = next_pct;
          }

          /*
!          * Remember start of first conversion spec; if we find %n$, then it's
!          * sufficient for find_arguments() to start here, without rescanning
!          * earlier literal text.
           */
!         if (first_pct == NULL)
!             first_pct = format;

!         /* Process conversion spec starting at *format */
!         format++;
          fieldwidth = precision = zpad = leftjust = forcesign = 0;
          longflag = longlongflag = pointflag = 0;
          fmtpos = accum = 0;
*************** nextch2:
*** 597,603 ****
              case '*':
                  if (have_dollar)
                  {
!                     /* process value after reading n$ */
                      afterstar = true;
                  }
                  else
--- 435,445 ----
              case '*':
                  if (have_dollar)
                  {
!                     /*
!                      * We'll process value after reading n$.  Note it's OK to
!                      * assume have_dollar is set correctly, because in a valid
!                      * format string the initial % must have had n$ if * does.
!                      */
                      afterstar = true;
                  }
                  else
*************** nextch2:
*** 628,633 ****
--- 470,483 ----
                  accum = 0;
                  goto nextch2;
              case '$':
+                 /* First dollar sign? */
+                 if (!have_dollar)
+                 {
+                     /* Yup, so examine all conversion specs in format */
+                     if (!find_arguments(first_pct, args, argvalues))
+                         goto bad_format;
+                     have_dollar = true;
+                 }
                  if (afterstar)
                  {
                      /* fetch and process star value */
*************** nextch2:
*** 806,811 ****
--- 656,665 ----
                  dopr_outch('%', target);
                  break;
          }
+
+         /* Check for failure after each conversion spec */
+         if (target->failed)
+             break;
      }

      return;
*************** bad_format:
*** 815,822 ****
      target->failed = true;
  }

  static void
! fmtstr(char *value, int leftjust, int minlen, int maxwidth,
         int pointflag, PrintfTarget *target)
  {
      int            padlen,
--- 669,903 ----
      target->failed = true;
  }

+ /*
+  * find_arguments(): sort out the arguments for a format spec with %n$
+  *
+  * If format is valid, return true and fill argvalues[i] with the value
+  * for the conversion spec that has %i$ or *i$.  Else return false.
+  */
+ static bool
+ find_arguments(const char *format, va_list args,
+                PrintfArgValue *argvalues)
+ {
+     int            ch;
+     bool        afterstar;
+     int            accum;
+     int            longlongflag;
+     int            longflag;
+     int            fmtpos;
+     int            i;
+     int            last_dollar;
+     PrintfArgType argtypes[NL_ARGMAX + 1];
+
+     /* Initialize to "no dollar arguments known" */
+     last_dollar = 0;
+     MemSet(argtypes, 0, sizeof(argtypes));
+
+     /*
+      * This loop must accept the same format strings as the one in dopr().
+      * However, we don't need to analyze them to the same level of detail.
+      *
+      * Since we're only called if there's a dollar-type spec somewhere, we can
+      * fail immediately if we find a non-dollar spec.  Per the C99 standard,
+      * all argument references in the format string must be one or the other.
+      */
+     while (*format != '\0')
+     {
+         /* Locate next conversion specifier */
+         if (*format != '%')
+         {
+             /* Unlike dopr, we can just quit if there's no more specifiers */
+             format = strchr(format + 1, '%');
+             if (format == NULL)
+                 break;
+         }
+
+         /* Process conversion spec starting at *format */
+         format++;
+         longflag = longlongflag = 0;
+         fmtpos = accum = 0;
+         afterstar = false;
+ nextch1:
+         ch = *format++;
+         if (ch == '\0')
+             break;                /* illegal, but we don't complain */
+         switch (ch)
+         {
+             case '-':
+             case '+':
+                 goto nextch1;
+             case '0':
+             case '1':
+             case '2':
+             case '3':
+             case '4':
+             case '5':
+             case '6':
+             case '7':
+             case '8':
+             case '9':
+                 accum = accum * 10 + (ch - '0');
+                 goto nextch1;
+             case '.':
+                 accum = 0;
+                 goto nextch1;
+             case '*':
+                 if (afterstar)
+                     return false;    /* previous star missing dollar */
+                 afterstar = true;
+                 accum = 0;
+                 goto nextch1;
+             case '$':
+                 if (accum <= 0 || accum > NL_ARGMAX)
+                     return false;
+                 if (afterstar)
+                 {
+                     if (argtypes[accum] &&
+                         argtypes[accum] != ATYPE_INT)
+                         return false;
+                     argtypes[accum] = ATYPE_INT;
+                     last_dollar = Max(last_dollar, accum);
+                     afterstar = false;
+                 }
+                 else
+                     fmtpos = accum;
+                 accum = 0;
+                 goto nextch1;
+             case 'l':
+                 if (longflag)
+                     longlongflag = 1;
+                 else
+                     longflag = 1;
+                 goto nextch1;
+             case 'z':
+ #if SIZEOF_SIZE_T == 8
+ #ifdef HAVE_LONG_INT_64
+                 longflag = 1;
+ #elif defined(HAVE_LONG_LONG_INT_64)
+                 longlongflag = 1;
+ #else
+ #error "Don't know how to print 64bit integers"
+ #endif
+ #else
+                 /* assume size_t is same size as int */
+ #endif
+                 goto nextch1;
+             case 'h':
+             case '\'':
+                 /* ignore these */
+                 goto nextch1;
+             case 'd':
+             case 'i':
+             case 'o':
+             case 'u':
+             case 'x':
+             case 'X':
+                 if (fmtpos)
+                 {
+                     PrintfArgType atype;
+
+                     if (longlongflag)
+                         atype = ATYPE_LONGLONG;
+                     else if (longflag)
+                         atype = ATYPE_LONG;
+                     else
+                         atype = ATYPE_INT;
+                     if (argtypes[fmtpos] &&
+                         argtypes[fmtpos] != atype)
+                         return false;
+                     argtypes[fmtpos] = atype;
+                     last_dollar = Max(last_dollar, fmtpos);
+                 }
+                 else
+                     return false;    /* non-dollar conversion spec */
+                 break;
+             case 'c':
+                 if (fmtpos)
+                 {
+                     if (argtypes[fmtpos] &&
+                         argtypes[fmtpos] != ATYPE_INT)
+                         return false;
+                     argtypes[fmtpos] = ATYPE_INT;
+                     last_dollar = Max(last_dollar, fmtpos);
+                 }
+                 else
+                     return false;    /* non-dollar conversion spec */
+                 break;
+             case 's':
+             case 'p':
+                 if (fmtpos)
+                 {
+                     if (argtypes[fmtpos] &&
+                         argtypes[fmtpos] != ATYPE_CHARPTR)
+                         return false;
+                     argtypes[fmtpos] = ATYPE_CHARPTR;
+                     last_dollar = Max(last_dollar, fmtpos);
+                 }
+                 else
+                     return false;    /* non-dollar conversion spec */
+                 break;
+             case 'e':
+             case 'E':
+             case 'f':
+             case 'g':
+             case 'G':
+                 if (fmtpos)
+                 {
+                     if (argtypes[fmtpos] &&
+                         argtypes[fmtpos] != ATYPE_DOUBLE)
+                         return false;
+                     argtypes[fmtpos] = ATYPE_DOUBLE;
+                     last_dollar = Max(last_dollar, fmtpos);
+                 }
+                 else
+                     return false;    /* non-dollar conversion spec */
+                 break;
+             case '%':
+                 break;
+         }
+
+         /*
+          * If we finish the spec with afterstar still set, there's a
+          * non-dollar star in there.
+          */
+         if (afterstar)
+             return false;        /* non-dollar conversion spec */
+     }
+
+     /*
+      * Format appears valid so far, so collect the arguments in physical
+      * order.  (Since we rejected any non-dollar specs that would have
+      * collected arguments, we know that dopr() hasn't collected any yet.)
+      */
+     for (i = 1; i <= last_dollar; i++)
+     {
+         switch (argtypes[i])
+         {
+             case ATYPE_NONE:
+                 return false;
+             case ATYPE_INT:
+                 argvalues[i].i = va_arg(args, int);
+                 break;
+             case ATYPE_LONG:
+                 argvalues[i].l = va_arg(args, long);
+                 break;
+             case ATYPE_LONGLONG:
+                 argvalues[i].ll = va_arg(args, int64);
+                 break;
+             case ATYPE_DOUBLE:
+                 argvalues[i].d = va_arg(args, double);
+                 break;
+             case ATYPE_CHARPTR:
+                 argvalues[i].cptr = va_arg(args, char *);
+                 break;
+         }
+     }
+
+     return true;
+ }
+
  static void
! fmtstr(const char *value, int leftjust, int minlen, int maxwidth,
         int pointflag, PrintfTarget *target)
  {
      int            padlen,
*************** fmtstr(char *value, int leftjust, int mi
*** 831,847 ****
      else
          vallen = strlen(value);

!     adjust_padlen(minlen, vallen, leftjust, &padlen);

!     while (padlen > 0)
      {
!         dopr_outch(' ', target);
!         --padlen;
      }

      dostr(value, vallen, target);

!     trailing_pad(&padlen, target);
  }

  static void
--- 912,928 ----
      else
          vallen = strlen(value);

!     padlen = compute_padlen(minlen, vallen, leftjust);

!     if (padlen > 0)
      {
!         dopr_outchmulti(' ', padlen, target);
!         padlen = 0;
      }

      dostr(value, vallen, target);

!     trailing_pad(padlen, target);
  }

  static void
*************** fmtint(int64 value, char type, int force
*** 869,875 ****
      int            signvalue = 0;
      char        convert[64];
      int            vallen = 0;
!     int            padlen = 0;        /* amount to pad */
      int            zeropad;        /* extra leading zeroes */

      switch (type)
--- 950,956 ----
      int            signvalue = 0;
      char        convert[64];
      int            vallen = 0;
!     int            padlen;            /* amount to pad */
      int            zeropad;        /* extra leading zeroes */

      switch (type)
*************** fmtint(int64 value, char type, int force
*** 917,958 ****

          do
          {
!             convert[vallen++] = cvt[uvalue % base];
              uvalue = uvalue / base;
          } while (uvalue);
      }

      zeropad = Max(0, precision - vallen);

!     adjust_padlen(minlen, vallen + zeropad, leftjust, &padlen);

!     leading_pad(zpad, &signvalue, &padlen, target);

!     while (zeropad-- > 0)
!         dopr_outch('0', target);

!     while (vallen > 0)
!         dopr_outch(convert[--vallen], target);

!     trailing_pad(&padlen, target);
  }

  static void
  fmtchar(int value, int leftjust, int minlen, PrintfTarget *target)
  {
!     int            padlen = 0;        /* amount to pad */

!     adjust_padlen(minlen, 1, leftjust, &padlen);

!     while (padlen > 0)
      {
!         dopr_outch(' ', target);
!         --padlen;
      }

      dopr_outch(value, target);

!     trailing_pad(&padlen, target);
  }

  static void
--- 998,1038 ----

          do
          {
!             convert[sizeof(convert) - (++vallen)] = cvt[uvalue % base];
              uvalue = uvalue / base;
          } while (uvalue);
      }

      zeropad = Max(0, precision - vallen);

!     padlen = compute_padlen(minlen, vallen + zeropad, leftjust);

!     leading_pad(zpad, signvalue, &padlen, target);

!     if (zeropad > 0)
!         dopr_outchmulti('0', zeropad, target);

!     dostr(convert + sizeof(convert) - vallen, vallen, target);

!     trailing_pad(padlen, target);
  }

  static void
  fmtchar(int value, int leftjust, int minlen, PrintfTarget *target)
  {
!     int            padlen;            /* amount to pad */

!     padlen = compute_padlen(minlen, 1, leftjust);

!     if (padlen > 0)
      {
!         dopr_outchmulti(' ', padlen, target);
!         padlen = 0;
      }

      dopr_outch(value, target);

!     trailing_pad(padlen, target);
  }

  static void
*************** fmtfloat(double value, char type, int fo
*** 963,972 ****
      int            signvalue = 0;
      int            prec;
      int            vallen;
!     char        fmt[32];
      char        convert[1024];
      int            zeropadlen = 0; /* amount to pad with zeroes */
!     int            padlen = 0;        /* amount to pad with spaces */

      /*
       * We rely on the regular C library's sprintf to do the basic conversion,
--- 1043,1056 ----
      int            signvalue = 0;
      int            prec;
      int            vallen;
!     char        fmt[8];
      char        convert[1024];
      int            zeropadlen = 0; /* amount to pad with zeroes */
!     int            padlen;            /* amount to pad with spaces */
!
!     /* Handle sign (NaNs have no sign) */
!     if (!isnan(value) && adjust_sign((value < 0), forcesign, &signvalue))
!         value = -value;

      /*
       * We rely on the regular C library's sprintf to do the basic conversion,
*************** fmtfloat(double value, char type, int fo
*** 988,1004 ****

      if (pointflag)
      {
-         if (sprintf(fmt, "%%.%d%c", prec, type) < 0)
-             goto fail;
          zeropadlen = precision - prec;
      }
-     else if (sprintf(fmt, "%%%c", type) < 0)
-         goto fail;
-
-     if (!isnan(value) && adjust_sign((value < 0), forcesign, &signvalue))
-         value = -value;
-
-     vallen = sprintf(convert, fmt, value);
      if (vallen < 0)
          goto fail;

--- 1072,1092 ----

      if (pointflag)
      {
          zeropadlen = precision - prec;
+         fmt[0] = '%';
+         fmt[1] = '.';
+         fmt[2] = '*';
+         fmt[3] = type;
+         fmt[4] = '\0';
+         vallen = sprintf(convert, fmt, prec, value);
+     }
+     else
+     {
+         fmt[0] = '%';
+         fmt[1] = type;
+         fmt[2] = '\0';
+         vallen = sprintf(convert, fmt, value);
      }
      if (vallen < 0)
          goto fail;

*************** fmtfloat(double value, char type, int fo
*** 1006,1014 ****
      if (zeropadlen > 0 && !isdigit((unsigned char) convert[vallen - 1]))
          zeropadlen = 0;

!     adjust_padlen(minlen, vallen + zeropadlen, leftjust, &padlen);

!     leading_pad(zpad, &signvalue, &padlen, target);

      if (zeropadlen > 0)
      {
--- 1094,1102 ----
      if (zeropadlen > 0 && !isdigit((unsigned char) convert[vallen - 1]))
          zeropadlen = 0;

!     padlen = compute_padlen(minlen, vallen + zeropadlen, leftjust);

!     leading_pad(zpad, signvalue, &padlen, target);

      if (zeropadlen > 0)
      {
*************** fmtfloat(double value, char type, int fo
*** 1019,1036 ****
              epos = strrchr(convert, 'E');
          if (epos)
          {
!             /* pad after exponent */
              dostr(convert, epos - convert, target);
!             while (zeropadlen-- > 0)
!                 dopr_outch('0', target);
              dostr(epos, vallen - (epos - convert), target);
          }
          else
          {
              /* no exponent, pad after the digits */
              dostr(convert, vallen, target);
!             while (zeropadlen-- > 0)
!                 dopr_outch('0', target);
          }
      }
      else
--- 1107,1124 ----
              epos = strrchr(convert, 'E');
          if (epos)
          {
!             /* pad before exponent */
              dostr(convert, epos - convert, target);
!             if (zeropadlen > 0)
!                 dopr_outchmulti('0', zeropadlen, target);
              dostr(epos, vallen - (epos - convert), target);
          }
          else
          {
              /* no exponent, pad after the digits */
              dostr(convert, vallen, target);
!             if (zeropadlen > 0)
!                 dopr_outchmulti('0', zeropadlen, target);
          }
      }
      else
*************** fmtfloat(double value, char type, int fo
*** 1039,1045 ****
          dostr(convert, vallen, target);
      }

!     trailing_pad(&padlen, target);
      return;

  fail:
--- 1127,1133 ----
          dostr(convert, vallen, target);
      }

!     trailing_pad(padlen, target);
      return;

  fail:
*************** fail:
*** 1049,1054 ****
--- 1137,1149 ----
  static void
  dostr(const char *str, int slen, PrintfTarget *target)
  {
+     /* fast path for common case of slen == 1 */
+     if (slen == 1)
+     {
+         dopr_outch(*str, target);
+         return;
+     }
+
      while (slen > 0)
      {
          int            avail;
*************** dopr_outch(int c, PrintfTarget *target)
*** 1092,1097 ****
--- 1187,1228 ----
      *(target->bufptr++) = c;
  }

+ static void
+ dopr_outchmulti(int c, int slen, PrintfTarget *target)
+ {
+     /* fast path for common case of slen == 1 */
+     if (slen == 1)
+     {
+         dopr_outch(c, target);
+         return;
+     }
+
+     while (slen > 0)
+     {
+         int            avail;
+
+         if (target->bufend != NULL)
+             avail = target->bufend - target->bufptr;
+         else
+             avail = slen;
+         if (avail <= 0)
+         {
+             /* buffer full, can we dump to stream? */
+             if (target->stream == NULL)
+             {
+                 target->nchars += slen; /* no, lose the data */
+                 return;
+             }
+             flushbuffer(target);
+             continue;
+         }
+         avail = Min(avail, slen);
+         memset(target->bufptr, c, avail);
+         target->bufptr += avail;
+         slen -= avail;
+     }
+ }
+

  static int
  adjust_sign(int is_negative, int forcesign, int *signvalue)
*************** adjust_sign(int is_negative, int forcesi
*** 1107,1148 ****
  }


! static void
! adjust_padlen(int minlen, int vallen, int leftjust, int *padlen)
  {
!     *padlen = minlen - vallen;
!     if (*padlen < 0)
!         *padlen = 0;
      if (leftjust)
!         *padlen = -(*padlen);
  }


  static void
! leading_pad(int zpad, int *signvalue, int *padlen, PrintfTarget *target)
  {
      if (*padlen > 0 && zpad)
      {
!         if (*signvalue)
          {
!             dopr_outch(*signvalue, target);
              --(*padlen);
!             *signvalue = 0;
          }
!         while (*padlen > 0)
          {
!             dopr_outch(zpad, target);
!             --(*padlen);
          }
      }
!     while (*padlen > (*signvalue != 0))
      {
!         dopr_outch(' ', target);
!         --(*padlen);
      }
!     if (*signvalue)
      {
!         dopr_outch(*signvalue, target);
          if (*padlen > 0)
              --(*padlen);
          else if (*padlen < 0)
--- 1238,1285 ----
  }


! static int
! compute_padlen(int minlen, int vallen, int leftjust)
  {
!     int            padlen;
!
!     padlen = minlen - vallen;
!     if (padlen < 0)
!         padlen = 0;
      if (leftjust)
!         padlen = -padlen;
!     return padlen;
  }


  static void
! leading_pad(int zpad, int signvalue, int *padlen, PrintfTarget *target)
  {
+     int            maxpad;
+
      if (*padlen > 0 && zpad)
      {
!         if (signvalue)
          {
!             dopr_outch(signvalue, target);
              --(*padlen);
!             signvalue = 0;
          }
!         if (*padlen > 0)
          {
!             dopr_outchmulti(zpad, *padlen, target);
!             *padlen = 0;
          }
      }
!     maxpad = (signvalue != 0);
!     if (*padlen > maxpad)
      {
!         dopr_outchmulti(' ', *padlen - maxpad, target);
!         *padlen = maxpad;
      }
!     if (signvalue)
      {
!         dopr_outch(signvalue, target);
          if (*padlen > 0)
              --(*padlen);
          else if (*padlen < 0)
*************** leading_pad(int zpad, int *signvalue, in
*** 1152,1162 ****


  static void
! trailing_pad(int *padlen, PrintfTarget *target)
  {
!     while (*padlen < 0)
!     {
!         dopr_outch(' ', target);
!         ++(*padlen);
!     }
  }
--- 1289,1296 ----


  static void
! trailing_pad(int padlen, PrintfTarget *target)
  {
!     if (padlen < 0)
!         dopr_outchmulti(' ', -padlen, target);
  }

Re: Performance improvements for src/port/snprintf.c

От
Alexander Kuzmenkov
Дата:

I benchmarked this, using your testbed and comparing to libc sprintf (Ubuntu GLIBC 2.27-0ubuntu3) and another implementation I know [1], all compiled with gcc 5.4.0 with -O2. I used bigger decimals in one of the formats, but otherwise they are the same as yours. Here is the table of conversion time relative to libc:

format                                 pg      stb
("%2$.*3$f %1$d\n", 42, 123.456, 2)    1.03    -
("%.*g", 15, 123.456)                  1.08    0.31
("%10d", 15)                           0.63    0.52
("%s", "012345678900123456789001234    2.06    6.20
("%d 012345678900123456789001234567    2.03    1.81
("%1$d 0123456789001234567890012345    1.34    -
("%d %d", 845879348, 994502893)        1.97    0.59

Surprisingly, our implementation is twice faster than libc on "%10d". Stb is faster than we are with floats, but it uses its own algorithm for that. It is also faster with decimals, probably because it uses a two-digit lookup table, not one-digit like we do. Unfortunately it doesn't support dollars.

1. https://github.com/nothings/stb/blob/master/stb_sprintf.h

-- 
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Performance improvements for src/port/snprintf.c

От
Tom Lane
Дата:
Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> writes:
> I benchmarked this, using your testbed and comparing to libc sprintf 
> (Ubuntu GLIBC 2.27-0ubuntu3) and another implementation I know [1], all 
> compiled with gcc 5.

Thanks for reviewing!

The cfbot noticed that the recent dlopen patch conflicted with this in
configure.in, so here's a rebased version.  The code itself didn't change.

            regards, tom lane

diff --git a/configure b/configure
index dd77742..5fa9396 100755
*** a/configure
--- b/configure
*************** fi
*** 15060,15066 ****
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`

! for ac_func in cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate
pstatpthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink sync_file_range utime
utimeswcstombs_l 
  do :
    as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
--- 15060,15066 ----
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`

! for ac_func in cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate
pstatpthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul symlink sync_file_range
utimeutimes wcstombs_l 
  do :
    as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.in b/configure.in
index 3ada48b..93e8556 100644
*** a/configure.in
--- b/configure.in
*************** PGAC_FUNC_WCSTOMBS_L
*** 1544,1550 ****
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`

! AC_CHECK_FUNCS([cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll
posix_fallocatepstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink
sync_file_rangeutime utimes wcstombs_l]) 

  AC_REPLACE_FUNCS(fseeko)
  case $host_os in
--- 1544,1550 ----
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`

! AC_CHECK_FUNCS([cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll
posix_fallocatepstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul symlink
sync_file_rangeutime utimes wcstombs_l]) 

  AC_REPLACE_FUNCS(fseeko)
  case $host_os in
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 4094e22..752a547 100644
*** a/src/include/pg_config.h.in
--- b/src/include/pg_config.h.in
***************
*** 531,536 ****
--- 531,539 ----
  /* Define to 1 if you have the <stdlib.h> header file. */
  #undef HAVE_STDLIB_H

+ /* Define to 1 if you have the `strchrnul' function. */
+ #undef HAVE_STRCHRNUL
+
  /* Define to 1 if you have the `strerror' function. */
  #undef HAVE_STRERROR

diff --git a/src/include/pg_config.h.win32 b/src/include/pg_config.h.win32
index 6618b43..ea72c44 100644
*** a/src/include/pg_config.h.win32
--- b/src/include/pg_config.h.win32
***************
*** 402,407 ****
--- 402,410 ----
  /* Define to 1 if you have the <stdlib.h> header file. */
  #define HAVE_STDLIB_H 1

+ /* Define to 1 if you have the `strchrnul' function. */
+ /* #undef HAVE_STRCHRNUL */
+
  /* Define to 1 if you have the `strerror' function. */
  #ifndef HAVE_STRERROR
  #define HAVE_STRERROR 1
diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index 851e2ae..66151c2 100644
*** a/src/port/snprintf.c
--- b/src/port/snprintf.c
*************** flushbuffer(PrintfTarget *target)
*** 295,301 ****
  }


! static void fmtstr(char *value, int leftjust, int minlen, int maxwidth,
         int pointflag, PrintfTarget *target);
  static void fmtptr(void *value, PrintfTarget *target);
  static void fmtint(int64 value, char type, int forcesign,
--- 295,303 ----
  }


! static bool find_arguments(const char *format, va_list args,
!                PrintfArgValue *argvalues);
! static void fmtstr(const char *value, int leftjust, int minlen, int maxwidth,
         int pointflag, PrintfTarget *target);
  static void fmtptr(void *value, PrintfTarget *target);
  static void fmtint(int64 value, char type, int forcesign,
*************** static void fmtfloat(double value, char
*** 307,317 ****
           PrintfTarget *target);
  static void dostr(const char *str, int slen, PrintfTarget *target);
  static void dopr_outch(int c, PrintfTarget *target);
  static int    adjust_sign(int is_negative, int forcesign, int *signvalue);
! static void adjust_padlen(int minlen, int vallen, int leftjust, int *padlen);
! static void leading_pad(int zpad, int *signvalue, int *padlen,
              PrintfTarget *target);
! static void trailing_pad(int *padlen, PrintfTarget *target);


  /*
--- 309,320 ----
           PrintfTarget *target);
  static void dostr(const char *str, int slen, PrintfTarget *target);
  static void dopr_outch(int c, PrintfTarget *target);
+ static void dopr_outchmulti(int c, int slen, PrintfTarget *target);
  static int    adjust_sign(int is_negative, int forcesign, int *signvalue);
! static int    compute_padlen(int minlen, int vallen, int leftjust);
! static void leading_pad(int zpad, int signvalue, int *padlen,
              PrintfTarget *target);
! static void trailing_pad(int padlen, PrintfTarget *target);


  /*
*************** static void trailing_pad(int *padlen, Pr
*** 320,329 ****
  static void
  dopr(PrintfTarget *target, const char *format, va_list args)
  {
!     const char *format_start = format;
      int            ch;
      bool        have_dollar;
-     bool        have_non_dollar;
      bool        have_star;
      bool        afterstar;
      int            accum;
--- 323,331 ----
  static void
  dopr(PrintfTarget *target, const char *format, va_list args)
  {
!     const char *first_pct = NULL;
      int            ch;
      bool        have_dollar;
      bool        have_star;
      bool        afterstar;
      int            accum;
*************** dopr(PrintfTarget *target, const char *f
*** 335,559 ****
      int            precision;
      int            zpad;
      int            forcesign;
-     int            last_dollar;
      int            fmtpos;
      int            cvalue;
      int64        numvalue;
      double        fvalue;
      char       *strvalue;
-     int            i;
-     PrintfArgType argtypes[NL_ARGMAX + 1];
      PrintfArgValue argvalues[NL_ARGMAX + 1];

      /*
!      * Parse the format string to determine whether there are %n$ format
!      * specs, and identify the types and order of the format parameters.
       */
!     have_dollar = have_non_dollar = false;
!     last_dollar = 0;
!     MemSet(argtypes, 0, sizeof(argtypes));

!     while ((ch = *format++) != '\0')
      {
!         if (ch != '%')
!             continue;
!         longflag = longlongflag = pointflag = 0;
!         fmtpos = accum = 0;
!         afterstar = false;
! nextch1:
!         ch = *format++;
!         if (ch == '\0')
!             break;                /* illegal, but we don't complain */
!         switch (ch)
          {
!             case '-':
!             case '+':
!                 goto nextch1;
!             case '0':
!             case '1':
!             case '2':
!             case '3':
!             case '4':
!             case '5':
!             case '6':
!             case '7':
!             case '8':
!             case '9':
!                 accum = accum * 10 + (ch - '0');
!                 goto nextch1;
!             case '.':
!                 pointflag = 1;
!                 accum = 0;
!                 goto nextch1;
!             case '*':
!                 if (afterstar)
!                     have_non_dollar = true; /* multiple stars */
!                 afterstar = true;
!                 accum = 0;
!                 goto nextch1;
!             case '$':
!                 have_dollar = true;
!                 if (accum <= 0 || accum > NL_ARGMAX)
!                     goto bad_format;
!                 if (afterstar)
!                 {
!                     if (argtypes[accum] &&
!                         argtypes[accum] != ATYPE_INT)
!                         goto bad_format;
!                     argtypes[accum] = ATYPE_INT;
!                     last_dollar = Max(last_dollar, accum);
!                     afterstar = false;
!                 }
!                 else
!                     fmtpos = accum;
!                 accum = 0;
!                 goto nextch1;
!             case 'l':
!                 if (longflag)
!                     longlongflag = 1;
!                 else
!                     longflag = 1;
!                 goto nextch1;
!             case 'z':
! #if SIZEOF_SIZE_T == 8
! #ifdef HAVE_LONG_INT_64
!                 longflag = 1;
! #elif defined(HAVE_LONG_LONG_INT_64)
!                 longlongflag = 1;
! #else
! #error "Don't know how to print 64bit integers"
! #endif
  #else
!                 /* assume size_t is same size as int */
  #endif
-                 goto nextch1;
-             case 'h':
-             case '\'':
-                 /* ignore these */
-                 goto nextch1;
-             case 'd':
-             case 'i':
-             case 'o':
-             case 'u':
-             case 'x':
-             case 'X':
-                 if (fmtpos)
-                 {
-                     PrintfArgType atype;

!                     if (longlongflag)
!                         atype = ATYPE_LONGLONG;
!                     else if (longflag)
!                         atype = ATYPE_LONG;
!                     else
!                         atype = ATYPE_INT;
!                     if (argtypes[fmtpos] &&
!                         argtypes[fmtpos] != atype)
!                         goto bad_format;
!                     argtypes[fmtpos] = atype;
!                     last_dollar = Max(last_dollar, fmtpos);
!                 }
!                 else
!                     have_non_dollar = true;
!                 break;
!             case 'c':
!                 if (fmtpos)
!                 {
!                     if (argtypes[fmtpos] &&
!                         argtypes[fmtpos] != ATYPE_INT)
!                         goto bad_format;
!                     argtypes[fmtpos] = ATYPE_INT;
!                     last_dollar = Max(last_dollar, fmtpos);
!                 }
!                 else
!                     have_non_dollar = true;
!                 break;
!             case 's':
!             case 'p':
!                 if (fmtpos)
!                 {
!                     if (argtypes[fmtpos] &&
!                         argtypes[fmtpos] != ATYPE_CHARPTR)
!                         goto bad_format;
!                     argtypes[fmtpos] = ATYPE_CHARPTR;
!                     last_dollar = Max(last_dollar, fmtpos);
!                 }
!                 else
!                     have_non_dollar = true;
!                 break;
!             case 'e':
!             case 'E':
!             case 'f':
!             case 'g':
!             case 'G':
!                 if (fmtpos)
!                 {
!                     if (argtypes[fmtpos] &&
!                         argtypes[fmtpos] != ATYPE_DOUBLE)
!                         goto bad_format;
!                     argtypes[fmtpos] = ATYPE_DOUBLE;
!                     last_dollar = Max(last_dollar, fmtpos);
!                 }
!                 else
!                     have_non_dollar = true;
                  break;
!             case '%':
                  break;
          }

          /*
!          * If we finish the spec with afterstar still set, there's a
!          * non-dollar star in there.
           */
!         if (afterstar)
!             have_non_dollar = true;
!     }
!
!     /* Per spec, you use either all dollar or all not. */
!     if (have_dollar && have_non_dollar)
!         goto bad_format;
!
!     /*
!      * In dollar mode, collect the arguments in physical order.
!      */
!     for (i = 1; i <= last_dollar; i++)
!     {
!         switch (argtypes[i])
!         {
!             case ATYPE_NONE:
!                 goto bad_format;
!             case ATYPE_INT:
!                 argvalues[i].i = va_arg(args, int);
!                 break;
!             case ATYPE_LONG:
!                 argvalues[i].l = va_arg(args, long);
!                 break;
!             case ATYPE_LONGLONG:
!                 argvalues[i].ll = va_arg(args, int64);
!                 break;
!             case ATYPE_DOUBLE:
!                 argvalues[i].d = va_arg(args, double);
!                 break;
!             case ATYPE_CHARPTR:
!                 argvalues[i].cptr = va_arg(args, char *);
!                 break;
!         }
!     }
!
!     /*
!      * At last we can parse the format for real.
!      */
!     format = format_start;
!     while ((ch = *format++) != '\0')
!     {
!         if (target->failed)
!             break;

!         if (ch != '%')
!         {
!             dopr_outch(ch, target);
!             continue;
!         }
          fieldwidth = precision = zpad = leftjust = forcesign = 0;
          longflag = longlongflag = pointflag = 0;
          fmtpos = accum = 0;
--- 337,397 ----
      int            precision;
      int            zpad;
      int            forcesign;
      int            fmtpos;
      int            cvalue;
      int64        numvalue;
      double        fvalue;
      char       *strvalue;
      PrintfArgValue argvalues[NL_ARGMAX + 1];

      /*
!      * Initially, we suppose the format string does not use %n$.  The first
!      * time we come to a conversion spec that has that, we'll call
!      * find_arguments() to check for consistent use of %n$ and fill the
!      * argvalues array with the argument values in the correct order.
       */
!     have_dollar = false;

!     while (*format != '\0')
      {
!         /* Locate next conversion specifier */
!         if (*format != '%')
          {
!             const char *next_pct = format + 1;
!
!             /*
!              * If strchrnul exists (it's a glibc-ism), it's a good bit faster
!              * than the equivalent manual loop.  Note: this doesn't compile
!              * cleanly without -D_GNU_SOURCE, but we normally use that on
!              * glibc platforms.
!              */
! #ifdef HAVE_STRCHRNUL
!             next_pct = strchrnul(next_pct, '%');
  #else
!             while (*next_pct != '\0' && *next_pct != '%')
!                 next_pct++;
  #endif

!             /* Dump literal data we just scanned over */
!             dostr(format, next_pct - format, target);
!             if (target->failed)
                  break;
!
!             if (*next_pct == '\0')
                  break;
+             format = next_pct;
          }

          /*
!          * Remember start of first conversion spec; if we find %n$, then it's
!          * sufficient for find_arguments() to start here, without rescanning
!          * earlier literal text.
           */
!         if (first_pct == NULL)
!             first_pct = format;

!         /* Process conversion spec starting at *format */
!         format++;
          fieldwidth = precision = zpad = leftjust = forcesign = 0;
          longflag = longlongflag = pointflag = 0;
          fmtpos = accum = 0;
*************** nextch2:
*** 597,603 ****
              case '*':
                  if (have_dollar)
                  {
!                     /* process value after reading n$ */
                      afterstar = true;
                  }
                  else
--- 435,445 ----
              case '*':
                  if (have_dollar)
                  {
!                     /*
!                      * We'll process value after reading n$.  Note it's OK to
!                      * assume have_dollar is set correctly, because in a valid
!                      * format string the initial % must have had n$ if * does.
!                      */
                      afterstar = true;
                  }
                  else
*************** nextch2:
*** 628,633 ****
--- 470,483 ----
                  accum = 0;
                  goto nextch2;
              case '$':
+                 /* First dollar sign? */
+                 if (!have_dollar)
+                 {
+                     /* Yup, so examine all conversion specs in format */
+                     if (!find_arguments(first_pct, args, argvalues))
+                         goto bad_format;
+                     have_dollar = true;
+                 }
                  if (afterstar)
                  {
                      /* fetch and process star value */
*************** nextch2:
*** 806,811 ****
--- 656,665 ----
                  dopr_outch('%', target);
                  break;
          }
+
+         /* Check for failure after each conversion spec */
+         if (target->failed)
+             break;
      }

      return;
*************** bad_format:
*** 815,822 ****
      target->failed = true;
  }

  static void
! fmtstr(char *value, int leftjust, int minlen, int maxwidth,
         int pointflag, PrintfTarget *target)
  {
      int            padlen,
--- 669,903 ----
      target->failed = true;
  }

+ /*
+  * find_arguments(): sort out the arguments for a format spec with %n$
+  *
+  * If format is valid, return true and fill argvalues[i] with the value
+  * for the conversion spec that has %i$ or *i$.  Else return false.
+  */
+ static bool
+ find_arguments(const char *format, va_list args,
+                PrintfArgValue *argvalues)
+ {
+     int            ch;
+     bool        afterstar;
+     int            accum;
+     int            longlongflag;
+     int            longflag;
+     int            fmtpos;
+     int            i;
+     int            last_dollar;
+     PrintfArgType argtypes[NL_ARGMAX + 1];
+
+     /* Initialize to "no dollar arguments known" */
+     last_dollar = 0;
+     MemSet(argtypes, 0, sizeof(argtypes));
+
+     /*
+      * This loop must accept the same format strings as the one in dopr().
+      * However, we don't need to analyze them to the same level of detail.
+      *
+      * Since we're only called if there's a dollar-type spec somewhere, we can
+      * fail immediately if we find a non-dollar spec.  Per the C99 standard,
+      * all argument references in the format string must be one or the other.
+      */
+     while (*format != '\0')
+     {
+         /* Locate next conversion specifier */
+         if (*format != '%')
+         {
+             /* Unlike dopr, we can just quit if there's no more specifiers */
+             format = strchr(format + 1, '%');
+             if (format == NULL)
+                 break;
+         }
+
+         /* Process conversion spec starting at *format */
+         format++;
+         longflag = longlongflag = 0;
+         fmtpos = accum = 0;
+         afterstar = false;
+ nextch1:
+         ch = *format++;
+         if (ch == '\0')
+             break;                /* illegal, but we don't complain */
+         switch (ch)
+         {
+             case '-':
+             case '+':
+                 goto nextch1;
+             case '0':
+             case '1':
+             case '2':
+             case '3':
+             case '4':
+             case '5':
+             case '6':
+             case '7':
+             case '8':
+             case '9':
+                 accum = accum * 10 + (ch - '0');
+                 goto nextch1;
+             case '.':
+                 accum = 0;
+                 goto nextch1;
+             case '*':
+                 if (afterstar)
+                     return false;    /* previous star missing dollar */
+                 afterstar = true;
+                 accum = 0;
+                 goto nextch1;
+             case '$':
+                 if (accum <= 0 || accum > NL_ARGMAX)
+                     return false;
+                 if (afterstar)
+                 {
+                     if (argtypes[accum] &&
+                         argtypes[accum] != ATYPE_INT)
+                         return false;
+                     argtypes[accum] = ATYPE_INT;
+                     last_dollar = Max(last_dollar, accum);
+                     afterstar = false;
+                 }
+                 else
+                     fmtpos = accum;
+                 accum = 0;
+                 goto nextch1;
+             case 'l':
+                 if (longflag)
+                     longlongflag = 1;
+                 else
+                     longflag = 1;
+                 goto nextch1;
+             case 'z':
+ #if SIZEOF_SIZE_T == 8
+ #ifdef HAVE_LONG_INT_64
+                 longflag = 1;
+ #elif defined(HAVE_LONG_LONG_INT_64)
+                 longlongflag = 1;
+ #else
+ #error "Don't know how to print 64bit integers"
+ #endif
+ #else
+                 /* assume size_t is same size as int */
+ #endif
+                 goto nextch1;
+             case 'h':
+             case '\'':
+                 /* ignore these */
+                 goto nextch1;
+             case 'd':
+             case 'i':
+             case 'o':
+             case 'u':
+             case 'x':
+             case 'X':
+                 if (fmtpos)
+                 {
+                     PrintfArgType atype;
+
+                     if (longlongflag)
+                         atype = ATYPE_LONGLONG;
+                     else if (longflag)
+                         atype = ATYPE_LONG;
+                     else
+                         atype = ATYPE_INT;
+                     if (argtypes[fmtpos] &&
+                         argtypes[fmtpos] != atype)
+                         return false;
+                     argtypes[fmtpos] = atype;
+                     last_dollar = Max(last_dollar, fmtpos);
+                 }
+                 else
+                     return false;    /* non-dollar conversion spec */
+                 break;
+             case 'c':
+                 if (fmtpos)
+                 {
+                     if (argtypes[fmtpos] &&
+                         argtypes[fmtpos] != ATYPE_INT)
+                         return false;
+                     argtypes[fmtpos] = ATYPE_INT;
+                     last_dollar = Max(last_dollar, fmtpos);
+                 }
+                 else
+                     return false;    /* non-dollar conversion spec */
+                 break;
+             case 's':
+             case 'p':
+                 if (fmtpos)
+                 {
+                     if (argtypes[fmtpos] &&
+                         argtypes[fmtpos] != ATYPE_CHARPTR)
+                         return false;
+                     argtypes[fmtpos] = ATYPE_CHARPTR;
+                     last_dollar = Max(last_dollar, fmtpos);
+                 }
+                 else
+                     return false;    /* non-dollar conversion spec */
+                 break;
+             case 'e':
+             case 'E':
+             case 'f':
+             case 'g':
+             case 'G':
+                 if (fmtpos)
+                 {
+                     if (argtypes[fmtpos] &&
+                         argtypes[fmtpos] != ATYPE_DOUBLE)
+                         return false;
+                     argtypes[fmtpos] = ATYPE_DOUBLE;
+                     last_dollar = Max(last_dollar, fmtpos);
+                 }
+                 else
+                     return false;    /* non-dollar conversion spec */
+                 break;
+             case '%':
+                 break;
+         }
+
+         /*
+          * If we finish the spec with afterstar still set, there's a
+          * non-dollar star in there.
+          */
+         if (afterstar)
+             return false;        /* non-dollar conversion spec */
+     }
+
+     /*
+      * Format appears valid so far, so collect the arguments in physical
+      * order.  (Since we rejected any non-dollar specs that would have
+      * collected arguments, we know that dopr() hasn't collected any yet.)
+      */
+     for (i = 1; i <= last_dollar; i++)
+     {
+         switch (argtypes[i])
+         {
+             case ATYPE_NONE:
+                 return false;
+             case ATYPE_INT:
+                 argvalues[i].i = va_arg(args, int);
+                 break;
+             case ATYPE_LONG:
+                 argvalues[i].l = va_arg(args, long);
+                 break;
+             case ATYPE_LONGLONG:
+                 argvalues[i].ll = va_arg(args, int64);
+                 break;
+             case ATYPE_DOUBLE:
+                 argvalues[i].d = va_arg(args, double);
+                 break;
+             case ATYPE_CHARPTR:
+                 argvalues[i].cptr = va_arg(args, char *);
+                 break;
+         }
+     }
+
+     return true;
+ }
+
  static void
! fmtstr(const char *value, int leftjust, int minlen, int maxwidth,
         int pointflag, PrintfTarget *target)
  {
      int            padlen,
*************** fmtstr(char *value, int leftjust, int mi
*** 831,847 ****
      else
          vallen = strlen(value);

!     adjust_padlen(minlen, vallen, leftjust, &padlen);

!     while (padlen > 0)
      {
!         dopr_outch(' ', target);
!         --padlen;
      }

      dostr(value, vallen, target);

!     trailing_pad(&padlen, target);
  }

  static void
--- 912,928 ----
      else
          vallen = strlen(value);

!     padlen = compute_padlen(minlen, vallen, leftjust);

!     if (padlen > 0)
      {
!         dopr_outchmulti(' ', padlen, target);
!         padlen = 0;
      }

      dostr(value, vallen, target);

!     trailing_pad(padlen, target);
  }

  static void
*************** fmtint(int64 value, char type, int force
*** 869,875 ****
      int            signvalue = 0;
      char        convert[64];
      int            vallen = 0;
!     int            padlen = 0;        /* amount to pad */
      int            zeropad;        /* extra leading zeroes */

      switch (type)
--- 950,956 ----
      int            signvalue = 0;
      char        convert[64];
      int            vallen = 0;
!     int            padlen;            /* amount to pad */
      int            zeropad;        /* extra leading zeroes */

      switch (type)
*************** fmtint(int64 value, char type, int force
*** 917,958 ****

          do
          {
!             convert[vallen++] = cvt[uvalue % base];
              uvalue = uvalue / base;
          } while (uvalue);
      }

      zeropad = Max(0, precision - vallen);

!     adjust_padlen(minlen, vallen + zeropad, leftjust, &padlen);

!     leading_pad(zpad, &signvalue, &padlen, target);

!     while (zeropad-- > 0)
!         dopr_outch('0', target);

!     while (vallen > 0)
!         dopr_outch(convert[--vallen], target);

!     trailing_pad(&padlen, target);
  }

  static void
  fmtchar(int value, int leftjust, int minlen, PrintfTarget *target)
  {
!     int            padlen = 0;        /* amount to pad */

!     adjust_padlen(minlen, 1, leftjust, &padlen);

!     while (padlen > 0)
      {
!         dopr_outch(' ', target);
!         --padlen;
      }

      dopr_outch(value, target);

!     trailing_pad(&padlen, target);
  }

  static void
--- 998,1038 ----

          do
          {
!             convert[sizeof(convert) - (++vallen)] = cvt[uvalue % base];
              uvalue = uvalue / base;
          } while (uvalue);
      }

      zeropad = Max(0, precision - vallen);

!     padlen = compute_padlen(minlen, vallen + zeropad, leftjust);

!     leading_pad(zpad, signvalue, &padlen, target);

!     if (zeropad > 0)
!         dopr_outchmulti('0', zeropad, target);

!     dostr(convert + sizeof(convert) - vallen, vallen, target);

!     trailing_pad(padlen, target);
  }

  static void
  fmtchar(int value, int leftjust, int minlen, PrintfTarget *target)
  {
!     int            padlen;            /* amount to pad */

!     padlen = compute_padlen(minlen, 1, leftjust);

!     if (padlen > 0)
      {
!         dopr_outchmulti(' ', padlen, target);
!         padlen = 0;
      }

      dopr_outch(value, target);

!     trailing_pad(padlen, target);
  }

  static void
*************** fmtfloat(double value, char type, int fo
*** 963,972 ****
      int            signvalue = 0;
      int            prec;
      int            vallen;
!     char        fmt[32];
      char        convert[1024];
      int            zeropadlen = 0; /* amount to pad with zeroes */
!     int            padlen = 0;        /* amount to pad with spaces */

      /*
       * We rely on the regular C library's sprintf to do the basic conversion,
--- 1043,1056 ----
      int            signvalue = 0;
      int            prec;
      int            vallen;
!     char        fmt[8];
      char        convert[1024];
      int            zeropadlen = 0; /* amount to pad with zeroes */
!     int            padlen;            /* amount to pad with spaces */
!
!     /* Handle sign (NaNs have no sign) */
!     if (!isnan(value) && adjust_sign((value < 0), forcesign, &signvalue))
!         value = -value;

      /*
       * We rely on the regular C library's sprintf to do the basic conversion,
*************** fmtfloat(double value, char type, int fo
*** 988,1004 ****

      if (pointflag)
      {
-         if (sprintf(fmt, "%%.%d%c", prec, type) < 0)
-             goto fail;
          zeropadlen = precision - prec;
      }
-     else if (sprintf(fmt, "%%%c", type) < 0)
-         goto fail;
-
-     if (!isnan(value) && adjust_sign((value < 0), forcesign, &signvalue))
-         value = -value;
-
-     vallen = sprintf(convert, fmt, value);
      if (vallen < 0)
          goto fail;

--- 1072,1092 ----

      if (pointflag)
      {
          zeropadlen = precision - prec;
+         fmt[0] = '%';
+         fmt[1] = '.';
+         fmt[2] = '*';
+         fmt[3] = type;
+         fmt[4] = '\0';
+         vallen = sprintf(convert, fmt, prec, value);
+     }
+     else
+     {
+         fmt[0] = '%';
+         fmt[1] = type;
+         fmt[2] = '\0';
+         vallen = sprintf(convert, fmt, value);
      }
      if (vallen < 0)
          goto fail;

*************** fmtfloat(double value, char type, int fo
*** 1006,1014 ****
      if (zeropadlen > 0 && !isdigit((unsigned char) convert[vallen - 1]))
          zeropadlen = 0;

!     adjust_padlen(minlen, vallen + zeropadlen, leftjust, &padlen);

!     leading_pad(zpad, &signvalue, &padlen, target);

      if (zeropadlen > 0)
      {
--- 1094,1102 ----
      if (zeropadlen > 0 && !isdigit((unsigned char) convert[vallen - 1]))
          zeropadlen = 0;

!     padlen = compute_padlen(minlen, vallen + zeropadlen, leftjust);

!     leading_pad(zpad, signvalue, &padlen, target);

      if (zeropadlen > 0)
      {
*************** fmtfloat(double value, char type, int fo
*** 1019,1036 ****
              epos = strrchr(convert, 'E');
          if (epos)
          {
!             /* pad after exponent */
              dostr(convert, epos - convert, target);
!             while (zeropadlen-- > 0)
!                 dopr_outch('0', target);
              dostr(epos, vallen - (epos - convert), target);
          }
          else
          {
              /* no exponent, pad after the digits */
              dostr(convert, vallen, target);
!             while (zeropadlen-- > 0)
!                 dopr_outch('0', target);
          }
      }
      else
--- 1107,1124 ----
              epos = strrchr(convert, 'E');
          if (epos)
          {
!             /* pad before exponent */
              dostr(convert, epos - convert, target);
!             if (zeropadlen > 0)
!                 dopr_outchmulti('0', zeropadlen, target);
              dostr(epos, vallen - (epos - convert), target);
          }
          else
          {
              /* no exponent, pad after the digits */
              dostr(convert, vallen, target);
!             if (zeropadlen > 0)
!                 dopr_outchmulti('0', zeropadlen, target);
          }
      }
      else
*************** fmtfloat(double value, char type, int fo
*** 1039,1045 ****
          dostr(convert, vallen, target);
      }

!     trailing_pad(&padlen, target);
      return;

  fail:
--- 1127,1133 ----
          dostr(convert, vallen, target);
      }

!     trailing_pad(padlen, target);
      return;

  fail:
*************** fail:
*** 1049,1054 ****
--- 1137,1149 ----
  static void
  dostr(const char *str, int slen, PrintfTarget *target)
  {
+     /* fast path for common case of slen == 1 */
+     if (slen == 1)
+     {
+         dopr_outch(*str, target);
+         return;
+     }
+
      while (slen > 0)
      {
          int            avail;
*************** dopr_outch(int c, PrintfTarget *target)
*** 1092,1097 ****
--- 1187,1228 ----
      *(target->bufptr++) = c;
  }

+ static void
+ dopr_outchmulti(int c, int slen, PrintfTarget *target)
+ {
+     /* fast path for common case of slen == 1 */
+     if (slen == 1)
+     {
+         dopr_outch(c, target);
+         return;
+     }
+
+     while (slen > 0)
+     {
+         int            avail;
+
+         if (target->bufend != NULL)
+             avail = target->bufend - target->bufptr;
+         else
+             avail = slen;
+         if (avail <= 0)
+         {
+             /* buffer full, can we dump to stream? */
+             if (target->stream == NULL)
+             {
+                 target->nchars += slen; /* no, lose the data */
+                 return;
+             }
+             flushbuffer(target);
+             continue;
+         }
+         avail = Min(avail, slen);
+         memset(target->bufptr, c, avail);
+         target->bufptr += avail;
+         slen -= avail;
+     }
+ }
+

  static int
  adjust_sign(int is_negative, int forcesign, int *signvalue)
*************** adjust_sign(int is_negative, int forcesi
*** 1107,1148 ****
  }


! static void
! adjust_padlen(int minlen, int vallen, int leftjust, int *padlen)
  {
!     *padlen = minlen - vallen;
!     if (*padlen < 0)
!         *padlen = 0;
      if (leftjust)
!         *padlen = -(*padlen);
  }


  static void
! leading_pad(int zpad, int *signvalue, int *padlen, PrintfTarget *target)
  {
      if (*padlen > 0 && zpad)
      {
!         if (*signvalue)
          {
!             dopr_outch(*signvalue, target);
              --(*padlen);
!             *signvalue = 0;
          }
!         while (*padlen > 0)
          {
!             dopr_outch(zpad, target);
!             --(*padlen);
          }
      }
!     while (*padlen > (*signvalue != 0))
      {
!         dopr_outch(' ', target);
!         --(*padlen);
      }
!     if (*signvalue)
      {
!         dopr_outch(*signvalue, target);
          if (*padlen > 0)
              --(*padlen);
          else if (*padlen < 0)
--- 1238,1285 ----
  }


! static int
! compute_padlen(int minlen, int vallen, int leftjust)
  {
!     int            padlen;
!
!     padlen = minlen - vallen;
!     if (padlen < 0)
!         padlen = 0;
      if (leftjust)
!         padlen = -padlen;
!     return padlen;
  }


  static void
! leading_pad(int zpad, int signvalue, int *padlen, PrintfTarget *target)
  {
+     int            maxpad;
+
      if (*padlen > 0 && zpad)
      {
!         if (signvalue)
          {
!             dopr_outch(signvalue, target);
              --(*padlen);
!             signvalue = 0;
          }
!         if (*padlen > 0)
          {
!             dopr_outchmulti(zpad, *padlen, target);
!             *padlen = 0;
          }
      }
!     maxpad = (signvalue != 0);
!     if (*padlen > maxpad)
      {
!         dopr_outchmulti(' ', *padlen - maxpad, target);
!         *padlen = maxpad;
      }
!     if (signvalue)
      {
!         dopr_outch(signvalue, target);
          if (*padlen > 0)
              --(*padlen);
          else if (*padlen < 0)
*************** leading_pad(int zpad, int *signvalue, in
*** 1152,1162 ****


  static void
! trailing_pad(int *padlen, PrintfTarget *target)
  {
!     while (*padlen < 0)
!     {
!         dopr_outch(' ', target);
!         ++(*padlen);
!     }
  }
--- 1289,1296 ----


  static void
! trailing_pad(int padlen, PrintfTarget *target)
  {
!     if (padlen < 0)
!         dopr_outchmulti(' ', -padlen, target);
  }

Re: Performance improvements for src/port/snprintf.c

От
Andres Freund
Дата:
On 2018-09-12 14:14:15 -0400, Tom Lane wrote:
> Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> writes:
> > I benchmarked this, using your testbed and comparing to libc sprintf
> > (Ubuntu GLIBC 2.27-0ubuntu3) and another implementation I know [1], all
> > compiled with gcc 5.
>
> Thanks for reviewing!
>
> The cfbot noticed that the recent dlopen patch conflicted with this in
> configure.in, so here's a rebased version.  The code itself didn't change.

Conflicts again, but not too hard to resolve.

The mini benchmark from http://archives.postgresql.org/message-id/20180926174645.nsyj77lx2mvtz4kx%40alap3.anarazel.de
is significantly improved by this patch.

> 96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63:
>
> COPY somefloats TO '/dev/null';
> COPY 10000000
> Time: 24575.770 ms (00:24.576)
>
> 96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63^:
>
> COPY somefloats TO '/dev/null';
> COPY 10000000
> Time: 12877.037 ms (00:12.877)

This patch:

postgres[32704][1]=# ;SELECT pg_prewarm('somefloats');COPY somefloats TO '/dev/null';
Time: 0.269 ms
┌────────────┐
│ pg_prewarm │
├────────────┤
│      73530 │
└────────────┘
(1 row)

Time: 34.983 ms
COPY 10000000
Time: 15511.478 ms (00:15.511)


The profile from 96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63^ is:
+   38.15%  postgres  libc-2.27.so      [.] __GI___printf_fp_l
+   13.98%  postgres  libc-2.27.so      [.] hack_digit
+    7.54%  postgres  libc-2.27.so      [.] __mpn_mul_1
+    7.32%  postgres  postgres          [.] CopyOneRowTo
+    6.12%  postgres  libc-2.27.so      [.] vfprintf
+    3.14%  postgres  libc-2.27.so      [.] __strlen_avx2
+    1.97%  postgres  postgres          [.] heap_deform_tuple
+    1.77%  postgres  postgres          [.] AllocSetAlloc
+    1.43%  postgres  postgres          [.] psprintf
+    1.25%  postgres  libc-2.27.so      [.] _IO_str_init_static_internal
+    1.09%  postgres  libc-2.27.so      [.] _IO_vsnprintf
+    1.09%  postgres  postgres          [.] appendBinaryStringInfo

The profile of master with this patch is:

+   32.38%  postgres  libc-2.27.so      [.] __GI___printf_fp_l
+   11.08%  postgres  libc-2.27.so      [.] hack_digit
+    9.55%  postgres  postgres          [.] CopyOneRowTo
+    6.24%  postgres  libc-2.27.so      [.] __mpn_mul_1
+    5.01%  postgres  libc-2.27.so      [.] vfprintf
+    4.91%  postgres  postgres          [.] dopr.constprop.4
+    3.53%  postgres  libc-2.27.so      [.] __strlen_avx2
+    1.55%  postgres  libc-2.27.so      [.] __strchrnul_avx2
+    1.49%  postgres  libc-2.27.so      [.] __memmove_avx_unaligned_erms
+    1.35%  postgres  postgres          [.] AllocSetAlloc
+    1.32%  postgres  libc-2.27.so      [.] _IO_str_init_static_internal
+    1.30%  postgres  postgres          [.] FunctionCall1Coll
+    1.27%  postgres  postgres          [.] psprintf
+    1.16%  postgres  postgres          [.] appendBinaryStringInfo
+    1.16%  postgres  libc-2.27.so      [.] _IO_old_init
+    1.06%  postgres  postgres          [.] heap_deform_tuple
+    1.02%  postgres  libc-2.27.so      [.] sprintf
+    1.02%  postgres  libc-2.27.so      [.] _IO_vsprintf

(all functions above 1%)


I assume this partially is just the additional layers of function calls
(psprintf, pvsnprintf, pg_vsnprintf, dopr) that are now done, in
addition to pretty much the same work as before (i.e. sprintf("%.*f")).

- Andres


Re: Performance improvements for src/port/snprintf.c

От
Andres Freund
Дата:
On 2018-09-26 15:04:20 -0700, Andres Freund wrote:
> On 2018-09-12 14:14:15 -0400, Tom Lane wrote:
> > Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> writes:
> > > I benchmarked this, using your testbed and comparing to libc sprintf
> > > (Ubuntu GLIBC 2.27-0ubuntu3) and another implementation I know [1], all
> > > compiled with gcc 5.
> >
> > Thanks for reviewing!
> >
> > The cfbot noticed that the recent dlopen patch conflicted with this in
> > configure.in, so here's a rebased version.  The code itself didn't change.
> 
> Conflicts again, but not too hard to resolve.
> 
> The mini benchmark from
http://archives.postgresql.org/message-id/20180926174645.nsyj77lx2mvtz4kx%40alap3.anarazel.de
> is significantly improved by this patch.
> 
> > 96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63:
> >
> > COPY somefloats TO '/dev/null';
> > COPY 10000000
> > Time: 24575.770 ms (00:24.576)
> >
> > 96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63^:
> >
> > COPY somefloats TO '/dev/null';
> > COPY 10000000
> > Time: 12877.037 ms (00:12.877)
> 
> This patch:
> 
> postgres[32704][1]=# ;SELECT pg_prewarm('somefloats');COPY somefloats TO '/dev/null';
> Time: 0.269 ms
> ┌────────────┐
> │ pg_prewarm │
> ├────────────┤
> │      73530 │
> └────────────┘
> (1 row)
> 
> Time: 34.983 ms
> COPY 10000000
> Time: 15511.478 ms (00:15.511)
> 
> 
> The profile from 96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63^ is:
> +   38.15%  postgres  libc-2.27.so      [.] __GI___printf_fp_l
> +   13.98%  postgres  libc-2.27.so      [.] hack_digit
> +    7.54%  postgres  libc-2.27.so      [.] __mpn_mul_1
> +    7.32%  postgres  postgres          [.] CopyOneRowTo
> +    6.12%  postgres  libc-2.27.so      [.] vfprintf
> +    3.14%  postgres  libc-2.27.so      [.] __strlen_avx2
> +    1.97%  postgres  postgres          [.] heap_deform_tuple
> +    1.77%  postgres  postgres          [.] AllocSetAlloc
> +    1.43%  postgres  postgres          [.] psprintf
> +    1.25%  postgres  libc-2.27.so      [.] _IO_str_init_static_internal
> +    1.09%  postgres  libc-2.27.so      [.] _IO_vsnprintf
> +    1.09%  postgres  postgres          [.] appendBinaryStringInfo
> 
> The profile of master with this patch is:
> 
> +   32.38%  postgres  libc-2.27.so      [.] __GI___printf_fp_l
> +   11.08%  postgres  libc-2.27.so      [.] hack_digit
> +    9.55%  postgres  postgres          [.] CopyOneRowTo
> +    6.24%  postgres  libc-2.27.so      [.] __mpn_mul_1
> +    5.01%  postgres  libc-2.27.so      [.] vfprintf
> +    4.91%  postgres  postgres          [.] dopr.constprop.4
> +    3.53%  postgres  libc-2.27.so      [.] __strlen_avx2
> +    1.55%  postgres  libc-2.27.so      [.] __strchrnul_avx2
> +    1.49%  postgres  libc-2.27.so      [.] __memmove_avx_unaligned_erms
> +    1.35%  postgres  postgres          [.] AllocSetAlloc
> +    1.32%  postgres  libc-2.27.so      [.] _IO_str_init_static_internal
> +    1.30%  postgres  postgres          [.] FunctionCall1Coll
> +    1.27%  postgres  postgres          [.] psprintf
> +    1.16%  postgres  postgres          [.] appendBinaryStringInfo
> +    1.16%  postgres  libc-2.27.so      [.] _IO_old_init
> +    1.06%  postgres  postgres          [.] heap_deform_tuple
> +    1.02%  postgres  libc-2.27.so      [.] sprintf
> +    1.02%  postgres  libc-2.27.so      [.] _IO_vsprintf
> 
> (all functions above 1%)
> 
> 
> I assume this partially is just the additional layers of function calls
> (psprintf, pvsnprintf, pg_vsnprintf, dopr) that are now done, in
> addition to pretty much the same work as before (i.e. sprintf("%.*f")).

I'm *NOT* proposing that as the actual solution, but as a datapoint, it
might be interesting that hardcoding the precision and thus allowing use
ofusing strfromd() instead of sprintf yields a *better* runtime than
master.

Time: 10255.134 ms (00:10.255)

Greetings,

Andres Freund


Re: Performance improvements for src/port/snprintf.c

От
Andres Freund
Дата:
On 2018-08-17 14:32:59 -0400, Tom Lane wrote:
> I've been looking into the possible performance consequences of that,
> in particular comparing snprintf.c to the library versions on macOS,
> FreeBSD, OpenBSD, and NetBSD.  While it held up well in simpler cases,
> I noted that it was significantly slower on long format strings, which
> I traced to two separate problems:

> Perhaps there's a way to improve that
> without writing our own floating-point conversion code, but I'm not
> seeing an easy way offhand.  I don't think that's a showstopper though.
> This code is now faster than the native code for very many other cases,
> so on average it should cause no real performance problem.

I kinda wonder if we shouldn't replace the non pg_* functions in
snprintf.c with a more modern copy of a compatibly licensed libc. Looks
to me like our implementation has forked off some BSD a fair while ago.

There seems to be a few choices. Among others:
- freebsd libc:
  https://github.com/freebsd/freebsd/blob/master/lib/libc/stdio/vfprintf.c
  (floating point stuff is elsewhere)
- musl libc:
  https://git.musl-libc.org/cgit/musl/tree/src/stdio/vfprintf.c
- stb (as Alexander referenced earlier)
  https://github.com/nothings/stb/blob/master/stb_sprintf.h

I've not benchmarked any of these. Just by looking at the code, the musl
one looks by far the most compact - looks like all the relevant code is
in the one file referenced.

Greetings,

Andres Freund


Re: Performance improvements for src/port/snprintf.c

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2018-09-26 15:04:20 -0700, Andres Freund wrote:
>> I assume this partially is just the additional layers of function calls
>> (psprintf, pvsnprintf, pg_vsnprintf, dopr) that are now done, in
>> addition to pretty much the same work as before (i.e. sprintf("%.*f")).

No, there are no additional layers that weren't there before ---
snprintf.c's snprintf() slots in directly where the platform's did before.

Well, ok, dopr() wasn't there before, but I trust you're not claiming
that glibc's implementation of snprintf() is totally flat either.

I think it's just that snprintf.c is a bit slower in this case.  If you
look at glibc's implementation, they've expended a heck of a lot of code
and sweat on it.  The only reason we could hope to beat it is that we're
prepared to throw out some functionality, like LC_NUMERIC handling.

> I'm *NOT* proposing that as the actual solution, but as a datapoint, it
> might be interesting that hardcoding the precision and thus allowing use
> ofusing strfromd() instead of sprintf yields a *better* runtime than
> master.

Interesting.  strfromd() is a glibc-ism, and a fairly recent one at
that (my RHEL6 box doesn't seem to have it).  But we could use it where
available.  And it doesn't seem unreasonable to have a fast path for
the specific precision value(s) that float4/8out will actually use.

            regards, tom lane


Re: Performance improvements for src/port/snprintf.c

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> I kinda wonder if we shouldn't replace the non pg_* functions in
> snprintf.c with a more modern copy of a compatibly licensed libc. Looks
> to me like our implementation has forked off some BSD a fair while ago.

Maybe, but the benchmarking I was doing last month didn't convince me
that the *BSD versions were remarkably fast.  There are a lot of cases
where our version is faster.

            regards, tom lane


Re: Performance improvements for src/port/snprintf.c

От
Andres Freund
Дата:
On 2018-09-26 19:45:07 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2018-09-26 15:04:20 -0700, Andres Freund wrote:
> >> I assume this partially is just the additional layers of function calls
> >> (psprintf, pvsnprintf, pg_vsnprintf, dopr) that are now done, in
> >> addition to pretty much the same work as before (i.e. sprintf("%.*f")).
> 
> No, there are no additional layers that weren't there before ---
> snprintf.c's snprintf() slots in directly where the platform's did before.

Hm? What I mean is that we can't realistically be faster with the
current architecture, because for floating point we end up doing glibc
sprintf() in either case.  And after the unconditional replacement,
we're doing a bunch of *additional* work (at the very least we're
parsing the format string twice).

> Well, ok, dopr() wasn't there before, but I trust you're not claiming
> that glibc's implementation of snprintf() is totally flat either.

I don't even think it's all that fast...


> > I'm *NOT* proposing that as the actual solution, but as a datapoint, it
> > might be interesting that hardcoding the precision and thus allowing use
> > ofusing strfromd() instead of sprintf yields a *better* runtime than
> > master.
> 
> Interesting.  strfromd() is a glibc-ism, and a fairly recent one at
> that (my RHEL6 box doesn't seem to have it).  But we could use it where
> available.  And it doesn't seem unreasonable to have a fast path for
> the specific precision value(s) that float4/8out will actually use.

It's C99 afaict.  What I did for my quick hack is to just hack the
precision as characters into the format that dopr() uses...

Greetings,

Andres Freund


Re: Performance improvements for src/port/snprintf.c

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2018-09-26 19:45:07 -0400, Tom Lane wrote:
>> No, there are no additional layers that weren't there before ---
>> snprintf.c's snprintf() slots in directly where the platform's did before.

> Hm? What I mean is that we can't realistically be faster with the
> current architecture, because for floating point we end up doing glibc
> sprintf() in either case.

Oh, you mean specifically for the float conversion case.  I still say
that I will *not* accept judging this code solely on the float case.
The string and integer cases are at least as important if not more so.

>> Interesting.  strfromd() is a glibc-ism, and a fairly recent one at
>> that (my RHEL6 box doesn't seem to have it).

> It's C99 afaict.

It's not in POSIX 2008, and I don't see it in my admittedly-draft
copy of C99 either.  But that's not real relevant -- I don't see
much reason not to use it if we want a quick and dirty answer for
the platforms that have it.

If we had more ambition, we might consider stealing the float
conversion logic out of the "stb" library that Alexander pointed
to upthread.  It says it's public domain, so there's no license
impediment to borrowing some code ...

            regards, tom lane


Re: Performance improvements for src/port/snprintf.c

От
Andres Freund
Дата:
On 2018-09-26 20:25:44 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2018-09-26 19:45:07 -0400, Tom Lane wrote:
> >> No, there are no additional layers that weren't there before ---
> >> snprintf.c's snprintf() slots in directly where the platform's did before.
> 
> > Hm? What I mean is that we can't realistically be faster with the
> > current architecture, because for floating point we end up doing glibc
> > sprintf() in either case.
> 
> Oh, you mean specifically for the float conversion case.  I still say
> that I will *not* accept judging this code solely on the float case.

Oh, it should definitely not be judged solely based on floating point,
we agree.


> The string and integer cases are at least as important if not more so.

I think the integer stuff has become a *little* bit less important,
because we converted the hot cases over to pg_lto etc.


> >> Interesting.  strfromd() is a glibc-ism, and a fairly recent one at
> >> that (my RHEL6 box doesn't seem to have it).
> 
> > It's C99 afaict.
> 
> It's not in POSIX 2008, and I don't see it in my admittedly-draft
> copy of C99 either.  But that's not real relevant -- I don't see
> much reason not to use it if we want a quick and dirty answer for
> the platforms that have it.

Right, I really just wanted some more baseline numbers.


> If we had more ambition, we might consider stealing the float
> conversion logic out of the "stb" library that Alexander pointed
> to upthread.  It says it's public domain, so there's no license
> impediment to borrowing some code ...

Yea, I started to play around with doing so with musl, but based on
early my benchmarks it's not fast enough to bother.  I've not integrated
it into our code, but instead printed two floating point numbers with
your test:

musl 5000000 iterations:
snprintf time = 3144.46 ms total, 0.000628892 ms per iteration
pg_snprintf time = 4215.1 ms total, 0.00084302 ms per iteration
ratio = 1.340

glibc 5000000 iterations:
snprintf time = 1680.82 ms total, 0.000336165 ms per iteration
pg_snprintf time = 2629.46 ms total, 0.000525892 ms per iteration
ratio = 1.564

So there's pretty clearly no point in even considering starting from
musl.

Greetings,

Andres Freund


Re: Performance improvements for src/port/snprintf.c

От
Andres Freund
Дата:
On 2018-09-26 17:40:22 -0700, Andres Freund wrote:
> On 2018-09-26 20:25:44 -0400, Tom Lane wrote:
> > Andres Freund <andres@anarazel.de> writes:
> > > On 2018-09-26 19:45:07 -0400, Tom Lane wrote:
> > >> No, there are no additional layers that weren't there before ---
> > >> snprintf.c's snprintf() slots in directly where the platform's did before.
> > 
> > > Hm? What I mean is that we can't realistically be faster with the
> > > current architecture, because for floating point we end up doing glibc
> > > sprintf() in either case.
> > 
> > Oh, you mean specifically for the float conversion case.  I still say
> > that I will *not* accept judging this code solely on the float case.
> 
> Oh, it should definitely not be judged solely based on floating point,
> we agree.
> 
> 
> > The string and integer cases are at least as important if not more so.
> 
> I think the integer stuff has become a *little* bit less important,
> because we converted the hot cases over to pg_lto etc.
> 
> 
> > >> Interesting.  strfromd() is a glibc-ism, and a fairly recent one at
> > >> that (my RHEL6 box doesn't seem to have it).
> > 
> > > It's C99 afaict.
> > 
> > It's not in POSIX 2008, and I don't see it in my admittedly-draft
> > copy of C99 either.  But that's not real relevant -- I don't see
> > much reason not to use it if we want a quick and dirty answer for
> > the platforms that have it.
> 
> Right, I really just wanted some more baseline numbers.
> 
> 
> > If we had more ambition, we might consider stealing the float
> > conversion logic out of the "stb" library that Alexander pointed
> > to upthread.  It says it's public domain, so there's no license
> > impediment to borrowing some code ...
> 
> Yea, I started to play around with doing so with musl, but based on
> early my benchmarks it's not fast enough to bother.  I've not integrated
> it into our code, but instead printed two floating point numbers with
> your test:
> 
> musl 5000000 iterations:
> snprintf time = 3144.46 ms total, 0.000628892 ms per iteration
> pg_snprintf time = 4215.1 ms total, 0.00084302 ms per iteration
> ratio = 1.340
> 
> glibc 5000000 iterations:
> snprintf time = 1680.82 ms total, 0.000336165 ms per iteration
> pg_snprintf time = 2629.46 ms total, 0.000525892 ms per iteration
> ratio = 1.564
> 
> So there's pretty clearly no point in even considering starting from
> musl.

Hm, stb's results just for floating point isn't bad. The above numbers
were for %f %f. But as the minimal usage would be about the internal
usage of dopr(), here's comparing %.*f:

snprintf time = 1324.87 ms total, 0.000264975 ms per iteration
pg time = 1434.57 ms total, 0.000286915 ms per iteration
stbsp time = 552.14 ms total, 0.000110428 ms per iteration

Greetings,

Andres Freund


Re: Performance improvements for src/port/snprintf.c

От
Andres Freund
Дата:
Hi,

On 2018-09-26 17:57:05 -0700, Andres Freund wrote:
> snprintf time = 1324.87 ms total, 0.000264975 ms per iteration
> pg time = 1434.57 ms total, 0.000286915 ms per iteration
> stbsp time = 552.14 ms total, 0.000110428 ms per iteration

Reading around the interwebz lead me to look at ryu

https://dl.acm.org/citation.cfm?id=3192369
https://github.com/ulfjack/ryu/tree/46f4c5572121a6f1428749fe3e24132c3626c946

That's an algorithm that always generates the minimally sized
roundtrip-safe string output for a floating point number. That makes it
insuitable for the innards of printf, but it very well could be
interesting for e.g. float8out, especially when we currently specify a
"too high" precision to guarantee round-trip safeity.

Greetings,

Andres Freund


Re: Performance improvements for src/port/snprintf.c

От
Tom Lane
Дата:
Here's a rebased version of <15785.1536776055@sss.pgh.pa.us>.

I think we should try to get this reviewed and committed before
we worry more about the float business.  It would be silly to
not be benchmarking any bigger changes against the low-hanging
fruit here.

            regards, tom lane

diff --git a/configure b/configure
index 6414ec1..0448c6b 100755
*** a/configure
--- b/configure
*************** fi
*** 15100,15106 ****
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`

! for ac_func in cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate
ppollpstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink sync_file_range utime
utimeswcstombs_l 
  do :
    as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
--- 15100,15106 ----
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`

! for ac_func in cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate
ppollpstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul symlink
sync_file_rangeutime utimes wcstombs_l 
  do :
    as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.in b/configure.in
index 158d5a1..23b5bb8 100644
*** a/configure.in
--- b/configure.in
*************** PGAC_FUNC_WCSTOMBS_L
*** 1571,1577 ****
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`

! AC_CHECK_FUNCS([cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll
posix_fallocateppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink
sync_file_rangeutime utimes wcstombs_l]) 

  AC_REPLACE_FUNCS(fseeko)
  case $host_os in
--- 1571,1577 ----
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`

! AC_CHECK_FUNCS([cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll
posix_fallocateppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul
symlinksync_file_range utime utimes wcstombs_l]) 

  AC_REPLACE_FUNCS(fseeko)
  case $host_os in
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 90dda8e..7894caa 100644
*** a/src/include/pg_config.h.in
--- b/src/include/pg_config.h.in
***************
*** 523,528 ****
--- 523,531 ----
  /* Define to 1 if you have the <stdlib.h> header file. */
  #undef HAVE_STDLIB_H

+ /* Define to 1 if you have the `strchrnul' function. */
+ #undef HAVE_STRCHRNUL
+
  /* Define to 1 if you have the `strerror_r' function. */
  #undef HAVE_STRERROR_R

diff --git a/src/include/pg_config.h.win32 b/src/include/pg_config.h.win32
index 93bb773..f7a051d 100644
*** a/src/include/pg_config.h.win32
--- b/src/include/pg_config.h.win32
***************
*** 394,399 ****
--- 394,402 ----
  /* Define to 1 if you have the <stdlib.h> header file. */
  #define HAVE_STDLIB_H 1

+ /* Define to 1 if you have the `strchrnul' function. */
+ /* #undef HAVE_STRCHRNUL */
+
  /* Define to 1 if you have the `strerror_r' function. */
  /* #undef HAVE_STRERROR_R */

diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index 2c77eec..1469878 100644
*** a/src/port/snprintf.c
--- b/src/port/snprintf.c
*************** flushbuffer(PrintfTarget *target)
*** 310,316 ****
  }


! static void fmtstr(char *value, int leftjust, int minlen, int maxwidth,
         int pointflag, PrintfTarget *target);
  static void fmtptr(void *value, PrintfTarget *target);
  static void fmtint(int64 value, char type, int forcesign,
--- 310,318 ----
  }


! static bool find_arguments(const char *format, va_list args,
!                PrintfArgValue *argvalues);
! static void fmtstr(const char *value, int leftjust, int minlen, int maxwidth,
         int pointflag, PrintfTarget *target);
  static void fmtptr(void *value, PrintfTarget *target);
  static void fmtint(int64 value, char type, int forcesign,
*************** static void fmtfloat(double value, char
*** 322,332 ****
           PrintfTarget *target);
  static void dostr(const char *str, int slen, PrintfTarget *target);
  static void dopr_outch(int c, PrintfTarget *target);
  static int    adjust_sign(int is_negative, int forcesign, int *signvalue);
! static void adjust_padlen(int minlen, int vallen, int leftjust, int *padlen);
! static void leading_pad(int zpad, int *signvalue, int *padlen,
              PrintfTarget *target);
! static void trailing_pad(int *padlen, PrintfTarget *target);


  /*
--- 324,335 ----
           PrintfTarget *target);
  static void dostr(const char *str, int slen, PrintfTarget *target);
  static void dopr_outch(int c, PrintfTarget *target);
+ static void dopr_outchmulti(int c, int slen, PrintfTarget *target);
  static int    adjust_sign(int is_negative, int forcesign, int *signvalue);
! static int    compute_padlen(int minlen, int vallen, int leftjust);
! static void leading_pad(int zpad, int signvalue, int *padlen,
              PrintfTarget *target);
! static void trailing_pad(int padlen, PrintfTarget *target);


  /*
*************** static void
*** 336,345 ****
  dopr(PrintfTarget *target, const char *format, va_list args)
  {
      int            save_errno = errno;
!     const char *format_start = format;
      int            ch;
      bool        have_dollar;
-     bool        have_non_dollar;
      bool        have_star;
      bool        afterstar;
      int            accum;
--- 339,347 ----
  dopr(PrintfTarget *target, const char *format, va_list args)
  {
      int            save_errno = errno;
!     const char *first_pct = NULL;
      int            ch;
      bool        have_dollar;
      bool        have_star;
      bool        afterstar;
      int            accum;
*************** dopr(PrintfTarget *target, const char *f
*** 351,576 ****
      int            precision;
      int            zpad;
      int            forcesign;
-     int            last_dollar;
      int            fmtpos;
      int            cvalue;
      int64        numvalue;
      double        fvalue;
      char       *strvalue;
-     int            i;
-     PrintfArgType argtypes[NL_ARGMAX + 1];
      PrintfArgValue argvalues[NL_ARGMAX + 1];

      /*
!      * Parse the format string to determine whether there are %n$ format
!      * specs, and identify the types and order of the format parameters.
       */
!     have_dollar = have_non_dollar = false;
!     last_dollar = 0;
!     MemSet(argtypes, 0, sizeof(argtypes));

!     while ((ch = *format++) != '\0')
      {
!         if (ch != '%')
!             continue;
!         longflag = longlongflag = pointflag = 0;
!         fmtpos = accum = 0;
!         afterstar = false;
! nextch1:
!         ch = *format++;
!         if (ch == '\0')
!             break;                /* illegal, but we don't complain */
!         switch (ch)
          {
!             case '-':
!             case '+':
!                 goto nextch1;
!             case '0':
!             case '1':
!             case '2':
!             case '3':
!             case '4':
!             case '5':
!             case '6':
!             case '7':
!             case '8':
!             case '9':
!                 accum = accum * 10 + (ch - '0');
!                 goto nextch1;
!             case '.':
!                 pointflag = 1;
!                 accum = 0;
!                 goto nextch1;
!             case '*':
!                 if (afterstar)
!                     have_non_dollar = true; /* multiple stars */
!                 afterstar = true;
!                 accum = 0;
!                 goto nextch1;
!             case '$':
!                 have_dollar = true;
!                 if (accum <= 0 || accum > NL_ARGMAX)
!                     goto bad_format;
!                 if (afterstar)
!                 {
!                     if (argtypes[accum] &&
!                         argtypes[accum] != ATYPE_INT)
!                         goto bad_format;
!                     argtypes[accum] = ATYPE_INT;
!                     last_dollar = Max(last_dollar, accum);
!                     afterstar = false;
!                 }
!                 else
!                     fmtpos = accum;
!                 accum = 0;
!                 goto nextch1;
!             case 'l':
!                 if (longflag)
!                     longlongflag = 1;
!                 else
!                     longflag = 1;
!                 goto nextch1;
!             case 'z':
! #if SIZEOF_SIZE_T == 8
! #ifdef HAVE_LONG_INT_64
!                 longflag = 1;
! #elif defined(HAVE_LONG_LONG_INT_64)
!                 longlongflag = 1;
! #else
! #error "Don't know how to print 64bit integers"
! #endif
  #else
!                 /* assume size_t is same size as int */
  #endif
-                 goto nextch1;
-             case 'h':
-             case '\'':
-                 /* ignore these */
-                 goto nextch1;
-             case 'd':
-             case 'i':
-             case 'o':
-             case 'u':
-             case 'x':
-             case 'X':
-                 if (fmtpos)
-                 {
-                     PrintfArgType atype;

!                     if (longlongflag)
!                         atype = ATYPE_LONGLONG;
!                     else if (longflag)
!                         atype = ATYPE_LONG;
!                     else
!                         atype = ATYPE_INT;
!                     if (argtypes[fmtpos] &&
!                         argtypes[fmtpos] != atype)
!                         goto bad_format;
!                     argtypes[fmtpos] = atype;
!                     last_dollar = Max(last_dollar, fmtpos);
!                 }
!                 else
!                     have_non_dollar = true;
!                 break;
!             case 'c':
!                 if (fmtpos)
!                 {
!                     if (argtypes[fmtpos] &&
!                         argtypes[fmtpos] != ATYPE_INT)
!                         goto bad_format;
!                     argtypes[fmtpos] = ATYPE_INT;
!                     last_dollar = Max(last_dollar, fmtpos);
!                 }
!                 else
!                     have_non_dollar = true;
!                 break;
!             case 's':
!             case 'p':
!                 if (fmtpos)
!                 {
!                     if (argtypes[fmtpos] &&
!                         argtypes[fmtpos] != ATYPE_CHARPTR)
!                         goto bad_format;
!                     argtypes[fmtpos] = ATYPE_CHARPTR;
!                     last_dollar = Max(last_dollar, fmtpos);
!                 }
!                 else
!                     have_non_dollar = true;
!                 break;
!             case 'e':
!             case 'E':
!             case 'f':
!             case 'g':
!             case 'G':
!                 if (fmtpos)
!                 {
!                     if (argtypes[fmtpos] &&
!                         argtypes[fmtpos] != ATYPE_DOUBLE)
!                         goto bad_format;
!                     argtypes[fmtpos] = ATYPE_DOUBLE;
!                     last_dollar = Max(last_dollar, fmtpos);
!                 }
!                 else
!                     have_non_dollar = true;
                  break;
!             case 'm':
!             case '%':
                  break;
          }

          /*
!          * If we finish the spec with afterstar still set, there's a
!          * non-dollar star in there.
           */
!         if (afterstar)
!             have_non_dollar = true;
!     }
!
!     /* Per spec, you use either all dollar or all not. */
!     if (have_dollar && have_non_dollar)
!         goto bad_format;
!
!     /*
!      * In dollar mode, collect the arguments in physical order.
!      */
!     for (i = 1; i <= last_dollar; i++)
!     {
!         switch (argtypes[i])
!         {
!             case ATYPE_NONE:
!                 goto bad_format;
!             case ATYPE_INT:
!                 argvalues[i].i = va_arg(args, int);
!                 break;
!             case ATYPE_LONG:
!                 argvalues[i].l = va_arg(args, long);
!                 break;
!             case ATYPE_LONGLONG:
!                 argvalues[i].ll = va_arg(args, int64);
!                 break;
!             case ATYPE_DOUBLE:
!                 argvalues[i].d = va_arg(args, double);
!                 break;
!             case ATYPE_CHARPTR:
!                 argvalues[i].cptr = va_arg(args, char *);
!                 break;
!         }
!     }
!
!     /*
!      * At last we can parse the format for real.
!      */
!     format = format_start;
!     while ((ch = *format++) != '\0')
!     {
!         if (target->failed)
!             break;

!         if (ch != '%')
!         {
!             dopr_outch(ch, target);
!             continue;
!         }
          fieldwidth = precision = zpad = leftjust = forcesign = 0;
          longflag = longlongflag = pointflag = 0;
          fmtpos = accum = 0;
--- 353,413 ----
      int            precision;
      int            zpad;
      int            forcesign;
      int            fmtpos;
      int            cvalue;
      int64        numvalue;
      double        fvalue;
      char       *strvalue;
      PrintfArgValue argvalues[NL_ARGMAX + 1];

      /*
!      * Initially, we suppose the format string does not use %n$.  The first
!      * time we come to a conversion spec that has that, we'll call
!      * find_arguments() to check for consistent use of %n$ and fill the
!      * argvalues array with the argument values in the correct order.
       */
!     have_dollar = false;

!     while (*format != '\0')
      {
!         /* Locate next conversion specifier */
!         if (*format != '%')
          {
!             const char *next_pct = format + 1;
!
!             /*
!              * If strchrnul exists (it's a glibc-ism), it's a good bit faster
!              * than the equivalent manual loop.  Note: this doesn't compile
!              * cleanly without -D_GNU_SOURCE, but we normally use that on
!              * glibc platforms.
!              */
! #ifdef HAVE_STRCHRNUL
!             next_pct = strchrnul(next_pct, '%');
  #else
!             while (*next_pct != '\0' && *next_pct != '%')
!                 next_pct++;
  #endif

!             /* Dump literal data we just scanned over */
!             dostr(format, next_pct - format, target);
!             if (target->failed)
                  break;
!
!             if (*next_pct == '\0')
                  break;
+             format = next_pct;
          }

          /*
!          * Remember start of first conversion spec; if we find %n$, then it's
!          * sufficient for find_arguments() to start here, without rescanning
!          * earlier literal text.
           */
!         if (first_pct == NULL)
!             first_pct = format;

!         /* Process conversion spec starting at *format */
!         format++;
          fieldwidth = precision = zpad = leftjust = forcesign = 0;
          longflag = longlongflag = pointflag = 0;
          fmtpos = accum = 0;
*************** nextch2:
*** 614,620 ****
              case '*':
                  if (have_dollar)
                  {
!                     /* process value after reading n$ */
                      afterstar = true;
                  }
                  else
--- 451,461 ----
              case '*':
                  if (have_dollar)
                  {
!                     /*
!                      * We'll process value after reading n$.  Note it's OK to
!                      * assume have_dollar is set correctly, because in a valid
!                      * format string the initial % must have had n$ if * does.
!                      */
                      afterstar = true;
                  }
                  else
*************** nextch2:
*** 645,650 ****
--- 486,499 ----
                  accum = 0;
                  goto nextch2;
              case '$':
+                 /* First dollar sign? */
+                 if (!have_dollar)
+                 {
+                     /* Yup, so examine all conversion specs in format */
+                     if (!find_arguments(first_pct, args, argvalues))
+                         goto bad_format;
+                     have_dollar = true;
+                 }
                  if (afterstar)
                  {
                      /* fetch and process star value */
*************** nextch2:
*** 832,837 ****
--- 681,690 ----
                  dopr_outch('%', target);
                  break;
          }
+
+         /* Check for failure after each conversion spec */
+         if (target->failed)
+             break;
      }

      return;
*************** bad_format:
*** 841,848 ****
      target->failed = true;
  }

  static void
! fmtstr(char *value, int leftjust, int minlen, int maxwidth,
         int pointflag, PrintfTarget *target)
  {
      int            padlen,
--- 694,929 ----
      target->failed = true;
  }

+ /*
+  * find_arguments(): sort out the arguments for a format spec with %n$
+  *
+  * If format is valid, return true and fill argvalues[i] with the value
+  * for the conversion spec that has %i$ or *i$.  Else return false.
+  */
+ static bool
+ find_arguments(const char *format, va_list args,
+                PrintfArgValue *argvalues)
+ {
+     int            ch;
+     bool        afterstar;
+     int            accum;
+     int            longlongflag;
+     int            longflag;
+     int            fmtpos;
+     int            i;
+     int            last_dollar;
+     PrintfArgType argtypes[NL_ARGMAX + 1];
+
+     /* Initialize to "no dollar arguments known" */
+     last_dollar = 0;
+     MemSet(argtypes, 0, sizeof(argtypes));
+
+     /*
+      * This loop must accept the same format strings as the one in dopr().
+      * However, we don't need to analyze them to the same level of detail.
+      *
+      * Since we're only called if there's a dollar-type spec somewhere, we can
+      * fail immediately if we find a non-dollar spec.  Per the C99 standard,
+      * all argument references in the format string must be one or the other.
+      */
+     while (*format != '\0')
+     {
+         /* Locate next conversion specifier */
+         if (*format != '%')
+         {
+             /* Unlike dopr, we can just quit if there's no more specifiers */
+             format = strchr(format + 1, '%');
+             if (format == NULL)
+                 break;
+         }
+
+         /* Process conversion spec starting at *format */
+         format++;
+         longflag = longlongflag = 0;
+         fmtpos = accum = 0;
+         afterstar = false;
+ nextch1:
+         ch = *format++;
+         if (ch == '\0')
+             break;                /* illegal, but we don't complain */
+         switch (ch)
+         {
+             case '-':
+             case '+':
+                 goto nextch1;
+             case '0':
+             case '1':
+             case '2':
+             case '3':
+             case '4':
+             case '5':
+             case '6':
+             case '7':
+             case '8':
+             case '9':
+                 accum = accum * 10 + (ch - '0');
+                 goto nextch1;
+             case '.':
+                 accum = 0;
+                 goto nextch1;
+             case '*':
+                 if (afterstar)
+                     return false;    /* previous star missing dollar */
+                 afterstar = true;
+                 accum = 0;
+                 goto nextch1;
+             case '$':
+                 if (accum <= 0 || accum > NL_ARGMAX)
+                     return false;
+                 if (afterstar)
+                 {
+                     if (argtypes[accum] &&
+                         argtypes[accum] != ATYPE_INT)
+                         return false;
+                     argtypes[accum] = ATYPE_INT;
+                     last_dollar = Max(last_dollar, accum);
+                     afterstar = false;
+                 }
+                 else
+                     fmtpos = accum;
+                 accum = 0;
+                 goto nextch1;
+             case 'l':
+                 if (longflag)
+                     longlongflag = 1;
+                 else
+                     longflag = 1;
+                 goto nextch1;
+             case 'z':
+ #if SIZEOF_SIZE_T == 8
+ #ifdef HAVE_LONG_INT_64
+                 longflag = 1;
+ #elif defined(HAVE_LONG_LONG_INT_64)
+                 longlongflag = 1;
+ #else
+ #error "Don't know how to print 64bit integers"
+ #endif
+ #else
+                 /* assume size_t is same size as int */
+ #endif
+                 goto nextch1;
+             case 'h':
+             case '\'':
+                 /* ignore these */
+                 goto nextch1;
+             case 'd':
+             case 'i':
+             case 'o':
+             case 'u':
+             case 'x':
+             case 'X':
+                 if (fmtpos)
+                 {
+                     PrintfArgType atype;
+
+                     if (longlongflag)
+                         atype = ATYPE_LONGLONG;
+                     else if (longflag)
+                         atype = ATYPE_LONG;
+                     else
+                         atype = ATYPE_INT;
+                     if (argtypes[fmtpos] &&
+                         argtypes[fmtpos] != atype)
+                         return false;
+                     argtypes[fmtpos] = atype;
+                     last_dollar = Max(last_dollar, fmtpos);
+                 }
+                 else
+                     return false;    /* non-dollar conversion spec */
+                 break;
+             case 'c':
+                 if (fmtpos)
+                 {
+                     if (argtypes[fmtpos] &&
+                         argtypes[fmtpos] != ATYPE_INT)
+                         return false;
+                     argtypes[fmtpos] = ATYPE_INT;
+                     last_dollar = Max(last_dollar, fmtpos);
+                 }
+                 else
+                     return false;    /* non-dollar conversion spec */
+                 break;
+             case 's':
+             case 'p':
+                 if (fmtpos)
+                 {
+                     if (argtypes[fmtpos] &&
+                         argtypes[fmtpos] != ATYPE_CHARPTR)
+                         return false;
+                     argtypes[fmtpos] = ATYPE_CHARPTR;
+                     last_dollar = Max(last_dollar, fmtpos);
+                 }
+                 else
+                     return false;    /* non-dollar conversion spec */
+                 break;
+             case 'e':
+             case 'E':
+             case 'f':
+             case 'g':
+             case 'G':
+                 if (fmtpos)
+                 {
+                     if (argtypes[fmtpos] &&
+                         argtypes[fmtpos] != ATYPE_DOUBLE)
+                         return false;
+                     argtypes[fmtpos] = ATYPE_DOUBLE;
+                     last_dollar = Max(last_dollar, fmtpos);
+                 }
+                 else
+                     return false;    /* non-dollar conversion spec */
+                 break;
+             case 'm':
+             case '%':
+                 break;
+         }
+
+         /*
+          * If we finish the spec with afterstar still set, there's a
+          * non-dollar star in there.
+          */
+         if (afterstar)
+             return false;        /* non-dollar conversion spec */
+     }
+
+     /*
+      * Format appears valid so far, so collect the arguments in physical
+      * order.  (Since we rejected any non-dollar specs that would have
+      * collected arguments, we know that dopr() hasn't collected any yet.)
+      */
+     for (i = 1; i <= last_dollar; i++)
+     {
+         switch (argtypes[i])
+         {
+             case ATYPE_NONE:
+                 return false;
+             case ATYPE_INT:
+                 argvalues[i].i = va_arg(args, int);
+                 break;
+             case ATYPE_LONG:
+                 argvalues[i].l = va_arg(args, long);
+                 break;
+             case ATYPE_LONGLONG:
+                 argvalues[i].ll = va_arg(args, int64);
+                 break;
+             case ATYPE_DOUBLE:
+                 argvalues[i].d = va_arg(args, double);
+                 break;
+             case ATYPE_CHARPTR:
+                 argvalues[i].cptr = va_arg(args, char *);
+                 break;
+         }
+     }
+
+     return true;
+ }
+
  static void
! fmtstr(const char *value, int leftjust, int minlen, int maxwidth,
         int pointflag, PrintfTarget *target)
  {
      int            padlen,
*************** fmtstr(char *value, int leftjust, int mi
*** 857,873 ****
      else
          vallen = strlen(value);

!     adjust_padlen(minlen, vallen, leftjust, &padlen);

!     while (padlen > 0)
      {
!         dopr_outch(' ', target);
!         --padlen;
      }

      dostr(value, vallen, target);

!     trailing_pad(&padlen, target);
  }

  static void
--- 938,954 ----
      else
          vallen = strlen(value);

!     padlen = compute_padlen(minlen, vallen, leftjust);

!     if (padlen > 0)
      {
!         dopr_outchmulti(' ', padlen, target);
!         padlen = 0;
      }

      dostr(value, vallen, target);

!     trailing_pad(padlen, target);
  }

  static void
*************** fmtint(int64 value, char type, int force
*** 895,901 ****
      int            signvalue = 0;
      char        convert[64];
      int            vallen = 0;
!     int            padlen = 0;        /* amount to pad */
      int            zeropad;        /* extra leading zeroes */

      switch (type)
--- 976,982 ----
      int            signvalue = 0;
      char        convert[64];
      int            vallen = 0;
!     int            padlen;            /* amount to pad */
      int            zeropad;        /* extra leading zeroes */

      switch (type)
*************** fmtint(int64 value, char type, int force
*** 943,984 ****

          do
          {
!             convert[vallen++] = cvt[uvalue % base];
              uvalue = uvalue / base;
          } while (uvalue);
      }

      zeropad = Max(0, precision - vallen);

!     adjust_padlen(minlen, vallen + zeropad, leftjust, &padlen);

!     leading_pad(zpad, &signvalue, &padlen, target);

!     while (zeropad-- > 0)
!         dopr_outch('0', target);

!     while (vallen > 0)
!         dopr_outch(convert[--vallen], target);

!     trailing_pad(&padlen, target);
  }

  static void
  fmtchar(int value, int leftjust, int minlen, PrintfTarget *target)
  {
!     int            padlen = 0;        /* amount to pad */

!     adjust_padlen(minlen, 1, leftjust, &padlen);

!     while (padlen > 0)
      {
!         dopr_outch(' ', target);
!         --padlen;
      }

      dopr_outch(value, target);

!     trailing_pad(&padlen, target);
  }

  static void
--- 1024,1064 ----

          do
          {
!             convert[sizeof(convert) - (++vallen)] = cvt[uvalue % base];
              uvalue = uvalue / base;
          } while (uvalue);
      }

      zeropad = Max(0, precision - vallen);

!     padlen = compute_padlen(minlen, vallen + zeropad, leftjust);

!     leading_pad(zpad, signvalue, &padlen, target);

!     if (zeropad > 0)
!         dopr_outchmulti('0', zeropad, target);

!     dostr(convert + sizeof(convert) - vallen, vallen, target);

!     trailing_pad(padlen, target);
  }

  static void
  fmtchar(int value, int leftjust, int minlen, PrintfTarget *target)
  {
!     int            padlen;            /* amount to pad */

!     padlen = compute_padlen(minlen, 1, leftjust);

!     if (padlen > 0)
      {
!         dopr_outchmulti(' ', padlen, target);
!         padlen = 0;
      }

      dopr_outch(value, target);

!     trailing_pad(padlen, target);
  }

  static void
*************** fmtfloat(double value, char type, int fo
*** 989,998 ****
      int            signvalue = 0;
      int            prec;
      int            vallen;
!     char        fmt[32];
      char        convert[1024];
      int            zeropadlen = 0; /* amount to pad with zeroes */
!     int            padlen = 0;        /* amount to pad with spaces */

      /*
       * We rely on the regular C library's sprintf to do the basic conversion,
--- 1069,1082 ----
      int            signvalue = 0;
      int            prec;
      int            vallen;
!     char        fmt[8];
      char        convert[1024];
      int            zeropadlen = 0; /* amount to pad with zeroes */
!     int            padlen;            /* amount to pad with spaces */
!
!     /* Handle sign (NaNs have no sign) */
!     if (!isnan(value) && adjust_sign((value < 0), forcesign, &signvalue))
!         value = -value;

      /*
       * We rely on the regular C library's sprintf to do the basic conversion,
*************** fmtfloat(double value, char type, int fo
*** 1014,1030 ****

      if (pointflag)
      {
-         if (sprintf(fmt, "%%.%d%c", prec, type) < 0)
-             goto fail;
          zeropadlen = precision - prec;
      }
-     else if (sprintf(fmt, "%%%c", type) < 0)
-         goto fail;
-
-     if (!isnan(value) && adjust_sign((value < 0), forcesign, &signvalue))
-         value = -value;
-
-     vallen = sprintf(convert, fmt, value);
      if (vallen < 0)
          goto fail;

--- 1098,1118 ----

      if (pointflag)
      {
          zeropadlen = precision - prec;
+         fmt[0] = '%';
+         fmt[1] = '.';
+         fmt[2] = '*';
+         fmt[3] = type;
+         fmt[4] = '\0';
+         vallen = sprintf(convert, fmt, prec, value);
+     }
+     else
+     {
+         fmt[0] = '%';
+         fmt[1] = type;
+         fmt[2] = '\0';
+         vallen = sprintf(convert, fmt, value);
      }
      if (vallen < 0)
          goto fail;

*************** fmtfloat(double value, char type, int fo
*** 1032,1040 ****
      if (zeropadlen > 0 && !isdigit((unsigned char) convert[vallen - 1]))
          zeropadlen = 0;

!     adjust_padlen(minlen, vallen + zeropadlen, leftjust, &padlen);

!     leading_pad(zpad, &signvalue, &padlen, target);

      if (zeropadlen > 0)
      {
--- 1120,1128 ----
      if (zeropadlen > 0 && !isdigit((unsigned char) convert[vallen - 1]))
          zeropadlen = 0;

!     padlen = compute_padlen(minlen, vallen + zeropadlen, leftjust);

!     leading_pad(zpad, signvalue, &padlen, target);

      if (zeropadlen > 0)
      {
*************** fmtfloat(double value, char type, int fo
*** 1045,1062 ****
              epos = strrchr(convert, 'E');
          if (epos)
          {
!             /* pad after exponent */
              dostr(convert, epos - convert, target);
!             while (zeropadlen-- > 0)
!                 dopr_outch('0', target);
              dostr(epos, vallen - (epos - convert), target);
          }
          else
          {
              /* no exponent, pad after the digits */
              dostr(convert, vallen, target);
!             while (zeropadlen-- > 0)
!                 dopr_outch('0', target);
          }
      }
      else
--- 1133,1150 ----
              epos = strrchr(convert, 'E');
          if (epos)
          {
!             /* pad before exponent */
              dostr(convert, epos - convert, target);
!             if (zeropadlen > 0)
!                 dopr_outchmulti('0', zeropadlen, target);
              dostr(epos, vallen - (epos - convert), target);
          }
          else
          {
              /* no exponent, pad after the digits */
              dostr(convert, vallen, target);
!             if (zeropadlen > 0)
!                 dopr_outchmulti('0', zeropadlen, target);
          }
      }
      else
*************** fmtfloat(double value, char type, int fo
*** 1065,1071 ****
          dostr(convert, vallen, target);
      }

!     trailing_pad(&padlen, target);
      return;

  fail:
--- 1153,1159 ----
          dostr(convert, vallen, target);
      }

!     trailing_pad(padlen, target);
      return;

  fail:
*************** fail:
*** 1075,1080 ****
--- 1163,1175 ----
  static void
  dostr(const char *str, int slen, PrintfTarget *target)
  {
+     /* fast path for common case of slen == 1 */
+     if (slen == 1)
+     {
+         dopr_outch(*str, target);
+         return;
+     }
+
      while (slen > 0)
      {
          int            avail;
*************** dopr_outch(int c, PrintfTarget *target)
*** 1118,1123 ****
--- 1213,1254 ----
      *(target->bufptr++) = c;
  }

+ static void
+ dopr_outchmulti(int c, int slen, PrintfTarget *target)
+ {
+     /* fast path for common case of slen == 1 */
+     if (slen == 1)
+     {
+         dopr_outch(c, target);
+         return;
+     }
+
+     while (slen > 0)
+     {
+         int            avail;
+
+         if (target->bufend != NULL)
+             avail = target->bufend - target->bufptr;
+         else
+             avail = slen;
+         if (avail <= 0)
+         {
+             /* buffer full, can we dump to stream? */
+             if (target->stream == NULL)
+             {
+                 target->nchars += slen; /* no, lose the data */
+                 return;
+             }
+             flushbuffer(target);
+             continue;
+         }
+         avail = Min(avail, slen);
+         memset(target->bufptr, c, avail);
+         target->bufptr += avail;
+         slen -= avail;
+     }
+ }
+

  static int
  adjust_sign(int is_negative, int forcesign, int *signvalue)
*************** adjust_sign(int is_negative, int forcesi
*** 1133,1174 ****
  }


! static void
! adjust_padlen(int minlen, int vallen, int leftjust, int *padlen)
  {
!     *padlen = minlen - vallen;
!     if (*padlen < 0)
!         *padlen = 0;
      if (leftjust)
!         *padlen = -(*padlen);
  }


  static void
! leading_pad(int zpad, int *signvalue, int *padlen, PrintfTarget *target)
  {
      if (*padlen > 0 && zpad)
      {
!         if (*signvalue)
          {
!             dopr_outch(*signvalue, target);
              --(*padlen);
!             *signvalue = 0;
          }
!         while (*padlen > 0)
          {
!             dopr_outch(zpad, target);
!             --(*padlen);
          }
      }
!     while (*padlen > (*signvalue != 0))
      {
!         dopr_outch(' ', target);
!         --(*padlen);
      }
!     if (*signvalue)
      {
!         dopr_outch(*signvalue, target);
          if (*padlen > 0)
              --(*padlen);
          else if (*padlen < 0)
--- 1264,1311 ----
  }


! static int
! compute_padlen(int minlen, int vallen, int leftjust)
  {
!     int            padlen;
!
!     padlen = minlen - vallen;
!     if (padlen < 0)
!         padlen = 0;
      if (leftjust)
!         padlen = -padlen;
!     return padlen;
  }


  static void
! leading_pad(int zpad, int signvalue, int *padlen, PrintfTarget *target)
  {
+     int            maxpad;
+
      if (*padlen > 0 && zpad)
      {
!         if (signvalue)
          {
!             dopr_outch(signvalue, target);
              --(*padlen);
!             signvalue = 0;
          }
!         if (*padlen > 0)
          {
!             dopr_outchmulti(zpad, *padlen, target);
!             *padlen = 0;
          }
      }
!     maxpad = (signvalue != 0);
!     if (*padlen > maxpad)
      {
!         dopr_outchmulti(' ', *padlen - maxpad, target);
!         *padlen = maxpad;
      }
!     if (signvalue)
      {
!         dopr_outch(signvalue, target);
          if (*padlen > 0)
              --(*padlen);
          else if (*padlen < 0)
*************** leading_pad(int zpad, int *signvalue, in
*** 1178,1188 ****


  static void
! trailing_pad(int *padlen, PrintfTarget *target)
  {
!     while (*padlen < 0)
!     {
!         dopr_outch(' ', target);
!         ++(*padlen);
!     }
  }
--- 1315,1322 ----


  static void
! trailing_pad(int padlen, PrintfTarget *target)
  {
!     if (padlen < 0)
!         dopr_outchmulti(' ', -padlen, target);
  }

Re: Performance improvements for src/port/snprintf.c

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> Reading around the interwebz lead me to look at ryu

> https://dl.acm.org/citation.cfm?id=3192369
> https://github.com/ulfjack/ryu/tree/46f4c5572121a6f1428749fe3e24132c3626c946

> That's an algorithm that always generates the minimally sized
> roundtrip-safe string output for a floating point number. That makes it
> insuitable for the innards of printf, but it very well could be
> interesting for e.g. float8out, especially when we currently specify a
> "too high" precision to guarantee round-trip safeity.

Yeah, the whole business of round-trip safety is a bit worrisome.
If we change printf, and it produces different low-order digits
than before, will floats still round-trip correctly?  I think we
have to ensure that they do.  If we just use strfromd(), then it's
libc's problem if the results change ... but if we stick in some
code we got from elsewhere, it's our problem.

BTW, were you thinking of plugging in strfromd() inside snprintf.c,
or just invoking it directly from float[48]out?  The latter would
presumably be cheaper, and it'd solve the most pressing performance
problem, if not every problem.

            regards, tom lane


Re: Performance improvements for src/port/snprintf.c

От
Andres Freund
Дата:
Hi,

On 2018-09-26 21:30:25 -0400, Tom Lane wrote:
> Here's a rebased version of <15785.1536776055@sss.pgh.pa.us>.
> 
> I think we should try to get this reviewed and committed before
> we worry more about the float business.  It would be silly to
> not be benchmarking any bigger changes against the low-hanging
> fruit here.

Yea, no arguments there.

I'll try to have a look tomorrow.

Greetings,

Andres Freund


Re: Performance improvements for src/port/snprintf.c

От
Thomas Munro
Дата:
On Thu, Sep 27, 2018 at 1:18 PM Andres Freund <andres@anarazel.de> wrote:
> On 2018-09-26 17:57:05 -0700, Andres Freund wrote:
> > snprintf time = 1324.87 ms total, 0.000264975 ms per iteration
> > pg time = 1434.57 ms total, 0.000286915 ms per iteration
> > stbsp time = 552.14 ms total, 0.000110428 ms per iteration
>
> Reading around the interwebz lead me to look at ryu
>
> https://dl.acm.org/citation.cfm?id=3192369
> https://github.com/ulfjack/ryu/tree/46f4c5572121a6f1428749fe3e24132c3626c946
>
> That's an algorithm that always generates the minimally sized
> roundtrip-safe string output for a floating point number. That makes it
> insuitable for the innards of printf, but it very well could be
> interesting for e.g. float8out, especially when we currently specify a
> "too high" precision to guarantee round-trip safeity.

Wow.  While all the algorithms have that round trip goal, they keep
doing it faster.  I was once interested in their speed for a work
problem, and looked into the 30 year old dragon4 and 8 year old grisu3
algorithms.  It's amazing to me that we have a new algorithm in 2018
for this ancient problem, and it claims to be 3 times faster than the
competition.  (Hah, I see that "ryū" is Japanese for dragon.  "Grisù"
is a dragon from an Italian TV series.)

--
Thomas Munro
http://www.enterprisedb.com


Re: Performance improvements for src/port/snprintf.c

От
Andres Freund
Дата:
On 2018-09-26 21:44:41 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Reading around the interwebz lead me to look at ryu
> 
> > https://dl.acm.org/citation.cfm?id=3192369
> > https://github.com/ulfjack/ryu/tree/46f4c5572121a6f1428749fe3e24132c3626c946
> 
> > That's an algorithm that always generates the minimally sized
> > roundtrip-safe string output for a floating point number. That makes it
> > insuitable for the innards of printf, but it very well could be
> > interesting for e.g. float8out, especially when we currently specify a
> > "too high" precision to guarantee round-trip safeity.
> 
> Yeah, the whole business of round-trip safety is a bit worrisome.

Seems like using a better algorithm also has the potential to make the
output a bit smaller / more readable than what we currently produce.


> If we change printf, and it produces different low-order digits
> than before, will floats still round-trip correctly?  I think we
> have to ensure that they do.

Yea, I think that's an absolutely hard requirement.  It'd possibly be a
good idea to add an  assert that enforce that, although I'm not sure
it's worth the portability issues around crappy system libcs that do
randomly different things.


> BTW, were you thinking of plugging in strfromd() inside snprintf.c,
> or just invoking it directly from float[48]out?  The latter would
> presumably be cheaper, and it'd solve the most pressing performance
> problem, if not every problem.

I wasn't actually seriously suggesting we should use strfromd, but I
guess one way to deal with this would be to add a wrapper routine that
could directly be called from float[48]out *and* from fmtfloat(). Wonder
if it'd be worthwhile to *not* pass that wrapper a format string, but
instead pass the sprecision as an explicit argument.  Would make the use
in snprintf.c a bit more annoying (due to fFeEgG support), but probably
considerably simpler and faster if we ever reimplement that ourself.

Greetings,

Andres Freund


Re: Performance improvements for src/port/snprintf.c

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2018-09-26 21:44:41 -0400, Tom Lane wrote:
>> BTW, were you thinking of plugging in strfromd() inside snprintf.c,
>> or just invoking it directly from float[48]out?  The latter would
>> presumably be cheaper, and it'd solve the most pressing performance
>> problem, if not every problem.

> I wasn't actually seriously suggesting we should use strfromd, but I
> guess one way to deal with this would be to add a wrapper routine that
> could directly be called from float[48]out *and* from fmtfloat().

Yeah, something along that line occurred to me a bit later.

> Wonder
> if it'd be worthwhile to *not* pass that wrapper a format string, but
> instead pass the sprecision as an explicit argument.

Right, getting rid of the round trip to text for the precision seems
like a win.  I'm surprised that strfromd is defined the way it is and
not with something like (double val, char fmtcode, int precision, ...)

            regards, tom lane


Re: Performance improvements for src/port/snprintf.c

От
Andrew Gierth
Дата:
>>>>> "Andres" == Andres Freund <andres@anarazel.de> writes:

 Andres> Hm, stb's results just for floating point isn't bad. The above
 Andres> numbers were for %f %f. But as the minimal usage would be about
 Andres> the internal usage of dopr(), here's comparing %.*f:

 Andres> snprintf time = 1324.87 ms total, 0.000264975 ms per iteration
 Andres> pg time = 1434.57 ms total, 0.000286915 ms per iteration
 Andres> stbsp time = 552.14 ms total, 0.000110428 ms per iteration

Hmm. We had a case recently on IRC where the performance of float8out
turned out to be the major bottleneck: a table of about 2.7 million rows
and ~70 float columns showed an overhead of ~66 seconds for doing COPY
as opposed to COPY BINARY (the actual problem report was that doing
"select * from table" from R was taking a minute+ longer than expected,
we got comparative timings for COPY just to narrow down causes).

That translates to approx. 0.00035 ms overhead (i.e. time(float8out) -
time(float8send)) per conversion (Linux server, hardware unknown).

That 66 seconds was the difference between 18s and 1m24s, so it wasn't a
small factor but totally dominated the query time.

-- 
Andrew (irc:RhodiumToad)


Re: Performance improvements for src/port/snprintf.c

От
Thomas Munro
Дата:
On Thu, Sep 27, 2018 at 3:55 PM Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
> >>>>> "Andres" == Andres Freund <andres@anarazel.de> writes:
>  Andres> Hm, stb's results just for floating point isn't bad. The above
>  Andres> numbers were for %f %f. But as the minimal usage would be about
>  Andres> the internal usage of dopr(), here's comparing %.*f:
>
>  Andres> snprintf time = 1324.87 ms total, 0.000264975 ms per iteration
>  Andres> pg time = 1434.57 ms total, 0.000286915 ms per iteration
>  Andres> stbsp time = 552.14 ms total, 0.000110428 ms per iteration
>
> Hmm. We had a case recently on IRC where the performance of float8out
> turned out to be the major bottleneck: a table of about 2.7 million rows
> and ~70 float columns showed an overhead of ~66 seconds for doing COPY
> as opposed to COPY BINARY (the actual problem report was that doing
> "select * from table" from R was taking a minute+ longer than expected,
> we got comparative timings for COPY just to narrow down causes).
>
> That translates to approx. 0.00035 ms overhead (i.e. time(float8out) -
> time(float8send)) per conversion (Linux server, hardware unknown).
>
> That 66 seconds was the difference between 18s and 1m24s, so it wasn't a
> small factor but totally dominated the query time.

For perfect and cheap round trip to ASCII, not for human consumption,
I wonder about the hexadecimal binary float literal format from C99
(and showing up in other places too).

-- 
Thomas Munro
http://www.enterprisedb.com


Re: Performance improvements for src/port/snprintf.c

От
Andres Freund
Дата:

On September 26, 2018 8:53:27 PM PDT, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>>>>>> "Andres" == Andres Freund <andres@anarazel.de> writes:
>
> Andres> Hm, stb's results just for floating point isn't bad. The above
>Andres> numbers were for %f %f. But as the minimal usage would be about
> Andres> the internal usage of dopr(), here's comparing %.*f:
>
> Andres> snprintf time = 1324.87 ms total, 0.000264975 ms per iteration
> Andres> pg time = 1434.57 ms total, 0.000286915 ms per iteration
> Andres> stbsp time = 552.14 ms total, 0.000110428 ms per iteration
>
>Hmm. We had a case recently on IRC where the performance of float8out
>turned out to be the major bottleneck: a table of about 2.7 million
>rows
>and ~70 float columns showed an overhead of ~66 seconds for doing COPY
>as opposed to COPY BINARY (the actual problem report was that doing
>"select * from table" from R was taking a minute+ longer than expected,
>we got comparative timings for COPY just to narrow down causes).
>
>That translates to approx. 0.00035 ms overhead (i.e. time(float8out) -
>time(float8send)) per conversion (Linux server, hardware unknown).

Sounds like it could be pretty precisely be the cost measured above. My laptop's a bit faster than most server CPUs and
thetest has perfect branch prediction... 


>That 66 seconds was the difference between 18s and 1m24s, so it wasn't
>a
>small factor but totally dominated the query time.


Ugh.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: Performance improvements for src/port/snprintf.c

От
Andres Freund
Дата:

On September 26, 2018 9:04:08 PM PDT, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
>On Thu, Sep 27, 2018 at 3:55 PM Andrew Gierth
><andrew@tao11.riddles.org.uk> wrote:
>> >>>>> "Andres" == Andres Freund <andres@anarazel.de> writes:
>>  Andres> Hm, stb's results just for floating point isn't bad. The
>above
>>  Andres> numbers were for %f %f. But as the minimal usage would be
>about
>>  Andres> the internal usage of dopr(), here's comparing %.*f:
>>
>>  Andres> snprintf time = 1324.87 ms total, 0.000264975 ms per
>iteration
>>  Andres> pg time = 1434.57 ms total, 0.000286915 ms per iteration
>>  Andres> stbsp time = 552.14 ms total, 0.000110428 ms per iteration
>>
>> Hmm. We had a case recently on IRC where the performance of float8out
>> turned out to be the major bottleneck: a table of about 2.7 million
>rows
>> and ~70 float columns showed an overhead of ~66 seconds for doing
>COPY
>> as opposed to COPY BINARY (the actual problem report was that doing
>> "select * from table" from R was taking a minute+ longer than
>expected,
>> we got comparative timings for COPY just to narrow down causes).
>>
>> That translates to approx. 0.00035 ms overhead (i.e. time(float8out)
>-
>> time(float8send)) per conversion (Linux server, hardware unknown).
>>
>> That 66 seconds was the difference between 18s and 1m24s, so it
>wasn't a
>> small factor but totally dominated the query time.
>
>For perfect and cheap round trip to ASCII, not for human consumption,
>I wonder about the hexadecimal binary float literal format from C99
>(and showing up in other places too).

I'm not quite sure how we realistically would migrate to that though. Clients and their users won't understand it, and
themore knowledgeable ones will already use the binary protocol. 

Answers
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: Performance improvements for src/port/snprintf.c

От
Andres Freund
Дата:
Hi,

On 2018-09-26 21:30:25 -0400, Tom Lane wrote:
> Here's a rebased version of <15785.1536776055@sss.pgh.pa.us>.
> 
> I think we should try to get this reviewed and committed before
> we worry more about the float business.  It would be silly to
> not be benchmarking any bigger changes against the low-hanging
> fruit here.

I've looked through the patch. Looks good to me. Some minor notes:

- How about adding our own strchrnul for the case where we don't
  HAVE_STRCHRNUL?  It's possible that other platforms have something
  similar, and the code wouldlook more readable that way.
- I know it's not new, but is it actually correct to use va_arg(args, int64)
  for ATYPE_LONGLONG?

Greetings,

Andres Freund


Re: Performance improvements for src/port/snprintf.c

От
Tom Lane
Дата:
Here's a version of this patch rebased over commit 625b38ea0.

That commit's fix for the possibly-expensive memset means that we need
to reconsider performance numbers for this patch.  I re-ran my previous
tests, and it's still looking like this is a substantial win, as it makes
snprintf.c faster than the native snprintf for most non-float cases.
We're still stuck at something like 10% penalty for float cases.

While there might be value in implementing our own float printing code,
I have a pretty hard time getting excited about the cost/benefit ratio
of that.  I think that what we probably really ought to do here is hack
float4out/float8out to bypass the extra overhead, as in the 0002 patch
below.

For reference, I attach the testbed I'm using now plus some results.
I wasn't able to get my cranky NetBSD system up today, so I don't
have results for that.  However, I did add recent glibc (Fedora 28)
to the mix, and I was interested to discover that they seem to have
added a fast-path for format strings that are exactly "%s", just as
NetBSD did.  I wonder if we should reconsider our position on doing
that.  It'd be a simple enough addition...

            regards, tom lane

diff --git a/configure b/configure
index 6414ec1..0448c6b 100755
*** a/configure
--- b/configure
*************** fi
*** 15100,15106 ****
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`

! for ac_func in cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate
ppollpstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink sync_file_range utime
utimeswcstombs_l 
  do :
    as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
--- 15100,15106 ----
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`

! for ac_func in cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate
ppollpstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul symlink
sync_file_rangeutime utimes wcstombs_l 
  do :
    as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.in b/configure.in
index 158d5a1..23b5bb8 100644
*** a/configure.in
--- b/configure.in
*************** PGAC_FUNC_WCSTOMBS_L
*** 1571,1577 ****
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`

! AC_CHECK_FUNCS([cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll
posix_fallocateppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink
sync_file_rangeutime utimes wcstombs_l]) 

  AC_REPLACE_FUNCS(fseeko)
  case $host_os in
--- 1571,1577 ----
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`

! AC_CHECK_FUNCS([cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll
posix_fallocateppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul
symlinksync_file_range utime utimes wcstombs_l]) 

  AC_REPLACE_FUNCS(fseeko)
  case $host_os in
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 90dda8e..7894caa 100644
*** a/src/include/pg_config.h.in
--- b/src/include/pg_config.h.in
***************
*** 523,528 ****
--- 523,531 ----
  /* Define to 1 if you have the <stdlib.h> header file. */
  #undef HAVE_STDLIB_H

+ /* Define to 1 if you have the `strchrnul' function. */
+ #undef HAVE_STRCHRNUL
+
  /* Define to 1 if you have the `strerror_r' function. */
  #undef HAVE_STRERROR_R

diff --git a/src/include/pg_config.h.win32 b/src/include/pg_config.h.win32
index 93bb773..f7a051d 100644
*** a/src/include/pg_config.h.win32
--- b/src/include/pg_config.h.win32
***************
*** 394,399 ****
--- 394,402 ----
  /* Define to 1 if you have the <stdlib.h> header file. */
  #define HAVE_STDLIB_H 1

+ /* Define to 1 if you have the `strchrnul' function. */
+ /* #undef HAVE_STRCHRNUL */
+
  /* Define to 1 if you have the `strerror_r' function. */
  /* #undef HAVE_STRERROR_R */

diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index 1be5f70..3094ad8 100644
*** a/src/port/snprintf.c
--- b/src/port/snprintf.c
*************** flushbuffer(PrintfTarget *target)
*** 314,320 ****
  }


! static void fmtstr(char *value, int leftjust, int minlen, int maxwidth,
         int pointflag, PrintfTarget *target);
  static void fmtptr(void *value, PrintfTarget *target);
  static void fmtint(int64 value, char type, int forcesign,
--- 314,322 ----
  }


! static bool find_arguments(const char *format, va_list args,
!                PrintfArgValue *argvalues);
! static void fmtstr(const char *value, int leftjust, int minlen, int maxwidth,
         int pointflag, PrintfTarget *target);
  static void fmtptr(void *value, PrintfTarget *target);
  static void fmtint(int64 value, char type, int forcesign,
*************** static void fmtfloat(double value, char
*** 326,336 ****
           PrintfTarget *target);
  static void dostr(const char *str, int slen, PrintfTarget *target);
  static void dopr_outch(int c, PrintfTarget *target);
  static int    adjust_sign(int is_negative, int forcesign, int *signvalue);
! static void adjust_padlen(int minlen, int vallen, int leftjust, int *padlen);
! static void leading_pad(int zpad, int *signvalue, int *padlen,
              PrintfTarget *target);
! static void trailing_pad(int *padlen, PrintfTarget *target);


  /*
--- 328,339 ----
           PrintfTarget *target);
  static void dostr(const char *str, int slen, PrintfTarget *target);
  static void dopr_outch(int c, PrintfTarget *target);
+ static void dopr_outchmulti(int c, int slen, PrintfTarget *target);
  static int    adjust_sign(int is_negative, int forcesign, int *signvalue);
! static int    compute_padlen(int minlen, int vallen, int leftjust);
! static void leading_pad(int zpad, int signvalue, int *padlen,
              PrintfTarget *target);
! static void trailing_pad(int padlen, PrintfTarget *target);


  /*
*************** static void
*** 340,349 ****
  dopr(PrintfTarget *target, const char *format, va_list args)
  {
      int            save_errno = errno;
!     const char *format_start = format;
      int            ch;
      bool        have_dollar;
-     bool        have_non_dollar;
      bool        have_star;
      bool        afterstar;
      int            accum;
--- 343,351 ----
  dopr(PrintfTarget *target, const char *format, va_list args)
  {
      int            save_errno = errno;
!     const char *first_pct = NULL;
      int            ch;
      bool        have_dollar;
      bool        have_star;
      bool        afterstar;
      int            accum;
*************** dopr(PrintfTarget *target, const char *f
*** 355,580 ****
      int            precision;
      int            zpad;
      int            forcesign;
-     int            last_dollar;
      int            fmtpos;
      int            cvalue;
      int64        numvalue;
      double        fvalue;
      char       *strvalue;
-     int            i;
-     PrintfArgType argtypes[PG_NL_ARGMAX + 1];
      PrintfArgValue argvalues[PG_NL_ARGMAX + 1];

      /*
!      * Parse the format string to determine whether there are %n$ format
!      * specs, and identify the types and order of the format parameters.
       */
!     have_dollar = have_non_dollar = false;
!     last_dollar = 0;
!     MemSet(argtypes, 0, sizeof(argtypes));

!     while ((ch = *format++) != '\0')
      {
!         if (ch != '%')
!             continue;
!         longflag = longlongflag = pointflag = 0;
!         fmtpos = accum = 0;
!         afterstar = false;
! nextch1:
!         ch = *format++;
!         if (ch == '\0')
!             break;                /* illegal, but we don't complain */
!         switch (ch)
          {
!             case '-':
!             case '+':
!                 goto nextch1;
!             case '0':
!             case '1':
!             case '2':
!             case '3':
!             case '4':
!             case '5':
!             case '6':
!             case '7':
!             case '8':
!             case '9':
!                 accum = accum * 10 + (ch - '0');
!                 goto nextch1;
!             case '.':
!                 pointflag = 1;
!                 accum = 0;
!                 goto nextch1;
!             case '*':
!                 if (afterstar)
!                     have_non_dollar = true; /* multiple stars */
!                 afterstar = true;
!                 accum = 0;
!                 goto nextch1;
!             case '$':
!                 have_dollar = true;
!                 if (accum <= 0 || accum > PG_NL_ARGMAX)
!                     goto bad_format;
!                 if (afterstar)
!                 {
!                     if (argtypes[accum] &&
!                         argtypes[accum] != ATYPE_INT)
!                         goto bad_format;
!                     argtypes[accum] = ATYPE_INT;
!                     last_dollar = Max(last_dollar, accum);
!                     afterstar = false;
!                 }
!                 else
!                     fmtpos = accum;
!                 accum = 0;
!                 goto nextch1;
!             case 'l':
!                 if (longflag)
!                     longlongflag = 1;
!                 else
!                     longflag = 1;
!                 goto nextch1;
!             case 'z':
! #if SIZEOF_SIZE_T == 8
! #ifdef HAVE_LONG_INT_64
!                 longflag = 1;
! #elif defined(HAVE_LONG_LONG_INT_64)
!                 longlongflag = 1;
! #else
! #error "Don't know how to print 64bit integers"
! #endif
  #else
!                 /* assume size_t is same size as int */
  #endif
-                 goto nextch1;
-             case 'h':
-             case '\'':
-                 /* ignore these */
-                 goto nextch1;
-             case 'd':
-             case 'i':
-             case 'o':
-             case 'u':
-             case 'x':
-             case 'X':
-                 if (fmtpos)
-                 {
-                     PrintfArgType atype;

!                     if (longlongflag)
!                         atype = ATYPE_LONGLONG;
!                     else if (longflag)
!                         atype = ATYPE_LONG;
!                     else
!                         atype = ATYPE_INT;
!                     if (argtypes[fmtpos] &&
!                         argtypes[fmtpos] != atype)
!                         goto bad_format;
!                     argtypes[fmtpos] = atype;
!                     last_dollar = Max(last_dollar, fmtpos);
!                 }
!                 else
!                     have_non_dollar = true;
!                 break;
!             case 'c':
!                 if (fmtpos)
!                 {
!                     if (argtypes[fmtpos] &&
!                         argtypes[fmtpos] != ATYPE_INT)
!                         goto bad_format;
!                     argtypes[fmtpos] = ATYPE_INT;
!                     last_dollar = Max(last_dollar, fmtpos);
!                 }
!                 else
!                     have_non_dollar = true;
!                 break;
!             case 's':
!             case 'p':
!                 if (fmtpos)
!                 {
!                     if (argtypes[fmtpos] &&
!                         argtypes[fmtpos] != ATYPE_CHARPTR)
!                         goto bad_format;
!                     argtypes[fmtpos] = ATYPE_CHARPTR;
!                     last_dollar = Max(last_dollar, fmtpos);
!                 }
!                 else
!                     have_non_dollar = true;
!                 break;
!             case 'e':
!             case 'E':
!             case 'f':
!             case 'g':
!             case 'G':
!                 if (fmtpos)
!                 {
!                     if (argtypes[fmtpos] &&
!                         argtypes[fmtpos] != ATYPE_DOUBLE)
!                         goto bad_format;
!                     argtypes[fmtpos] = ATYPE_DOUBLE;
!                     last_dollar = Max(last_dollar, fmtpos);
!                 }
!                 else
!                     have_non_dollar = true;
                  break;
!             case 'm':
!             case '%':
                  break;
          }

          /*
!          * If we finish the spec with afterstar still set, there's a
!          * non-dollar star in there.
           */
!         if (afterstar)
!             have_non_dollar = true;
!     }
!
!     /* Per spec, you use either all dollar or all not. */
!     if (have_dollar && have_non_dollar)
!         goto bad_format;
!
!     /*
!      * In dollar mode, collect the arguments in physical order.
!      */
!     for (i = 1; i <= last_dollar; i++)
!     {
!         switch (argtypes[i])
!         {
!             case ATYPE_NONE:
!                 goto bad_format;
!             case ATYPE_INT:
!                 argvalues[i].i = va_arg(args, int);
!                 break;
!             case ATYPE_LONG:
!                 argvalues[i].l = va_arg(args, long);
!                 break;
!             case ATYPE_LONGLONG:
!                 argvalues[i].ll = va_arg(args, int64);
!                 break;
!             case ATYPE_DOUBLE:
!                 argvalues[i].d = va_arg(args, double);
!                 break;
!             case ATYPE_CHARPTR:
!                 argvalues[i].cptr = va_arg(args, char *);
!                 break;
!         }
!     }
!
!     /*
!      * At last we can parse the format for real.
!      */
!     format = format_start;
!     while ((ch = *format++) != '\0')
!     {
!         if (target->failed)
!             break;

!         if (ch != '%')
!         {
!             dopr_outch(ch, target);
!             continue;
!         }
          fieldwidth = precision = zpad = leftjust = forcesign = 0;
          longflag = longlongflag = pointflag = 0;
          fmtpos = accum = 0;
--- 357,417 ----
      int            precision;
      int            zpad;
      int            forcesign;
      int            fmtpos;
      int            cvalue;
      int64        numvalue;
      double        fvalue;
      char       *strvalue;
      PrintfArgValue argvalues[PG_NL_ARGMAX + 1];

      /*
!      * Initially, we suppose the format string does not use %n$.  The first
!      * time we come to a conversion spec that has that, we'll call
!      * find_arguments() to check for consistent use of %n$ and fill the
!      * argvalues array with the argument values in the correct order.
       */
!     have_dollar = false;

!     while (*format != '\0')
      {
!         /* Locate next conversion specifier */
!         if (*format != '%')
          {
!             const char *next_pct = format + 1;
!
!             /*
!              * If strchrnul exists (it's a glibc-ism), it's a good bit faster
!              * than the equivalent manual loop.  Note: this doesn't compile
!              * cleanly without -D_GNU_SOURCE, but we normally use that on
!              * glibc platforms.
!              */
! #ifdef HAVE_STRCHRNUL
!             next_pct = strchrnul(next_pct, '%');
  #else
!             while (*next_pct != '\0' && *next_pct != '%')
!                 next_pct++;
  #endif

!             /* Dump literal data we just scanned over */
!             dostr(format, next_pct - format, target);
!             if (target->failed)
                  break;
!
!             if (*next_pct == '\0')
                  break;
+             format = next_pct;
          }

          /*
!          * Remember start of first conversion spec; if we find %n$, then it's
!          * sufficient for find_arguments() to start here, without rescanning
!          * earlier literal text.
           */
!         if (first_pct == NULL)
!             first_pct = format;

!         /* Process conversion spec starting at *format */
!         format++;
          fieldwidth = precision = zpad = leftjust = forcesign = 0;
          longflag = longlongflag = pointflag = 0;
          fmtpos = accum = 0;
*************** nextch2:
*** 618,624 ****
              case '*':
                  if (have_dollar)
                  {
!                     /* process value after reading n$ */
                      afterstar = true;
                  }
                  else
--- 455,465 ----
              case '*':
                  if (have_dollar)
                  {
!                     /*
!                      * We'll process value after reading n$.  Note it's OK to
!                      * assume have_dollar is set correctly, because in a valid
!                      * format string the initial % must have had n$ if * does.
!                      */
                      afterstar = true;
                  }
                  else
*************** nextch2:
*** 649,654 ****
--- 490,503 ----
                  accum = 0;
                  goto nextch2;
              case '$':
+                 /* First dollar sign? */
+                 if (!have_dollar)
+                 {
+                     /* Yup, so examine all conversion specs in format */
+                     if (!find_arguments(first_pct, args, argvalues))
+                         goto bad_format;
+                     have_dollar = true;
+                 }
                  if (afterstar)
                  {
                      /* fetch and process star value */
*************** nextch2:
*** 836,841 ****
--- 685,694 ----
                  dopr_outch('%', target);
                  break;
          }
+
+         /* Check for failure after each conversion spec */
+         if (target->failed)
+             break;
      }

      return;
*************** bad_format:
*** 845,852 ****
      target->failed = true;
  }

  static void
! fmtstr(char *value, int leftjust, int minlen, int maxwidth,
         int pointflag, PrintfTarget *target)
  {
      int            padlen,
--- 698,933 ----
      target->failed = true;
  }

+ /*
+  * find_arguments(): sort out the arguments for a format spec with %n$
+  *
+  * If format is valid, return true and fill argvalues[i] with the value
+  * for the conversion spec that has %i$ or *i$.  Else return false.
+  */
+ static bool
+ find_arguments(const char *format, va_list args,
+                PrintfArgValue *argvalues)
+ {
+     int            ch;
+     bool        afterstar;
+     int            accum;
+     int            longlongflag;
+     int            longflag;
+     int            fmtpos;
+     int            i;
+     int            last_dollar;
+     PrintfArgType argtypes[PG_NL_ARGMAX + 1];
+
+     /* Initialize to "no dollar arguments known" */
+     last_dollar = 0;
+     MemSet(argtypes, 0, sizeof(argtypes));
+
+     /*
+      * This loop must accept the same format strings as the one in dopr().
+      * However, we don't need to analyze them to the same level of detail.
+      *
+      * Since we're only called if there's a dollar-type spec somewhere, we can
+      * fail immediately if we find a non-dollar spec.  Per the C99 standard,
+      * all argument references in the format string must be one or the other.
+      */
+     while (*format != '\0')
+     {
+         /* Locate next conversion specifier */
+         if (*format != '%')
+         {
+             /* Unlike dopr, we can just quit if there's no more specifiers */
+             format = strchr(format + 1, '%');
+             if (format == NULL)
+                 break;
+         }
+
+         /* Process conversion spec starting at *format */
+         format++;
+         longflag = longlongflag = 0;
+         fmtpos = accum = 0;
+         afterstar = false;
+ nextch1:
+         ch = *format++;
+         if (ch == '\0')
+             break;                /* illegal, but we don't complain */
+         switch (ch)
+         {
+             case '-':
+             case '+':
+                 goto nextch1;
+             case '0':
+             case '1':
+             case '2':
+             case '3':
+             case '4':
+             case '5':
+             case '6':
+             case '7':
+             case '8':
+             case '9':
+                 accum = accum * 10 + (ch - '0');
+                 goto nextch1;
+             case '.':
+                 accum = 0;
+                 goto nextch1;
+             case '*':
+                 if (afterstar)
+                     return false;    /* previous star missing dollar */
+                 afterstar = true;
+                 accum = 0;
+                 goto nextch1;
+             case '$':
+                 if (accum <= 0 || accum > PG_NL_ARGMAX)
+                     return false;
+                 if (afterstar)
+                 {
+                     if (argtypes[accum] &&
+                         argtypes[accum] != ATYPE_INT)
+                         return false;
+                     argtypes[accum] = ATYPE_INT;
+                     last_dollar = Max(last_dollar, accum);
+                     afterstar = false;
+                 }
+                 else
+                     fmtpos = accum;
+                 accum = 0;
+                 goto nextch1;
+             case 'l':
+                 if (longflag)
+                     longlongflag = 1;
+                 else
+                     longflag = 1;
+                 goto nextch1;
+             case 'z':
+ #if SIZEOF_SIZE_T == 8
+ #ifdef HAVE_LONG_INT_64
+                 longflag = 1;
+ #elif defined(HAVE_LONG_LONG_INT_64)
+                 longlongflag = 1;
+ #else
+ #error "Don't know how to print 64bit integers"
+ #endif
+ #else
+                 /* assume size_t is same size as int */
+ #endif
+                 goto nextch1;
+             case 'h':
+             case '\'':
+                 /* ignore these */
+                 goto nextch1;
+             case 'd':
+             case 'i':
+             case 'o':
+             case 'u':
+             case 'x':
+             case 'X':
+                 if (fmtpos)
+                 {
+                     PrintfArgType atype;
+
+                     if (longlongflag)
+                         atype = ATYPE_LONGLONG;
+                     else if (longflag)
+                         atype = ATYPE_LONG;
+                     else
+                         atype = ATYPE_INT;
+                     if (argtypes[fmtpos] &&
+                         argtypes[fmtpos] != atype)
+                         return false;
+                     argtypes[fmtpos] = atype;
+                     last_dollar = Max(last_dollar, fmtpos);
+                 }
+                 else
+                     return false;    /* non-dollar conversion spec */
+                 break;
+             case 'c':
+                 if (fmtpos)
+                 {
+                     if (argtypes[fmtpos] &&
+                         argtypes[fmtpos] != ATYPE_INT)
+                         return false;
+                     argtypes[fmtpos] = ATYPE_INT;
+                     last_dollar = Max(last_dollar, fmtpos);
+                 }
+                 else
+                     return false;    /* non-dollar conversion spec */
+                 break;
+             case 's':
+             case 'p':
+                 if (fmtpos)
+                 {
+                     if (argtypes[fmtpos] &&
+                         argtypes[fmtpos] != ATYPE_CHARPTR)
+                         return false;
+                     argtypes[fmtpos] = ATYPE_CHARPTR;
+                     last_dollar = Max(last_dollar, fmtpos);
+                 }
+                 else
+                     return false;    /* non-dollar conversion spec */
+                 break;
+             case 'e':
+             case 'E':
+             case 'f':
+             case 'g':
+             case 'G':
+                 if (fmtpos)
+                 {
+                     if (argtypes[fmtpos] &&
+                         argtypes[fmtpos] != ATYPE_DOUBLE)
+                         return false;
+                     argtypes[fmtpos] = ATYPE_DOUBLE;
+                     last_dollar = Max(last_dollar, fmtpos);
+                 }
+                 else
+                     return false;    /* non-dollar conversion spec */
+                 break;
+             case 'm':
+             case '%':
+                 break;
+         }
+
+         /*
+          * If we finish the spec with afterstar still set, there's a
+          * non-dollar star in there.
+          */
+         if (afterstar)
+             return false;        /* non-dollar conversion spec */
+     }
+
+     /*
+      * Format appears valid so far, so collect the arguments in physical
+      * order.  (Since we rejected any non-dollar specs that would have
+      * collected arguments, we know that dopr() hasn't collected any yet.)
+      */
+     for (i = 1; i <= last_dollar; i++)
+     {
+         switch (argtypes[i])
+         {
+             case ATYPE_NONE:
+                 return false;
+             case ATYPE_INT:
+                 argvalues[i].i = va_arg(args, int);
+                 break;
+             case ATYPE_LONG:
+                 argvalues[i].l = va_arg(args, long);
+                 break;
+             case ATYPE_LONGLONG:
+                 argvalues[i].ll = va_arg(args, int64);
+                 break;
+             case ATYPE_DOUBLE:
+                 argvalues[i].d = va_arg(args, double);
+                 break;
+             case ATYPE_CHARPTR:
+                 argvalues[i].cptr = va_arg(args, char *);
+                 break;
+         }
+     }
+
+     return true;
+ }
+
  static void
! fmtstr(const char *value, int leftjust, int minlen, int maxwidth,
         int pointflag, PrintfTarget *target)
  {
      int            padlen,
*************** fmtstr(char *value, int leftjust, int mi
*** 861,877 ****
      else
          vallen = strlen(value);

!     adjust_padlen(minlen, vallen, leftjust, &padlen);

!     while (padlen > 0)
      {
!         dopr_outch(' ', target);
!         --padlen;
      }

      dostr(value, vallen, target);

!     trailing_pad(&padlen, target);
  }

  static void
--- 942,958 ----
      else
          vallen = strlen(value);

!     padlen = compute_padlen(minlen, vallen, leftjust);

!     if (padlen > 0)
      {
!         dopr_outchmulti(' ', padlen, target);
!         padlen = 0;
      }

      dostr(value, vallen, target);

!     trailing_pad(padlen, target);
  }

  static void
*************** fmtint(int64 value, char type, int force
*** 899,905 ****
      int            signvalue = 0;
      char        convert[64];
      int            vallen = 0;
!     int            padlen = 0;        /* amount to pad */
      int            zeropad;        /* extra leading zeroes */

      switch (type)
--- 980,986 ----
      int            signvalue = 0;
      char        convert[64];
      int            vallen = 0;
!     int            padlen;            /* amount to pad */
      int            zeropad;        /* extra leading zeroes */

      switch (type)
*************** fmtint(int64 value, char type, int force
*** 947,988 ****

          do
          {
!             convert[vallen++] = cvt[uvalue % base];
              uvalue = uvalue / base;
          } while (uvalue);
      }

      zeropad = Max(0, precision - vallen);

!     adjust_padlen(minlen, vallen + zeropad, leftjust, &padlen);

!     leading_pad(zpad, &signvalue, &padlen, target);

!     while (zeropad-- > 0)
!         dopr_outch('0', target);

!     while (vallen > 0)
!         dopr_outch(convert[--vallen], target);

!     trailing_pad(&padlen, target);
  }

  static void
  fmtchar(int value, int leftjust, int minlen, PrintfTarget *target)
  {
!     int            padlen = 0;        /* amount to pad */

!     adjust_padlen(minlen, 1, leftjust, &padlen);

!     while (padlen > 0)
      {
!         dopr_outch(' ', target);
!         --padlen;
      }

      dopr_outch(value, target);

!     trailing_pad(&padlen, target);
  }

  static void
--- 1028,1068 ----

          do
          {
!             convert[sizeof(convert) - (++vallen)] = cvt[uvalue % base];
              uvalue = uvalue / base;
          } while (uvalue);
      }

      zeropad = Max(0, precision - vallen);

!     padlen = compute_padlen(minlen, vallen + zeropad, leftjust);

!     leading_pad(zpad, signvalue, &padlen, target);

!     if (zeropad > 0)
!         dopr_outchmulti('0', zeropad, target);

!     dostr(convert + sizeof(convert) - vallen, vallen, target);

!     trailing_pad(padlen, target);
  }

  static void
  fmtchar(int value, int leftjust, int minlen, PrintfTarget *target)
  {
!     int            padlen;            /* amount to pad */

!     padlen = compute_padlen(minlen, 1, leftjust);

!     if (padlen > 0)
      {
!         dopr_outchmulti(' ', padlen, target);
!         padlen = 0;
      }

      dopr_outch(value, target);

!     trailing_pad(padlen, target);
  }

  static void
*************** fmtfloat(double value, char type, int fo
*** 993,1002 ****
      int            signvalue = 0;
      int            prec;
      int            vallen;
!     char        fmt[32];
      char        convert[1024];
      int            zeropadlen = 0; /* amount to pad with zeroes */
!     int            padlen = 0;        /* amount to pad with spaces */

      /*
       * We rely on the regular C library's sprintf to do the basic conversion,
--- 1073,1086 ----
      int            signvalue = 0;
      int            prec;
      int            vallen;
!     char        fmt[8];
      char        convert[1024];
      int            zeropadlen = 0; /* amount to pad with zeroes */
!     int            padlen;            /* amount to pad with spaces */
!
!     /* Handle sign (NaNs have no sign) */
!     if (!isnan(value) && adjust_sign((value < 0), forcesign, &signvalue))
!         value = -value;

      /*
       * We rely on the regular C library's sprintf to do the basic conversion,
*************** fmtfloat(double value, char type, int fo
*** 1018,1034 ****

      if (pointflag)
      {
-         if (sprintf(fmt, "%%.%d%c", prec, type) < 0)
-             goto fail;
          zeropadlen = precision - prec;
      }
-     else if (sprintf(fmt, "%%%c", type) < 0)
-         goto fail;
-
-     if (!isnan(value) && adjust_sign((value < 0), forcesign, &signvalue))
-         value = -value;
-
-     vallen = sprintf(convert, fmt, value);
      if (vallen < 0)
          goto fail;

--- 1102,1122 ----

      if (pointflag)
      {
          zeropadlen = precision - prec;
+         fmt[0] = '%';
+         fmt[1] = '.';
+         fmt[2] = '*';
+         fmt[3] = type;
+         fmt[4] = '\0';
+         vallen = sprintf(convert, fmt, prec, value);
+     }
+     else
+     {
+         fmt[0] = '%';
+         fmt[1] = type;
+         fmt[2] = '\0';
+         vallen = sprintf(convert, fmt, value);
      }
      if (vallen < 0)
          goto fail;

*************** fmtfloat(double value, char type, int fo
*** 1036,1044 ****
      if (zeropadlen > 0 && !isdigit((unsigned char) convert[vallen - 1]))
          zeropadlen = 0;

!     adjust_padlen(minlen, vallen + zeropadlen, leftjust, &padlen);

!     leading_pad(zpad, &signvalue, &padlen, target);

      if (zeropadlen > 0)
      {
--- 1124,1132 ----
      if (zeropadlen > 0 && !isdigit((unsigned char) convert[vallen - 1]))
          zeropadlen = 0;

!     padlen = compute_padlen(minlen, vallen + zeropadlen, leftjust);

!     leading_pad(zpad, signvalue, &padlen, target);

      if (zeropadlen > 0)
      {
*************** fmtfloat(double value, char type, int fo
*** 1049,1066 ****
              epos = strrchr(convert, 'E');
          if (epos)
          {
!             /* pad after exponent */
              dostr(convert, epos - convert, target);
!             while (zeropadlen-- > 0)
!                 dopr_outch('0', target);
              dostr(epos, vallen - (epos - convert), target);
          }
          else
          {
              /* no exponent, pad after the digits */
              dostr(convert, vallen, target);
!             while (zeropadlen-- > 0)
!                 dopr_outch('0', target);
          }
      }
      else
--- 1137,1154 ----
              epos = strrchr(convert, 'E');
          if (epos)
          {
!             /* pad before exponent */
              dostr(convert, epos - convert, target);
!             if (zeropadlen > 0)
!                 dopr_outchmulti('0', zeropadlen, target);
              dostr(epos, vallen - (epos - convert), target);
          }
          else
          {
              /* no exponent, pad after the digits */
              dostr(convert, vallen, target);
!             if (zeropadlen > 0)
!                 dopr_outchmulti('0', zeropadlen, target);
          }
      }
      else
*************** fmtfloat(double value, char type, int fo
*** 1069,1075 ****
          dostr(convert, vallen, target);
      }

!     trailing_pad(&padlen, target);
      return;

  fail:
--- 1157,1163 ----
          dostr(convert, vallen, target);
      }

!     trailing_pad(padlen, target);
      return;

  fail:
*************** fail:
*** 1079,1084 ****
--- 1167,1179 ----
  static void
  dostr(const char *str, int slen, PrintfTarget *target)
  {
+     /* fast path for common case of slen == 1 */
+     if (slen == 1)
+     {
+         dopr_outch(*str, target);
+         return;
+     }
+
      while (slen > 0)
      {
          int            avail;
*************** dopr_outch(int c, PrintfTarget *target)
*** 1122,1127 ****
--- 1217,1258 ----
      *(target->bufptr++) = c;
  }

+ static void
+ dopr_outchmulti(int c, int slen, PrintfTarget *target)
+ {
+     /* fast path for common case of slen == 1 */
+     if (slen == 1)
+     {
+         dopr_outch(c, target);
+         return;
+     }
+
+     while (slen > 0)
+     {
+         int            avail;
+
+         if (target->bufend != NULL)
+             avail = target->bufend - target->bufptr;
+         else
+             avail = slen;
+         if (avail <= 0)
+         {
+             /* buffer full, can we dump to stream? */
+             if (target->stream == NULL)
+             {
+                 target->nchars += slen; /* no, lose the data */
+                 return;
+             }
+             flushbuffer(target);
+             continue;
+         }
+         avail = Min(avail, slen);
+         memset(target->bufptr, c, avail);
+         target->bufptr += avail;
+         slen -= avail;
+     }
+ }
+

  static int
  adjust_sign(int is_negative, int forcesign, int *signvalue)
*************** adjust_sign(int is_negative, int forcesi
*** 1137,1178 ****
  }


! static void
! adjust_padlen(int minlen, int vallen, int leftjust, int *padlen)
  {
!     *padlen = minlen - vallen;
!     if (*padlen < 0)
!         *padlen = 0;
      if (leftjust)
!         *padlen = -(*padlen);
  }


  static void
! leading_pad(int zpad, int *signvalue, int *padlen, PrintfTarget *target)
  {
      if (*padlen > 0 && zpad)
      {
!         if (*signvalue)
          {
!             dopr_outch(*signvalue, target);
              --(*padlen);
!             *signvalue = 0;
          }
!         while (*padlen > 0)
          {
!             dopr_outch(zpad, target);
!             --(*padlen);
          }
      }
!     while (*padlen > (*signvalue != 0))
      {
!         dopr_outch(' ', target);
!         --(*padlen);
      }
!     if (*signvalue)
      {
!         dopr_outch(*signvalue, target);
          if (*padlen > 0)
              --(*padlen);
          else if (*padlen < 0)
--- 1268,1315 ----
  }


! static int
! compute_padlen(int minlen, int vallen, int leftjust)
  {
!     int            padlen;
!
!     padlen = minlen - vallen;
!     if (padlen < 0)
!         padlen = 0;
      if (leftjust)
!         padlen = -padlen;
!     return padlen;
  }


  static void
! leading_pad(int zpad, int signvalue, int *padlen, PrintfTarget *target)
  {
+     int            maxpad;
+
      if (*padlen > 0 && zpad)
      {
!         if (signvalue)
          {
!             dopr_outch(signvalue, target);
              --(*padlen);
!             signvalue = 0;
          }
!         if (*padlen > 0)
          {
!             dopr_outchmulti(zpad, *padlen, target);
!             *padlen = 0;
          }
      }
!     maxpad = (signvalue != 0);
!     if (*padlen > maxpad)
      {
!         dopr_outchmulti(' ', *padlen - maxpad, target);
!         *padlen = maxpad;
      }
!     if (signvalue)
      {
!         dopr_outch(signvalue, target);
          if (*padlen > 0)
              --(*padlen);
          else if (*padlen < 0)
*************** leading_pad(int zpad, int *signvalue, in
*** 1182,1192 ****


  static void
! trailing_pad(int *padlen, PrintfTarget *target)
  {
!     while (*padlen < 0)
!     {
!         dopr_outch(' ', target);
!         ++(*padlen);
!     }
  }
--- 1319,1326 ----


  static void
! trailing_pad(int padlen, PrintfTarget *target)
  {
!     if (padlen < 0)
!         dopr_outchmulti(' ', -padlen, target);
  }
diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index df35557..2e68991 100644
*** a/src/backend/utils/adt/float.c
--- b/src/backend/utils/adt/float.c
*************** float4out(PG_FUNCTION_ARGS)
*** 258,269 ****
              break;
          default:
              {
                  int            ndig = FLT_DIG + extra_float_digits;

                  if (ndig < 1)
                      ndig = 1;

!                 ascii = psprintf("%.*g", ndig, num);
              }
      }

--- 258,287 ----
              break;
          default:
              {
+                 /*
+                  * We don't go through snprintf.c here because, for this
+                  * particular choice of format string, it adds nothing of
+                  * value to the native behavior of sprintf() --- except
+                  * handling buffer overrun.  We just make the buffer big
+                  * enough to not have to worry.
+                  */
+ #undef sprintf
                  int            ndig = FLT_DIG + extra_float_digits;
+                 int            len PG_USED_FOR_ASSERTS_ONLY;

+                 /* Neither of these limits can trigger, but be paranoid */
                  if (ndig < 1)
                      ndig = 1;
+                 else if (ndig > 32)
+                     ndig = 32;

!                 ascii = (char *) palloc(64);
!
!                 len = sprintf(ascii, "%.*g", ndig, num);
!
!                 Assert(len > 0 && len < 64);
!
! #define sprintf pg_sprintf
              }
      }

*************** float8out_internal(double num)
*** 494,505 ****
              break;
          default:
              {
                  int            ndig = DBL_DIG + extra_float_digits;

                  if (ndig < 1)
                      ndig = 1;

!                 ascii = psprintf("%.*g", ndig, num);
              }
      }

--- 512,541 ----
              break;
          default:
              {
+                 /*
+                  * We don't go through snprintf.c here because, for this
+                  * particular choice of format string, it adds nothing of
+                  * value to the native behavior of sprintf() --- except
+                  * handling buffer overrun.  We just make the buffer big
+                  * enough to not have to worry.
+                  */
+ #undef sprintf
                  int            ndig = DBL_DIG + extra_float_digits;
+                 int            len PG_USED_FOR_ASSERTS_ONLY;

+                 /* Neither of these limits can trigger, but be paranoid */
                  if (ndig < 1)
                      ndig = 1;
+                 else if (ndig > 32)
+                     ndig = 32;

!                 ascii = (char *) palloc(64);
!
!                 len = sprintf(ascii, "%.*g", ndig, num);
!
!                 Assert(len > 0 && len < 64);
!
! #define sprintf pg_sprintf
              }
      }

old glibc (RHEL6) vs snprintf.c HEAD:

Test case: %2$.*3$f %1$d
snprintf time = 768.907 ms total, 0.000768907 ms per iteration
pg_snprintf time = 1059.85 ms total, 0.00105985 ms per iteration
ratio = 1.378

Test case: %.*g
snprintf time = 782.535 ms total, 0.000782535 ms per iteration
pg_snprintf time = 1061.71 ms total, 0.00106171 ms per iteration
ratio = 1.357

Test case: %d %d
snprintf time = 162.616 ms total, 0.000162616 ms per iteration
pg_snprintf time = 135.103 ms total, 0.000135103 ms per iteration
ratio = 0.831

Test case: %10d
snprintf time = 143.027 ms total, 0.000143027 ms per iteration
pg_snprintf time = 123.307 ms total, 0.000123307 ms per iteration
ratio = 0.862

Test case: %s
snprintf time = 306.78 ms total, 0.00030678 ms per iteration
pg_snprintf time = 89.8803 ms total, 8.98803e-05 ms per iteration
ratio = 0.293

Test case: %sx
snprintf time = 310.329 ms total, 0.000310329 ms per iteration
pg_snprintf time = 97.4798 ms total, 9.74798e-05 ms per iteration
ratio = 0.314

Test case: %d
0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890
snprintf time = 155.288 ms total, 0.000155288 ms per iteration
pg_snprintf time = 1393.57 ms total, 0.00139357 ms per iteration
ratio = 8.974

Test case: %1$d
0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890
snprintf time = 238.737 ms total, 0.000238737 ms per iteration
pg_snprintf time = 1407 ms total, 0.001407 ms per iteration
ratio = 5.894


old glibc (RHEL6) vs snprintf.c v5 patch:

Test case: %2$.*3$f %1$d
snprintf time = 768.425 ms total, 0.000768425 ms per iteration
pg_snprintf time = 895.599 ms total, 0.000895599 ms per iteration
ratio = 1.165

Test case: %.*g
snprintf time = 781.704 ms total, 0.000781704 ms per iteration
pg_snprintf time = 854.84 ms total, 0.00085484 ms per iteration
ratio = 1.094

Test case: %d %d
snprintf time = 163.596 ms total, 0.000163596 ms per iteration
pg_snprintf time = 134.837 ms total, 0.000134837 ms per iteration
ratio = 0.824

Test case: %10d
snprintf time = 143.684 ms total, 0.000143684 ms per iteration
pg_snprintf time = 91.3437 ms total, 9.13437e-05 ms per iteration
ratio = 0.636

Test case: %s
snprintf time = 307.769 ms total, 0.000307769 ms per iteration
pg_snprintf time = 70.5951 ms total, 7.05951e-05 ms per iteration
ratio = 0.229

Test case: %sx
snprintf time = 309.641 ms total, 0.000309641 ms per iteration
pg_snprintf time = 85.2016 ms total, 8.52016e-05 ms per iteration
ratio = 0.275

Test case: %d
0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890
snprintf time = 154.108 ms total, 0.000154108 ms per iteration
pg_snprintf time = 126.425 ms total, 0.000126425 ms per iteration
ratio = 0.820

Test case: %1$d
0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890
snprintf time = 239.205 ms total, 0.000239205 ms per iteration
pg_snprintf time = 220.167 ms total, 0.000220167 ms per iteration
ratio = 0.920


new glibc (Fedora 28) vs snprintf.c HEAD:

Test case: %2$.*3$f %1$d
snprintf time = 486.994 ms total, 0.000486994 ms per iteration
pg_snprintf time = 579.769 ms total, 0.000579769 ms per iteration
ratio = 1.191

Test case: %.*g
snprintf time = 472.281 ms total, 0.000472281 ms per iteration
pg_snprintf time = 640.499 ms total, 0.000640499 ms per iteration
ratio = 1.356

Test case: %d %d
snprintf time = 82.5786 ms total, 8.25786e-05 ms per iteration
pg_snprintf time = 82.8582 ms total, 8.28582e-05 ms per iteration
ratio = 1.003

Test case: %10d
snprintf time = 76.3904 ms total, 7.63904e-05 ms per iteration
pg_snprintf time = 68.9414 ms total, 6.89414e-05 ms per iteration
ratio = 0.902

Test case: %s
snprintf time = 7.80277 ms total, 7.80277e-06 ms per iteration
pg_snprintf time = 38.8177 ms total, 3.88177e-05 ms per iteration
ratio = 4.975

Test case: %sx
snprintf time = 58.1852 ms total, 5.81852e-05 ms per iteration
pg_snprintf time = 40.532 ms total, 4.0532e-05 ms per iteration
ratio = 0.697

Test case: %d
0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890
snprintf time = 69.3546 ms total, 6.93546e-05 ms per iteration
pg_snprintf time = 578.326 ms total, 0.000578326 ms per iteration
ratio = 8.339

Test case: %1$d
0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890
snprintf time = 124.402 ms total, 0.000124402 ms per iteration
pg_snprintf time = 589.417 ms total, 0.000589417 ms per iteration
ratio = 4.738


new glibc (Fedora 28) vs snprintf.c v5 patch:

Test case: %2$.*3$f %1$d
snprintf time = 488.016 ms total, 0.000488016 ms per iteration
pg_snprintf time = 486.862 ms total, 0.000486862 ms per iteration
ratio = 0.998

Test case: %.*g
snprintf time = 472.963 ms total, 0.000472963 ms per iteration
pg_snprintf time = 514.987 ms total, 0.000514987 ms per iteration
ratio = 1.089

Test case: %d %d
snprintf time = 82.7157 ms total, 8.27157e-05 ms per iteration
pg_snprintf time = 80.7262 ms total, 8.07262e-05 ms per iteration
ratio = 0.976

Test case: %10d
snprintf time = 77.337 ms total, 7.7337e-05 ms per iteration
pg_snprintf time = 48.0275 ms total, 4.80275e-05 ms per iteration
ratio = 0.621

Test case: %s
snprintf time = 7.74283 ms total, 7.74283e-06 ms per iteration
pg_snprintf time = 25.7698 ms total, 2.57698e-05 ms per iteration
ratio = 3.328

Test case: %sx
snprintf time = 59.1255 ms total, 5.91255e-05 ms per iteration
pg_snprintf time = 30.5548 ms total, 3.05548e-05 ms per iteration
ratio = 0.517

Test case: %d
0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890
snprintf time = 68.0464 ms total, 6.80464e-05 ms per iteration
pg_snprintf time = 55.9024 ms total, 5.59024e-05 ms per iteration
ratio = 0.822

Test case: %1$d
0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890
snprintf time = 124.641 ms total, 0.000124641 ms per iteration
pg_snprintf time = 79.1252 ms total, 7.91252e-05 ms per iteration
ratio = 0.635


FreeBSD 11.0 vs snprintf.c HEAD:

Test case: %2$.*3$f %1$d
snprintf time = 592.785 ms total, 0.000592785 ms per iteration
pg_snprintf time = 604.65 ms total, 0.00060465 ms per iteration
ratio = 1.020

Test case: %.*g
snprintf time = 514.67 ms total, 0.00051467 ms per iteration
pg_snprintf time = 822.477 ms total, 0.000822477 ms per iteration
ratio = 1.598

Test case: %d %d
snprintf time = 182.617 ms total, 0.000182617 ms per iteration
pg_snprintf time = 81.1515 ms total, 8.11515e-05 ms per iteration
ratio = 0.444

Test case: %10d
snprintf time = 147.134 ms total, 0.000147134 ms per iteration
pg_snprintf time = 60.1539 ms total, 6.01539e-05 ms per iteration
ratio = 0.409

Test case: %s
snprintf time = 130.103 ms total, 0.000130103 ms per iteration
pg_snprintf time = 65.1186 ms total, 6.51186e-05 ms per iteration
ratio = 0.501

Test case: %sx
snprintf time = 161.099 ms total, 0.000161099 ms per iteration
pg_snprintf time = 67.597 ms total, 6.7597e-05 ms per iteration
ratio = 0.420

Test case: %d
0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890
snprintf time = 272.016 ms total, 0.000272016 ms per iteration
pg_snprintf time = 576.049 ms total, 0.000576049 ms per iteration
ratio = 2.118

Test case: %1$d
0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890
snprintf time = 377.863 ms total, 0.000377863 ms per iteration
pg_snprintf time = 591.675 ms total, 0.000591675 ms per iteration
ratio = 1.566


FreeBSD 11.0 vs snprintf.c v5 patch:

Test case: %2$.*3$f %1$d
snprintf time = 506.083 ms total, 0.000506083 ms per iteration
pg_snprintf time = 406.696 ms total, 0.000406696 ms per iteration
ratio = 0.804

Test case: %.*g
snprintf time = 516.361 ms total, 0.000516361 ms per iteration
pg_snprintf time = 583.85 ms total, 0.00058385 ms per iteration
ratio = 1.131

Test case: %d %d
snprintf time = 183.341 ms total, 0.000183341 ms per iteration
pg_snprintf time = 115.537 ms total, 0.000115537 ms per iteration
ratio = 0.630

Test case: %10d
snprintf time = 148.071 ms total, 0.000148071 ms per iteration
pg_snprintf time = 77.2526 ms total, 7.72526e-05 ms per iteration
ratio = 0.522

Test case: %s
snprintf time = 130.47 ms total, 0.00013047 ms per iteration
pg_snprintf time = 63.1665 ms total, 6.31665e-05 ms per iteration
ratio = 0.484

Test case: %sx
snprintf time = 161.038 ms total, 0.000161038 ms per iteration
pg_snprintf time = 66.365 ms total, 6.6365e-05 ms per iteration
ratio = 0.412

Test case: %d
0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890
snprintf time = 275.3 ms total, 0.0002753 ms per iteration
pg_snprintf time = 169.824 ms total, 0.000169824 ms per iteration
ratio = 0.617

Test case: %1$d
0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890
snprintf time = 377.169 ms total, 0.000377169 ms per iteration
pg_snprintf time = 324.277 ms total, 0.000324277 ms per iteration
ratio = 0.860


OpenBSD 6.0 vs snprintf.c HEAD:

Test case: %2$.*3$f %1$d
snprintf time = 366.693 ms total, 0.000366693 ms per iteration
pg_snprintf time = 575.859 ms total, 0.000575859 ms per iteration
ratio = 1.570

Test case: %.*g
snprintf time = 487.42 ms total, 0.00048742 ms per iteration
pg_snprintf time = 783.286 ms total, 0.000783286 ms per iteration
ratio = 1.607

Test case: %d %d
snprintf time = 175.423 ms total, 0.000175423 ms per iteration
pg_snprintf time = 95.583 ms total, 9.5583e-05 ms per iteration
ratio = 0.545

Test case: %10d
snprintf time = 145.509 ms total, 0.000145509 ms per iteration
pg_snprintf time = 80.7363 ms total, 8.07363e-05 ms per iteration
ratio = 0.555

Test case: %s
snprintf time = 160.49 ms total, 0.00016049 ms per iteration
pg_snprintf time = 69.3179 ms total, 6.93179e-05 ms per iteration
ratio = 0.432

Test case: %sx
snprintf time = 179.017 ms total, 0.000179017 ms per iteration
pg_snprintf time = 72.7243 ms total, 7.27243e-05 ms per iteration
ratio = 0.406

Test case: %d
0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890
snprintf time = 893.578 ms total, 0.000893578 ms per iteration
pg_snprintf time = 721.957 ms total, 0.000721957 ms per iteration
ratio = 0.808

Test case: %1$d
0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890
snprintf time = 1696.39 ms total, 0.00169639 ms per iteration
pg_snprintf time = 731.144 ms total, 0.000731144 ms per iteration
ratio = 0.431


OpenBSD 6.0 vs snprintf.c v5 patch:

Test case: %2$.*3$f %1$d
snprintf time = 368.158 ms total, 0.000368158 ms per iteration
pg_snprintf time = 408.526 ms total, 0.000408526 ms per iteration
ratio = 1.110

Test case: %.*g
snprintf time = 486.741 ms total, 0.000486741 ms per iteration
pg_snprintf time = 548.396 ms total, 0.000548396 ms per iteration
ratio = 1.127

Test case: %d %d
snprintf time = 175.101 ms total, 0.000175101 ms per iteration
pg_snprintf time = 123.1 ms total, 0.0001231 ms per iteration
ratio = 0.703

Test case: %10d
snprintf time = 145.661 ms total, 0.000145661 ms per iteration
pg_snprintf time = 82.1964 ms total, 8.21964e-05 ms per iteration
ratio = 0.564

Test case: %s
snprintf time = 160.213 ms total, 0.000160213 ms per iteration
pg_snprintf time = 62.7285 ms total, 6.27285e-05 ms per iteration
ratio = 0.392

Test case: %sx
snprintf time = 178.462 ms total, 0.000178462 ms per iteration
pg_snprintf time = 66.0236 ms total, 6.60236e-05 ms per iteration
ratio = 0.370

Test case: %d
0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890
snprintf time = 892.687 ms total, 0.000892687 ms per iteration
pg_snprintf time = 183.339 ms total, 0.000183339 ms per iteration
ratio = 0.205

Test case: %1$d
0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890
snprintf time = 1672.81 ms total, 0.00167281 ms per iteration
pg_snprintf time = 218.761 ms total, 0.000218761 ms per iteration
ratio = 0.131


macOS Mojave vs snprintf.c HEAD:

Test case: %2$.*3$f %1$d
snprintf time = 215.462 ms total, 0.000215462 ms per iteration
pg_snprintf time = 301.006 ms total, 0.000301006 ms per iteration
ratio = 1.397

Test case: %.*g
snprintf time = 329.865 ms total, 0.000329865 ms per iteration
pg_snprintf time = 501.367 ms total, 0.000501367 ms per iteration
ratio = 1.520

Test case: %d %d
snprintf time = 83.8561 ms total, 8.38561e-05 ms per iteration
pg_snprintf time = 65.7689 ms total, 6.57689e-05 ms per iteration
ratio = 0.784

Test case: %10d
snprintf time = 65.5346 ms total, 6.55346e-05 ms per iteration
pg_snprintf time = 59.9587 ms total, 5.99587e-05 ms per iteration
ratio = 0.915

Test case: %s
snprintf time = 67.0085 ms total, 6.70085e-05 ms per iteration
pg_snprintf time = 32.315 ms total, 3.2315e-05 ms per iteration
ratio = 0.482

Test case: %sx
snprintf time = 78.1157 ms total, 7.81157e-05 ms per iteration
pg_snprintf time = 32.7136 ms total, 3.27136e-05 ms per iteration
ratio = 0.419

Test case: %d
0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890
snprintf time = 172.349 ms total, 0.000172349 ms per iteration
pg_snprintf time = 412.957 ms total, 0.000412957 ms per iteration
ratio = 2.396

Test case: %1$d
0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890
snprintf time = 303.392 ms total, 0.000303392 ms per iteration
pg_snprintf time = 427.246 ms total, 0.000427246 ms per iteration
ratio = 1.408


macOS Mojave vs snprintf.c v5 patch:

Test case: %2$.*3$f %1$d
snprintf time = 212.13 ms total, 0.00021213 ms per iteration
pg_snprintf time = 244.277 ms total, 0.000244277 ms per iteration
ratio = 1.152

Test case: %.*g
snprintf time = 337.137 ms total, 0.000337137 ms per iteration
pg_snprintf time = 371.386 ms total, 0.000371386 ms per iteration
ratio = 1.102

Test case: %d %d
snprintf time = 80.876 ms total, 8.0876e-05 ms per iteration
pg_snprintf time = 64.8164 ms total, 6.48164e-05 ms per iteration
ratio = 0.801

Test case: %10d
snprintf time = 60.1313 ms total, 6.01313e-05 ms per iteration
pg_snprintf time = 40.7794 ms total, 4.07794e-05 ms per iteration
ratio = 0.678

Test case: %s
snprintf time = 61.55 ms total, 6.155e-05 ms per iteration
pg_snprintf time = 24.1078 ms total, 2.41078e-05 ms per iteration
ratio = 0.392

Test case: %sx
snprintf time = 71.2324 ms total, 7.12324e-05 ms per iteration
pg_snprintf time = 35.2292 ms total, 3.52292e-05 ms per iteration
ratio = 0.495

Test case: %d
0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890
snprintf time = 149.266 ms total, 0.000149266 ms per iteration
pg_snprintf time = 101.321 ms total, 0.000101321 ms per iteration
ratio = 0.679

Test case: %1$d
0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890
snprintf time = 306.861 ms total, 0.000306861 ms per iteration
pg_snprintf time = 122.933 ms total, 0.000122933 ms per iteration
ratio = 0.401

#include "postgres_fe.h"

#include "portability/instr_time.h"

#include "snprintf.c"


#define TIMETEST(fmtstr,...) \
    printf("Test case: %s\n", fmtstr); \
    INSTR_TIME_SET_CURRENT(start); \
    for (i = 0; i < count; i++) \
        snprintf(buffer, sizeof(buffer), fmtstr, __VA_ARGS__); \
    INSTR_TIME_SET_CURRENT(stop); \
    INSTR_TIME_SUBTRACT(stop, start); \
    elapsed = INSTR_TIME_GET_MILLISEC(stop); \
    printf("snprintf time = %g ms total, %g ms per iteration\n", \
           elapsed, elapsed / count); \
    INSTR_TIME_SET_CURRENT(start); \
    for (i = 0; i < count; i++) \
        pg_snprintf(buffer, sizeof(buffer), fmtstr, __VA_ARGS__); \
    INSTR_TIME_SET_CURRENT(stop); \
    INSTR_TIME_SUBTRACT(stop, start); \
    elapsed2 = INSTR_TIME_GET_MILLISEC(stop); \
    printf("pg_snprintf time = %g ms total, %g ms per iteration\n", \
           elapsed2, elapsed2 / count); \
    printf("ratio = %.3f\n\n", elapsed2 / elapsed)

int
main(int argc, char **argv)
{
    int            count = 0;
    char        buffer[1000];
    instr_time    start;
    instr_time    stop;
    double        elapsed;
    double        elapsed2;
    int            i;

    if (argc > 1)
        count = atoi(argv[1]);
    if (count <= 0)
        count = 1000000;

    TIMETEST("%2$.*3$f %1$d", 42, 123.456, 2);

    TIMETEST("%.*g", 15, 123.456);

    TIMETEST("%d %d", 15, 16);

    TIMETEST("%10d", 15);

    TIMETEST("%s",

"0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890");

    TIMETEST("%sx",

"0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890");

    TIMETEST("%d
0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890",
             42);

    TIMETEST("%1$d
0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890",
             42);

    return 0;
}

Re: Performance improvements for src/port/snprintf.c

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> I've looked through the patch. Looks good to me. Some minor notes:

[ didn't see this till after sending my previous ]

> - How about adding our own strchrnul for the case where we don't
>   HAVE_STRCHRNUL?  It's possible that other platforms have something
>   similar, and the code wouldlook more readable that way.

Sure, we could just make a "static inline strchrnul()" for use when
!HAVE_STRCHRNUL.  No objection.

> - I know it's not new, but is it actually correct to use va_arg(args, int64)
>   for ATYPE_LONGLONG?

Well, the problem with just doing s/int64/long long/g is that the
code would then fail on compilers without a "long long" type.
We could ifdef our way around that, but I don't think the code would
end up prettier.

Given that we only ever use "ll" modifiers via INT64_FORMAT, and that
that'll only be set to "ll" if int64 is indeed "long long", those code
paths should be dead code in any situation where the type pun is wrong.

            regards, tom lane


Re: Performance improvements for src/port/snprintf.c

От
Andres Freund
Дата:
On 2018-10-02 17:54:31 -0400, Tom Lane wrote:
> Here's a version of this patch rebased over commit 625b38ea0.
> 
> That commit's fix for the possibly-expensive memset means that we need
> to reconsider performance numbers for this patch.  I re-ran my previous
> tests, and it's still looking like this is a substantial win, as it makes
> snprintf.c faster than the native snprintf for most non-float cases.
> We're still stuck at something like 10% penalty for float cases.

Cool.  Let's get that in...


> While there might be value in implementing our own float printing code,
> I have a pretty hard time getting excited about the cost/benefit ratio
> of that.  I think that what we probably really ought to do here is hack
> float4out/float8out to bypass the extra overhead, as in the 0002 patch
> below.

I'm thinking we should do a bit more than just that hack. I'm thinking
of something (barely tested) like

int
pg_double_to_string(char *buf, size_t bufsize, char tp, int precision, double val)
{
    char        fmt[8];

#ifdef HAVE_STRFROMD

    if (precision != -1)
    {
        fmt[0] = '%';
        fmt[1] = '.';
        fmt[2] = '0' + precision / 10;
        fmt[3] = '0' + precision % 10;
        fmt[4] = tp;
        fmt[5] = '\0';
    }
    else
    {
        fmt[0] = '%';
        fmt[1] = tp;
        fmt[2] = '\0';
    }

    return strfromd(buf, bufsize, fmt, val);
#else

    if (precision != -1)
    {
        fmt[0] = '%';
        fmt[1] = '.';
        fmt[2] = '*';
        fmt[3] = tp;
        fmt[4] = '\0';
    }
    else
    {
        fmt[0] = '%';
        fmt[1] = tp;
        fmt[2] = '\0';
    }

#undef snprintf
    return snprintf(buf, bufsize, fmt, precision, val);
#define sprintf pg_snprintf
#endif
}

and putting that in string.h or such.

Then we'd likely be faster both when going through pg_sprintf etc when
strfromd is available, and by using it directly in float8out etc, we'd
be at least as fast as before.

I can clean that up, just not tonight.


FWIW, I think there's still a significant argument to be made that we
should work on our floating point IO performance. Both on the input and
output side. It's a significant practical problem. But both a fix like
you describe, and my proposal, should bring us to at least the previous
level of performance for the hot paths. So that'd then just be an
independent consideration.


Greetings,

Andres Freund


Re: Performance improvements for src/port/snprintf.c

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2018-10-02 17:54:31 -0400, Tom Lane wrote:
>> Here's a version of this patch rebased over commit 625b38ea0.

> Cool.  Let's get that in...

Cool, I'll push it shortly.

>> While there might be value in implementing our own float printing code,
>> I have a pretty hard time getting excited about the cost/benefit ratio
>> of that.  I think that what we probably really ought to do here is hack
>> float4out/float8out to bypass the extra overhead, as in the 0002 patch
>> below.

> I'm thinking we should do a bit more than just that hack. I'm thinking
> of something (barely tested) like

Meh.  The trouble with that is that it relies on the platform's snprintf,
not sprintf, and that brings us right back into a world of portability
hurt.  I don't feel that the move to C99 gets us out of worrying about
noncompliant snprintfs --- we're only requiring a C99 *compiler*, not
libc.  See buildfarm member gharial for a counterexample.

I'm happy to look into whether using strfromd when available buys us
anything over using sprintf.  I'm not entirely convinced that it will,
because of the need to ASCII-ize and de-ASCII-ize the precision, but
it's worth checking.

> FWIW, I think there's still a significant argument to be made that we
> should work on our floating point IO performance. Both on the input and
> output side. It's a significant practical problem. But both a fix like
> you describe, and my proposal, should bring us to at least the previous
> level of performance for the hot paths. So that'd then just be an
> independent consideration.

Well, an independent project anyway.  I concur that it would have value;
but whether it's worth the effort, and the possible behavioral changes,
is not very clear to me.

            regards, tom lane


Re: Performance improvements for src/port/snprintf.c

От
Andres Freund
Дата:
Hi,

On 2018-10-03 08:20:14 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> >> While there might be value in implementing our own float printing code,
> >> I have a pretty hard time getting excited about the cost/benefit ratio
> >> of that.  I think that what we probably really ought to do here is hack
> >> float4out/float8out to bypass the extra overhead, as in the 0002 patch
> >> below.
> 
> > I'm thinking we should do a bit more than just that hack. I'm thinking
> > of something (barely tested) like
> 
> Meh.  The trouble with that is that it relies on the platform's snprintf,
> not sprintf, and that brings us right back into a world of portability
> hurt.  I don't feel that the move to C99 gets us out of worrying about
> noncompliant snprintfs --- we're only requiring a C99 *compiler*, not
> libc.  See buildfarm member gharial for a counterexample.

Oh, we could just use sprintf() and tell strfromd the buffer is large
enough. I only used snprintf because it seemed more symmetric, and
because I was at most 1/3 awake.


> I'm happy to look into whether using strfromd when available buys us
> anything over using sprintf.  I'm not entirely convinced that it will,
> because of the need to ASCII-ize and de-ASCII-ize the precision, but
> it's worth checking.

It's definitely faster.  It's not a full-blown format parser, so I guess
the cost of the conversion isn't too bad:
https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strfrom-skeleton.c;hb=HEAD#l68

CREATE TABLE somefloats(id serial, data1 float8, data2 float8, data3 float8);
INSERT INTO somefloats(data1, data2, data3) SELECT random(), random(), random() FROM generate_series(1, 10000000);
VACUUM FREEZE somefloats;

I'm comparing the times of:
COPY somefloats TO '/dev/null';

master (including your commit):
16177.202 ms

snprintf using sprintf via pg_double_to_string:
16195.787

snprintf using strfromd via pg_double_to_string:
14856.974 ms

float8out using sprintf via pg_double_to_string:
16176.169

float8out using strfromd via pg_double_to_string:
13532.698



FWIW, it seems that using a local buffer and than pstrdup'ing that in
float8out_internal is a bit faster, and would probably save a bit of
memory on average:

float8out using sprintf via pg_double_to_string, pstrdup:
15370.774

float8out using strfromd via pg_double_to_string, pstrdup:
13498.331


Greetings,

Andres Freund


Re: Performance improvements for src/port/snprintf.c

От
Tom Lane
Дата:
I wrote:
> ... However, I did add recent glibc (Fedora 28)
> to the mix, and I was interested to discover that they seem to have
> added a fast-path for format strings that are exactly "%s", just as
> NetBSD did.  I wonder if we should reconsider our position on doing
> that.  It'd be a simple enough addition...

I experimented with adding an initial check for "format is exactly %s"
at the top of dopr(), and couldn't get excited about that.  Instrumenting
things showed that the optimization fired in only 1.8% of the calls
during a run of our core regression tests.  Now, that might not count
as a really representative workload, but it doesn't make me think that
the case is worth optimizing for us.

But then it occurred to me that there's more than one way to skin this
cat.  We could, for an even cheaper extra test, detect that any one
format specifier is just "%s", and use the same kind of fast-path
within the loop.  With the same sort of instrumentation, I found that
a full 45% of the format specs executed in the core regression tests
are just %s.  That makes me think that a patch along the lines of the
attached is a good win for our use-cases.  Comparing to Fedora 28's
glibc, this gets us to

Test case: %s
snprintf time = 8.83615 ms total, 8.83615e-06 ms per iteration
pg_snprintf time = 23.9372 ms total, 2.39372e-05 ms per iteration
ratio = 2.709

Test case: %sx
snprintf time = 59.4481 ms total, 5.94481e-05 ms per iteration
pg_snprintf time = 29.8983 ms total, 2.98983e-05 ms per iteration
ratio = 0.503

versus what we have as of this morning's commit:

Test case: %s
snprintf time = 7.7427 ms total, 7.7427e-06 ms per iteration
pg_snprintf time = 26.2439 ms total, 2.62439e-05 ms per iteration
ratio = 3.390

Test case: %sx
snprintf time = 61.4452 ms total, 6.14452e-05 ms per iteration
pg_snprintf time = 32.7516 ms total, 3.27516e-05 ms per iteration
ratio = 0.533

The penalty for non-%s cases seems to be a percent or so, although
it's barely above the noise floor in my tests.

            regards, tom lane

diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index cad7345..b9b6add 100644
*** a/src/port/snprintf.c
--- b/src/port/snprintf.c
*************** dopr(PrintfTarget *target, const char *f
*** 431,436 ****
--- 431,449 ----

          /* Process conversion spec starting at *format */
          format++;
+
+         /* Fast path for conversion spec that is exactly %s */
+         if (*format == 's')
+         {
+             format++;
+             strvalue = va_arg(args, char *);
+             Assert(strvalue != NULL);
+             dostr(strvalue, strlen(strvalue), target);
+             if (target->failed)
+                 break;
+             continue;
+         }
+
          fieldwidth = precision = zpad = leftjust = forcesign = 0;
          longflag = longlongflag = pointflag = 0;
          fmtpos = accum = 0;

Re: Performance improvements for src/port/snprintf.c

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> FWIW, it seems that using a local buffer and than pstrdup'ing that in
> float8out_internal is a bit faster, and would probably save a bit of
> memory on average:
> float8out using sprintf via pg_double_to_string, pstrdup:
> 15370.774
> float8out using strfromd via pg_double_to_string, pstrdup:
> 13498.331

[ scratches head ... ]  How would that work?  Seems like it necessarily
adds a strlen() call to whatever we'd be doing otherwise.  palloc isn't
going to be any faster just from asking it for slightly fewer bytes.
I think there might be something wrong with your test scenario ...
or there's more noise in the numbers than you thought.

            regards, tom lane


Re: Performance improvements for src/port/snprintf.c

От
Andres Freund
Дата:
Hi,

On 2018-10-03 12:07:32 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > FWIW, it seems that using a local buffer and than pstrdup'ing that in
> > float8out_internal is a bit faster, and would probably save a bit of
> > memory on average:
> > float8out using sprintf via pg_double_to_string, pstrdup:
> > 15370.774
> > float8out using strfromd via pg_double_to_string, pstrdup:
> > 13498.331
> 
> [ scratches head ... ]  How would that work?  Seems like it necessarily
> adds a strlen() call to whatever we'd be doing otherwise.  palloc isn't
> going to be any faster just from asking it for slightly fewer bytes.
> I think there might be something wrong with your test scenario ...
> or there's more noise in the numbers than you thought.

I guess the difference is that we're more likely to find reusable chunks
in aset.c and/or need fewer OS allocations.  As the memory is going to
be touched again very shortly afterwards, the cache effects probably are
neglegible.

The strlen definitely shows up in profiles, it just seems to save at
least as much as it costs.

Doesn't strike me as THAT odd?

Greetings,

Andres Freund


Re: Performance improvements for src/port/snprintf.c

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2018-10-03 12:07:32 -0400, Tom Lane wrote:
>> [ scratches head ... ]  How would that work?  Seems like it necessarily
>> adds a strlen() call to whatever we'd be doing otherwise.  palloc isn't
>> going to be any faster just from asking it for slightly fewer bytes.
>> I think there might be something wrong with your test scenario ...
>> or there's more noise in the numbers than you thought.

> I guess the difference is that we're more likely to find reusable chunks
> in aset.c and/or need fewer OS allocations.  As the memory is going to
> be touched again very shortly afterwards, the cache effects probably are
> neglegible.

> The strlen definitely shows up in profiles, it just seems to save at
> least as much as it costs.

> Doesn't strike me as THAT odd?

What it strikes me as is excessively dependent on one particular test
scenario.  I don't mind optimizations that are tradeoffs between
well-understood costs, but this smells like handwaving that's going to
lose as much or more often than winning, once it hits the real world.

            regards, tom lane


Re: Performance improvements for src/port/snprintf.c

От
Andres Freund
Дата:
Hi,

On 2018-10-03 11:59:27 -0400, Tom Lane wrote:
> I wrote:
> > ... However, I did add recent glibc (Fedora 28)
> > to the mix, and I was interested to discover that they seem to have
> > added a fast-path for format strings that are exactly "%s", just as
> > NetBSD did.  I wonder if we should reconsider our position on doing
> > that.  It'd be a simple enough addition...
> 
> I experimented with adding an initial check for "format is exactly %s"
> at the top of dopr(), and couldn't get excited about that.  Instrumenting
> things showed that the optimization fired in only 1.8% of the calls
> during a run of our core regression tests.  Now, that might not count
> as a really representative workload, but it doesn't make me think that
> the case is worth optimizing for us.

Seems right.  I also have a hard time to believe that any of those "%s"
printfs are performance critical - we'd hopefully just have avoided the
sprintf in that case.


> But then it occurred to me that there's more than one way to skin this
> cat.  We could, for an even cheaper extra test, detect that any one
> format specifier is just "%s", and use the same kind of fast-path
> within the loop.  With the same sort of instrumentation, I found that
> a full 45% of the format specs executed in the core regression tests
> are just %s.  That makes me think that a patch along the lines of the
> attached is a good win for our use-cases.  Comparing to Fedora 28's
> glibc, this gets us to

Hm, especially if we special case the float->string conversions directly
at the hot callsites, that seems reasonable.  I kinda wish we could just
easily move the format string processing to compile-time, but given
translatability that won't be widely possible even if it were otherwise
feasible.

Greetings,

Andres Freund


Re: Performance improvements for src/port/snprintf.c

От
Andres Freund
Дата:
Hi,

On 2018-10-03 12:22:13 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2018-10-03 12:07:32 -0400, Tom Lane wrote:
> >> [ scratches head ... ]  How would that work?  Seems like it necessarily
> >> adds a strlen() call to whatever we'd be doing otherwise.  palloc isn't
> >> going to be any faster just from asking it for slightly fewer bytes.
> >> I think there might be something wrong with your test scenario ...
> >> or there's more noise in the numbers than you thought.
> 
> > I guess the difference is that we're more likely to find reusable chunks
> > in aset.c and/or need fewer OS allocations.  As the memory is going to
> > be touched again very shortly afterwards, the cache effects probably are
> > neglegible.
> 
> > The strlen definitely shows up in profiles, it just seems to save at
> > least as much as it costs.
> 
> > Doesn't strike me as THAT odd?
> 
> What it strikes me as is excessively dependent on one particular test
> scenario.  I don't mind optimizations that are tradeoffs between
> well-understood costs, but this smells like handwaving that's going to
> lose as much or more often than winning, once it hits the real world.

I'm not particularly wedded to doing the allocation differently - I was
just mildly wondering if the increased size of the allocations could be
problematic. And that lead me to testing that. And reporting it.   I
don't think the real-world test differences are that large in this
specific case, but whatever.

It seems the general "use strfromd if available" approach is generally
useful, even if we need to serialize the precision.  Putting it into an
inline appears to be helpful, avoids some of the otherwise precision
related branches.  Do you have any feelings about which header to put
the code in?  I used common/string.h so far.

Greetings,

Andres Freund


Re: Performance improvements for src/port/snprintf.c

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2018-10-03 11:59:27 -0400, Tom Lane wrote:
>> I experimented with adding an initial check for "format is exactly %s"
>> at the top of dopr(), and couldn't get excited about that.  Instrumenting
>> things showed that the optimization fired in only 1.8% of the calls
>> during a run of our core regression tests.  Now, that might not count
>> as a really representative workload, but it doesn't make me think that
>> the case is worth optimizing for us.

> Seems right.  I also have a hard time to believe that any of those "%s"
> printfs are performance critical - we'd hopefully just have avoided the
> sprintf in that case.

Yup, that's probably a good chunk of the reason why there aren't very
many.  But we *do* use %s a lot to assemble multiple strings or combine
them with fixed text, which is why the other form of the optimization
is useful.

>> But then it occurred to me that there's more than one way to skin this
>> cat.  We could, for an even cheaper extra test, detect that any one
>> format specifier is just "%s", and use the same kind of fast-path
>> within the loop.  With the same sort of instrumentation, I found that
>> a full 45% of the format specs executed in the core regression tests
>> are just %s.  That makes me think that a patch along the lines of the
>> attached is a good win for our use-cases.  Comparing to Fedora 28's
>> glibc, this gets us to

> Hm, especially if we special case the float->string conversions directly
> at the hot callsites, that seems reasonable.  I kinda wish we could just
> easily move the format string processing to compile-time, but given
> translatability that won't be widely possible even if it were otherwise
> feasible.

Yeah, there's a mighty big pile of infrastructure that depends on the
way *printf works.  I agree that one way or another we're going to be
special-casing float8out and float4out.

BTW, I poked around in the related glibc sources the other day, and
it seemed like they are doing some sort of quasi-compilation of format
strings.  I couldn't figure out how they made it pay, though --- without
some way to avoid re-compiling the same format string over and over,
seems like it couldn't net out as a win.  But if they are avoiding
that, I didn't find where.

            regards, tom lane


Re: Performance improvements for src/port/snprintf.c

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> It seems the general "use strfromd if available" approach is generally
> useful, even if we need to serialize the precision.

Agreed.

> Putting it into an
> inline appears to be helpful, avoids some of the otherwise precision
> related branches.  Do you have any feelings about which header to put
> the code in?  I used common/string.h so far.

I do not think it should be in a header, for two reasons:

(1) The need to use sprintf for portability means that we need very
tight constraints on the precision spec *and* the buffer size *and*
the format type (%f pretty much destroys certainty about how long the
output string is).  So this isn't going to be general purpose code.
I think just writing it into float[48]out is sufficient.

(2) It's already the case that most code trying to emit floats ought
to go through float[48]out, in order to have standardized treatment
of Inf and NaN.  Providing some other API in a common header would
just create a temptation to break that policy.

Now, if we did write our own float output code then we would standardize
Inf/NaN outputs inside that, and both of these issues would go away ...
but I think what we'd do is provide something strfromd-like as an
alternate API for that code, so we still won't need a wrapper.
And anyway it doesn't sound like either of us care to jump that hurdle
right now.

            regards, tom lane


Re: Performance improvements for src/port/snprintf.c

От
Andres Freund
Дата:
Hi,

On 2018-10-03 12:54:52 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > It seems the general "use strfromd if available" approach is generally
> > useful, even if we need to serialize the precision.
> 
> Agreed.
> 
> > Putting it into an
> > inline appears to be helpful, avoids some of the otherwise precision
> > related branches.  Do you have any feelings about which header to put
> > the code in?  I used common/string.h so far.
> 
> I do not think it should be in a header, for two reasons:
> 
> (1) The need to use sprintf for portability means that we need very
> tight constraints on the precision spec *and* the buffer size *and*
> the format type (%f pretty much destroys certainty about how long the
> output string is).  So this isn't going to be general purpose code.
> I think just writing it into float[48]out is sufficient.

Well, the numbers suggest it's also useful to do so from snprintf - it's
not that rare that we output floating point numbers from semi
performance critical code, even leaving aside float[48]out.  So I'm not
convinced that we shouldn't do this from within snprintf.c too. Now we
could open-code it twice, but i'm not sure I see the point.

If we just define the API as having to guarantee there's enough space
for the output format, I think it'll work well enough for now?
snprintf.c already assumes everything floating point can be output in
1024 chars, no?

Greetings,

Andres Freund


Re: Performance improvements for src/port/snprintf.c

От
Tom Lane
Дата:
I wrote:
> Andres Freund <andres@anarazel.de> writes:
>> - I know it's not new, but is it actually correct to use va_arg(args, int64)
>> for ATYPE_LONGLONG?

> Well, the problem with just doing s/int64/long long/g is that the
> code would then fail on compilers without a "long long" type.
> We could ifdef our way around that, but I don't think the code would
> end up prettier.

I spent a bit more time thinking about that point.  My complaint about
lack of long long should be moot given that we're now requiring C99.
So the two cases we need to worry about are (1) long long exists and
is 64 bits, and (2) long long exists and is wider than 64 bits.  In
case (1) there's nothing actively wrong with the code as it stands.
In case (2), if we were to fix the problem by s/int64/long long/g,
the result would be that we'd be doing the arithmetic for all
integer-to-text conversions in 128 bits, which seems likely to be
pretty expensive.

So a "real" fix would probably require having separate versions of
fmtint for long and long long.  I'm not terribly excited about
going there.  I can see it happening some day when/if we need to
use 128-bit math more extensively than today, but I do not think
that day is close.  (Are there *any* platforms where "long long"
is 128 bits today?)

Having said that, maybe there's a case for changing the type spec
in only the va_arg() call, and leaving snprintf's related local
variables as int64.  (Is that what you actually meant?)  Then,
if long long really is different from int64, at least we have
predictable truncation behavior after fetching the value, rather
than undefined behavior while fetching it.

            regards, tom lane


Re: Performance improvements for src/port/snprintf.c

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2018-10-03 12:54:52 -0400, Tom Lane wrote:
>> (1) The need to use sprintf for portability means that we need very
>> tight constraints on the precision spec *and* the buffer size *and*
>> the format type (%f pretty much destroys certainty about how long the
>> output string is).  So this isn't going to be general purpose code.
>> I think just writing it into float[48]out is sufficient.

> Well, the numbers suggest it's also useful to do so from snprintf - it's
> not that rare that we output floating point numbers from semi
> performance critical code, even leaving aside float[48]out.  So I'm not
> convinced that we shouldn't do this from within snprintf.c too. Now we
> could open-code it twice, but i'm not sure I see the point.

I do not see the point of messing with snprintf.c here.  I doubt that
strfromd is faster than the existing sprintf call (because the latter
can use ".*" instead of serializing and deserializing the precision).
Even if it is, I do not want to expose an attractive-nuisance API
in a header, and I think this would be exactly that.

> If we just define the API as having to guarantee there's enough space
> for the output format, I think it'll work well enough for now?

No, because that's a recipe for buffer-overflow bugs.  It's *hard*
to be sure the buffer is big enough, and easy to make breakable
assumptions.

> snprintf.c already assumes everything floating point can be output in
> 1024 chars, no?

Indeed, and it's got hacks like a forced limit to precision 350 in order
to make that safe.  I don't want to be repeating the reasoning in
fmtfloat() in a bunch of other places.

            regards, tom lane


Re: Performance improvements for src/port/snprintf.c

От
Andres Freund
Дата:
Hi,

On 2018-10-03 13:31:09 -0400, Tom Lane wrote:
> I do not see the point of messing with snprintf.c here.  I doubt that
> strfromd is faster than the existing sprintf call (because the latter
> can use ".*" instead of serializing and deserializing the precision).

I'm confused, the numbers I posted clearly show that it's faster?

Greetings,

Andres Freund


Re: Performance improvements for src/port/snprintf.c

От
Andres Freund
Дата:
Hi,

On 2018-10-03 13:18:35 -0400, Tom Lane wrote:
> I wrote:
> > Andres Freund <andres@anarazel.de> writes:
> >> - I know it's not new, but is it actually correct to use va_arg(args, int64)
> >> for ATYPE_LONGLONG?
> 
> > Well, the problem with just doing s/int64/long long/g is that the
> > code would then fail on compilers without a "long long" type.
> > We could ifdef our way around that, but I don't think the code would
> > end up prettier.
> 
> I spent a bit more time thinking about that point.  My complaint about
> lack of long long should be moot given that we're now requiring C99.

True, I didn't think of that.


> So the two cases we need to worry about are (1) long long exists and
> is 64 bits, and (2) long long exists and is wider than 64 bits.  In
> case (1) there's nothing actively wrong with the code as it stands.
> In case (2), if we were to fix the problem by s/int64/long long/g,
> the result would be that we'd be doing the arithmetic for all
> integer-to-text conversions in 128 bits, which seems likely to be
> pretty expensive.

Yea, that seems quite undesirable.


> So a "real" fix would probably require having separate versions of
> fmtint for long and long long.  I'm not terribly excited about
> going there.  I can see it happening some day when/if we need to
> use 128-bit math more extensively than today, but I do not think
> that day is close.

Right, that seems a bit off.


> (Are there *any* platforms where "long long" is 128 bits today?)

Not that I'm aware off.


> Having said that, maybe there's a case for changing the type spec
> in only the va_arg() call, and leaving snprintf's related local
> variables as int64.  (Is that what you actually meant?)  Then,
> if long long really is different from int64, at least we have
> predictable truncation behavior after fetching the value, rather
> than undefined behavior while fetching it.

Hm. I guess that'd be a bit better, but I'm not sure it's worth it. How
about we simply add a static assert that long long isn't bigger than
int64?

Greetings,

Andres Freund


Re: Performance improvements for src/port/snprintf.c

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2018-10-03 13:31:09 -0400, Tom Lane wrote:
>> I do not see the point of messing with snprintf.c here.  I doubt that
>> strfromd is faster than the existing sprintf call (because the latter
>> can use ".*" instead of serializing and deserializing the precision).

> I'm confused, the numbers I posted clearly show that it's faster?

Those were in the context of whether float8out went through snprintf.c
or directly to strfromd, no?

            regards, tom lane


Re: Performance improvements for src/port/snprintf.c

От
Andres Freund
Дата:
On 2018-10-03 13:40:03 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2018-10-03 13:31:09 -0400, Tom Lane wrote:
> >> I do not see the point of messing with snprintf.c here.  I doubt that
> >> strfromd is faster than the existing sprintf call (because the latter
> >> can use ".*" instead of serializing and deserializing the precision).
>
> > I'm confused, the numbers I posted clearly show that it's faster?
>
> Those were in the context of whether float8out went through snprintf.c
> or directly to strfromd, no?

I measured both, changing float8out directly, and just adapting
snprintf.c:

> snprintf using sprintf via pg_double_to_string:
> 16195.787
>
> snprintf using strfromd via pg_double_to_string:
> 14856.974 ms
>
> float8out using sprintf via pg_double_to_string:
> 16176.169
>
> float8out using strfromd via pg_double_to_string:
> 13532.698

So when using pg's snprintf() to print a single floating point number
with precision, we get nearly a 10% boost. The win unsurprisingly is
bigger when not going through snprintf.c.

Greetings,

Andres Freund


Re: Performance improvements for src/port/snprintf.c

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2018-10-03 13:18:35 -0400, Tom Lane wrote:
>> Having said that, maybe there's a case for changing the type spec
>> in only the va_arg() call, and leaving snprintf's related local
>> variables as int64.  (Is that what you actually meant?)  Then,
>> if long long really is different from int64, at least we have
>> predictable truncation behavior after fetching the value, rather
>> than undefined behavior while fetching it.

> Hm. I guess that'd be a bit better, but I'm not sure it's worth it. How
> about we simply add a static assert that long long isn't bigger than
> int64?

WFM, I'll make it happen.

            regards, tom lane


Re: Performance improvements for src/port/snprintf.c

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> So when using pg's snprintf() to print a single floating point number
> with precision, we get nearly a 10% boost.

I just tested that using my little standalone testbed, and I failed
to replicate the result.  I do see that strfromd is slightly faster,
but it's just a few percent measuring snprintf.c in isolation --- in
the overall context of COPY, I don't see how you get to 10% net savings.

So I continue to think there's something fishy about your test case.

BTW, so far as I can tell on F28, strfromd isn't exposed without
"-D__STDC_WANT_IEC_60559_BFP_EXT__", which seems fairly scary;
what else does that affect?

            regards, tom lane

diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index b9b6add..f75369c 100644
--- a/src/port/snprintf.c
+++ b/src/port/snprintf.c
@@ -1137,17 +1137,19 @@ fmtfloat(double value, char type, int forcesign, int leftjust,
         zeropadlen = precision - prec;
         fmt[0] = '%';
         fmt[1] = '.';
-        fmt[2] = '*';
-        fmt[3] = type;
-        fmt[4] = '\0';
-        vallen = sprintf(convert, fmt, prec, value);
+        fmt[2] = (prec / 100) + '0';
+        fmt[3] = ((prec % 100) / 10) + '0';
+        fmt[4] = (prec % 10) + '0';
+        fmt[5] = type;
+        fmt[6] = '\0';
+        vallen = strfromd(convert, sizeof(convert), fmt, value);
     }
     else
     {
         fmt[0] = '%';
         fmt[1] = type;
         fmt[2] = '\0';
-        vallen = sprintf(convert, fmt, value);
+        vallen = strfromd(convert, sizeof(convert), fmt, value);
     }
     if (vallen < 0)
         goto fail;

Re: Performance improvements for src/port/snprintf.c

От
Alvaro Herrera
Дата:
On 2018-Oct-03, Tom Lane wrote:

> Andres Freund <andres@anarazel.de> writes:

> BTW, so far as I can tell on F28, strfromd isn't exposed without
> "-D__STDC_WANT_IEC_60559_BFP_EXT__", which seems fairly scary;
> what else does that affect?

https://en.cppreference.com/w/c/experimental/fpext1

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Performance improvements for src/port/snprintf.c

От
Andres Freund
Дата:
Hi,

On 2018-10-03 14:01:35 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > So when using pg's snprintf() to print a single floating point number
> > with precision, we get nearly a 10% boost.
> 
> I just tested that using my little standalone testbed, and I failed
> to replicate the result.  I do see that strfromd is slightly faster,
> but it's just a few percent measuring snprintf.c in isolation --- in
> the overall context of COPY, I don't see how you get to 10% net savings.

I just tested your patch, and I see (best of three):

master:
16224.727 ms
hack-use-of-strfromd.patch:
14944.927 ms

So not quite 10%, but pretty close.


CREATE TABLE somefloats(id serial, data1 float8, data2 float8, data3 float8);
INSERT INTO somefloats(data1, data2, data3) SELECT random(), random(), random() FROM generate_series(1, 10000000);
VACUUM FREEZE somefloats;

COPY somefloats TO '/dev/null';

What difference do you see?


> So I continue to think there's something fishy about your test case.
> 
> BTW, so far as I can tell on F28, strfromd isn't exposed without
> "-D__STDC_WANT_IEC_60559_BFP_EXT__", which seems fairly scary;
> what else does that affect?

My copy says:

#undef __GLIBC_USE_IEC_60559_BFP_EXT
#if defined __USE_GNU || defined __STDC_WANT_IEC_60559_BFP_EXT__
# define __GLIBC_USE_IEC_60559_BFP_EXT 1
#else
# define __GLIBC_USE_IEC_60559_BFP_EXT 0
#endif

And __USE_GNU is enabled by
#ifdef    _GNU_SOURCE
# define __USE_GNU    1
#endif

So I don't think anything's needed to enable that in pg, given that we
define _GNU_SOURCE


Greetings,

Andres Freund


Re: Performance improvements for src/port/snprintf.c

От
Tom Lane
Дата:
I wrote:
>> Hm. I guess that'd be a bit better, but I'm not sure it's worth it. How
>> about we simply add a static assert that long long isn't bigger than
>> int64?

> WFM, I'll make it happen.

Actually, while writing a comment to go with that assertion, I decided
this was dumb.  If we're expecting the compiler to have "long long",
and if we're convinced that no platforms define "long long" as wider
than 64 bits, we may as well go with the s/int64/long long/g solution.
That should result in no code change on any platform today.  And it
will still work correctly, if maybe a bit inefficiently, on some
hypothetical future platform where long long is wider.  We (or our
successors) can worry about optimizing that when the time comes.

            regards, tom lane


Re: Performance improvements for src/port/snprintf.c

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2018-10-03 14:01:35 -0400, Tom Lane wrote:
>> BTW, so far as I can tell on F28, strfromd isn't exposed without
>> "-D__STDC_WANT_IEC_60559_BFP_EXT__", which seems fairly scary;
>> what else does that affect?

> So I don't think anything's needed to enable that in pg, given that we
> define _GNU_SOURCE

Ah, OK.  I thought my test case had _GNU_SOURCE defined already,
but it didn't.  You might want to do something like what I stuck
in for strchrnul, though:

/*
 * glibc's <string.h> declares strchrnul only if _GNU_SOURCE is defined.
 * While we typically use that on glibc platforms, configure will set
 * HAVE_STRCHRNUL whether it's used or not.  Fill in the missing declaration
 * so that this file will compile cleanly with or without _GNU_SOURCE.
 */
#ifndef _GNU_SOURCE
extern char *strchrnul(const char *s, int c);
#endif


            regards, tom lane


Re: Performance improvements for src/port/snprintf.c

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> [ let's use strfromd ]

So I'm having second thoughts about this, based on the fact that
strfromd() in't strictly a glibc-ism but is defined in an ISO/IEC
standard.  That means that we can expect to see it start showing up
on other platforms (though a quick search did not find any evidence
that it has yet).  And that means that we'd better consider
quality-of-implementation issues.  We know that glibc's version is
fractionally faster than using sprintf with "%.*g", but what are
the odds that that will be true universally?  I don't have a warm
feeling about it, given that strfromd's API isn't a very good impedance
match to what we really need.

I really think that what we ought to do is apply the float[48]out hack
I showed in <30551.1538517271@sss.pgh.pa.us> and call it good, at least
till such time as somebody wants to propose a full-on reimplementation of
float output.  I don't want to buy back into having platform dependencies
in this area after having just expended a lot of sweat to get rid of them.

            regards, tom lane


Re: Performance improvements for src/port/snprintf.c

От
Andres Freund
Дата:
Hi,

On 2018-10-05 11:54:59 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > [ let's use strfromd ]
> 
> So I'm having second thoughts about this, based on the fact that
> strfromd() in't strictly a glibc-ism but is defined in an ISO/IEC
> standard.  That means that we can expect to see it start showing up
> on other platforms (though a quick search did not find any evidence
> that it has yet).  And that means that we'd better consider
> quality-of-implementation issues.  We know that glibc's version is
> fractionally faster than using sprintf with "%.*g", but what are
> the odds that that will be true universally?  I don't have a warm
> feeling about it, given that strfromd's API isn't a very good impedance
> match to what we really need.
> 
> I really think that what we ought to do is apply the float[48]out hack
> I showed in <30551.1538517271@sss.pgh.pa.us> and call it good, at least
> till such time as somebody wants to propose a full-on reimplementation of
> float output.  I don't want to buy back into having platform dependencies
> in this area after having just expended a lot of sweat to get rid of them.

I'm not convinced. Because of some hypothetical platform that may
introduce strfromd() in a broken/slower manner, but where sprintf() is
correct, we should not do the minimal work to alleviate an actual
performance bottleneck in a trivial manner on linux? Our most widely
used platform?  If we find a platform where it's borked, we could just
add a small hack into their platform template file.

Greetings,

Andres Freund


Re: Performance improvements for src/port/snprintf.c

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2018-10-05 11:54:59 -0400, Tom Lane wrote:
>> I really think that what we ought to do is apply the float[48]out hack
>> I showed in <30551.1538517271@sss.pgh.pa.us> and call it good, at least
>> till such time as somebody wants to propose a full-on reimplementation of
>> float output.  I don't want to buy back into having platform dependencies
>> in this area after having just expended a lot of sweat to get rid of them.

> I'm not convinced. Because of some hypothetical platform that may
> introduce strfromd() in a broken/slower manner, but where sprintf() is
> correct, we should not do the minimal work to alleviate an actual
> performance bottleneck in a trivial manner on linux? Our most widely
> used platform?  If we find a platform where it's borked, we could just
> add a small hack into their platform template file.

If it were a significant performance improvement, I'd be okay with that
conclusion, but my measurements say that it's not.  The extra complication
is not free, and in my judgement it's not worth it.

We certainly do need to buy back the performance we lost in float[48]out,
but the hack I suggested does so --- on all platforms, not only Linux.

            regards, tom lane


Re: Performance improvements for src/port/snprintf.c

От
Tom Lane
Дата:
I stepped back a bit from the raw performance question and thought about
what we actually want functionally in snprintf's float handling.  There
are a couple of points worth making:

* The fact that float[48]out explicitly handle NaN and Inf cases is a
leftover from when they had to cope with varying behavior of
platform-specific snprintf implementations.  Now that we've standardized
on snprintf.c, it makes a lot more sense to enforce standardized printing
of these values inside snprintf.c.  That not only avoids repeated tests
for these cases at different code levels, but ensures that the uniform
behavior exists for all our invocations of *printf, not just float[48]out.

* snprintf.c doesn't really work right for IEEE minus zero, as I recently
noted in another thread (<23662.1538067926@sss.pgh.pa.us>).  While this
is not of significance for float[48]out, it might be a problem for other
callers.  Now that we've enforced usage of snprintf.c across-the-board,
I think it's more important to worry about these corner cases.  It's not
that expensive to fix either; we can test for minus zero with something
like this:
        static const double dzero = 0.0;
        if (value == 0.0 &&
            memcmp(&value, &dzero, sizeof(double)) != 0)
(ie, "it's equal to zero but not bitwise equal to zero").  While that
looks like it might be expensive, I find that recent versions of both
gcc and clang can optimize the memcmp call down to something like
        cmpq    $0, 8(%rsp)
so I think it's well worth the cost to get this right.

The attached proposed patch addresses both of those points.

Also, in service of the performance angle, I went ahead and made a
roughly strfromd-like entry point in snprintf.c, but using an API
that doesn't force textual conversion of the precision spec.

As best I can tell, this patch puts the performance of float8out
on par with what it was in v11, measuring using a tight loop like

    while (count-- > 0)
    {
        char *str = float8out_internal(val);
        pfree(str);
        CHECK_FOR_INTERRUPTS();
    }

For me, this is within a percent or two either way on a couple of
different machines; that's within the noise level.

            regards, tom lane

diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index df35557..260377c 100644
*** a/src/backend/utils/adt/float.c
--- b/src/backend/utils/adt/float.c
*************** Datum
*** 243,272 ****
  float4out(PG_FUNCTION_ARGS)
  {
      float4        num = PG_GETARG_FLOAT4(0);
!     char       *ascii;
!
!     if (isnan(num))
!         PG_RETURN_CSTRING(pstrdup("NaN"));
!
!     switch (is_infinite(num))
!     {
!         case 1:
!             ascii = pstrdup("Infinity");
!             break;
!         case -1:
!             ascii = pstrdup("-Infinity");
!             break;
!         default:
!             {
!                 int            ndig = FLT_DIG + extra_float_digits;
!
!                 if (ndig < 1)
!                     ndig = 1;
!
!                 ascii = psprintf("%.*g", ndig, num);
!             }
!     }

      PG_RETURN_CSTRING(ascii);
  }

--- 243,252 ----
  float4out(PG_FUNCTION_ARGS)
  {
      float4        num = PG_GETARG_FLOAT4(0);
!     char       *ascii = (char *) palloc(32);
!     int            ndig = FLT_DIG + extra_float_digits;

+     (void) pg_strfromd(ascii, 32, ndig, num);
      PG_RETURN_CSTRING(ascii);
  }

*************** float8out(PG_FUNCTION_ARGS)
*** 479,508 ****
  char *
  float8out_internal(double num)
  {
!     char       *ascii;
!
!     if (isnan(num))
!         return pstrdup("NaN");
!
!     switch (is_infinite(num))
!     {
!         case 1:
!             ascii = pstrdup("Infinity");
!             break;
!         case -1:
!             ascii = pstrdup("-Infinity");
!             break;
!         default:
!             {
!                 int            ndig = DBL_DIG + extra_float_digits;
!
!                 if (ndig < 1)
!                     ndig = 1;
!
!                 ascii = psprintf("%.*g", ndig, num);
!             }
!     }

      return ascii;
  }

--- 459,468 ----
  char *
  float8out_internal(double num)
  {
!     char       *ascii = (char *) palloc(32);
!     int            ndig = DBL_DIG + extra_float_digits;

+     (void) pg_strfromd(ascii, 32, ndig, num);
      return ascii;
  }

diff --git a/src/include/port.h b/src/include/port.h
index e654d5c..0729c3f 100644
*** a/src/include/port.h
--- b/src/include/port.h
*************** extern int    pg_printf(const char *fmt,...
*** 187,192 ****
--- 187,195 ----
  #define fprintf            pg_fprintf
  #define printf(...)        pg_printf(__VA_ARGS__)

+ /* This is also provided by snprintf.c */
+ extern int    pg_strfromd(char *str, size_t count, int precision, double value);
+
  /* Replace strerror() with our own, somewhat more robust wrapper */
  extern char *pg_strerror(int errnum);
  #define strerror pg_strerror
diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index ef496fa..897c683 100644
*** a/src/port/snprintf.c
--- b/src/port/snprintf.c
*************** fmtfloat(double value, char type, int fo
*** 1111,1120 ****
      int            zeropadlen = 0; /* amount to pad with zeroes */
      int            padlen;            /* amount to pad with spaces */

-     /* Handle sign (NaNs have no sign) */
-     if (!isnan(value) && adjust_sign((value < 0), forcesign, &signvalue))
-         value = -value;
-
      /*
       * We rely on the regular C library's sprintf to do the basic conversion,
       * then handle padding considerations here.
--- 1111,1116 ----
*************** fmtfloat(double value, char type, int fo
*** 1128,1161 ****
       * bytes and limit requested precision to 350 digits; this should prevent
       * buffer overrun even with non-IEEE math.  If the original precision
       * request was more than 350, separately pad with zeroes.
       */
      if (precision < 0)            /* cover possible overflow of "accum" */
          precision = 0;
      prec = Min(precision, 350);

!     if (pointflag)
      {
!         zeropadlen = precision - prec;
!         fmt[0] = '%';
!         fmt[1] = '.';
!         fmt[2] = '*';
!         fmt[3] = type;
!         fmt[4] = '\0';
!         vallen = sprintf(convert, fmt, prec, value);
      }
      else
      {
!         fmt[0] = '%';
!         fmt[1] = type;
!         fmt[2] = '\0';
!         vallen = sprintf(convert, fmt, value);
!     }
!     if (vallen < 0)
!         goto fail;

!     /* If it's infinity or NaN, forget about doing any zero-padding */
!     if (zeropadlen > 0 && !isdigit((unsigned char) convert[vallen - 1]))
!         zeropadlen = 0;

      padlen = compute_padlen(minlen, vallen + zeropadlen, leftjust);

--- 1124,1185 ----
       * bytes and limit requested precision to 350 digits; this should prevent
       * buffer overrun even with non-IEEE math.  If the original precision
       * request was more than 350, separately pad with zeroes.
+      *
+      * We handle infinities and NaNs specially to ensure platform-independent
+      * output.
       */
      if (precision < 0)            /* cover possible overflow of "accum" */
          precision = 0;
      prec = Min(precision, 350);

!     if (isnan(value))
      {
!         strcpy(convert, "NaN");
!         vallen = 3;
!         /* no zero padding, regardless of precision spec */
      }
      else
      {
!         /*
!          * Handle sign (NaNs have no sign, so we don't do this in the case
!          * above).  "value < 0.0" will not be true for IEEE minus zero, so we
!          * detect that by looking for the case where value equals 0.0
!          * according to == but not according to memcmp.
!          */
!         static const double dzero = 0.0;

!         if (adjust_sign((value < 0.0 ||
!                          (value == 0.0 &&
!                           memcmp(&value, &dzero, sizeof(double)) != 0)),
!                         forcesign, &signvalue))
!             value = -value;
!
!         if (isinf(value))
!         {
!             strcpy(convert, "Infinity");
!             vallen = 8;
!             /* no zero padding, regardless of precision spec */
!         }
!         else if (pointflag)
!         {
!             zeropadlen = precision - prec;
!             fmt[0] = '%';
!             fmt[1] = '.';
!             fmt[2] = '*';
!             fmt[3] = type;
!             fmt[4] = '\0';
!             vallen = sprintf(convert, fmt, prec, value);
!         }
!         else
!         {
!             fmt[0] = '%';
!             fmt[1] = type;
!             fmt[2] = '\0';
!             vallen = sprintf(convert, fmt, value);
!         }
!         if (vallen < 0)
!             goto fail;
!     }

      padlen = compute_padlen(minlen, vallen + zeropadlen, leftjust);

*************** fail:
*** 1197,1202 ****
--- 1221,1316 ----
      target->failed = true;
  }

+ /*
+  * Nonstandard entry point to print a double value efficiently.
+  *
+  * This is approximately equivalent to strfromd(), but has an API more
+  * adapted to what float8out() wants.  The behavior is like snprintf()
+  * with a format of "%.ng", where n is the specified precision.
+  * However, the target buffer must be nonempty (i.e. count > 0), and
+  * the precision is silently bounded to a sane range.
+  */
+ int
+ pg_strfromd(char *str, size_t count, int precision, double value)
+ {
+     PrintfTarget target;
+     int            signvalue = 0;
+     int            vallen;
+     char        fmt[8];
+     char        convert[64];
+
+     /* Set up the target like pg_snprintf, but require nonempty buffer */
+     Assert(count > 0);
+     target.bufstart = target.bufptr = str;
+     target.bufend = str + count - 1;
+     target.stream = NULL;
+     target.nchars = 0;
+     target.failed = false;
+
+     /*
+      * We bound precision to a reasonable range; the combination of this and
+      * the knowledge that we're using "g" format without padding allows the
+      * convert[] buffer to be reasonably small.
+      */
+     if (precision < 1)
+         precision = 1;
+     else if (precision > 32)
+         precision = 32;
+
+     /*
+      * The rest is just an inlined version of the fmtfloat() logic above,
+      * simplified using the knowledge that no padding is wanted.
+      */
+     if (isnan(value))
+     {
+         strcpy(convert, "NaN");
+         vallen = 3;
+     }
+     else
+     {
+         static const double dzero = 0.0;
+
+         if (value < 0.0 ||
+             (value == 0.0 &&
+              memcmp(&value, &dzero, sizeof(double)) != 0))
+         {
+             signvalue = '-';
+             value = -value;
+         }
+
+         if (isinf(value))
+         {
+             strcpy(convert, "Infinity");
+             vallen = 8;
+         }
+         else
+         {
+             fmt[0] = '%';
+             fmt[1] = '.';
+             fmt[2] = '*';
+             fmt[3] = 'g';
+             fmt[4] = '\0';
+             vallen = sprintf(convert, fmt, precision, value);
+             if (vallen < 0)
+             {
+                 target.failed = true;
+                 goto fail;
+             }
+         }
+     }
+
+     if (signvalue)
+         dopr_outch(signvalue, &target);
+
+     dostr(convert, vallen, &target);
+
+ fail:
+     *(target.bufptr) = '\0';
+     return target.failed ? -1 : (target.bufptr - target.bufstart
+                                  + target.nchars);
+ }
+
+
  static void
  dostr(const char *str, int slen, PrintfTarget *target)
  {

Re: Performance improvements for src/port/snprintf.c

От
Andrew Gierth
Дата:
>>>>> "Andres" == Andres Freund <andres@anarazel.de> writes:

 Andres> I'm not convinced. Because of some hypothetical platform that
 Andres> may introduce strfromd() in a broken/slower manner, but where
 Andres> sprintf() is correct, we should not do the minimal work to
 Andres> alleviate an actual performance bottleneck in a trivial manner
 Andres> on linux? Our most widely used platform? If we find a platform
 Andres> where it's borked, we could just add a small hack into their
 Andres> platform template file.

So here's a thing: I finally got to doing my performance tests for using
the Ryu float output code in float[48]out.

Ryu is so blazing fast that with it, COPY of a table with 2million rows
of 12 random float8 columns (plus id) becomes FASTER in text mode than
in binary mode (rather than ~5x slower):

copy binary flttst to '/dev/null';  -- binary
Time: 3222.444 ms (00:03.222)

copy flttst to '/dev/null';  -- non-Ryu
Time: 16416.161 ms (00:16.416)

copy flttst to '/dev/null';  -- Ryu
Time: 2691.642 ms (00:02.692)

(And yes, I've double-checked the results and they look correct, other
than the formatting differences. COPY BINARY seems to have a bit more
overhead than text mode, even for just doing integers, I don't know
why.)

-- 
Andrew (irc:RhodiumToad)


Re: Performance improvements for src/port/snprintf.c

От
Tom Lane
Дата:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> So here's a thing: I finally got to doing my performance tests for using
> the Ryu float output code in float[48]out.
> Ryu is so blazing fast that with it, COPY of a table with 2million rows
> of 12 random float8 columns (plus id) becomes FASTER in text mode than
> in binary mode (rather than ~5x slower):

Oh yeah?  Where's the code for this?

> (And yes, I've double-checked the results and they look correct, other
> than the formatting differences. COPY BINARY seems to have a bit more
> overhead than text mode, even for just doing integers, I don't know
> why.)

The per-column overhead is more (length word vs delimiter) and I think
the APIs for send/recv functions are potentially a bit less efficient
too.

            regards, tom lane


Re: Performance improvements for src/port/snprintf.c

От
Andrew Gierth
Дата:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 >> Ryu is so blazing fast that with it, COPY of a table with 2million
 >> rows of 12 random float8 columns (plus id) becomes FASTER in text
 >> mode than in binary mode (rather than ~5x slower):

 Tom> Oh yeah?  Where's the code for this?

Upstream code is at https://github.com/ulfjack/ryu

Most of that is benchmarking, java, and other stuff not interesting to
us; the guts are under ryu/ and are dual-licensed under Boost 1.0 (which
I think we can use, since the only difference from BSD seems to be a
permissive one) as well as Apache 2.0 (which AFAIK we can't use).

I attach the patch I've used for testing, which has these changes from
upstream Ryu:

  - added ryu_ prefix to entry point functions
  - changed some #include file locations
  - added #define NDEBUG since there are a bunch of plain C assert()s

but I didn't touch the formatting or style of the Ryu code so it's all
C99 and // comments and OTB etc.

For testing purposes what I did was to change float[48]out to use the
Ryu code iff extra_float_digits > 0. This isn't likely what a final
version should do, just a convenience flag. The regression tests for
float8 fail of course since Ryu's output format differs (it always
includes an exponent, but the code for that part can be tweaked without
touching the main algorithm).

-- 
Andrew (irc:RhodiumToad)


Вложения

Re: Performance improvements for src/port/snprintf.c

От
Tom Lane
Дата:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
>  Tom> Oh yeah?  Where's the code for this?

> Upstream code is at https://github.com/ulfjack/ryu
> ...
> I attach the patch I've used for testing, which has these changes from
> upstream Ryu:

Thanks.  Just scanning through the code quickly, I note that it assumes
IEEE float format, which is probably okay but I suppose we might want
a configure switch to disable it (and revert to platform sprintf).
I couldn't immediately figure out if it's got endianness assumptions;
but even if it does, that'd likely only affect the initial disassembly
of the IEEE format, so probably not a huge deal.

I wonder which variant of the code you were testing (e.g. HAS_UINT128
or not).

There's a pretty large gap between this code and PG coding conventions,
both as to layout and portability rules.  I wonder if we'd be better off
to implement the algorithm afresh instead of whacking this particular
code past the point of unrecognizability.

> The regression tests for
> float8 fail of course since Ryu's output format differs (it always
> includes an exponent, but the code for that part can be tweaked without
> touching the main algorithm).

Yeah, one would hope.  But I wonder whether it always produces the
same low-order digits, and if not, whether people will complain.
We just had somebody griping about a change in insignificant zeroes
in timestamp output :-(.  Still, seems worth further investigation.

            regards, tom lane


Re: Performance improvements for src/port/snprintf.c

От
Andrew Gierth
Дата:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 Tom> Thanks. Just scanning through the code quickly, I note that it
 Tom> assumes IEEE float format, which is probably okay but I suppose we
 Tom> might want a configure switch to disable it (and revert to
 Tom> platform sprintf).

Yeah, but even s390 these days supports IEEE floats in hardware so I'm
not sure there are any platforms left that don't (that we care about).

 Tom> I couldn't immediately figure out if it's got endianness
 Tom> assumptions; but even if it does, that'd likely only affect the
 Tom> initial disassembly of the IEEE format, so probably not a huge
 Tom> deal.

Upstream docs say it's fine with big-endian as long as the endianness of
ints and floats is the same.

 Tom> I wonder which variant of the code you were testing (e.g.
 Tom> HAS_UINT128 or not).

I was using clang 3.9.1 on FreeBSD amd64, and HAS_UINT128 ends up
enabled by this test:

#if defined(__SIZEOF_INT128__) && !defined(_MSC_VER) && !defined(RYU_ONLY_64_BIT_OPS)
#define HAS_UINT128
...

 >> The regression tests for float8 fail of course since Ryu's output
 >> format differs (it always includes an exponent, but the code for
 >> that part can be tweaked without touching the main algorithm).

 Tom> Yeah, one would hope. But I wonder whether it always produces the
 Tom> same low-order digits, and if not, whether people will complain.

It won't produce the same low-order digits in general, since it has a
different objective: rather than outputting a decimal value which is the
true float value rounded to a fixed size by decimal rounding rules, it
produces the shortest decimal value which falls within the binary float
rounding interval of the true float value. i.e. the objective is to be
able to round-trip back to float and get the identical result.

One option would be to stick with snprintf if extra_float_digits is less
than 0 (or less than or equal to 0 and make the default 1) and use ryu
otherwise, so that the option to get rounded floats is still there.
(Apparently some people do use negative values of extra_float_digits.)
Unlike other format-changing GUCs, this one already exists and is
already used by people who want more or less precision, including by
pg_dump where rount-trip conversion is the requirement.

Here are some examples of differences in digits, comparing ryu output
with extra_float_digits=3:

  Pi:     ryu 3.141592653589793E0
      sprintf 3.14159265358979312
   e:     ryu 2.7182818284590455E0
      sprintf 2.71828182845904553
1/10:     ryu 1E-1
      sprintf 0.100000000000000006
 1/3:     ryu 3.333333333333333E-1
      sprintf 0.333333333333333315
 2/3:     ryu 6.666666666666666E-1
      sprintf 0.66666666666666663
 
-- 
Andrew.


Re: Performance improvements for src/port/snprintf.c

От
Tom Lane
Дата:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom> Yeah, one would hope. But I wonder whether it always produces the
>  Tom> same low-order digits, and if not, whether people will complain.

> It won't produce the same low-order digits in general, since it has a
> different objective: rather than outputting a decimal value which is the
> true float value rounded to a fixed size by decimal rounding rules, it
> produces the shortest decimal value which falls within the binary float
> rounding interval of the true float value. i.e. the objective is to be
> able to round-trip back to float and get the identical result.

So I'm thinking that there are two, hopefully separable, issues here:

1. The algorithm for deciding how many digits to print.

2. The speed.

Now, "shortest value that converts back exactly" is technically cool,
but I am not sure that it solves any real-world problem that we have.
I'm also worried that introducing it would result in complaints like
https://www.postgresql.org/message-id/CANaXbVjw3Y8VmapWuZahtcRhpE61hsSUcjquip3HuXeuN8y4sg%40mail.gmail.com

As for #2, my *very* short once-over of the code led me to think that
the speed win comes mostly from use of wide integer arithmetic, and
maybe from throwing big lookup tables at the problem.  If so, it's very
likely possible that we could adopt those techniques without necessarily
buying into the shortest-exact rule for how many digits to print.

> One option would be to stick with snprintf if extra_float_digits is less
> than 0 (or less than or equal to 0 and make the default 1) and use ryu
> otherwise, so that the option to get rounded floats is still there.
> (Apparently some people do use negative values of extra_float_digits.)
> Unlike other format-changing GUCs, this one already exists and is
> already used by people who want more or less precision, including by
> pg_dump where rount-trip conversion is the requirement.

I wouldn't necessarily object to having some value of extra_float_digits
that selects the shortest-exact rule, but I'm thinking maybe it should
be a value we don't currently accept.

            regards, tom lane


Re: Performance improvements for src/port/snprintf.c

От
Andrew Gierth
Дата:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 Tom> Now, "shortest value that converts back exactly" is technically
 Tom> cool, but I am not sure that it solves any real-world problem that
 Tom> we have.

Well, it seems to me that it is perfect for pg_dump.

Also it's kind of a problem that our default float output is not
round-trip safe - people do keep wondering why they can select a row and
it'll show a certain value, but then doing WHERE col = 'xxx' on that
value does not find the row. Yes, testing equality of floats is bad, but
there's no reason to put in extra landmines.

 Tom> I'm also worried that introducing it would result in complaints like
 Tom> https://www.postgresql.org/message-id/CANaXbVjw3Y8VmapWuZahtcRhpE61hsSUcjquip3HuXeuN8y4sg%40mail.gmail.com

Frankly for a >20x performance improvement in float8out I don't think
that's an especially big deal.

 Tom> As for #2, my *very* short once-over of the code led me to think
 Tom> that the speed win comes mostly from use of wide integer
 Tom> arithmetic,

Data point: forcing it to use 64-bit only (#define RYU_ONLY_64_BIT_OPS)
makes negligible difference on my test setup.

 Tom> and maybe from throwing big lookup tables at the problem. If so,
 Tom> it's very likely possible that we could adopt those techniques
 Tom> without necessarily buying into the shortest-exact rule for how
 Tom> many digits to print.

If you read the ACM paper (linked from the upstream github repo), it
explains how the algorithm works by combining the radix conversion step
with (the initial iterations of) the operation of finding the shortest
representation. This allows limiting the number of bits needed for the
intermediate results so that it can all be done in fixed-size integers,
rather than using an arbitrary-precision approach.

I do not see any obvious way to use this code to generate the same
output in the final digits that we currently do (in the sense of
overly-exact values like outputting 1.89999999999999991 for 1.9 when
extra_float_digits=3).

 >> One option would be to stick with snprintf if extra_float_digits is
 >> less than 0 (or less than or equal to 0 and make the default 1) and
 >> use ryu otherwise, so that the option to get rounded floats is still
 >> there. (Apparently some people do use negative values of
 >> extra_float_digits.) Unlike other format-changing GUCs, this one
 >> already exists and is already used by people who want more or less
 >> precision, including by pg_dump where rount-trip conversion is the
 >> requirement.

 Tom> I wouldn't necessarily object to having some value of
 Tom> extra_float_digits that selects the shortest-exact rule, but I'm
 Tom> thinking maybe it should be a value we don't currently accept.

Why would anyone currently set extra_float_digits > 0 if not to get
round-trip-safe values?

-- 
Andrew (irc:RhodiumToad)


Re: Performance improvements for src/port/snprintf.c

От
Tom Lane
Дата:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom> Now, "shortest value that converts back exactly" is technically
>  Tom> cool, but I am not sure that it solves any real-world problem that
>  Tom> we have.

> Well, it seems to me that it is perfect for pg_dump.

Perhaps.  I was hoping for something we could slot into snprintf.c;
not being able to select the number of digits to output is clearly
a deal-breaker for that usage.  But perhaps it's reasonable to allow
"extra_float_digits = 3" to be redefined as meaning "use the shortest
value that converts back exactly" in float[48]out.

However, it seems like it should still be on the table to look at
other code that just does sprintf's job faster (such as the stb
code Alexander mentioned).  If anything like that is acceptable
for the general case, then we have to ask whether ryu is enough
faster than *that* code, not faster than what we have now, to
justify carrying another umpteen KB of independent code path
for the dump-and-restore case.

> Also it's kind of a problem that our default float output is not
> round-trip safe - people do keep wondering why they can select a row and
> it'll show a certain value, but then doing WHERE col = 'xxx' on that
> value does not find the row.

Unfortunately, I do not think it's going to be acceptable for default
float output (as opposed to the dump/restore case) to become round-trip
safe.  The number of people complaining today would be dwarfed by the
number of people complaining about extra garbage digits in their results.
There isn't any compromise that will make things "just work" for people
who are unaware of the subtleties of float arithmetic.

            regards, tom lane


Re: Performance improvements for src/port/snprintf.c

От
Andrew Gierth
Дата:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 Tom> However, it seems like it should still be on the table to look at
 Tom> other code that just does sprintf's job faster (such as the stb
 Tom> code Alexander mentioned).

The stb sprintf is indeed a lot faster for floats than other sprintfs,
but (a) it's still quite a lot slower than Ryu (COPY of my test table is
4.2 seconds with stb, vs 2.7 seconds with Ryu), and (b) it also produces
changes in the insignificant digits, so while (it claims) the values are
still round-trip convertible, they are neither the shortest
representation nor the exact representation.

For example, consider 1.9, which is 0x3FFE666666666666:

exact value: 1.899999999999999911182158029987476766109466552734375
accepted input range:
  min: 1.89999999999999980015985556747182272374629974365234375
  max: 1.90000000000000002220446049250313080847263336181640625

exact value rounded to 18 SF: 1.89999999999999991

Ryu output: 1.9E0
STB (%*.18g) output: 1.89999999999999992
sprintf (%*.18g) output: 1.89999999999999991

So while STB's output is in the acceptable range, it's not the result of
rounding the exact value to 18 digits (as sprintf does on my system at
least) and nor is it the shortest. Testing a bunch of random values it
usually seems to be off from the rounded exact result by +/- 1 in the
last digit.

-- 
Andrew (irc:RhodiumToad)


Re: Performance improvements for src/port/snprintf.c

От
Andres Freund
Дата:
Hi,

On 2018-10-07 12:59:18 +0100, Andrew Gierth wrote:
> >>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
> 
>  Tom> Now, "shortest value that converts back exactly" is technically
>  Tom> cool, but I am not sure that it solves any real-world problem that
>  Tom> we have.
> 
> Well, it seems to me that it is perfect for pg_dump.
> 
> Also it's kind of a problem that our default float output is not
> round-trip safe - people do keep wondering why they can select a row and
> it'll show a certain value, but then doing WHERE col = 'xxx' on that
> value does not find the row. Yes, testing equality of floats is bad, but
> there's no reason to put in extra landmines.

+1


>  Tom> I'm also worried that introducing it would result in complaints like
>  Tom> https://www.postgresql.org/message-id/CANaXbVjw3Y8VmapWuZahtcRhpE61hsSUcjquip3HuXeuN8y4sg%40mail.gmail.com
> 
> Frankly for a >20x performance improvement in float8out I don't think
> that's an especially big deal.

+1. There's plenty complaints where we just say "sorry that it bothers
you, but these larger concerns made us that way".


> I do not see any obvious way to use this code to generate the same
> output in the final digits that we currently do (in the sense of
> overly-exact values like outputting 1.89999999999999991 for 1.9 when
> extra_float_digits=3).

But, why would that be required? Just to placate people wanting exactly
the same output as before?  I don't quite get how that'd be a useful
requirement.

Obviously we *do* need to support outputting non-exponent style output
where appropriate, but that should mostly be different massaging of
d2d()'s output, instead of calling to_chars() as the ryu upstream code
does.  ISTM we also need to support *reducing* the precision (for the
case where people intentionally reduce extra_float_digits), but that
similarly should be a SMOP, right?-

- Andres