Re: Why is pq_begintypsend so slow?

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Why is pq_begintypsend so slow?
Дата
Msg-id 6648.1589819885@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Why is pq_begintypsend so slow?  (Andres Freund <andres@anarazel.de>)
Ответы Re: Why is pq_begintypsend so slow?  (Ranier Vilela <ranier.vf@gmail.com>)
Re: Why is pq_begintypsend so slow?  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
>> FWIW, I've also observed, in another thread (the node func generation
>> thing [1]), that inlining enlargeStringInfo() helps a lot, especially
>> when inlining some of its callers. Moving e.g. appendStringInfo() inline
>> allows the compiler to sometimes optimize away the strlen. But if
>> e.g. an inlined appendBinaryStringInfo() still calls enlargeStringInfo()
>> unconditionally, successive appends cannot optimize away memory accesses
>> for ->len/->data.

> With a set of patches doing so, int4send itself is not a significant
> factor for my test benchmark [1] anymore.

This thread seems to have died out, possibly because the last set of
patches that Andres posted was sufficiently complicated and invasive
that nobody wanted to review it.  I thought about this again after
seeing that [1] is mostly about pq_begintypsend overhead, and had
an epiphany: there isn't really a strong reason for pq_begintypsend
to be inserting bits into the buffer at all.  The bytes will be
filled by pq_endtypsend, and nothing in between should be touching
them.  So I propose 0001 attached.  It's poking into the stringinfo
abstraction a bit more than I would want to do if there weren't a
compelling performance reason to do so, but there evidently is.

With 0001, pq_begintypsend drops from being the top single routine
in a profile of a test case like [1] to being well down the list.
The next biggest cost compared to text-format output is that
printtup() itself is noticeably more expensive.  A lot of the extra
cost there seems to be from pq_sendint32(), which is getting inlined
into printtup(), and there probably isn't much we can do to make that
cheaper. But eliminating a common subexpression as in 0002 below does
help noticeably, at least with the rather old gcc I'm using.

For me, the combination of these two eliminates most but not quite
all of the cost penalty of binary over text output as seen in [1].

            regards, tom lane

[1] https://www.postgresql.org/message-id/CAMovtNoHFod2jMAKQjjxv209PCTJx5Kc66anwWvX0mEiaXwgmA%40mail.gmail.com

diff --git a/src/backend/libpq/pqformat.c b/src/backend/libpq/pqformat.c
index a6f990c..03b7404 100644
--- a/src/backend/libpq/pqformat.c
+++ b/src/backend/libpq/pqformat.c
@@ -328,11 +328,16 @@ void
 pq_begintypsend(StringInfo buf)
 {
     initStringInfo(buf);
-    /* Reserve four bytes for the bytea length word */
-    appendStringInfoCharMacro(buf, '\0');
-    appendStringInfoCharMacro(buf, '\0');
-    appendStringInfoCharMacro(buf, '\0');
-    appendStringInfoCharMacro(buf, '\0');
+
+    /*
+     * Reserve four bytes for the bytea length word.  We don't need to fill
+     * them with anything (pq_endtypsend will do that), and this function is
+     * enough of a hot spot that it's worth cheating to save some cycles. Note
+     * in particular that we don't bother to guarantee that the buffer is
+     * null-terminated.
+     */
+    Assert(buf->maxlen > 4);
+    buf->len = 4;
 }
 
 /* --------------------------------
diff --git a/src/backend/access/common/printtup.c b/src/backend/access/common/printtup.c
index dd1bac0..a9315c6 100644
--- a/src/backend/access/common/printtup.c
+++ b/src/backend/access/common/printtup.c
@@ -438,11 +438,12 @@ printtup(TupleTableSlot *slot, DestReceiver *self)
         {
             /* Binary output */
             bytea       *outputbytes;
+            int            outputlen;

             outputbytes = SendFunctionCall(&thisState->finfo, attr);
-            pq_sendint32(buf, VARSIZE(outputbytes) - VARHDRSZ);
-            pq_sendbytes(buf, VARDATA(outputbytes),
-                         VARSIZE(outputbytes) - VARHDRSZ);
+            outputlen = VARSIZE(outputbytes) - VARHDRSZ;
+            pq_sendint32(buf, outputlen);
+            pq_sendbytes(buf, VARDATA(outputbytes), outputlen);
         }
     }


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Luke Porter
Дата:
Сообщение: PostgresSQL project
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: Missing grammar production for WITH TIES