Em seg., 18 de mai. de 2020 às 13:38, Tom Lane <tgl@sss.pgh.pa.us> escreveu:
Andres Freund <andres@anarazel.de> writes: >> FWIW, I've also observed, in another thread (the node func generation >> thing [1]), that inlining enlargeStringInfo() helps a lot, especially >> when inlining some of its callers. Moving e.g. appendStringInfo() inline >> allows the compiler to sometimes optimize away the strlen. But if >> e.g. an inlined appendBinaryStringInfo() still calls enlargeStringInfo() >> unconditionally, successive appends cannot optimize away memory accesses >> for ->len/->data.
> With a set of patches doing so, int4send itself is not a significant > factor for my test benchmark [1] anymore.
This thread seems to have died out, possibly because the last set of patches that Andres posted was sufficiently complicated and invasive that nobody wanted to review it. I thought about this again after seeing that [1] is mostly about pq_begintypsend overhead, and had an epiphany: there isn't really a strong reason for pq_begintypsend to be inserting bits into the buffer at all. The bytes will be filled by pq_endtypsend, and nothing in between should be touching them. So I propose 0001 attached. It's poking into the stringinfo abstraction a bit more than I would want to do if there weren't a compelling performance reason to do so, but there evidently is.
With 0001, pq_begintypsend drops from being the top single routine in a profile of a test case like [1] to being well down the list. The next biggest cost compared to text-format output is that printtup() itself is noticeably more expensive. A lot of the extra cost there seems to be from pq_sendint32(), which is getting inlined into printtup(), and there probably isn't much we can do to make that cheaper. But eliminating a common subexpression as in 0002 below does help noticeably, at least with the rather old gcc I'm using.
Again, I see problems with the types declared in Postgres. 1. pq_sendint32 (StringInfo buf, uint32 i) 2. extern void pq_sendbytes (StringInfo buf, const char * data, int datalen);
Wouldn't it be better to declare outputlen (0002) as uint32? To avoid converting from (int) to (uint32), even if afterwards there is a conversion from (uint32) to (int)?