Hi,
While looking at a profile I randomly noticed that we spend a surprising
amount of time in snprintf() and its subsidiary functions. That turns
out to be
if (strcmp(portal->commandTag, "SELECT") == 0)
snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
"SELECT " UINT64_FORMAT, nprocessed);
in PortalRun(). That's actually fairly trivial to optimize - we don't
need the full blown snprintf machinery here. A quick benchmark
replacing it with:
memcpy(completionTag, "SELECT ", sizeof("SELECT "));
pg_lltoa(nprocessed, completionTag + 7);
yields nearly a ~2% increase in TPS. Larger than I expected. The code
is obviously less pretty, but it's also not actually that bad.
Attached is the patch I used for benchmarking. I wonder if I just hit
some specific version of glibc that regressed snprintf performance, or
whether others can reproduce this.
If it actually reproducible, I think we should go for it. But update the
rest of the completionTag writes in the same file too.
Greetings,
Andres Freund