On 2017-09-12 00:19:48 -0700, Andres Freund wrote:
> Hi,
>
> I've recently seen a benchmark in which pg_mbcliplen() showed up
> prominently. Which it will basically in any benchmark with longer query
> strings, but fast queries. That's not that uncommon.
>
> I wonder if we could avoid the cost of pg_mbcliplen() from within
> pgstat_report_activity(), by moving some of the cost to the read
> side. pgstat values are obviously read far less frequently in nearly all
> cases that are performance relevant.
>
> Therefore I wonder if we couldn't just store a querystring that's
> essentially just a memcpy()ed prefix, and do a pg_mbcliplen() on the
> read side. I think that should work because all *server side* encodings
> store character lengths in the *first* byte of a multibyte character
> (at least one clientside encoding, gb18030, doesn't behave that way).
>
> That'd necessitate an added memory copy in pg_stat_get_activity(), but
> that seems fairly harmless.
>
> Faults in my thinking?
Here's a patch that implements that idea. Seems to work well. I'm a
bit loathe to add proper regression tests for this, seems awfully
dependent on specific track_activity_query_size settings. I did confirm
using gdb that I see incomplete characters before
pgstat_clip_activity(), but not after.
I've renamed st_activity to st_activity_raw to increase the likelihood
that potential external users of st_activity notice and adapt. Increases
the noise, but imo to a very bareable amount. Don't feel strongly
though.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers