Re: Speed up JSON escape processing with SIMD plus other optimisations

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Speed up JSON escape processing with SIMD plus other optimisations
Дата
Msg-id CAApHDvqQr0A2CopL85d0Uphk9uQxBdWcxzfMp1qqKYPe8vJNEw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Speed up JSON escape processing with SIMD plus other optimisations  (Heikki Linnakangas <hlinnaka@iki.fi>)
Список pgsql-hackers
On Wed, 24 Jul 2024 at 22:55, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> On 02/07/2024 07:49, David Rowley wrote:
> > I've attached a rebased set of patches.  The previous set no longer applied.
>
> I looked briefly at the first patch. Seems reasonable.
>
> One little thing that caught my eye is that in populate_scalar(), you
> sometimes make a temporary copy of the string to add the
> null-terminator, but then call escape_json() which doesn't need the
> null-terminator anymore. See attached patch to avoid that. However, it's
> not clear to me how to reach that codepath, or if it reachable at all. I
> tried to add a NOTICE there and ran the regression tests, but got no
> failures.

Thanks for noticing that. It seems like a good simplification
regardless. I've incorporated it.

I made another pass over the 0001 and 0003 patches and after a bit of
renaming, I pushed the result.  I ended up keeping escape_json() as-is
and giving the new function the name escape_json_with_len().  The text
version is named ecape_json_text(). I think originally I did it the
other way as thought I'd have been able to adjust more locations than
I did. Having it this way around is slightly less churn.

I did another round of testing on the SIMD patch (attached as v5-0001)
as I wondered if the SIMD loop maybe shouldn't wait too long before
copying the bytes to the destination string.  I had wondered if the
JSON string was very large that if we looked ahead too far that by the
time we flush those bytes out to the destination buffer, we'd have
started eviction of L1 cachelines for parts of the buffer that are
still to be flushed.  I put this to the test (test 3) and found that
with a 1MB JSON string it is faster to flush every 512 bytes than it
is to only flush after checking the entire 1MB.  With a 10kB JSON
string (test 2), the extra code to flush every 512 bytes seems to slow
things down.  I'm a bit undecided about whether the flushing is
worthwhile or not. It really depend on the length of JSON strings we'd
like to optimise for. It might be possible to get the best of both but
I think it might require manually implementing portions of
appendBinaryStringInfo(). I'd rather not go there. Does anyone have
any thoughts about that?

Test 2 (10KB) does show a ~261% performance increase but dropped to
~227% flushing every 512 bytes. Test 3 (1MB) increased performance by
~99% without early flushing and increased to ~156% flushing every 512
bytes.

bench.sql: select row_to_json(j1)::jsonb from j1;

## Test 1 (variable JSON strings up to 1KB)
create table j1 (very_long_column_name_to_test_json_escape text);
insert into j1 select repeat('x', x) from generate_series(0,1024)x;
vacuum freeze j1;

master @ 17a5871d:
$ for i in {1..3}; do pgbench -n -f bench.sql -T 10 -M prepared
postgres | grep tps; done
tps = 364.410386 (without initial connection time)
tps = 367.914165 (without initial connection time)
tps = 365.794513 (without initial connection time)

master + v5-0001
$ for i in {1..3}; do pgbench -n -f bench.sql -T 10 -M prepared
postgres | grep tps; done
tps = 683.570613 (without initial connection time)
tps = 685.206578 (without initial connection time)
tps = 679.014056 (without initial connection time)

## Test 2 (10KB JSON strings)
create table j1 (very_long_column_name_to_test_json_escape text);
insert into j1 select repeat('x', 1024*10) from generate_series(0,1024)x;
vacuum freeze j1;

master @ 17a5871d:
$ for i in {1..3}; do pgbench -n -f bench.sql -T 10 -M prepared
postgres | grep tps; done
tps = 23.872630 (without initial connection time)
tps = 26.232014 (without initial connection time)
tps = 26.495739 (without initial connection time)

master + v5-0001
$ for i in {1..3}; do pgbench -n -f bench.sql -T 10 -M prepared
postgres | grep tps; done
tps = 96.813515 (without initial connection time)
tps = 96.023632 (without initial connection time)
tps = 99.630428 (without initial connection time)

master + v5-0001 ESCAPE_JSON_MAX_LOOKHEAD 512
$ for i in {1..3}; do pgbench -n -f bench.sql -T 10 -M prepared
postgres | grep tps; done
tps = 83.597442 (without initial connection time)
tps = 85.045554 (without initial connection time)
tps = 82.105907 (without initial connection time)

## Test 3 (1MB JSON strings)
create table j1 (very_long_column_name_to_test_json_escape text);
insert into j1 select repeat('x', 1024*1024) from generate_series(0,10)x;
vacuum freeze j1;

master @ 17a5871d:
$ for i in {1..3}; do pgbench -n -f bench.sql -T 10 -M prepared
postgres | grep tps; done
tps = 18.885922 (without initial connection time)
tps = 18.829701 (without initial connection time)
tps = 18.889369 (without initial connection time)

master v5-0001
$ for i in {1..3}; do pgbench -n -f bench.sql -T 10 -M prepared
postgres | grep tps; done
tps = 37.464967 (without initial connection time)
tps = 37.536676 (without initial connection time)
tps = 37.561387 (without initial connection time)

master + v5-0001 ESCAPE_JSON_MAX_LOOKHEAD 512
$ for i in {1..3}; do pgbench -n -f bench.sql -T 10 -M prepared
postgres | grep tps; done
tps = 48.296320 (without initial connection time)
tps = 48.118151 (without initial connection time)
tps = 48.507530 (without initial connection time)

David

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Building with meson on NixOS/nixpkgs
Следующее
От: Andrew Dunstan
Дата:
Сообщение: why is pg_upgrade's regression run so slow?