Обсуждение: Optimize JsonbContainerTypeName by reordering type checks

Поиск
Список
Период
Сортировка

Optimize JsonbContainerTypeName by reordering type checks

От
Chao Li
Дата:
Hi Hacker,

While reading jsonb related code, I found JsonbContainerTypeName() can 
be optimized. The function currently checks for the less common scalar 
container type before checking for objects and arrays.

This patch reorders the checks to prioritize the most common cases. The 
macros JsonContainerIsArray() and JsonContainerIsObject() are simple bit 
checks and are now evaluated first. This avoids the overhead of calling 
the JsonbExtractScalar() function in the vast majority of use cases.

I did the following test:

```

CREATE TABLE test_jsonb_types (
     id SERIAL PRIMARY KEY,
     data JSONB
);


INSERT INTO test_jsonb_types (data) VALUES
('{"name": "Alice", "age": 30}'),
('[1, 2, "three"]'),
('"hello world"'),
('12345'),
('true'),
('null');


evantest=# SELECT id, data, jsonb_typeof(data) AS data_type FROM 
test_jsonb_types;
  id |             data             | data_type
----+------------------------------+-----------
   1 | {"age": 30, "name": "Alice"} | object
   2 | [1, 2, "three"]              | array
   3 | "hello world"                | string
   4 | 12345                        | number
   5 | true                         | boolean
   6 | null                         | null
(6 rows)

```

Best regards,

--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Вложения

Re: Optimize JsonbContainerTypeName by reordering type checks

От
Chao Li
Дата:
Hi Hacker,

This is an innocent refactor that doesn't change logic and improve performance of jsonb_typeof() a little bit.

I have done a performance test that proves a tiny performance improvement:

With the master branch:
```
% pgbench -n --no-vacuum -T 120 -c 8 -j 4 -f test_jsonb_type.sql evantest
tps = 83290.213196 (without initial connection time)

# second run
tps = 83709.266846 (without initial connection time)

# third run
tps = 83531.014990 (without initial connection time)
```

With this patch:
```
% pgbench -n --no-vacuum -T 120 -c 8 -j 4 -f test_jsonb_type.sql evantest
tps = 84172.324715 (without initial connection time)

# second run
tps = 84082.619469 (without initial connection time)

# third run
tps = 84260.671778 (without initial connection time)
```

With 3 runs average tps, this patch makes ~0.8% improvement for jsonb_typeof().

My test script is attached, it can run against any database. And attached v2 patch file has a tiny change over v1.

Best regards,
Chao Li (Evan)
---------------------
HighGo Software Co., Ltd.
https://www.highgo.com/
Вложения

Re: Optimize JsonbContainerTypeName by reordering type checks

От
David Rowley
Дата:
On Tue, 9 Sept 2025 at 01:28, Chao Li <li.evan.chao@gmail.com> wrote:
> With 3 runs average tps, this patch makes ~0.8% improvement for jsonb_typeof().
>
> My test script is attached, it can run against any database. And attached v2 patch file has a tiny change over v1.

I'd say this is wishful thinking. You're saying that with the overhead
of planning, executor startup, most of executor run and executor
shutdown and subsequent cleanup, you're expecting a speedup from
swapping the order of operations in this function when you're calling
it once!? That could be true if the function was very expensive, but
this is not an expensive function. It'll amount to a few dozen
instructions for the function out of several 10s of millions, when you
consider the cost of processing the entire query.

To try to see if your patch makes any difference, I modified your test
to call the function 1 million times:

SELECT jsonb_typeof(
    CASE x%10
        WHEN 0 THEN '{"key1": "value"}'::jsonb
        WHEN 1 THEN '{"key2": "value"}'::jsonb
        WHEN 2 THEN '{"key3": "value"}'::jsonb
        WHEN 3 THEN '[1, 2, 3]'::jsonb
        WHEN 4 THEN '[1, 2, 3]'::jsonb
        ELSE '123'::jsonb
    END
) from generate_Series(1,1000000) x

Ran it 10 times on my Zen4 laptop:

$ echo master && for i in {1..10}; do pgbench -n -f
test_jsonb_type.sql -T 10 postgres | grep latency; done
master
latency average = 209.944 ms
latency average = 210.247 ms
latency average = 210.412 ms
latency average = 210.665 ms
latency average = 210.312 ms
latency average = 210.254 ms
latency average = 210.188 ms
latency average = 210.384 ms
latency average = 210.238 ms
latency average = 210.947 ms

$ echo patched && for i in {1..10}; do pgbench -n -f
test_jsonb_type.sql -T 10 postgres | grep latency; done
patched
latency average = 217.047 ms
latency average = 219.044 ms
latency average = 217.396 ms
latency average = 217.775 ms
latency average = 218.465 ms
latency average = 218.515 ms
latency average = 217.576 ms
latency average = 217.880 ms
latency average = 217.580 ms
latency average = 218.497 ms

That's 3.5% slower.

You may find it better to learn about "perf record" or even "perf
top".  These are useful tools to see where CPU time is being spent. If
you have a query you'd like to make faster, you might get some
indication of something that can be done to improve the situation with
these tools. They show which functions the CPU is spending the most
time in. Here's the perf top output in master for the query I posted
above:

  15.19%  postgres          [.] ExecInterpExpr
   4.67%  postgres          [.] AllocSetReset
   4.34%  postgres          [.] ExecMakeTableFunctionResult
   4.23%  postgres          [.] palloc0
   4.02%  postgres          [.] BufFileWrite
   3.92%  postgres          [.] BufFileReadCommon
   3.84%  postgres          [.] ExecScan
   3.27%  postgres          [.] heap_form_minimal_tuple
   3.05%  postgres          [.] AllocSetAlloc
   2.73%  postgres          [.] JsonbIteratorNext
   2.56%  postgres          [.] heap_fill_tuple
   2.41%  postgres          [.] tts_minimal_getsomeattrs
   2.27%  libc.so.6         [.] __memmove_avx_unaligned_erms
   2.23%  postgres          [.] ExecStoreMinimalTuple
   1.83%  postgres          [.] JsonbExtractScalar
   1.60%  postgres          [.] fill_val

JsonbExtractScalar isn't taking up much of the time, so it probably
isn't a very interesting area to focus optimisation work for this
query. You could maybe get rid of the CASE statement to remove some of
the ExecInterpExpr overhead, but still have my doubts if you'll ever
see this taking up a meaningful percentage of the overall CPU time.

David



Re: Optimize JsonbContainerTypeName by reordering type checks

От
Andreas Karlsson
Дата:
On 1/4/26 12:21 PM, David Rowley wrote:
> I'd say this is wishful thinking. You're saying that with the overhead
> of planning, executor startup, most of executor run and executor
> shutdown and subsequent cleanup, you're expecting a speedup from
> swapping the order of operations in this function when you're calling
> it once!? That could be true if the function was very expensive, but
> this is not an expensive function. It'll amount to a few dozen
> instructions for the function out of several 10s of millions, when you
> consider the cost of processing the entire query.

On top of this the idea the scalars are a less common thing is dubious. 
Why would that be? I would assume it depends on the input data. So even 
if it had been a speedup, which perf output does not seem to indicate 
that it would be, it would have slowed down some workloads while 
speeding up others.

Andreas