Hello hackers,
When playing with JSON_TABLE, I tried to replace tenk1 in regression tests
with a view based on JSON_TABLE, with the same content, and discovered
that for one sub-optimal query it's execution duration increased many-fold.
With the preparation script attached, I see the following durations
(for a build compiled by clang 18.1.3 with -O3):
explain (verbose, analyze)
select
(select max((select i.unique2 from tenk1 i where i.unique1 = o.unique1)))
from tenk1 o;
-- original tenk1
Execution Time: 4769.481 ms
explain (verbose, analyze)
select
(select max((select i.unique2 from jsonb_rs_tenk1 i where i.unique1 = o.unique1)))
from jsonb_rs_tenk1 o;
-- Function Call: jsonb_to_recordset...
Execution Time: 6841.767 ms
explain (verbose, analyze)
select
(select max((select i.unique2 from jsontable_tenk1 i where i.unique1 = o.unique1)))
from jsontable_tenk1 o;
-- Table Function Call: JSON_TABLE...
Execution Time: 288310.131 ms
(with 63% of time spent inside ExecEvalJsonExprPath())
Just for fun I've tested also XMLTABLE with the similar content:
explain (verbose, analyze)
select
(select max((select i.unique2 from xml_tenk1 i where i.unique1 = o.unique1)))
from xml_tenk1 o;
-- Table Function Call: XMLTABLE...
Execution Time: 1235066.636 ms
Maybe it's worth to add a note to the JSON_TABLE() documentation saying that
jsonb_to_recordset is (inherently?) more performant when processing arrays
of flat structures for users not to re-discover this fact...
Best regards,
Alexander