On 09/12/2014 08:52 PM, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Fri, Sep 12, 2014 at 1:11 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>> It's certainly possible that there is a test case for which Heikki's
>>> approach is superior, but if so we haven't seen it. And since it's
>>> approach is also more complicated, sticking with the simpler
>>> lengths-only approach seems like the way to go.
>
>> Huh, OK. I'm slightly surprised, but that's why we benchmark these things.
>
> The argument for Heikki's patch was never that it would offer better
> performance; it's obvious (at least to me) that it won't.
Performance was one argument for sure. It's not hard to come up with a
case where the all-lengths approach is much slower: take a huge array
with, say, million elements, and fetch the last element in a tight loop.
And do that in a PL/pgSQL function without storing the datum to disk, so
that it doesn't get toasted. Not a very common thing to do in real life,
although something like that might come up if you do a lot of json
processing in PL/pgSQL. but storing offsets makes that faster.
IOW, something like this:
do $$
declare ja jsonb; i int4;
begin select json_agg(g) into ja from generate_series(1, 100000) g; for i in 1..100000 loop perform ja ->> 90000;
endloop;
end;
$$;
should perform much better with current git master or "my patch", than
with the all-lengths patch.
I'm OK with going for the all-lengths approach anyway; it's simpler, and
working with huge arrays is hopefully not that common. But it's not a
completely open-and-shut case.
- Heikki