Обсуждение: jsonb access operators inefficiency

Поиск
Список
Период
Сортировка

jsonb access operators inefficiency

От
Teodor Sigaev
Дата:
Hi!

jsonb operators ->text, ->>text,->int, ->>int use inefficient methods to access
to needed field, proportional O(N/2). Attached patch suggests for text operators
O(log(N)) and for integer - O(1). The fuctions with fast access already are
existed in current code and are used in contains operation, for example.
Attached patch uses that functions instead of sequentual loop over object/array.
--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Вложения

Re: jsonb access operators inefficiency

От
Andrew Dunstan
Дата:
On 05/30/2014 09:35 AM, Teodor Sigaev wrote:
> Hi!
>
> jsonb operators ->text, ->>text,->int, ->>int use inefficient methods 
> to access to needed field, proportional O(N/2). Attached patch 
> suggests for text operators O(log(N)) and for integer - O(1). The 
> fuctions with fast access already are existed in current code and are 
> used in contains operation, for example. Attached patch uses that 
> functions instead of sequentual loop over object/array.

Teodor, thanks for this.

My only worry is this sort of code, which in a quick search I didn't 
find used elsewhere
   -                   (void) JsonbToCString(jtext, &tjb->root , -1);   -                   result =
cstring_to_text_with_len(jtext->data,  jtext->len);   +                   appendStringInfoSpaces(jtext, VARHDRSZ);   +
                (void) JsonbToCString(jtext, v->val.binary.data,   -1);   +   +                   result =
(text*)jtext->data;  +                   SET_VARSIZE(result, jtext->len);
 


If we're going to construct varlena objects inside a StringInfo, maybe 
we need a proper API for it. Isn't there a danger that data member of 
the StringInfo won't be properly aligned to allow us to do this? In any 
case, we should get most of the benefit of your patch without this 
"optimization".

cheers

andrew




Re: jsonb access operators inefficiency

От
Andrew Dunstan
Дата:
On 05/30/2014 11:08 AM, Andrew Dunstan wrote:
>
> On 05/30/2014 09:35 AM, Teodor Sigaev wrote:
>> Hi!
>>
>> jsonb operators ->text, ->>text,->int, ->>int use inefficient methods 
>> to access to needed field, proportional O(N/2). Attached patch 
>> suggests for text operators O(log(N)) and for integer - O(1). The 
>> fuctions with fast access already are existed in current code and are 
>> used in contains operation, for example. Attached patch uses that 
>> functions instead of sequentual loop over object/array.
>
> Teodor, thanks for this.
>
> My only worry is this sort of code, which in a quick search I didn't 
> find used elsewhere
>
>    -                   (void) JsonbToCString(jtext, &tjb->root , -1);
>    -                   result = cstring_to_text_with_len(jtext->data,
>    jtext->len);
>    +                   appendStringInfoSpaces(jtext, VARHDRSZ);
>    +                   (void) JsonbToCString(jtext, v->val.binary.data,
>    -1);
>    +
>    +                   result = (text*)jtext->data;
>    +                   SET_VARSIZE(result, jtext->len);
>
>
> If we're going to construct varlena objects inside a StringInfo, maybe 
> we need a proper API for it. Isn't there a danger that data member of 
> the StringInfo won't be properly aligned to allow us to do this? In 
> any case, we should get most of the benefit of your patch without this 
> "optimization".


I see that palloc.h says:
   The result of palloc() is always word-aligned


so maybe my alignment fear is misplaced. So my remaining question is 
whether this is OK stylistically.

cheers

andrew





Re: jsonb access operators inefficiency

От
Teodor Sigaev
Дата:
>
> If we're going to construct varlena objects inside a StringInfo, maybe
> we need a proper API for it. Isn't there a danger that data member of
> the StringInfo won't be properly aligned to allow us to do this? In any
> case, we should get most of the benefit of your patch without this
> "optimization".

I believe that StringInfo->data is palloc'ed, it means it's MAXALIGNed.

-- 
Teodor Sigaev                      E-mail: teodor@sigaev.ru                                      WWW:
http://www.sigaev.ru/



Re: jsonb access operators inefficiency

От
Teodor Sigaev
Дата:
> I see that palloc.h says:
>
>     The result of palloc() is always word-aligned

void *
repalloc(void *pointer, Size size)
{
...    /*     * Try to detect bogus pointers handed to us, poorly though we can.     * Presumably, a pointer that isn't
MAXALIGNEDisn't pointing at an     * allocated chunk.     */    Assert(pointer != NULL);    Assert(pointer == (void *)
MAXALIGN(pointer));
...


>
>
> so maybe my alignment fear is misplaced. So my remaining question is
> whether this is OK stylistically.

Something like this?
initStringInfoVarlena()/makeStringInfoVarlena()
{initStringInfo()appendStringInfoSpaces(jtext, VARHDRSZ);
}

char*
formStringInfoVarlena()
{SET_VARSIZE(jtext->data, jtext->len);return  jtext->data;
}

-- 
Teodor Sigaev                      E-mail: teodor@sigaev.ru                                      WWW:
http://www.sigaev.ru/



Re: jsonb access operators inefficiency

От
Oleg Bartunov
Дата:
The patch really improves access performance to jsonb.  On the
delicious bookmarks I got 5 times better performance.Now jsonb
outperforms json on simple access  (slide 12 of pgcon presentation) by
103 times !

Oleg

On Fri, May 30, 2014 at 9:35 AM, Teodor Sigaev <teodor@sigaev.ru> wrote:
> Hi!
>
> jsonb operators ->text, ->>text,->int, ->>int use inefficient methods to
> access to needed field, proportional O(N/2). Attached patch suggests for
> text operators O(log(N)) and for integer - O(1). The fuctions with fast
> access already are existed in current code and are used in contains
> operation, for example. Attached patch uses that functions instead of
> sequentual loop over object/array.
> --
> Teodor Sigaev                                   E-mail: teodor@sigaev.ru
>                                                    WWW:
> http://www.sigaev.ru/
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



Re: jsonb access operators inefficiency

От
Andrew Dunstan
Дата:
On 05/30/2014 01:30 PM, Oleg Bartunov wrote:
> The patch really improves access performance to jsonb.  On the
> delicious bookmarks I got 5 times better performance.Now jsonb
> outperforms json on simple access  (slide 12 of pgcon presentation) by
> 103 times !
>
>

(Oleg, please try not to top-post)

The question is whether the speedup comes from the reduction in lookup 
times or the reduction in string copying. I have a strong suspicion that 
it's mostly the first, not the second.

cheers

andrew



Re: jsonb access operators inefficiency

От
Tom Lane
Дата:
Andrew Dunstan <andrew@dunslane.net> writes:
> If we're going to construct varlena objects inside a StringInfo, maybe 
> we need a proper API for it. Isn't there a danger that data member of 
> the StringInfo won't be properly aligned to allow us to do this? In any 
> case, we should get most of the benefit of your patch without this 
> "optimization".

As noted, the data buffer is maxaligned; but nonetheless I agree that
this is a serious stylistic abuse, and it's not buying much compared
to the rest of the patch.  I'd stick with the cstring_to_text_with_len
coding.

At the other end of the process, why are we using PG_GETARG_TEXT_P
and not PG_GETARG_TEXT_PP to avoid a "detoast" cycle on short-header
inputs?  The function body is using VARDATA_ANY/VARSIZE_ANY_EXHDR,
so it's already prepared for unaligned input.
        regards, tom lane



Re: jsonb access operators inefficiency

От
Teodor Sigaev
Дата:
> The question is whether the speedup comes from the reduction in lookup
> times or the reduction in string copying. I have a strong suspicion that
> it's mostly the first, not the second.

Agree with about Oleg's test, but some users put huge values into 
json... Actually, I don't insist, ITSM, that's easy way to prevent extra 
copying.

-- 
Teodor Sigaev                      E-mail: teodor@sigaev.ru                                      WWW:
http://www.sigaev.ru/



Re: jsonb access operators inefficiency

От
Andrew Dunstan
Дата:
On 05/30/2014 01:41 PM, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> If we're going to construct varlena objects inside a StringInfo, maybe
>> we need a proper API for it. Isn't there a danger that data member of
>> the StringInfo won't be properly aligned to allow us to do this? In any
>> case, we should get most of the benefit of your patch without this
>> "optimization".
> As noted, the data buffer is maxaligned; but nonetheless I agree that
> this is a serious stylistic abuse, and it's not buying much compared
> to the rest of the patch.  I'd stick with the cstring_to_text_with_len
> coding.
>
> At the other end of the process, why are we using PG_GETARG_TEXT_P
> and not PG_GETARG_TEXT_PP to avoid a "detoast" cycle on short-header
> inputs?  The function body is using VARDATA_ANY/VARSIZE_ANY_EXHDR,
> so it's already prepared for unaligned input.
>
>             


Committed with these changes.

cheers

andrew