Re: [COMMITTERS] pgsql: Clean up jsonb code.

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: [COMMITTERS] pgsql: Clean up jsonb code.
Дата
Msg-id CAPpHfdvhWPm82t6T3a-dZAYM1oVoo+kK4R_5T_HqZ4cqCUGEgg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [COMMITTERS] pgsql: Clean up jsonb code.  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Ответы Re: [COMMITTERS] pgsql: Clean up jsonb code.
Re: [COMMITTERS] pgsql: Clean up jsonb code.
Список pgsql-hackers
On Thu, May 8, 2014 at 11:45 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
On 05/08/2014 02:25 AM, Peter Geoghegan wrote:
findJsonbValueFromSuperHeader()'s lowbound argument
previously served to establish a low bound for searching when
searching for multiple keys (so the second and subsequent
user-supplied key could skip much of the object).

Got that.

In the case of
jsonb_exists_any(), say, if you only have a reasonable expectation
that about 1 key exists, and that happens to be the last key that the
user passed to the text[] argument (to the existence/? operator), then
n - 1 calls to what is now findJsonbValueFromContainer() (which now
does not accept a lowbound) are wasted.

Check.

That's elem_count - 1
top-level binary searches of the entire jsonb. Or elem_count such
calls rather than 1 call (plus 1 sort of the supplied array) in the
common case where jsonb_exists_any() will return false.

Ok, but I don't see any big difference in that regard. It still called findJsonbValueFromContainer() elem_count times, before this commit. Each call was somewhat cheaper, because the lower bound of the binary search was initialized to where the previous search ended, but you still had to perform the search.

Granted, that might not be that bad right now, given that it's only
ever (say) elem_count or elem_count - 1 wasted binary searches through
the *top* level, but that might not always be true.

If we are ever to perform a deep search, I think we'll want to do much more to optimize that than just keep track of the lower bound. Like, start an iterator of tree and check for all of the keys in one go.

And even today,
sorting a presumably much smaller user-passed lookup array once has to
be cheaper than searching through the entire jsonb perhaps elem_count
times per call.

Well, the quick testing I did suggested otherwise. It's not a big difference, but sorting has all kinds of overhead, like pallocing the array to sort, copying the elements around etc. For a small array, the startup cost of sorting trumps the savings in the binary searches. Possibly the way the sorting was done was not optimal, and you could reduce the copying and allocations involved in that, but it's hardly worth the trouble.

With current head I can't load delicious dataset into jsonb format. I got segfault. It looks like memory corruption.

$ gzip -c -d js.copy.gz | psql postgres -c 'copy js from stdin;'
WARNING:  problem in alloc set ExprContext: bogus aset link in block 0x14a846000, chunk 0x14a84d278
CONTEXT:  COPY js, line 246766
WARNING:  problem in alloc set ExprContext: bad single-chunk 0x14a804b18 in block 0x14a7e6000
CONTEXT:  COPY js, line 1009820
WARNING:  problem in alloc set ExprContext: bogus aset link in block 0x14a7e6000, chunk 0x14a804b18
CONTEXT:  COPY js, line 1009820
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

You can get dataset here:

------
With best regards,
Alexander Korotkov.   

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: PQputCopyEnd doesn't adhere to its API contract
Следующее
От: Andres Freund
Дата:
Сообщение: pg_class.relpages/allvisible probably shouldn't be a int4