Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Дата
Msg-id 28554.1399414853@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)  ("David E. Wheeler" <david@justatheory.com>)
Ответы Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Список pgsql-hackers
"David E. Wheeler" <david@justatheory.com> writes:
> On May 6, 2014, at 2:20 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> Well, then, we only have a few days to come up with a name.

> What are the options?

We have no proposals as yet.

I've been looking at the source code to try to understand the difference
between the two opclasses (and BTW I concur with the opinions expressed
recently about the poor state of the internal documentation for jsonb).
If I've got it straight:

jsonb_ops indexes keys and values separately, so for instance "{xyz: 2}"
would give rise to GIN entries that are effectively the strings "Kxyz"
and "V2".  If you're looking for tuples containing "{xyz: 2}" then you
would be looking for the AND of those independent index entries, which
fortunately GIN is pretty good at computing.  But you could also look
for just keys or just values.

jsonb_hash_ops creates an index entry only for values, but what it
stores is a hash of both the value and the key it's stored under.
So in this example you'd get a hash combining "xyz" and "2".  This
means the only type of query you can perform is like "find JSON tuples
containing {xyz: 2}".

Because jsonb_ops stores the *whole* value, you can do lossless index
searches (no recheck needed on the heap tuple), but you also run the
risk of long strings failing to fit into an index entry.  Since jsonb_ops
reduces everything to a hash, there's no possibility of index failure,
but all queries are lossy and require recheck.

TBH, at this point I'm sort of agreeing with the thought expressed
upthread that maybe neither of these should be the default as-is.
They seem like rather arbitrary combinations of choices.  In particular
I wonder why there's not an option to store keys and values separately,
but as hashes not as the original strings, so that indexability of
everything could be guaranteed.  Or a variant of that might be to hash
only strings that are too large to fit in an index entry, and force
recheck only when searching for a string that needed hashing.

I wonder whether the most effective use of time at this point
wouldn't be to fix jsonb_ops to do that, rather than arguing about
what to rename it to.  If it didn't have the failure-for-long-strings
problem I doubt anybody would be unhappy about making it the default.
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: [COMMITTERS] pgsql: pgindent run for 9.4
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [COMMITTERS] pgsql: pgindent run for 9.4