Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Дата
Msg-id CAM3SWZTerM-f8nhmuatR-YhPiMwrqR01=TwtCNM1E9PERkvAAA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Ответы Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Список pgsql-hackers
On Sat, May 10, 2014 at 1:42 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> For example, if you have an object like '{"foo": {"bar": 123 } }', one will
> index "foo", "foo->bar", and "foo->bar->123" while the other will index
> "foo", "bar" and "123".

That isn't quite right, if we're talking about the user's perspective,
and concerns about what is made indexable. You cannot actually query
"foo->bar" there, because containment semantics don't support it.
Basically, you'd have to write that like this:

select * from docs where jdoc @> '{"foo":{"bar":*}'::jsonb;

I've added a wildcard to the rhs jsonb here, which of course won't
work, the proximate cause being that that simply isn't valid jsonb.
It's also something inherently impossible to support with the current
jsonb_hash_op's indexing strategy. That only hashes elements and
values, mixing in keys from all outer nesting levels (so it's possible
for there to be 0 *nentries). In the strict physical sense, it only
indexes "123" (with all keys mixed in to the hash) from your example,
because that's the only element or value that appears.

Your description of only indexing the "leaf level" is not ideal,
because it makes me think of B-Trees. Unlike B-Trees, jsonb can and
frequently will have jsonb_hash_ops-indexable primitive
values/elements at all nesting levels (that is, both "inner"
containers and "leaf" containers).

> Whether the opclasses use hashing to shorten the key is an orthogonal
> property, and IMHO not as important. To reflect that, I suggest that we name
> the opclasses:
>
> json_path_ops
> json_value_ops
>
> or something along those lines.

I would like to once again emphasize the user-level distinction
between the two: one (the text storage opclass) is better for
hstore-style indexing, where jsonb data is heterogeneous in structure,
most probably a hodge-podge of key/value pairs. The other (the hash
storage opclass) is better for document-database style use cases,
where keys alone have low selectivity, and there is an
unenforced/implicit schema that is mostly adhered to by client
applications. That's why I don't think it's much of a problem that the
example query above won't work (you could do something with
expressions, or expression indexes to make something work, but it
won't work with the text-storage opclass because there is no
appropriate operator). In general, for those document-database style
use cases, indexing keys alone isn't useful. Wherever that turns out
to be untrue, ad-hoc expression indexes of the text storage opclass
are probably the best solution.

Anyway, I agree with your general assessment; hashing is nothing more
than an implementation detail. I'm not sure which of your proposed
names is intended for which opclass, though. Do you mean
"jsonb_path_ops" and "jsonb_key_ops"? That makes perfect sense to me,
because indexing keys alone is the main user-level distinction.

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: New pg_lsn type doesn't have hash/btree opclasses
Следующее
От: Tom Lane
Дата:
Сообщение: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)