Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Дата
Msg-id CAM3SWZS-12EHfqvDzTbtkEsQHfVLJxx8XP1=jrCXK2wh9uKW-Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Ответы Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Список pgsql-hackers
On Tue, Apr 8, 2014 at 11:37 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> As the code stands, you don't have a choice on any of those things. The
> decisions have been made by us, PostgreSQL developers. The only choice you
> have is between jsonb_ops and jsonb_hash_ops, with a strange combination of
> tradeoffs in both. Sure, they're still useful, if not optimal, for a
> wide-range of applications. For more complicated cases, you will have to
> resort to expression indexes. It bugs me greatly that the underlying indexam
> could do all those things, we're just not exposing the capability.

Why would you ever not have to use expression indexes? Idiomatic usage
of jsonb involves expression indexes because it's desirable to index
only a expression. People will want to do things like only index the
nested "tags" array far more frequently then they'll only want to
index keys (that is, Object pair keys) in the entire document. I don't
get why you'd say that they'd "resort" to expression indexes, like
they're a kludge. Have you ever tried out one of the new document
databases? I suggest you do. Expression indexes on jsonb map pretty
closely onto how you're frequently expected to index data in those
systems. That's something that they make heavy use of. Why would you
ever not really have to consider ahead of time what is important
enough to be indexed, and what is not?

> ISTM we need a way to parameterize opclasses, so that when you create the
> index, you specify the above things.

That would be nice.

> In the absence of parameterizable opclasses, it would be much more flexible
> to have opclasses that index, keys, values, key-value pairs and paths
> separately, instead of the current json_ops and json_hash_ops opclasses
> which index all of those in the same index. That way, if you only e.g. ever
> query on the existence of a key, you'd only need to index the keys.

I think only ever needing to index the keys is not a common use-case.
It pretty much makes exactly as much sense to do so as it would with
hstore, and yet hstore doesn't support that after all these years.

> I don't understand how we ended up with the current dichotomy of json_ops
> and json_hash_ops...

It makes sense if you consider jsonb_ops best suited to simpler
hstore-style indexing, while jsonb_hash_ops is best suited to testing
containment of JSON documents, potentially with lots of nesting. These
documents are typically homogeneous in structure. Idiomatic usage of
systems like MongoDB involves "collections" of fairly homogeneous
documents. If there is a lot of variability in their structure within
a collection, the collection more or less becomes impossible to
usefully query. They aim to be flexible, but still implicitly require
you to insert data with a half-way sensible/consistent structure. This
makes separately indexing the keys less than compelling as a default,
because there is so much duplication of keys in practice.

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Craig Ringer
Дата:
Сообщение: libpq api wart: no PQconnect variant that can consume output of PQconninfoParse(...)
Следующее
От: Nicolas Barbier
Дата:
Сообщение: Re: Proposal for Merge Join for Non '=' Operators