Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Дата
Msg-id 5344EAA4.1050605@vmware.com
обсуждение исходный текст
Ответ на Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)  (Andrew Dunstan <andrew@dunslane.net>)
Ответы Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)  (Peter Geoghegan <pg@heroku.com>)
Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)  (Alexander Korotkov <aekorotkov@gmail.com>)
Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 04/09/2014 01:18 AM, Andrew Dunstan wrote:
>
> On 04/08/2014 05:57 PM, Peter Geoghegan wrote:
>> On Tue, Apr 8, 2014 at 2:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Well, let me see if I understand the situation correctly:
>>>
>>> * jsonb_ops supports more operators
>>>
>>> * jsonb_hash_ops produces smaller, better-performing indexes
>>>
>>> * jsonb_ops falls over on inputs with wide field values, but
>>> jsonb_hash_ops does not
>> There might be some compelling cases for indexing existence rather
>> than containment, since the recheck flag isn't set there, but in
>> general this summary seems sound. I would say that broadly, existence
>> is a less useful operator than containment, and so jsonb_hash_ops is
>> broadly more compelling. I didn't propose changing the default due to
>> concerns about the POLA, but I'm happy to be told that those concerns
>> were out of proportion to the practical benefits of a different
>> default.
>
> I tend to agree with Tom that POLA will be more violated by the default
> ops class not being able to index some values.

Yeah.

<rant>

Both of the operator classes are actually much less flexible than I'd 
like. Firstly, they index everything. In many cases, that's not what you 
want, so you end up with much larger indexes than necessary. Secondly, 
jsonb_ops indexes all values separately from the keys. That makes the 
index pretty much useless for a query on, say, WHERE json @> 
'{"needs_processing":true}', if all the rows also contain a key-value 
pair "active":true. Thirdly, inequality operators are not supported; you 
can't search for rows with (the json-syntax equivalent of) "price < 
12.3". Fourthly, sometimes you would want to include the "path" to an 
entry in the key, sometimes not.

If I understood correctly the way jsonb_hash_ops works, the limitation 
compared to jsonb_ops is that it cannot be used for foo ? 'bar' type 
queries. And the reason for that limitation is that it hashes the whole 
path to the key; the naked values are not indexes separately. But why 
not? jsonb_ops does - why is that decision related to whether you hash 
or not? Or it could index both. Sure, it would be wasteful when you 
don't need to support foo ? 'bar', but the point is that it should be up 
to the DBA to decide, based on his needs.

As the code stands, you don't have a choice on any of those things. The 
decisions have been made by us, PostgreSQL developers. The only choice 
you have is between jsonb_ops and jsonb_hash_ops, with a strange 
combination of tradeoffs in both. Sure, they're still useful, if not 
optimal, for a wide-range of applications. For more complicated cases, 
you will have to resort to expression indexes. It bugs me greatly that 
the underlying indexam could do all those things, we're just not 
exposing the capability.

ISTM we need a way to parameterize opclasses, so that when you create 
the index, you specify the above things.

</rant>

The ship has cleatly sailed to add parameterized opclasses to 9.4, but 
let's keep it in mind when we decide on the defaults.

In the absence of parameterizable opclasses, it would be much more 
flexible to have opclasses that index, keys, values, key-value pairs and 
paths separately, instead of the current json_ops and json_hash_ops 
opclasses which index all of those in the same index. That way, if you 
only e.g. ever query on the existence of a key, you'd only need to index 
the keys.

I don't understand how we ended up with the current dichotomy of 
json_ops and json_hash_ops...

- Heikki



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Sergey Muraviov
Дата:
Сообщение: Re: Problem with displaying "wide" tables in psql
Следующее
От: Ashutosh Bapat
Дата:
Сообщение: Pointer to structure in ECPG