Re: working around JSONB's lack of stats?

Поиск
Список
Период
Сортировка
От Jim Nasby
Тема Re: working around JSONB's lack of stats?
Дата
Msg-id 54CC3130.5030604@BlueTreble.com
обсуждение исходный текст
Ответ на Re: working around JSONB's lack of stats?  (Josh Berkus <josh@agliodbs.com>)
Список pgsql-performance
On 1/30/15 2:26 PM, Josh Berkus wrote:
> On 01/28/2015 03:50 PM, Peter Geoghegan wrote:
>> On Wed, Jan 28, 2015 at 3:42 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>> jsonb_col @> '[ "key1" ]'
>>> or jsonb_col ? 'key1'
>>>          if in MCE, assign % from MCE
>>>          otherwise assign 1% of non-MCE %
>>>
>>> jsonb_col @> '{ "key1": "value1" }'
>>>          if in MCE, assign MCE% * 0.1
>>>          otherwise assign 0.01 of non-MCE %
>>>
>>> Does that make sense?
>>
>> I suspect it makes a lot less sense. The way people seem to want to
>> use jsonb is as a document store with a bit of flexibility. Individual
>> JSON documents tend to be fairly homogeneous in structure within a
>> table, just like with systems like MongoDB. Strings within arrays are
>> keys for our purposes, and these are often used for tags and so on.
>> But Strings that are the key of an object/pair are much less useful to
>> index, in my estimation.
>
> Yeah, I see your point; except for arrays, people are usually searching
> for a key:value pair, and the existence of the key is not in doubt.
>
> That would make the "element" the key:value pair, no?  But
> realistically, we would only want to do that for simple keys and values.
>
> Although: if you "flatten" a nested JSON structure into just keys with
> scalar values (and array items as their own thing), then you could have
> a series of expanded key:value pairs to put into MCE.
>
> For example:
>
> { house : { city : San Francisco,
>       sqft: 1200,
>       color: blue,
>       occupants: [ mom, dad, child1 ]
>       }
>    occupation: programmer
> }
>
> ... would get flattened out into the following pairs:
>
> city: san francisco
> sqft: 1200
> color: blue
> occupants: [ mom ]
> occupants: [ dad ]
> occupants: [ child1 ]
> occupation: programmer
>
> This would probably work because there aren't a lot of data structures
> where people would have the same key:value pair in different locations
> in the JSON, and care about it stats-wise.  Alternatetly, if the same
> key-value pair appears multiple times in the same sample row, we could
> cut the MC% by that multiple.

Even if there were multiple occurrences, this would probably still be an
improvement.

Another idea... at one time in the past when discussing statistics on
multiple columns, one idea was to build statistics on indexes. If we
built that, we could also do the same thing for at least JSONB (not sure
about JSON). Obviously doesn't help for stuff you haven't indexed, but
presumably if you care about performance and have any significant size
of data you've also indexed parts of the JSON, yes?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


В списке pgsql-performance по дате отправления:

Предыдущее
От: Jim Nasby
Дата:
Сообщение: Re: why pg_class.relfrozenxid needs to be updated for frozen tables (where all rows have xmin=2)?
Следующее
От: Jim Nasby
Дата:
Сообщение: Re: Query performance