Re: Collecting statistics about contents of JSONB columns

Поиск
Список
Период
Сортировка
От Greg Stark
Тема Re: Collecting statistics about contents of JSONB columns
Дата
Msg-id CAM-w4HNeUQu0Dvh=-fd+h2ywZEfnxWpLDxHb8HjvqkMPevkekA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Collecting statistics about contents of JSONB columns  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Список pgsql-hackers
On Thu, 6 Jan 2022 at 14:56, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>
>
> Not sure I understand. I wasn't suggesting any user-defined filtering,
> but something done by default, similarly to what we do for regular MCV
> lists, based on frequency. We'd include frequent paths while excluding
> rare ones.
>
> So no need for a user interface.

Not sure but I think he was agreeing with you. That we should figure
out the baseline behaviour and get it as useful as possible first then
later look at adding some way to customize it. I agree -- I don't
think the user interface will be hard technically but I think it will
require some good ideas and there could be lots of bikeshedding. And a
lot of users will never even use it anyways so it's important to get
the defaults as useful as possible.

> Similarly for the non-scalar values - I don't think we can really keep
> regular statistics on such values (for the same reason why it's not
> enough for whole JSONB columns), so why to build/store that anyway.

For a default behaviour I wonder if it wouldn't be better to just
flatten and extract all the scalars. So if there's no matching path
then at least we have some way to estimate how often a scalar appears
anywhere in the json document.

That amounts to assuming the user knows the right path to find a given
scalar and there isn't a lot of overlap between keys. So it would at
least do something useful if you have something like {gender: female,
name: {first: nancy, last: reagan], state: california, country: usa}.
It might get things slightly wrong if you have some people named
"georgia" or have names that can be first or last names.

But it would generally be doing something more or less useful as long
as they look for "usa" in the country field and "male" in the gender
field. If they looked for "male" in $.name.first path it would give
bad estimates but assuming they know their data structure they won't
be doing that.

-- 
greg



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Blake, Geoff"
Дата:
Сообщение: Re: Add spin_delay() implementation for Arm in s_lock.h
Следующее
От: Robert Haas
Дата:
Сообщение: Re: refactoring basebackup.c