Re: PATCH: add support for IN and @> in functional-dependencystatistics use

Поиск
Список
Период
Сортировка
От Dean Rasheed
Тема Re: PATCH: add support for IN and @> in functional-dependencystatistics use
Дата
Msg-id CAEZATCXnQtw-ZytE+rp1Uby9KMs0hWim-unMa-nMdmS5MMgtbQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: PATCH: add support for IN and @> in functional-dependencystatistics use  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: PATCH: add support for IN and @> in functional-dependencystatistics use  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
[ For the sake of the archives, some of the discussion on the other
thread [1-3] should really have been on this thread. ]

On Sun, 2 Feb 2020 at 18:41, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> I think the challenge here is in applying the functional dependency
> computed for the whole array to individual elements. I'm not sure we can
> do that.
>
> For example, with a table like this:
>
>    CREATE TABLE t (a int, b int[]);
>    CREATE STATISTICS s (dependencies) ON a, b FROM t;
>
> Let's say the functional dependency is "perfect" i.e. has strength 1.0.
> But that only tells us dependency for complete array values, we don't
> know how much information we gain by knowledge of subset of the values.
>

The more I think about this example, the more I think this is really
just a special case of the more general problem of compatibility of
clauses. Once you add support for IN (...) clauses, any query of the
form

  SELECT ... WHERE (any clauses on col a) AND (any clauses on col b)

can be recast as

  SELECT ... WHERE a IN (...) AND b IN (...)

so any counter-example with bad estimates produced with a query in the
first form can also be written in the second form.

I think we should really be thinking in terms of making a strong
functional dependency (a => b) applicable generally to queries in the
first form, which will work well if the clauses on b are compatible
with those on b, but not if they're incompatible. However, that's not
so very different from the current state without extended stats, which
assumes independence, and will return poor estimates if the
columns/clauses aren't independent.

So I'd be tempted to apply a tidied up version of the patch from [3],
and then lift all restrictions from dependency_is_compatible_clause(),
other than the requirement that the clause refer to a single variable.

Regards,
Dean

[1] https://www.postgresql.org/message-id/CAEZATCXaNFZyOhR4XXAfkvj1tibRBEjje6ZbXwqWUB_tqbH%3Drw%40mail.gmail.com
[2] https://www.postgresql.org/message-id/20200309181915.5lxhuw2qxoihfoqo%40development
[3] https://www.postgresql.org/message-id/CAEZATCUic8PwhTnexC%2BUx-Z_e5MhWD-8jk%3DJ1MtnVW8TJD%2BVHw%40mail.gmail.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "曾文旌(义从)"
Дата:
Сообщение: Re: [Proposal] Global temporary tables
Следующее
От: Julien Rouhaud
Дата:
Сообщение: Re: Planning counters in pg_stat_statements (using pgss_store)