Re: strange row count estimates with conditions on multiple column

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: strange row count estimates with conditions on multiple column
Дата
Msg-id 13285.1289962995@sss.pgh.pa.us
обсуждение исходный текст
Ответ на strange row count estimates with conditions on multiple column  (Tomas Vondra <tv@fuzzy.cz>)
Ответы Re: strange row count estimates with conditions on multiple column  (Tomas Vondra <tv@fuzzy.cz>)
Список pgsql-general
Tomas Vondra <tv@fuzzy.cz> writes:
> Results from (A) and (B) seem strange to me because AFAIK there are no
> multi-column statistics available, and accoring to this thread

> http://archives.postgresql.org/pgsql-hackers/2009-03/msg00052.php

> the single-column estimates are not multiplied (which would be OK only
> in case of statistically independent columns).

You're misreading that thread: it's discussing row inequality
comparisons, as in your example (D).  Row equality comparisons are the
same as a bunch of per-column equality comparisons, which is why (A) and
(B) behave the same, and for that you *will* get a multiplication of the
assumed-independent clause selectivities.

> I'm not quite sure why (C) has an estimate of 1.

It's smart enough to see that each of the clauses is a range constraint
on the variable, so you get fairly tight estimates on the number of
matches ... and then those two small selectivities are multiplied
together.  It does not however notice that the range bounds are actually
equal, which would allow it to convert the estimate to a simple equality
estimate, which in many cases (including this one) would be better.
I think we've discussed special-casing that, but it doesn't look like
anybody got around to it yet.  It's a little bit tricky to do because
the range estimator doesn't really distinguish < from <= --- which
normally doesn't matter a lot, but it does when you're considering
"x >= 33 and x <= 33" versus "x > 33 and x < 33".

> And I do have exactly the same problem with the estimate in (D). Where
> the heck did 227232 come from?

It doesn't recognize that this case is a range comparison (which was a
point made in the thread you cited).  So you get a dumb multiplication
of the selectivities for col_a >= 33 and col_a <= 33.

            regards, tom lane

В списке pgsql-general по дате отправления:

Предыдущее
От: Glen Parker
Дата:
Сообщение: Re: Survey on backing up unlogged tables: help us with PostgreSQL development!
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Survey on backing up unlogged tables: help us with PostgreSQL development!