Re: improving GROUP BY estimation

Поиск

Список

Период

Сортировка

От	Mark Dilger
Тема	Re: improving GROUP BY estimation
Дата	26 февраля 2016 г. 02:16:16
Msg-id	03A3B58D-6A2D-44FE-9B03-E21948CFCBAA@gmail.com обсуждение исходный текст
Ответ на	improving GROUP BY estimation (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы	Re: improving GROUP BY estimation Re: improving GROUP BY estimation
Список	pgsql-hackers

Дерево обсуждения

> On Feb 23, 2016, at 5:12 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> <snip>
>
> So much better. Clearly, there are cases where this will over-estimate the cardinality - for example when the values
aresomehow correlated. 
>

I applied your patch, which caused a few regression tests to fail.  Attached
is a patch that includes the necessary changes to the expected test results.

It is not hard to write test cases where your patched version overestimates
the number of rows by a very similar factor as the old code underestimates
them.  My very first test, which was not specifically designed to demonstrate
this, happens to be one such example:

CREATE TABLE t (a INT, b int);
INSERT INTO t SELECT sqrt(gs)::int, gs FROM generate_series(1,10000000) gs;
ANALYZE t;
EXPLAIN SELECT a FROM t WHERE b < 1000 GROUP BY a;
                          QUERY PLAN
---------------------------------------------------------------
 HashAggregate  (cost=169250.21..169258.71 rows=850 width=4)
   Group Key: a
   ->  Seq Scan on t  (cost=0.00..169247.71 rows=1000 width=4)
         Filter: (b < 1000)
(4 rows)

SELECT COUNT(*) FROM (SELECT a FROM t WHERE b < 1000 GROUP BY a) AS ss;
 count
-------
    32
(1 row)

So, it estimates 850 rows where only 32 are returned .  Without applying your patch,
it estimates just 1 row where 32 are returned.  That's an overestimate of roughly 26 times,
rather than an underestimate of 32 times.

As a patch review, I'd say that your patch does what you claim it does, and it applies
cleanly, and passes the regression tests with my included modifications.  I think there
needs to be some discussion on the list about whether the patch is a good idea.

Mark Dilger

Вложения

estimate-num-groups-v2.txt

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Julien Rouhaud
Дата: 26 февраля 2016 г., 01:56:21
Сообщение: Re: create opclass documentation outdated

Следующее

От: Masahiko Sawada
Дата: 26 февраля 2016 г., 02:53:39
Сообщение: Re: Support for N synchronous standby servers - take 2

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: improving GROUP BY estimation

Вложения

Предыдущее

Следующее