Re: [PERFORM] Cheaper subquery scan not considered unless offset 0

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Re: [PERFORM] Cheaper subquery scan not considered unless offset 0
Дата	29 октября 2017 г. 17:58:00
Msg-id	29740.1509289080@sss.pgh.pa.us обсуждение исходный текст
Ответ на	[PERFORM] Cheaper subquery scan not considered unless offset 0 (Benjamin Coutu <ben.coutu@zeyos.com>)
Список	pgsql-performance

Дерево обсуждения

Benjamin Coutu <ben.coutu@zeyos.com> writes:
> Please consider the following three semantically equivalent, but differently written queries:
> ...
> Queries A + B generate the same plan and execute as follows:

>         ->  Finalize HashAggregate  (cost=32879.78..33102.62 rows=22285 width=12) (actual time=450.724..458.667
rows=15521loops=1) 
>               Group Key: b.item
>               Filter: (sum(b.amount) >= '1'::double precision)
>               Rows Removed by Filter: 48277

> Plan C though, thanks to the "offset optimization fence", executes the following, more efficient plan:

>   ->  Subquery Scan on c  (cost=32768.35..33269.76 rows=7428 width=12) (actual time=456.591..475.204 rows=15521
loops=1total=475.204) 
>         Filter: (c.stock >= '1'::double precision)
>         Rows Removed by Filter: 48277
>         ->  Finalize HashAggregate  (cost=32768.35..32991.20 rows=22285 width=12) (actual time=456.582..468.124
rows=63798loops=1 total=468.124) 
>               Group Key: b.item

Huh.  So we can see that the grouping step produces 63798 rows in reality,
of which 15521 pass the >= filter condition.  In Plan C, the planner
estimates the total number of group rows at 22285; then, having no
information about the statistics of c.stock, it uses DEFAULT_INEQ_SEL
(0.333) as the filter selectivity estimate, arriving at 7428 as the
estimated number of result rows for the subquery.

In Plan A+B, the planner presumably estimated the number of group rows at
22285 as well, but then it comes up with 22285 as the overall result.
Uh, what about the HAVING?

Evidently, the difference between 7428 and 22285 estimated rows out of
the subquery is enough to prompt a change in join plan for this query.
Since the true number is in between, it's just luck that Plan C is faster.
I don't put any great amount of stock in one join plan or the other
having been chosen for this case based on those estimates.

But ... what about the HAVING?  I took a quick look around and couldn't
find anyplace where the selectivity of an aggregate's filter condition
gets accounted for, which explains this observed behavior.  That seems
like a big oversight :-(

Now, it's true that we're basically never gonna be able to do better than
default selectivity estimates for post-aggregation filter conditions.
Maybe, at some point in the dim past, somebody intentionally decided that
applying the standard selectivity estimation logic to HAVING clauses was a
loser.  But I don't see any comments to that effect, and anyway taking the
selectivity as 1.0 all the time doesn't seem very bright either.

Changing this in back branches might be too much of a behavioral change,
but it seems like we oughta change HEAD to apply standard selectivity
estimation to the HAVING clause.
        regards, tom lane

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [PERFORM] Cheaper subquery scan not considered unless offset 0