Re: Wrong estimation of rows for hash join

Поиск

Список

Период

Сортировка

От	Alban Hertroys
Тема	Re: Wrong estimation of rows for hash join
Дата	16 октября 2009 г. 12:59:28
Msg-id	EC92797C-7836-4585-89EA-971DBA8D8BBE@solfertje.student.utwente.nl обсуждение исходный текст
Ответ на	Wrong estimation of rows for hash join (Christian Schröder <cs@deriva.de>)
Ответы	Re: Wrong estimation of rows for hash join
Список	pgsql-general

Дерево обсуждения

On 16 Oct 2009, at 10:59, Christian Schröder wrote:

> Hi list,
> I have the following query:
>   SELECT *
>   FROM base
>   INNER JOIN pt USING (x) WHERE pt.y IN ('1121', '11411', '11421',
> '1161', '1162');
>
> "explain analyze" yields the following result:
>
>                                                        QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------
> Hash Join  (cost=529.18..164930.70 rows=28374 width=0) (actual
> time=10.834..4877.326 rows=245298 loops=1)
>  Hash Cond: (b.x = pt.x)
>  ->  Seq Scan on b  (cost=0.00..159579.93 rows=1210093 width=4)
> (actual time=0.018..2464.871 rows=1210250 loops=1)
>  ->  Hash  (cost=527.41..527.41 rows=142 width=4) (actual
> time=10.775..10.775 rows=138 loops=1)
>        ->  Seq Scan on pt  (cost=0.00..527.41 rows=142 width=4)
> (actual time=0.057..10.556 rows=138 loops=1)
>              Filter: (y = ANY ('{1121,11411,11421,1161,1162}'::bpchar
> []))
> Total runtime: 5170.837 ms
>
> As you can see, the estimated number of rows in the join is much
> lower than the actual number. Normally, the join is part of a more
> complex query which gets really slow, probably (among other reasons)
> because the query planner uses the incorrect row estimate for its
> further plans.
>
> Question is: What can I do to improve that estimation? There is a
> foreign key from base.x to pt.x and both fields are declared not null.

The planner seems to think that x and y have a low selectivity, hence
the sequential scans. The estimate on the hash join seems to indicate
that the planner is actually still being too optimistic about the
selectivity of those columns. Is it really the case that those values
are random, or can their frequency in the table be predicted?

If so, you can create a selectivity function and a domain that uses
that selectivity function for equality. Check Nathan's recent video on
the subject (posted just yesterday).

I'm also somewhat surprised to see an array of what appear to be
integers be cast to bpchar[]. Did you define those coordinates(?) as
character types? Numerical comparisons tend to be faster than string
comparisons, which should make some difference on sequential scans.

> Side question: What can I do to improve the speed of the hash join
> itself? I understand that 2.5 secs are needed to perform the
> sequential scan of table b, but where do the remaining 2.5 secs come
> from?

As I read it the seq-scans take up the first 2.5s and the actual Hash
Join the other 2.5s.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,4ad86e2511683848115674!

В списке pgsql-general по дате отправления:

Предыдущее

От: Grzegorz Jaśkiewicz
Дата: 16 октября 2009 г., 10:55:55
Сообщение: savepoint name vs prepared transaction name

Следующее

От: Arnaud Lesauvage
Дата: 16 октября 2009 г., 13:30:53
Сообщение: Re: pgsql2shp : Encoding headache

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Wrong estimation of rows for hash join

Предыдущее

Следующее