Re: New hashed IN code ignores distinctiveness of subquery

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: New hashed IN code ignores distinctiveness of subquery
Дата
Msg-id 29875.1043608189@sss.pgh.pa.us
обсуждение исходный текст
Ответ на New hashed IN code ignores distinctiveness of subquery  (Bradley Baetz <bbaetz@acm.org>)
Ответы Re: New hashed IN code ignores distinctiveness of subquery  (Bradley Baetz <bbaetz@acm.org>)
Список pgsql-bugs
Bradley Baetz <bbaetz@acm.org> writes:
> I've been trying out the new hased subselect code from CVS. It appears
> that the planner isn't taking the distinctiveness of the values from the
> subselect into account:

This isn't really anything to do with the new IN code, but is a
long-standing problem: cost_mergejoin doesn't apply any penalty factor
for the case where there are lots of duplicates in both inner and outer
relation (causing rescans of big chunks of the inner relation).  You can
see the rescanning happening in the EXPLAIN ANALYZE output:

>    ->  Merge Join  (cost=0.00..3485661.38 rows=5570 width=8) (actual
> time=0.15..1429696.69 rows=50000 loops=1)
>          Merge Cond: ("outer".product_id = "inner".product_id)
>          ->  Index Scan using bugs_product_id_idx on bugs
> (cost=0.00..2313.33 rows=50000 width=4) (actual time=0.12..358.43
> rows=50000 loops=1)
>          ->  Index Scan using bugs_product_id_idx on bugs
> (cost=0.00..2313.33 rows=50000 width=4) (actual time=0.01..1152455.44
> rows=277884160 loops=1)
  ^^^^^^^^^^^^^^

277884160 rows pulled from a 50000-row relation means a heck of a lot of
rescanning went on :-(

The good news is that the system *is* aware of the small number of
distinct values in the table (note the dead-on estimate of the number of
distinct rows in your other query; which I think is from new-for-7.4
code, though the required stats have been available since 7.2).

I think it'd probably be possible to make some reasonable estimate of
the amount of rescanning required, and then inflate the mergejoin cost
estimate proportionally.  I have not gotten around to looking at the
problem though.  Care to take a shot at it?

            regards, tom lane

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Bradley Baetz
Дата:
Сообщение: New hashed IN code ignores distinctiveness of subquery
Следующее
От: Neil Conway
Дата:
Сообщение: Re: Bug #883: explain analyze causes postgres to die