Re: issue with query optimizer when joining two partitioned tables

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: issue with query optimizer when joining two partitioned tables
Дата
Msg-id 4E180920.8040806@enterprisedb.com
обсуждение исходный текст
Ответ на issue with query optimizer when joining two partitioned tables  (Anish Kejariwal <anishkej@gmail.com>)
Ответы Re: issue with query optimizer when joining two partitioned tables
Список pgsql-performance
On 09.07.2011 00:36, Anish Kejariwal wrote:
> My guess as to what happened:
> -because the icecream parent table has zero records, the query optimizer
> chooses the incorrect execution plan
> -when I do select * from icecream, the optimizer now knows how many records
> are really in the icecream table, by knowing that the icecream table has
> partitions.

"select * from icecream" won't have any direct effect on the
optimization of subsequent queries. What probably happened is that
autoanalyze ran in the background while you ran that select, and
analyzed some of the partitions. Simply waiting a while would've had the
same effect.

> Next, if I run vacuum analyze on the parent table, I again get a wrong/slow
> execution plan (this time it uses the hash join).  Again, I think this is
> because the parent table itself has zero records.
>
>
>
>             QUERY PLAN
>
>
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>   HashAggregate  (cost=191926.03..191962.95 rows=1136 width=41) (actual
> time=28967.567..28994.395 rows=11028 loops=1)
>     ->   Hash Join  (cost=166424.79..191585.47 rows=11352 width=41) (actual
> time=28539.196..28917.830 rows=11028 loops=1)
>           Hash Cond: (r.widget_id = rc.widget_id)
>           ->   Append  (cost=4.28..1149.30 rows=11352 width=21) (actual
> time=0.054..54.068 rows=11028 loops=1)
>                 ->   Bitmap Heap Scan on widget r  (cost=4.28..12.75 rows=1
> width=48) (actual time=0.013..0.013 rows=0 loops=1)
>                       Recheck Cond: (widgetset_id = 5)
>                       Filter: (widget_type_id = 4)
>                       ->   Bitmap Index Scan on widget_widgetset_id_idx
>   (cost=0.00..4.28 rows=4 width=0) (actual time=0.007..0.007 rows=0 loops=1)
>                             Index Cond: (widgetset_id = 5)
>                 ->   Index Scan using widget_part_5_widget_widget_type_id_idx
> on widget_part_5 r  (cost=0.00..1136.55 rows=11351 width=21) (actual
> time=0.035..22.419 rows=11028 loops=1)
>                       Index Cond: (widget_type_id = 4)
>                       Filter: (widgetset_id = 5)
>           ->   Hash  (cost=92214.73..92214.73 rows=4041823 width=24) (actual
> time=28438.419..28438.419 rows=4041866 loops=1)
>                 Buckets: 524288  Batches: 2  Memory Usage: 118449kB
>                 ->   Append  (cost=0.00..92214.73 rows=4041823 width=24)
> (actual time=0.020..14896.908 rows=4041866 loops=1)
>                       ->   Seq Scan on icecream rc  (cost=0.00..23.00 rows=5
> width=24) (actual time=0.002..0.002 rows=0 loops=1)
>                             Filter: (dataset_id = 281)
>                       ->   Seq Scan on icecream_part_281 rc
>   (cost=0.00..92191.73 rows=4041818 width=24) (actual time=0.012..5718.592
> rows=4041866 loops=1)
>                             Filter: (dataset_id = 281)
>   Total runtime: 29007.937 ms
> (20 rows)

The cost estimates in the above slow plan are pretty accurate, so I
suspect the cost estimates for the fast plan are not, or the planner
would choose that.

> I could of course disable hash join and merge join to force postgres to use
> a nested loop, but my system is often joining these two tables, and I'd
> rather not have to set this in every single place.
> set enable_mergejoin=off;
> set enable_hashjoin=off;
> set enable_nestloop = on;

Can you do explain analyze with these settings? That might give us a
clue on where it's going wrong.

Also, I suspect that when you load more data into icecream, the planner
might start to pick the faster plan, because the seqscan on icecream
will start to look more expensive compared to the index scan and nested
loop join in the faster plan.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

В списке pgsql-performance по дате отправления:

Предыдущее
От: Gael Le Mignot
Дата:
Сообщение: Memory usage of auto-vacuum
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: Memory usage of auto-vacuum