Re: Planner incorrectly choosing seq scan over index scan

Поиск
Список
Период
Сортировка
От Meetesh Karia
Тема Re: Planner incorrectly choosing seq scan over index scan
Дата
Msg-id fc5b04ca05080200054a6da4c3@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Planner incorrectly choosing seq scan over index scan  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Planner incorrectly choosing seq scan over index scan
Список pgsql-performance
Thanks Tom,

That modifies the query plan slightly, but the planner still decides to do a hash join for the lte_user table aliased 't'.  Though, if I make this change and set enable_hashjoin to off, the query plan (and execution time) gets even better.

enable_hashjoin = on
----------------------------------
QUERY PLAN
Sort  (cost=10113.35..10122.02 rows=3467 width=48) (actual time=1203.000..1203.000 rows=3467 loops=1)
  Sort Key: c.sourceid, c.targetid
  ->  Nested Loop  (cost=8711.19..9909.50 rows=3467 width=48) (actual time=1156.000..1203.000 rows=3467 loops=1)
        ->  Index Scan using lte_user_pkey on lte_user s  (cost=0.00..3.02 rows=1 width=16) (actual time=0.000..0.000 rows=1 loops=1)
              Index Cond: (617004 = user_id)
        ->  Hash Join  (cost=8711.19..9776.46 rows=3467 width=40) (actual time=1156.000..1187.000 rows=3467 loops=1)
              Hash Cond: ("outer".targetid = "inner".user_id)
              ->  Seq Scan on candidates617004 c  (cost=0.00..76.34 rows=3467 width=32) (actual time=0.000..16.000 rows=3467 loops=1)
                    Filter: (sourceid = 617004)
              ->  Hash  (cost=8012.55..8012.55 rows=279455 width=16) (actual time=1141.000..1141.000 rows=0 loops=1)
                    ->  Seq Scan on lte_user t  (cost=0.00..8012.55 rows=279455 width=16) (actual time=0.000..720.000 rows=279395 loops=1)
Total runtime: 1218.000 ms

enable_hashjoin = off
-----------------------------------
QUERY PLAN
Sort  (cost=10942.56..10951.22 rows=3467 width=48) (actual time=188.000..188.000 rows=3467 loops=1)
  Sort Key: c.sourceid, c.targetid
  ->  Nested Loop  (cost=0.00..10738.71 rows=3467 width=48) (actual time=0.000..188.000 rows=3467 loops=1)
        ->  Index Scan using lte_user_pkey on lte_user s  (cost=0.00..3.02 rows=1 width=16) (actual time=0.000..0.000 rows=1 loops=1)
              Index Cond: (617004 = user_id)
        ->  Nested Loop  (cost=0.00..10605.67 rows=3467 width=40) (actual time=0.000..157.000 rows=3467 loops=1)
              ->  Seq Scan on candidates617004 c  (cost=0.00..76.34 rows=3467 width=32) (actual time=0.000..15.000 rows=3467 loops=1)
                    Filter: (sourceid = 617004)
              ->  Index Scan using lte_user_pkey on lte_user t  (cost=0.00..3.02 rows=1 width=16) (actual time=0.028..0.037 rows=1 loops=3467)
                    Index Cond: ("outer".targetid = t.user_id)
Total runtime: 188.000 ms

Thanks,
Meetesh

On 8/2/05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Meetesh Karia <meetesh.karia@gmail.com> writes:
> Sure. The lte_user table is just a collection of users. user_id is assigned=
> uniquely using a sequence. During some processing, we create a candidates=
> table (candidates617004 in our case). This table is usually a temp table.=
> sourceid is a user_id (in this case it is always 617004) and targetid is=20
> also a user_id (2860 distinct values out of 3467). The rest of the=20
> information is either only used in the select clause or not used at all=20
> during this processing.

If you know that sourceid has only a single value, it'd probably be
helpful to call out that value in the query, ie,
        where ... AND c.sourceId = 617004 ...

                        regards, tom lane

В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Planner incorrectly choosing seq scan over index scan
Следующее
От: "Alon Goldshuv"
Дата:
Сообщение: Re: [PATCHES] COPY FROM performance improvements