Re: Index usage with sub select or outer joins

От: Julien Theulier
Тема: Re: Index usage with sub select or outer joins
Дата: ,
Msg-id: 004d01c944d8$ae1b01c0$0a510540$@com
(см: обсуждение, исходный текст)
Ответ на: Re: Index usage with sub select or inner joins  (Joshua Tolley)
Список: pgsql-performance

Скрыть дерево обсуждения

Using index for IS NULL query  ("Andrus", )
 Re: Using index for IS NULL query  (Tom Lane, )
  Re: Using index for IS NULL query  (Andreas Kretschmer, )
  Re: Using index for IS NULL query  (Matthew Wakeling, )
 Re: Using index for IS NULL query  (Andreas Kretschmer, )
 Re: Using index for IS NULL query  (Tomas Vondra, )
  Re: Using index for IS NULL query  ("Vladimir Sitnikov", )
   Re: Using index for IS NULL query  ("Scott Marlowe", )
   Re: Using index for IS NULL query  (Tomas Vondra, )
  Index usage with sub select or inner joins  ("Julien Theulier", )
   Re: Index usage with sub select or inner joins  (Joshua Tolley, )
    Re: Index usage with sub select or outer joins  ("Julien Theulier", )

Hello, Joshua,

I did different test cases and here are the results (numbers in seconds),
using (case sub queries) or not (case join) the index:
Rows (main table)    Outer Join        Sub queries
setting
1396163 rows    39.2            19.6
work_mem=256Mb
3347443 rows     72.2            203.1
work_mem=256Mb
3347443 rows     70.3            31.1
work_mem=1024Mb
4321072 rows     115            554.9
work_mem=256Mb
4321072 rows     111            583
work_mem=1024Mb
All outer joins where done without index uses

To force the use of the index for the first case (outer join), I have change
the seq_scan cost (from 1 to 2.5), it takes now only 6.1s for the outer join
on 1.4M rows. New explain plan below:
"HashAggregate  (cost=457881.84..460248.84 rows=39450 width=49)"
"  ->  Nested Loop Left Join  (cost=0.00..456994.22 rows=39450 width=49)"
"        ->  Seq Scan on bm_us_views_main_2608 a  (cost=0.00..223677.45
rows=39450 width=41)"
"              Filter: ((item_type = ANY ('{7,9}'::numeric[])) AND (qty >
1))"
"        ->  Index Scan using bm_us_bids_item_ix on bm_us_bids b
(cost=0.00..5.65 rows=13 width=19)"
"              Index Cond: ((b.item_id = a.item_id) AND (b.bid_date <
a.pv_timestamp) AND (b.bid_date >= (a.pv_timestamp - '60 days'::interval)))"

Index bm_us_bids_item_ix is on item_id, bidder_id (not used in the
condition) & bid_date

What can be the recommendations on tuning the different costs so it can
better estimate the seq scan & index scans costs? I think the issue is
there. But didn't find any figures helping to choose the correct parameters
according to cpu & disks speed

Regards,
Julien Theulier

-----Message d'origine-----
De : Joshua Tolley [mailto:]
Envoyé : mercredi 12 novembre 2008 14:54
À : Julien Theulier
Cc : 
Objet : Re: [PERFORM] Index usage with sub select or inner joins

On Wed, Nov 12, 2008 at 02:22:47PM +0100, Julien Theulier wrote:
> QUESTION: Why the planner choose seq scan in the first case & indexes
> scan in the second case? In a more general way, I observed that the
> planner has difficulties to select index scans & does in almost all
> the cases seq scan, when doing join queries. After investigations, it
> looks like when you join table a with table b on a column x and y and
> you have an index on column x only, the planner is not able to choose
> the index scan. You have to build the index corresponding exactly to
> the join statement btw the 2 tables

Short, general answer: index scans aren't always faster than sequential
scans, and the planner is smart enough to know that. Googling "Why isn't
postgresql using my index" provides more detailed results, but in short, if
it scans an index, it has to read pages from the index, and for all the
tuples it finds in the index, it has to read once again from the heap,
whereas a sequential scan requires reading once from the heap. If your query
will visit most of the rows of the table, pgsql will choose a sequential
scan over an index scan.

- Josh / eggyknap



В списке pgsql-performance по дате сообщения:

От: Tom Lane
Дата:
Сообщение: Re: Performance Question
От: Franck Routier
Дата:
Сообщение: Disk usage question