Обсуждение: why my query is not using index??

Поиск
Список
Период
Сортировка

why my query is not using index??

От
HyunSung Jang
Дата:
postgres=# explain ANALYZE select * from test where today < '2004-01-01';
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..19.51 rows=334 width=44) (actual
time=0.545..2.429 rows=721 loops=1)
   Filter: (today < '2004-01-01 00:00:00'::timestamp without time zone)
 Total runtime: 3.072 ms
(3 rows)

postgres=# explain ANALYZE select * from test where today > '2003-01-01'
and today < '2004-01-01';
                                                                  QUERY
PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_today on test  (cost=0.00..18.89 rows=6 width=44)
(actual time=0.055..1.098 rows=365 loops=1)
   Index Cond: ((today > '2003-01-01 00:00:00'::timestamp without time
zone) AND (today < '2004-01-01 00:00:00'::timestamp without time zone))
 Total runtime: 1.471 ms
(3 rows)

hello

I was expected 1st query should using index, but it doesn't
2nd query doing perfect as you see.

can you explain to me why it's not doing that i expected??
now I'm currently using postgresql 8.0pre3 on linux

/hyunsung jang.

Re: why my query is not using index??

От
Janning Vygen
Дата:
Am Mittwoch, 6. Oktober 2004 09:31 schrieben Sie:
> postgres=# explain ANALYZE select * from test where today < '2004-01-01';
>                                              QUERY PLAN
>------------------------- Seq Scan on test  (cost=0.00..19.51 rows=334
> width=44) (actual
> time=0.545..2.429 rows=721 loops=1)
>    Filter: (today < '2004-01-01 00:00:00'::timestamp without time zone)
>  Total runtime: 3.072 ms
> (3 rows)
>
> postgres=# explain ANALYZE select * from test where today > '2003-01-01'
> and today < '2004-01-01';
>                                                                   QUERY
> PLAN
> --------------------------------------------------------------- Index
> Scan using idx_today on test  (cost=0.00..18.89 rows=6 width=44) (actual
> time=0.055..1.098 rows=365 loops=1)
>    Index Cond: ((today > '2003-01-01 00:00:00'::timestamp without time
> zone) AND (today < '2004-01-01 00:00:00'::timestamp without time zone))
>  Total runtime: 1.471 ms
> (3 rows)
>
> hello
>
> I was expected 1st query should using index, but it doesn't
> 2nd query doing perfect as you see.

postgres uses a seq scan if its faster. In your case postgres seems to know
that most of your rows have a date < 2004-01-01 and so doesn't need to
consult the index if it has to read every page anyway. seq scan can be faster
on small tables. try (in psql) "SET enable_seqscan TO off;"  before running
your query and see how postgres plans it without using seq scan.

janning




Re: why my query is not using index??

От
Tom Lane
Дата:
HyunSung Jang <siche@siche.net> writes:
> can you explain to me why it's not doing that i expected??

Have you ANALYZEd this table recently?  The estimated row counts seem
way off.

            regards, tom lane

Re: why my query is not using index??

От
Francisco Reyes
Дата:
On Mon, 11 Oct 2004, Janning Vygen wrote:

> postgres uses a seq scan if its faster. In your case postgres seems to know
> that most of your rows have a date < 2004-01-01 and so doesn't need to
> consult the index if it has to read every page anyway. seq scan can be faster
> on small tables. try (in psql) "SET enable_seqscan TO off;"  before running
> your query and see how postgres plans it without using seq scan.


I was about to post and saw this message.
I have a query that was using sequential scans. Upon turning seqscan to
off it changed to using the index. What does that mean?
The tables are under 5k records so I wonder if that is why the optimizer
is option, on it's default state, to do sequential scans.

I was also wondering if there is a relation between the sequential scans
and the fact that my entire query is a series of left joins:

(1)FROM Accounts
(2)LEFT JOIN Equity_Positions ON Accounts.Account_ID =
(3)Equity_Positions.Account_ID
(4)LEFT JOIN Equities USING( Equity_ID )
(5)LEFT JOIN Benchmarks USING( Benchmark_ID )
(6)LEFT JOIN Equity_Prices ON Equities.equity_id = Equity_Prices.equity_id
(7)         AND Equity_Positions.Equity_Date = Equity_Prices.Date
(8)LEFT JOIN Benchmark_Positions ON Equities.Benchmark_ID =
(9)Benchmark_Positions.Benchmark_ID
(10)         AND Equity_Positions.Equity_Date =
(11)Benchmark_Positions.Benchmark_Date
(12)WHERE Client_ID =32

When I saw the default explain I was surprised to see that indexes were
not been used. For example the join on lines 4,5 are exactly the primary
key of the tables yet a sequential scan was used.

The default explain was:

Sort  (cost=382.01..382.15 rows=56 width=196)
    Sort Key: accounts.account_group, accounts.account_name,
equities.equity_description, equity_positions.equity_date
    ->  Hash Left Join  (cost=357.36..380.39 rows=56 width=196)
          Hash Cond: (("outer".benchmark_id = "inner".benchmark_id) AND ("outer".equity_date = "inner".benchmark_date))
          ->  Hash Left Join  (cost=353.41..375.46 rows=56 width=174)
                Hash Cond: (("outer".equity_id = "inner".equity_id) AND ("outer".equity_date = "inner".date))
                ->  Hash Left Join  (cost=292.22..296.90 rows=56 width=159)
                     Hash Cond: ("outer".benchmark_id = "inner".benchmark_id)
                     ->  Merge Right Join  (cost=290.40..294.51 rows=56 width=137)
                            Merge Cond: ("outer".equity_id = "inner".equity_id)
                            ->  Sort  (cost=47.19..48.83 rows=655 width=70)
                                  Sort Key: equities.equity_id
                                  ->  Seq Scan on equities  (cost=0.00..16.55 rows=655 width=70)
                            ->  Sort  (cost=243.21..243.35 rows=56 width=67)
                                  Sort Key: equity_positions.equity_id
                                  ->  Nested Loop Left Join  (cost=0.00..241.58 rows=56 width=67)
                                        ->  Seq Scan on accounts  (cost=0.00..5.80 rows=3 width=44)
                                              Filter: (client_id = 32)
                                      ->  Index Scan using positions_acct_equity_date on equity_positions
(cost=0.00..78.30rows=23 width=27) 
                                              Index Cond: ("outer".account_id = equity_positions.account_id)
                      ->  Hash  (cost=1.66..1.66 rows=66 width=22)
                            ->  Seq Scan on benchmarks  (cost=0.00..1.66 rows=66 width=22)
                ->  Hash  (cost=50.79..50.79 rows=2079 width=23)
                      ->  Seq Scan on equity_prices  (cost=0.00..50.79 rows=2079 width=23)
          ->  Hash  (cost=3.30..3.30 rows=130 width=30)
                ->  Seq Scan on benchmark_positions  (cost=0.00..3.30 rows=130 width=30)


After set enable_seqscan to off;
It becomes

Sort  (cost=490.82..490.96 rows=56 width=196)
    Sort Key: accounts.account_group, accounts.account_name,
equities.equity_description, equity_positions.equity_date
    ->  Merge Left Join  (cost=309.75..489.20 rows=56 width=196)
          Merge Cond: ("outer".benchmark_id = "inner".benchmark_id)
          Join Filter: ("outer".equity_date = "inner".benchmark_date)
          ->  Nested Loop Left Join  (cost=309.75..644.88 rows=56 width=174)
                ->  Merge Left Join  (cost=309.75..315.90 rows=56 width=159)
                      Merge Cond: ("outer".benchmark_id = "inner".benchmark_id)
                      ->  Sort  (cost=309.75..309.89 rows=56 width=137)
                            Sort Key: equities.benchmark_id
                            ->  Merge Right Join  (cost=254.43..308.12 rows=56 width=137)
                                  Merge Cond: ("outer".equity_id = "inner".equity_id)
                                  ->  Index Scan using equities_pkey on equities  (cost=0.00..51.21 rows=655 width=70)
                                  ->  Sort  (cost=254.43..254.57 rows=56 width=67)
                                        Sort Key: equity_positions.equity_id
                                        ->  Nested Loop Left Join  (cost=0.00..252.81 rows=56 width=67)
                                              ->  Index Scan using accounts_pkey on accounts  (cost=0.00..17.02 rows=3
width=44)
                                                    Filter: (client_id = 32)
                                              ->  Index Scan using positions_acct_equity_date on equity_positions
(cost=0.00..78.30rows=23 width=27) 
                                                    Index Cond: ("outer".account_id = equity_positions.account_id)
                      ->  Index Scan using benchmarks_pkey on benchmarks  (cost=0.00..5.57 rows=66 width=22)
                ->  Index Scan using equity_prices_equity_date on equity_prices  (cost=0.00..5.86 rows=1 width=23)
                      Index Cond: (("outer".equity_id = equity_prices.equity_id) AND ("outer".equity_date =
equity_prices.date))
          ->  Index Scan using benchpositions_acct_equity_date on benchmark_positions  (cost=0.00..10.82 rows=130
width=30)

Re: why my query is not using index??

От
Janning Vygen
Дата:
Am Montag, 11. Oktober 2004 22:49 schrieb Francisco Reyes:
> On Mon, 11 Oct 2004, Janning Vygen wrote:
> > postgres uses a seq scan if its faster. In your case postgres seems to
> > know that most of your rows have a date < 2004-01-01 and so doesn't need
> > to consult the index if it has to read every page anyway. seq scan can be
> > faster on small tables. try (in psql) "SET enable_seqscan TO off;"
> > before running your query and see how postgres plans it without using seq
> > scan.
>
> I was about to post and saw this message.
> I have a query that was using sequential scans. Upon turning seqscan to
> off it changed to using the index. What does that mean?

enable_seqscan off means that postgres is not allowed to use seqscan.
default is on and postgres decides for each table lookup which method is
faster: seq scan or index scan. thats what the planner does: deciding which
access method might be the fastest.

> The tables are under 5k records so I wonder if that is why the optimizer
> is option, on it's default state, to do sequential scans.

if you have small tables, postgres is using seqscan to reduce disk lookups.
postgresql reads disk pages in 8k blocks. if your whole table is under 8k
there is no reason for postgres to load an index from another disk page
because it has to load the whole disk anyway.

not sure, but i think postgres also analyzes the table to see which values are
in there. if you have a huge table with a column of integers and postgres
knows that 99% are of value 1 and you are looking for a row with a value of
1, why should it use an index just to see that it has to load the whole table
to find a matching row.

And that's why you can't make performance tests with small tables. you need
test data which is as close as possible to real data.

> I was also wondering if there is a relation between the sequential scans
> and the fact that my entire query is a series of left joins:

no.

janning

Re: why my query is not using index??

От
John Meinel
Дата:
Francisco Reyes wrote:
> On Mon, 11 Oct 2004, Janning Vygen wrote:
>
[...]
> When I saw the default explain I was surprised to see that indexes were
> not been used. For example the join on lines 4,5 are exactly the primary
> key of the tables yet a sequential scan was used.
>

Note this:
> The default explain was:
>
> Sort  (cost=382.01..382.15 rows=56 width=196)
>    Sort Key: accounts.account_group, accounts.account_name,

[...]

Versus this:
>
> After set enable_seqscan to off;
> It becomes
>
> Sort  (cost=490.82..490.96 rows=56 width=196)
>    Sort Key: accounts.account_group, accounts.account_name,

[...]

Postgres believes that it will cost 382 to do a sequential scan, versus
490 for an indexed scan. Hence why it prefers to do the sequential scan.
Try running explain analyze to see if how accurate it is.

As Janning mentioned, sometimes sequential scans *are* faster. If the
number of entries that will be found is large compared to the number of
total entries (I don't know the percentages, but probably >30-40%), then
it is faster to just load the data and scan through it, rather than
doing a whole bunch of indexed lookups.

John
=:->

Вложения

Re: why my query is not using index??

От
Greg Stark
Дата:
John Meinel <john@johnmeinel.com> writes:

> As Janning mentioned, sometimes sequential scans *are* faster. If the number of
> entries that will be found is large compared to the number of total entries (I
> don't know the percentages, but probably >30-40%),

Actually 30%-40% is unrealistic. The traditional rule of thumb for the
break-even point was 10%. In POstgres the actual percentage varies based on
how wide the records are and how correlated the location of the records is
with the index. Usually it's between 5%-10% but it can be even lower than that
sometimes.

--
greg

Re: why my query is not using index??

От
Francisco Reyes
Дата:
On Mon, 11 Oct 2004, John Meinel wrote:

> Postgres believes that it will cost 382 to do a sequential scan, versus 490
> for an indexed scan. Hence why it prefers to do the sequential scan. Try
> running explain analyze to see if how accurate it is.

With explain analyze I have with sequential scan on
Sort  (cost=382.01..382.15 rows=56 width=196)
(actual time=64.346..64.469 rows=24 loops=1)


And with seqscan off
Sort  (cost=490.82..490.96 rows=56 width=196)
(actual time=56.668..56.789 rows=24 loops=1)

So I guess that for this particular query I am better off setting the
seqscan off.

Re: why my query is not using index??

От
Tom Lane
Дата:
Francisco Reyes <lists@natserv.com> writes:
> With explain analyze I have with sequential scan on
> Sort  (cost=382.01..382.15 rows=56 width=196)
> (actual time=64.346..64.469 rows=24 loops=1)

> And with seqscan off
> Sort  (cost=490.82..490.96 rows=56 width=196)
> (actual time=56.668..56.789 rows=24 loops=1)

> So I guess that for this particular query I am better off setting the
> seqscan off.

For that kind of margin, you'd be a fool to do any such thing.

You might want to look at making some adjustment to random_page_cost
to bring the estimated costs in line with reality (though I'd counsel
taking more than one example into account while you tweak it).  But
setting seqscan off as a production setting is just a recipe for
shooting yourself in the foot.

            regards, tom lane