Обсуждение: seq scan instead of index scan

Поиск
Список
Период
Сортировка

seq scan instead of index scan

От
Karl Larsson
Дата:
Hello.

I have a problem I don't understand. I hope it's a simple problem and I'm
just stupid.

When I make a subquery Postgres don't care about my indexes and makes
a seq scan instead of a index scan. Why?

Is it possible that the subquery change the datatype and by this make
a index scan impossible? Can I somehow see the datatypes used by the
query?

Below is the test I'm running.

/ Karl Larsson


CREATE TABLE table_one (
    id bigint PRIMARY KEY NOT NULL
);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
"table_one_pkey" for table "table_one"

CREATE TABLE table_two (
    id bigint PRIMARY KEY NOT NULL
);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
"table_two_pkey" for table "table_two"




INSERT INTO table_one VALUES (4);
INSERT INTO table_one VALUES (3);
INSERT INTO table_one VALUES (5);
INSERT INTO table_one VALUES (2);
INSERT INTO table_one VALUES (6);
INSERT INTO table_one VALUES (1);

INSERT INTO table_two VALUES (14);
INSERT INTO table_two VALUES (12);
INSERT INTO table_two VALUES (10);
INSERT INTO table_two VALUES (8);
INSERT INTO table_two VALUES (6);
INSERT INTO table_two VALUES (4);
INSERT INTO table_two VALUES (2);



EXPLAIN ANALYZE
SELECT t2.id
FROM table_two AS t2, (
    SELECT id
    FROM table_one AS t1
    WHERE t1.id < 6
  ) AS foo
WHERE t2.id = foo.id;



                              
                               QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=35.44..78.58 rows=647 width=8) (actual time=0.076..0.088 rows=2 loops=1)
   Hash Cond: (t2.id = t1.id)
   ->  Seq Scan on table_two t2  (cost=0.00..29.40 rows=1940 width=8) (actual time=0.007..0.021 rows=7 loops=1)
   ->  Hash  (cost=27.35..27.35 rows=647 width=8) (actual time=0.038..0.038 rows=5 loops=1)
         ->  Bitmap Heap Scan on table_one t1  (cost=9.26..27.35 rows=647 width=8) (actual time=0.014..0.022 rows=5 loops=1)
               Recheck Cond: (id < 6)
               ->  Bitmap Index Scan on table_one_pkey  (cost=0.00..9.10 rows=647 width=0) (actual time=0.008..0.008 rows=5 loops=1)
                     Index Cond: (id < 6)
 Total runtime: 0.133 ms

Re: seq scan instead of index scan

От
Scott Marlowe
Дата:
On Thu, Dec 17, 2009 at 4:22 PM, Karl Larsson <karl.larsson47@gmail.com> wrote:
> Hello.
>
> I have a problem I don't understand. I hope it's a simple problem and I'm
> just stupid.
>
> When I make a subquery Postgres don't care about my indexes and makes
> a seq scan instead of a index scan. Why?

PostgreSQL uses an intelligent query planner that predicets how many
rows it will get back for each plan and chooses accordingly.  Since a
few dozen rows will all likely fit in the same block, it's way faster
to sequentially scan the table than to use an index scan.

Note that pgsql always has to go back to the original table to get the
rows anyway, since visibility info is not stored in the indexes.

Re: seq scan instead of index scan

От
"Kevin Grittner"
Дата:
Karl Larsson <karl.larsson47@gmail.com> wrote:

> When I make a subquery Postgres don't care about my indexes and
> makes a seq scan instead of a index scan. Why?

>  Total runtime: 0.133 ms

Because it thinks that it's faster that way with the particular data
you now have in your tables.  With more data, it might think some
other plan is faster.  It's running in less than 1/7500 second --
how sure are you that it would be significantly faster another way?

-Kevin

Re: seq scan instead of index scan

От
Karl Larsson
Дата:


On Fri, Dec 18, 2009 at 12:26 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Thu, Dec 17, 2009 at 4:22 PM, Karl Larsson <karl.larsson47@gmail.com> wrote:
> Hello.
>
> I have a problem I don't understand. I hope it's a simple problem and I'm
> just stupid.
>
> When I make a subquery Postgres don't care about my indexes and makes
> a seq scan instead of a index scan. Why?

PostgreSQL uses an intelligent query planner that predicets how many
rows it will get back for each plan and chooses accordingly.  Since a
few dozen rows will all likely fit in the same block, it's way faster
to sequentially scan the table than to use an index scan.

Note that pgsql always has to go back to the original table to get the
rows anyway, since visibility info is not stored in the indexes.

I forgot to mention  that I have a reel problem with 937(and growing) rows of data. My test tables
and test query is just to exemplify my problem. But I'll extend table_two and see if it change anything.

/ Karl Larsson

Re: seq scan instead of index scan

От
Greg Smith
Дата:
Karl Larsson wrote:
> When I make a subquery Postgres don't care about my indexes and makes
> a seq scan instead of a index scan. Why?
Data set is just too small for it to matter.  Watch what happens if I
continue from what you posted with much bigger tables:

postgres=# truncate table table_one;
TRUNCATE TABLE
postgres=# truncate table table_two;
TRUNCATE TABLE
postgres=# insert into table_one (select generate_series(1,100000));
INSERT 0 100000
postgres=# insert into table_two (select generate_series(1,100000));
INSERT 0 100000
postgres=# analyze;
ANALYZE
postgres=# EXPLAIN ANALYZE
SELECT t2.id
FROM table_two AS t2, (
    SELECT id
    FROM table_one AS t1
    WHERE t1.id < 6
  ) AS foo
WHERE t2.id = foo.id;
                                                             QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..91.35 rows=10 width=8) (actual
time=0.024..0.048 rows=5 loops=1)
   ->  Index Scan using table_one_pkey on table_one t1  (cost=0.00..8.44
rows=10 width=8) (actual time=0.009..0.013 rows=5 loops=1)
         Index Cond: (id < 6)
   ->  Index Scan using table_two_pkey on table_two t2  (cost=0.00..8.28
rows=1 width=8) (actual time=0.005..0.005 rows=1 loops=5)
         Index Cond: (t2.id = t1.id)
 Total runtime: 0.097 ms
(6 rows)

There's the index scan on both tables that you were expecting.

--
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com


Re: seq scan instead of index scan

От
Scott Marlowe
Дата:
On Thu, Dec 17, 2009 at 4:46 PM, Karl Larsson <karl.larsson47@gmail.com> wrote:
>
>
> On Fri, Dec 18, 2009 at 12:26 AM, Scott Marlowe <scott.marlowe@gmail.com>
> wrote:
>>
>> On Thu, Dec 17, 2009 at 4:22 PM, Karl Larsson <karl.larsson47@gmail.com>
>> wrote:
>> > Hello.
>> >
>> > I have a problem I don't understand. I hope it's a simple problem and
>> > I'm
>> > just stupid.
>> >
>> > When I make a subquery Postgres don't care about my indexes and makes
>> > a seq scan instead of a index scan. Why?
>>
>> PostgreSQL uses an intelligent query planner that predicets how many
>> rows it will get back for each plan and chooses accordingly.  Since a
>> few dozen rows will all likely fit in the same block, it's way faster
>> to sequentially scan the table than to use an index scan.
>>
>> Note that pgsql always has to go back to the original table to get the
>> rows anyway, since visibility info is not stored in the indexes.
>
> I forgot to mention  that I have a reel problem with 937(and growing) rows
> of data. My test tables
> and test query is just to exemplify my problem. But I'll extend table_two
> and see if it change anything.

Best bet is to post the real problem, not a semi-representational made
up one.  Unless the made up "test case" is truly representative and
recreates the failure pretty much the same was as the original.

Re: seq scan instead of index scan

От
Karl Larsson
Дата:
On Fri, Dec 18, 2009 at 1:10 AM, Greg Smith <greg@2ndquadrant.com> wrote:
Karl Larsson wrote:
When I make a subquery Postgres don't care about my indexes and makes
a seq scan instead of a index scan. Why?
Data set is just too small for it to matter.  Watch what happens if I continue from what you posted with much bigger tables:

postgres=# truncate table table_one;
TRUNCATE TABLE
postgres=# truncate table table_two;
TRUNCATE TABLE
postgres=# insert into table_one (select generate_series(1,100000));
INSERT 0 100000
postgres=# insert into table_two (select generate_series(1,100000));
INSERT 0 100000
postgres=# analyze;
ANALYZE
postgres=# EXPLAIN ANALYZE

SELECT t2.id
FROM table_two AS t2, (
  SELECT id
  FROM table_one AS t1
  WHERE t1.id < 6
 ) AS foo
WHERE t2.id = foo.id;
                                                           QUERY PLAN                                                            ------------------------------------------------------------------------------------------------------------------------------------
Nested Loop  (cost=0.00..91.35 rows=10 width=8) (actual time=0.024..0.048 rows=5 loops=1)
 ->  Index Scan using table_one_pkey on table_one t1  (cost=0.00..8.44 rows=10 width=8) (actual time=0.009..0.013 rows=5 loops=1)
       Index Cond: (id < 6)
 ->  Index Scan using table_two_pkey on table_two t2  (cost=0.00..8.28 rows=1 width=8) (actual time=0.005..0.005 rows=1 loops=5)
       Index Cond: (t2.id = t1.id)
Total runtime: 0.097 ms
(6 rows)

There's the index scan on both tables that you were expecting.

True. Thank you. I'll try this on my reel problem as well but I have a gut feeling it
won't work there since those tables are bigger.

/ Karl Larsson

Re: seq scan instead of index scan

От
Scott Marlowe
Дата:
On Thu, Dec 17, 2009 at 6:10 PM, Karl Larsson <karl.larsson47@gmail.com> wrote:
> On Fri, Dec 18, 2009 at 1:10 AM, Greg Smith <greg@2ndquadrant.com> wrote:
>>
>> Karl Larsson wrote:
>>>
>>> When I make a subquery Postgres don't care about my indexes and makes
>>> a seq scan instead of a index scan. Why?
>>
>> Data set is just too small for it to matter.  Watch what happens if I
>> continue from what you posted with much bigger tables:
>>
>> postgres=# truncate table table_one;
>> TRUNCATE TABLE
>> postgres=# truncate table table_two;
>> TRUNCATE TABLE
>> postgres=# insert into table_one (select generate_series(1,100000));
>> INSERT 0 100000
>> postgres=# insert into table_two (select generate_series(1,100000));
>> INSERT 0 100000
>> postgres=# analyze;
>> ANALYZE
>> postgres=# EXPLAIN ANALYZE
>> SELECT t2.id
>> FROM table_two AS t2, (
>>   SELECT id
>>   FROM table_one AS t1
>>   WHERE t1.id < 6
>>  ) AS foo
>> WHERE t2.id = foo.id;
>>                                                            QUERY PLAN
>>
>>
 ------------------------------------------------------------------------------------------------------------------------------------
>> Nested Loop  (cost=0.00..91.35 rows=10 width=8) (actual time=0.024..0.048
>> rows=5 loops=1)
>>  ->  Index Scan using table_one_pkey on table_one t1  (cost=0.00..8.44
>> rows=10 width=8) (actual time=0.009..0.013 rows=5 loops=1)
>>        Index Cond: (id < 6)
>>  ->  Index Scan using table_two_pkey on table_two t2  (cost=0.00..8.28
>> rows=1 width=8) (actual time=0.005..0.005 rows=1 loops=5)
>>        Index Cond: (t2.id = t1.id)
>> Total runtime: 0.097 ms
>> (6 rows)
>>
>> There's the index scan on both tables that you were expecting.
>
> True. Thank you. I'll try this on my reel problem as well but I have a gut
> feeling it
> won't work there since those tables are bigger.

Run it with explain analyze on the real table / SQL query and if it
doesn't run well, post it here.  Note you can do a lot to tune the
query planner, with things like random_page_cost, cpu_* cost
parameters, effective_cache_size and so on.  For troubleshooting
purposes you can use set enable_method=off where method can be things
like indexscan, nestloop, and so on.  Use show all to see them.

Re: seq scan instead of index scan

От
Karl Larsson
Дата:
> Best bet is to post the real problem, not a semi-representational made 
> up one.  Unless the made up "test case" is truly representative and
>  recreates the failure pretty much the same was as the original.

I agree at some level but I generally believe other people won't read
a big mail like that. In this case it might come to a big post from me
one day soon. :-)

Thanks to all who helped me.

/ Karl Larsson

Re: seq scan instead of index scan

От
Scott Marlowe
Дата:
On Thu, Dec 17, 2009 at 6:17 PM, Karl Larsson <karl.larsson47@gmail.com> wrote:
>> Best bet is to post the real problem, not a semi-representational made
>> up one.  Unless the made up "test case" is truly representative and
>>  recreates the failure pretty much the same was as the original.
>
> I agree at some level but I generally believe other people won't read
> a big mail like that. In this case it might come to a big post from me
> one day soon. :-)

You're on the one mailing list where they will read big posts.  It's
best if you can attach the explain analyze output as an attachment
tho, to keep it's format readable.

Re: seq scan instead of index scan

От
Tom Lane
Дата:
Greg Smith <greg@2ndquadrant.com> writes:
> Karl Larsson wrote:
>> When I make a subquery Postgres don't care about my indexes and makes
>> a seq scan instead of a index scan. Why?

> Data set is just too small for it to matter.  Watch what happens if I
> continue from what you posted with much bigger tables:
> ...
> There's the index scan on both tables that you were expecting.

And if you go much past that, it's likely to switch *away* from
indexscans again (eg, to a hash join, which has no use for ordered
input).  This is not wrong.  Indexes have their place but they are not
the solution for every query.

            regards, tom lane