Обсуждение: seq scan instead of index scan
Hello.
I have a problem I don't understand. I hope it's a simple problem and I'm
just stupid.
When I make a subquery Postgres don't care about my indexes and makes
a seq scan instead of a index scan. Why?
Is it possible that the subquery change the datatype and by this make
a index scan impossible? Can I somehow see the datatypes used by the
query?
Below is the test I'm running.
/ Karl Larsson
CREATE TABLE table_one (
id bigint PRIMARY KEY NOT NULL
);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
"table_one_pkey" for table "table_one"
CREATE TABLE table_two (
id bigint PRIMARY KEY NOT NULL
);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
"table_two_pkey" for table "table_two"
INSERT INTO table_one VALUES (4);
INSERT INTO table_one VALUES (3);
INSERT INTO table_one VALUES (5);
INSERT INTO table_one VALUES (2);
INSERT INTO table_one VALUES (6);
INSERT INTO table_one VALUES (1);
INSERT INTO table_two VALUES (14);
INSERT INTO table_two VALUES (12);
INSERT INTO table_two VALUES (10);
INSERT INTO table_two VALUES (8);
INSERT INTO table_two VALUES (6);
INSERT INTO table_two VALUES (4);
INSERT INTO table_two VALUES (2);
EXPLAIN ANALYZE
SELECT t2.id
FROM table_two AS t2, (
SELECT id
FROM table_one AS t1
WHERE t1.id < 6
) AS foo
WHERE t2.id = foo.id;
I have a problem I don't understand. I hope it's a simple problem and I'm
just stupid.
When I make a subquery Postgres don't care about my indexes and makes
a seq scan instead of a index scan. Why?
Is it possible that the subquery change the datatype and by this make
a index scan impossible? Can I somehow see the datatypes used by the
query?
Below is the test I'm running.
/ Karl Larsson
CREATE TABLE table_one (
id bigint PRIMARY KEY NOT NULL
);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
"table_one_pkey" for table "table_one"
CREATE TABLE table_two (
id bigint PRIMARY KEY NOT NULL
);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
"table_two_pkey" for table "table_two"
INSERT INTO table_one VALUES (4);
INSERT INTO table_one VALUES (3);
INSERT INTO table_one VALUES (5);
INSERT INTO table_one VALUES (2);
INSERT INTO table_one VALUES (6);
INSERT INTO table_one VALUES (1);
INSERT INTO table_two VALUES (14);
INSERT INTO table_two VALUES (12);
INSERT INTO table_two VALUES (10);
INSERT INTO table_two VALUES (8);
INSERT INTO table_two VALUES (6);
INSERT INTO table_two VALUES (4);
INSERT INTO table_two VALUES (2);
EXPLAIN ANALYZE
SELECT t2.id
FROM table_two AS t2, (
SELECT id
FROM table_one AS t1
WHERE t1.id < 6
) AS foo
WHERE t2.id = foo.id;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=35.44..78.58 rows=647 width=8) (actual time=0.076..0.088 rows=2 loops=1)
Hash Cond: (t2.id = t1.id)
-> Seq Scan on table_two t2 (cost=0.00..29.40 rows=1940 width=8) (actual time=0.007..0.021 rows=7 loops=1)
-> Hash (cost=27.35..27.35 rows=647 width=8) (actual time=0.038..0.038 rows=5 loops=1)
-> Bitmap Heap Scan on table_one t1 (cost=9.26..27.35 rows=647 width=8) (actual time=0.014..0.022 rows=5 loops=1)
Recheck Cond: (id < 6)
-> Bitmap Index Scan on table_one_pkey (cost=0.00..9.10 rows=647 width=0) (actual time=0.008..0.008 rows=5 loops=1)
Index Cond: (id < 6)
Total runtime: 0.133 ms
-------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=35.44..78.58 rows=647 width=8) (actual time=0.076..0.088 rows=2 loops=1)
Hash Cond: (t2.id = t1.id)
-> Seq Scan on table_two t2 (cost=0.00..29.40 rows=1940 width=8) (actual time=0.007..0.021 rows=7 loops=1)
-> Hash (cost=27.35..27.35 rows=647 width=8) (actual time=0.038..0.038 rows=5 loops=1)
-> Bitmap Heap Scan on table_one t1 (cost=9.26..27.35 rows=647 width=8) (actual time=0.014..0.022 rows=5 loops=1)
Recheck Cond: (id < 6)
-> Bitmap Index Scan on table_one_pkey (cost=0.00..9.10 rows=647 width=0) (actual time=0.008..0.008 rows=5 loops=1)
Index Cond: (id < 6)
Total runtime: 0.133 ms
On Thu, Dec 17, 2009 at 4:22 PM, Karl Larsson <karl.larsson47@gmail.com> wrote: > Hello. > > I have a problem I don't understand. I hope it's a simple problem and I'm > just stupid. > > When I make a subquery Postgres don't care about my indexes and makes > a seq scan instead of a index scan. Why? PostgreSQL uses an intelligent query planner that predicets how many rows it will get back for each plan and chooses accordingly. Since a few dozen rows will all likely fit in the same block, it's way faster to sequentially scan the table than to use an index scan. Note that pgsql always has to go back to the original table to get the rows anyway, since visibility info is not stored in the indexes.
Karl Larsson <karl.larsson47@gmail.com> wrote: > When I make a subquery Postgres don't care about my indexes and > makes a seq scan instead of a index scan. Why? > Total runtime: 0.133 ms Because it thinks that it's faster that way with the particular data you now have in your tables. With more data, it might think some other plan is faster. It's running in less than 1/7500 second -- how sure are you that it would be significantly faster another way? -Kevin
On Fri, Dec 18, 2009 at 12:26 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
I forgot to mention that I have a reel problem with 937(and growing) rows of data. My test tables
and test query is just to exemplify my problem. But I'll extend table_two and see if it change anything.
/ Karl Larsson
On Thu, Dec 17, 2009 at 4:22 PM, Karl Larsson <karl.larsson47@gmail.com> wrote:PostgreSQL uses an intelligent query planner that predicets how many
> Hello.
>
> I have a problem I don't understand. I hope it's a simple problem and I'm
> just stupid.
>
> When I make a subquery Postgres don't care about my indexes and makes
> a seq scan instead of a index scan. Why?
rows it will get back for each plan and chooses accordingly. Since a
few dozen rows will all likely fit in the same block, it's way faster
to sequentially scan the table than to use an index scan.
Note that pgsql always has to go back to the original table to get the
rows anyway, since visibility info is not stored in the indexes.
I forgot to mention that I have a reel problem with 937(and growing) rows of data. My test tables
and test query is just to exemplify my problem. But I'll extend table_two and see if it change anything.
/ Karl Larsson
Karl Larsson wrote: > When I make a subquery Postgres don't care about my indexes and makes > a seq scan instead of a index scan. Why? Data set is just too small for it to matter. Watch what happens if I continue from what you posted with much bigger tables: postgres=# truncate table table_one; TRUNCATE TABLE postgres=# truncate table table_two; TRUNCATE TABLE postgres=# insert into table_one (select generate_series(1,100000)); INSERT 0 100000 postgres=# insert into table_two (select generate_series(1,100000)); INSERT 0 100000 postgres=# analyze; ANALYZE postgres=# EXPLAIN ANALYZE SELECT t2.id FROM table_two AS t2, ( SELECT id FROM table_one AS t1 WHERE t1.id < 6 ) AS foo WHERE t2.id = foo.id; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=0.00..91.35 rows=10 width=8) (actual time=0.024..0.048 rows=5 loops=1) -> Index Scan using table_one_pkey on table_one t1 (cost=0.00..8.44 rows=10 width=8) (actual time=0.009..0.013 rows=5 loops=1) Index Cond: (id < 6) -> Index Scan using table_two_pkey on table_two t2 (cost=0.00..8.28 rows=1 width=8) (actual time=0.005..0.005 rows=1 loops=5) Index Cond: (t2.id = t1.id) Total runtime: 0.097 ms (6 rows) There's the index scan on both tables that you were expecting. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
On Thu, Dec 17, 2009 at 4:46 PM, Karl Larsson <karl.larsson47@gmail.com> wrote: > > > On Fri, Dec 18, 2009 at 12:26 AM, Scott Marlowe <scott.marlowe@gmail.com> > wrote: >> >> On Thu, Dec 17, 2009 at 4:22 PM, Karl Larsson <karl.larsson47@gmail.com> >> wrote: >> > Hello. >> > >> > I have a problem I don't understand. I hope it's a simple problem and >> > I'm >> > just stupid. >> > >> > When I make a subquery Postgres don't care about my indexes and makes >> > a seq scan instead of a index scan. Why? >> >> PostgreSQL uses an intelligent query planner that predicets how many >> rows it will get back for each plan and chooses accordingly. Since a >> few dozen rows will all likely fit in the same block, it's way faster >> to sequentially scan the table than to use an index scan. >> >> Note that pgsql always has to go back to the original table to get the >> rows anyway, since visibility info is not stored in the indexes. > > I forgot to mention that I have a reel problem with 937(and growing) rows > of data. My test tables > and test query is just to exemplify my problem. But I'll extend table_two > and see if it change anything. Best bet is to post the real problem, not a semi-representational made up one. Unless the made up "test case" is truly representative and recreates the failure pretty much the same was as the original.
On Fri, Dec 18, 2009 at 1:10 AM, Greg Smith <greg@2ndquadrant.com> wrote:
True. Thank you. I'll try this on my reel problem as well but I have a gut feeling it
won't work there since those tables are bigger.
/ Karl Larsson
Karl Larsson wrote:Data set is just too small for it to matter. Watch what happens if I continue from what you posted with much bigger tables:When I make a subquery Postgres don't care about my indexes and makes
a seq scan instead of a index scan. Why?
postgres=# truncate table table_one;
TRUNCATE TABLE
postgres=# truncate table table_two;
TRUNCATE TABLE
postgres=# insert into table_one (select generate_series(1,100000));
INSERT 0 100000
postgres=# insert into table_two (select generate_series(1,100000));
INSERT 0 100000
postgres=# analyze;
ANALYZE
postgres=# EXPLAIN ANALYZENested Loop (cost=0.00..91.35 rows=10 width=8) (actual time=0.024..0.048 rows=5 loops=1)
SELECT t2.id
FROM table_two AS t2, (
SELECT id
FROM table_one AS t1
WHERE t1.id < 6
) AS foo
WHERE t2.id = foo.id;
QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------
-> Index Scan using table_one_pkey on table_one t1 (cost=0.00..8.44 rows=10 width=8) (actual time=0.009..0.013 rows=5 loops=1)
Index Cond: (id < 6)
-> Index Scan using table_two_pkey on table_two t2 (cost=0.00..8.28 rows=1 width=8) (actual time=0.005..0.005 rows=1 loops=5)
Index Cond: (t2.id = t1.id)
Total runtime: 0.097 ms
(6 rows)
There's the index scan on both tables that you were expecting.
True. Thank you. I'll try this on my reel problem as well but I have a gut feeling it
won't work there since those tables are bigger.
/ Karl Larsson
On Thu, Dec 17, 2009 at 6:10 PM, Karl Larsson <karl.larsson47@gmail.com> wrote: > On Fri, Dec 18, 2009 at 1:10 AM, Greg Smith <greg@2ndquadrant.com> wrote: >> >> Karl Larsson wrote: >>> >>> When I make a subquery Postgres don't care about my indexes and makes >>> a seq scan instead of a index scan. Why? >> >> Data set is just too small for it to matter. Watch what happens if I >> continue from what you posted with much bigger tables: >> >> postgres=# truncate table table_one; >> TRUNCATE TABLE >> postgres=# truncate table table_two; >> TRUNCATE TABLE >> postgres=# insert into table_one (select generate_series(1,100000)); >> INSERT 0 100000 >> postgres=# insert into table_two (select generate_series(1,100000)); >> INSERT 0 100000 >> postgres=# analyze; >> ANALYZE >> postgres=# EXPLAIN ANALYZE >> SELECT t2.id >> FROM table_two AS t2, ( >> SELECT id >> FROM table_one AS t1 >> WHERE t1.id < 6 >> ) AS foo >> WHERE t2.id = foo.id; >> QUERY PLAN >> >> ------------------------------------------------------------------------------------------------------------------------------------ >> Nested Loop (cost=0.00..91.35 rows=10 width=8) (actual time=0.024..0.048 >> rows=5 loops=1) >> -> Index Scan using table_one_pkey on table_one t1 (cost=0.00..8.44 >> rows=10 width=8) (actual time=0.009..0.013 rows=5 loops=1) >> Index Cond: (id < 6) >> -> Index Scan using table_two_pkey on table_two t2 (cost=0.00..8.28 >> rows=1 width=8) (actual time=0.005..0.005 rows=1 loops=5) >> Index Cond: (t2.id = t1.id) >> Total runtime: 0.097 ms >> (6 rows) >> >> There's the index scan on both tables that you were expecting. > > True. Thank you. I'll try this on my reel problem as well but I have a gut > feeling it > won't work there since those tables are bigger. Run it with explain analyze on the real table / SQL query and if it doesn't run well, post it here. Note you can do a lot to tune the query planner, with things like random_page_cost, cpu_* cost parameters, effective_cache_size and so on. For troubleshooting purposes you can use set enable_method=off where method can be things like indexscan, nestloop, and so on. Use show all to see them.
> Best bet is to post the real problem, not a semi-representational made
I agree at some level but I generally believe other people won't read
a big mail like that. In this case it might come to a big post from me
one day soon. :-)
Thanks to all who helped me.
/ Karl Larsson
> up one. Unless the made up "test case" is truly representative and
> recreates the failure pretty much the same was as the original.
> recreates the failure pretty much the same was as the original.
I agree at some level but I generally believe other people won't read
a big mail like that. In this case it might come to a big post from me
one day soon. :-)
Thanks to all who helped me.
/ Karl Larsson
On Thu, Dec 17, 2009 at 6:17 PM, Karl Larsson <karl.larsson47@gmail.com> wrote: >> Best bet is to post the real problem, not a semi-representational made >> up one. Unless the made up "test case" is truly representative and >> recreates the failure pretty much the same was as the original. > > I agree at some level but I generally believe other people won't read > a big mail like that. In this case it might come to a big post from me > one day soon. :-) You're on the one mailing list where they will read big posts. It's best if you can attach the explain analyze output as an attachment tho, to keep it's format readable.
Greg Smith <greg@2ndquadrant.com> writes: > Karl Larsson wrote: >> When I make a subquery Postgres don't care about my indexes and makes >> a seq scan instead of a index scan. Why? > Data set is just too small for it to matter. Watch what happens if I > continue from what you posted with much bigger tables: > ... > There's the index scan on both tables that you were expecting. And if you go much past that, it's likely to switch *away* from indexscans again (eg, to a hash join, which has no use for ordered input). This is not wrong. Indexes have their place but they are not the solution for every query. regards, tom lane