Обсуждение: Query on postgresql 7.4.2 not using index
Hi all,
I have the following running on postgresql version 7.4.2:
CREATE SEQUENCE agenda_user_group_id_seq
MINVALUE 1
MAXVALUE 9223372036854775807
CYCLE
INCREMENT 1
START 1;
CREATE TABLE AGENDA_USERS_GROUPS
(
AGENDA_USER_GROUP_ID INT8
CONSTRAINT pk_agndusrgrp_usergroup PRIMARY KEY
DEFAULT NEXTVAL('agenda_user_group_id_seq'),
USER_ID NUMERIC(10)
CONSTRAINT fk_agenda_uid REFERENCES
AGENDA_USERS (USER_ID)
ON DELETE CASCADE
NOT NULL,
GROUP_ID NUMERIC(10)
CONSTRAINT fk_agenda_gid REFERENCES
AGENDA_GROUPS (GROUP_ID)
ON DELETE CASCADE
NOT NULL,
CREATION_DATE DATE
DEFAULT CURRENT_DATE,
CONSTRAINT un_agndusrgrp_usergroup
UNIQUE(USER_ID, GROUP_ID)
);
CREATE INDEX i_agnusrsgrs_userid ON AGENDA_USERS_GROUPS ( USER_ID );
CREATE INDEX i_agnusrsgrs_groupid ON AGENDA_USERS_GROUPS ( GROUP_ID );
When I execute:
EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups
WHERE group_id = 9;
it does a sequential scan and doesn't use the index and I don't
understand why, any idea? I have the same in postgresql 8.1 and it uses
the index :-|
Thanks
--
Arnau
On 4/25/06, Arnau <arnaulist@andromeiberica.com> wrote:
> Hi all,
>
> I have the following running on postgresql version 7.4.2:
>
> CREATE SEQUENCE agenda_user_group_id_seq
> MINVALUE 1
> MAXVALUE 9223372036854775807
> CYCLE
> INCREMENT 1
> START 1;
>
> CREATE TABLE AGENDA_USERS_GROUPS
> (
> AGENDA_USER_GROUP_ID INT8
> CONSTRAINT pk_agndusrgrp_usergroup PRIMARY KEY
> DEFAULT NEXTVAL('agenda_user_group_id_seq'),
> USER_ID NUMERIC(10)
> CONSTRAINT fk_agenda_uid REFERENCES
> AGENDA_USERS (USER_ID)
> ON DELETE CASCADE
> NOT NULL,
> GROUP_ID NUMERIC(10)
> CONSTRAINT fk_agenda_gid REFERENCES
> AGENDA_GROUPS (GROUP_ID)
> ON DELETE CASCADE
> NOT NULL,
> CREATION_DATE DATE
> DEFAULT CURRENT_DATE,
> CONSTRAINT un_agndusrgrp_usergroup
> UNIQUE(USER_ID, GROUP_ID)
> );
>
> CREATE INDEX i_agnusrsgrs_userid ON AGENDA_USERS_GROUPS ( USER_ID );
> CREATE INDEX i_agnusrsgrs_groupid ON AGENDA_USERS_GROUPS ( GROUP_ID );
>
>
> When I execute:
>
> EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups
> WHERE group_id = 9;
Try
EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups
WHERE group_id::int8 = 9;
or
EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups
WHERE group_id = '9';
and let us know what happens.
--
Postgresql & php tutorials
http://www.designmagick.com/
chris smith wrote:
> On 4/25/06, Arnau <arnaulist@andromeiberica.com> wrote:
>
>>Hi all,
>>
>> I have the following running on postgresql version 7.4.2:
>>
>>CREATE SEQUENCE agenda_user_group_id_seq
>>MINVALUE 1
>>MAXVALUE 9223372036854775807
>>CYCLE
>>INCREMENT 1
>>START 1;
>>
>>CREATE TABLE AGENDA_USERS_GROUPS
>>(
>> AGENDA_USER_GROUP_ID INT8
>> CONSTRAINT pk_agndusrgrp_usergroup PRIMARY KEY
>> DEFAULT NEXTVAL('agenda_user_group_id_seq'),
>> USER_ID NUMERIC(10)
>> CONSTRAINT fk_agenda_uid REFERENCES
>>AGENDA_USERS (USER_ID)
>> ON DELETE CASCADE
>> NOT NULL,
>> GROUP_ID NUMERIC(10)
>> CONSTRAINT fk_agenda_gid REFERENCES
>>AGENDA_GROUPS (GROUP_ID)
>> ON DELETE CASCADE
>> NOT NULL,
>> CREATION_DATE DATE
>> DEFAULT CURRENT_DATE,
>> CONSTRAINT un_agndusrgrp_usergroup
>>UNIQUE(USER_ID, GROUP_ID)
>>);
>>
>>CREATE INDEX i_agnusrsgrs_userid ON AGENDA_USERS_GROUPS ( USER_ID );
>>CREATE INDEX i_agnusrsgrs_groupid ON AGENDA_USERS_GROUPS ( GROUP_ID );
>>
>>
>>When I execute:
>>
>>EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups
>>WHERE group_id = 9;
>
>
> Try
>
> EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups
> WHERE group_id::int8 = 9;
>
> or
>
> EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups
> WHERE group_id = '9';
>
> and let us know what happens.
>
The same, the table has 2547556 entries:
espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM
agenda_users_groups
espsm_moviltelevision-# WHERE group_id::int8 = 9;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Seq Scan on agenda_users_groups (cost=0.00..59477.34 rows=12738
width=8) (actual time=3409.541..11818.794 rows=367026 loops=1)
Filter: ((group_id)::bigint = 9)
Total runtime: 13452.114 ms
(3 filas)
espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM
agenda_users_groups
espsm_moviltelevision-# WHERE group_id = '9';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Seq Scan on agenda_users_groups (cost=0.00..53108.45 rows=339675
width=8) (actual time=916.903..5763.830 rows=367026 loops=1)
Filter: (group_id = 9::numeric)
Total runtime: 7259.861 ms
(3 filas)
espsm_moviltelevision=# select count(*) from agenda_users_groups ;
count
---------
2547556
Thanks
--
Arnau
On 4/25/06, Arnau <arnaulist@andromeiberica.com> wrote: > espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM > agenda_users_groups > espsm_moviltelevision-# WHERE group_id = '9'; > QUERY PLAN > -------------------------------------------------------------------------------------------------------------------------------- > Seq Scan on agenda_users_groups (cost=0.00..53108.45 rows=339675 > width=8) (actual time=916.903..5763.830 rows=367026 loops=1) > Filter: (group_id = 9::numeric) > Total runtime: 7259.861 ms > (3 filas) Arnau, Why do you use a numeric instead of an integer/bigint?? IIRC, there were a few problems with index on numeric column on older version of PostgreSQL. You can't change the type of a column with 7.4, so create a new integer column then copy the values in this new column, drop the old one, rename the new one. Run vacuum analyze and recreate your index. It should work far better with an int. Note that you will have to update all the tables referencing this key... -- Guillaume
On Tue, 2006-04-25 at 08:49, Arnau wrote:
> chris smith wrote:
> > On 4/25/06, Arnau <arnaulist@andromeiberica.com> wrote:
> >
> >>Hi all,
> >>
> >> I have the following running on postgresql version 7.4.2:
> >>
> >>CREATE SEQUENCE agenda_user_group_id_seq
> >>MINVALUE 1
> >>MAXVALUE 9223372036854775807
> >>CYCLE
> >>INCREMENT 1
> >>START 1;
> >>
> >>CREATE TABLE AGENDA_USERS_GROUPS
> >>(
> >> AGENDA_USER_GROUP_ID INT8
> >> CONSTRAINT pk_agndusrgrp_usergroup PRIMARY KEY
> >> DEFAULT NEXTVAL('agenda_user_group_id_seq'),
> >> USER_ID NUMERIC(10)
> >> CONSTRAINT fk_agenda_uid REFERENCES
> >>AGENDA_USERS (USER_ID)
> >> ON DELETE CASCADE
> >> NOT NULL,
> >> GROUP_ID NUMERIC(10)
> >> CONSTRAINT fk_agenda_gid REFERENCES
> >>AGENDA_GROUPS (GROUP_ID)
> >> ON DELETE CASCADE
> >> NOT NULL,
> >> CREATION_DATE DATE
> >> DEFAULT CURRENT_DATE,
> >> CONSTRAINT un_agndusrgrp_usergroup
> >>UNIQUE(USER_ID, GROUP_ID)
> >>);
> >>
> >>CREATE INDEX i_agnusrsgrs_userid ON AGENDA_USERS_GROUPS ( USER_ID );
> >>CREATE INDEX i_agnusrsgrs_groupid ON AGENDA_USERS_GROUPS ( GROUP_ID );
SNIP
> The same, the table has 2547556 entries:
>
> espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM
> agenda_users_groups
> espsm_moviltelevision-# WHERE group_id::int8 = 9;
> QUERY PLAN
>
---------------------------------------------------------------------------------------------------------------------------------
> Seq Scan on agenda_users_groups (cost=0.00..59477.34 rows=12738
> width=8) (actual time=3409.541..11818.794 rows=367026 loops=1)
> Filter: ((group_id)::bigint = 9)
> Total runtime: 13452.114 ms
> (3 filas)
>
> espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM
> agenda_users_groups
> espsm_moviltelevision-# WHERE group_id = '9';
> QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------
> Seq Scan on agenda_users_groups (cost=0.00..53108.45 rows=339675
> width=8) (actual time=916.903..5763.830 rows=367026 loops=1)
> Filter: (group_id = 9::numeric)
> Total runtime: 7259.861 ms
> (3 filas)
>
> espsm_moviltelevision=# select count(*) from agenda_users_groups ;
> count
> ---------
> 2547556
OK, a few points.
1: 7.4.2 is WAY out of date for the 7.4 series. The 7.4 series, also
it a bit out of date, and many issues in terms of performance have been
enhanced in the 8.x series. You absolutely should update to the latest
7.4 series, as there are known data loss bugs and other issues in the
7.4.2 version.
2: An index scan isn't always faster. In this instance, it looks like
the number of rows that match in the last version of your query is well
over 10% of the rows. Assuming your average row takes up <10% or so of
a block, which is pretty common, then you're going to have to hit almost
every block anyway to get your data. So, an index scan is no win.
3: To test whether or not an index scan IS a win, you can use the
enable_xxx settings to prove it to yourself:
set enable_seqscan = off;
explain analyze <your query here>;
and compare. Note that the enable_seqscan = off thing is a sledge
hammer, not a nudge, and generally should NOT be used in production. If
an index scan is generally a win for you, but the database isn't using
it, you might need to tune the database for your machine. note that you
should NOT tune your database based on a single query. You'll need to
reach a compromise on your settings that makes all your queries run
reasonably fast without the planner making insane decisions. One of the
better postgresql tuning docs out there is the one at:
http://www.varlena.com/GeneralBits/Tidbits/perf.html .
Good luck.
Arnau <arnaulist@andromeiberica.com> writes:
> Seq Scan on agenda_users_groups (cost=0.00..53108.45 rows=339675
> width=8) (actual time=916.903..5763.830 rows=367026 loops=1)
> Filter: (group_id = 9::numeric)
> Total runtime: 7259.861 ms
> (3 filas)
> espsm_moviltelevision=# select count(*) from agenda_users_groups ;
> count
> ---------
> 2547556
So the SELECT is fetching nearly 15% of the rows in the table. The
planner is doing *the right thing* to use a seqscan, at least for
this particular group_id value.
regards, tom lane
Tom Lane wrote:
> Arnau <arnaulist@andromeiberica.com> writes:
>
>
>> Seq Scan on agenda_users_groups (cost=0.00..53108.45 rows=339675
>>width=8) (actual time=916.903..5763.830 rows=367026 loops=1)
>> Filter: (group_id = 9::numeric)
>> Total runtime: 7259.861 ms
>>(3 filas)
>
>
>>espsm_moviltelevision=# select count(*) from agenda_users_groups ;
>> count
>>---------
>> 2547556
>
>
> So the SELECT is fetching nearly 15% of the rows in the table. The
> planner is doing *the right thing* to use a seqscan, at least for
> this particular group_id value.
I have done the same tests on 8.1.0.
espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM
agenda_users_groups WHERE group_id = 9;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on agenda_users_groups (cost=2722.26..30341.78
rows=400361 width=8) (actual time=145.533..680.839 rows=367026 loops=1)
Recheck Cond: (group_id = 9::numeric)
-> Bitmap Index Scan on i_agnusrsgrs_groupid (cost=0.00..2722.26
rows=400361 width=0) (actual time=142.958..142.958 rows=367026 loops=1)
Index Cond: (group_id = 9::numeric)
Total runtime: 1004.966 ms
(5 rows)
espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM
agenda_users_groups WHERE group_id::int8 = 9;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Seq Scan on agenda_users_groups (cost=0.00..60947.43 rows=12777
width=8) (actual time=457.963..2244.928 rows=367026 loops=1)
Filter: ((group_id)::bigint = 9)
Total runtime: 2571.496 ms
(3 rows)
espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM
agenda_users_groups WHERE group_id::int8 = '9';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Seq Scan on agenda_users_groups (cost=0.00..60947.43 rows=12777
width=8) (actual time=407.193..2182.880 rows=367026 loops=1)
Filter: ((group_id)::bigint = 9::bigint)
Total runtime: 2506.998 ms
(3 rows)
espsm_moviltelevision=# select count(*) from agenda_users_groups ;
count
---------
2555437
(1 row)
Postgresql then uses the index, I don't understand why? in this
server I tried to tune the configuration, it's because of the tuning?
Because it's a newer version of postgresql?
Thanks for all the replies
--
Arnau
On Tue, 2006-04-25 at 10:47, Arnau wrote: > Tom Lane wrote: > > Arnau <arnaulist@andromeiberica.com> writes: > > > > > >>espsm_moviltelevision=# select count(*) from agenda_users_groups ; > >> count > >>--------- > >> 2547556 > > > > > > So the SELECT is fetching nearly 15% of the rows in the table. The > > planner is doing *the right thing* to use a seqscan, at least for > > this particular group_id value. > > > I have done the same tests on 8.1.0. > > > espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM > agenda_users_groups WHERE group_id = 9; > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------------------------- > Bitmap Heap Scan on agenda_users_groups (cost=2722.26..30341.78 > rows=400361 width=8) (actual time=145.533..680.839 rows=367026 loops=1) > Recheck Cond: (group_id = 9::numeric) > -> Bitmap Index Scan on i_agnusrsgrs_groupid (cost=0.00..2722.26 > rows=400361 width=0) (actual time=142.958..142.958 rows=367026 loops=1) > Index Cond: (group_id = 9::numeric) > Total runtime: 1004.966 ms > (5 rows) How big are these individual records? I'm guessing a fairly good size, since an index scan is winning. > espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM > agenda_users_groups WHERE group_id::int8 = 9; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------- > Seq Scan on agenda_users_groups (cost=0.00..60947.43 rows=12777 > width=8) (actual time=457.963..2244.928 rows=367026 loops=1) > Filter: ((group_id)::bigint = 9) > Total runtime: 2571.496 ms > (3 rows) OK. Stop and think about what you're telling postgresql to do here. You're telling it to cast the field group_id to int8, then compare it to 9. How can it cast the group_id to int8 without fetching it? That's right, you're ensuring a seq scan. You need to put the int8 cast on the other side of that equality comparison, like: where group_id = 9::int8
Arnau <arnaulist@andromeiberica.com> writes:
> I have done the same tests on 8.1.0.
Bitmap scans are a totally different animal that doesn't exist in 7.4.
A plain indexscan, such as 7.4 knows about, is generally not effective
for fetching more than a percent or two of the table. The crossover
point for a bitmap scan is much higher (don't know exactly, but probably
something like 30-50%).
regards, tom lane
>>I have done the same tests on 8.1.0.
>>
>>
>>espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM
>>agenda_users_groups WHERE group_id = 9;
>> QUERY PLAN
>>----------------------------------------------------------------------------------------------------------------------------------------------
>> Bitmap Heap Scan on agenda_users_groups (cost=2722.26..30341.78
>>rows=400361 width=8) (actual time=145.533..680.839 rows=367026 loops=1)
>> Recheck Cond: (group_id = 9::numeric)
>> -> Bitmap Index Scan on i_agnusrsgrs_groupid (cost=0.00..2722.26
>>rows=400361 width=0) (actual time=142.958..142.958 rows=367026 loops=1)
>> Index Cond: (group_id = 9::numeric)
>> Total runtime: 1004.966 ms
>>(5 rows)
>
>
> How big are these individual records? I'm guessing a fairly good size,
> since an index scan is winning.
How I could know the size on an individual record?
>
>
>>espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM
>>agenda_users_groups WHERE group_id::int8 = 9;
>> QUERY PLAN
>>-------------------------------------------------------------------------------------------------------------------------------
>> Seq Scan on agenda_users_groups (cost=0.00..60947.43 rows=12777
>>width=8) (actual time=457.963..2244.928 rows=367026 loops=1)
>> Filter: ((group_id)::bigint = 9)
>> Total runtime: 2571.496 ms
>>(3 rows)
>
>
> OK. Stop and think about what you're telling postgresql to do here.
>
> You're telling it to cast the field group_id to int8, then compare it to
> 9. How can it cast the group_id to int8 without fetching it? That's
> right, you're ensuring a seq scan. You need to put the int8 cast on the
> other side of that equality comparison, like:
>
> where group_id = 9::int8
I just did what Chris Smith asked me to do :), here I paste the
results I get when I change the cast.
espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM
agenda_users_groups WHERE group_id = 9::int8;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on agenda_users_groups (cost=2722.33..30343.06
rows=400379 width=8) (actual time=147.723..714.473 rows=367026 loops=1)
Recheck Cond: (group_id = 9::numeric)
-> Bitmap Index Scan on i_agnusrsgrs_groupid (cost=0.00..2722.33
rows=400379 width=0) (actual time=145.015..145.015 rows=367026 loops=1)
Index Cond: (group_id = 9::numeric)
Total runtime: 1038.537 ms
(5 rows)
espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM
agenda_users_groups WHERE group_id = '9'::int8;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on agenda_users_groups (cost=2722.33..30343.06
rows=400379 width=8) (actual time=153.858..1192.838 rows=367026 loops=1)
Recheck Cond: (group_id = 9::numeric)
-> Bitmap Index Scan on i_agnusrsgrs_groupid (cost=0.00..2722.33
rows=400379 width=0) (actual time=151.298..151.298 rows=367026 loops=1)
Index Cond: (group_id = 9::numeric)
Total runtime: 1527.039 ms
(5 rows)
Thanks
--
Arnau
> OK. Stop and think about what you're telling postgresql to do here. > > You're telling it to cast the field group_id to int8, then compare it to > 9. How can it cast the group_id to int8 without fetching it? That's > right, you're ensuring a seq scan. You need to put the int8 cast on the > other side of that equality comparison, like: Yeh that one was my fault :) I couldn't remember which way it went and if 7.4.x had issues with int8 indexes.. -- Postgresql & php tutorials http://www.designmagick.com/