Обсуждение: Another seq scan instead of index question
Hi everyone,
I've a table:
Table "cdsongs"
Attribute | Type | Modifier
----------------+-------------------+-------------------------------------
songid | integer | default
nextval('seq_songid'::text)
cdid | integer |
track | integer |
song | character varying |
extt | text |
fk_products_id | integer |
Indices: cdsongs_cdid,
cdsongs_songid,
idx_cdsongs_song
wich an index :
CREATE INDEX idx_cdsongs_song ON cdsongs (lower(song));
But postgresql refuses to use it. The vacuum output is :
depos=# VACUUM VERBOSE ANALYZE cdsongs;
NOTICE: --Relation cdsongs--
NOTICE: Pages 41232: Changed 0, reaped 0, Empty 0, New 0; Tup
4210874: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 52, MaxLen
2025; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU
0.64s/0.72u sec.
NOTICE: Index cdsongs_cdid: Pages 9223; Tuples 4210874. CPU
0.47s/3.80u sec.
NOTICE: Index cdsongs_songid: Pages 9223; Tuples 4210874. CPU
0.46s/3.68u sec.
NOTICE: Index idx_cdsongs_song: Pages 21888; Tuples 4210874. CPU
1.24s/3.93u sec.
NOTICE: --Relation pg_toast_61094022--
NOTICE: Pages 5: Changed 0, reaped 0, Empty 0, New 0; Tup 28: Vac 0,
Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 63, MaxLen 2034; Re-using:
Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec.
NOTICE: Index pg_toast_61094022_idx: Pages 2; Tuples 28. CPU
0.00s/0.00u sec.
NOTICE: Analyzing...
There are 4210874 rows, which is a lot compared to the expected rows
returned, so why does it still use seq scan ? (I'd expect seq scan if
it was returning loads of rows!).
depos=# explain select * from cdsongs where lower(song) like
'mushroom festival in hell';
NOTICE: QUERY PLAN:
Seq Scan on cdsongs (cost=0.00..104395.11 rows=42109 width=40)
EXPLAIN
depos=# set enable_seqscan = off;
SET VARIABLE
depos=# explain select * from cdsongs where lower(song) like
'mushroom festival in hell';
NOTICE: QUERY PLAN:
Index Scan using idx_cdsongs_song on cdsongs (cost=0.00..115549.17
rows=42109 width=40)
EXPLAIN
And indead, if I force seqscan off and perform the query, it's dead
fast. Otherwise it uses seqscan and takes a long long time.
Cheers for any help!
Nick
--
Part 3 MEng Cybernetics; Reading, UK http://www.nickpiper.co.uk/
Change PGP actions of mailer or fetch key see website 1024D/3ED8B27F
Choose life. Be Vegan :-) Please reduce needless cruelty + suffering !
Nicholas Piper <nick@nickpiper.co.uk> writes:
> There are 4210874 rows, which is a lot compared to the expected rows
> returned, so why does it still use seq scan ?
Well, no, it isn't "a lot". The row estimate is just about 1% of the
total rows, which suggests strongly that you're getting a default
selectivity estimate rather than anything real. Note also that you have
about 100 rows per disk page (4210874/41232). So it's estimating that
it will need to fetch about one row out of every page, on which basis
the indexscan looks pretty unattractive --- it can't save any I/O.
Your real problem is the bogus selectivity estimate. What version
are you running? If 7.0, see contrib/likeplanning/. If 7.1, I'd
be interested to see what you get from
select attname,attdispersion,s.*
from pg_statistic s, pg_attribute a, pg_class c
where starelid = c.oid and attrelid = c.oid and staattnum = attnum
and relname = 'cdsongs';
regards, tom lane
On Tue, 07 Aug 2001, Tom Lane wrote:
> Your real problem is the bogus selectivity estimate. What version
> are you running? If 7.0, see contrib/likeplanning/. If 7.1, I'd
> be interested to see what you get from
I'm on 7.1
(PostgreSQL 7.1.2 on i686-pc-linux-gnu, compiled by GCC 2.95.4)
> select attname,attdispersion,s.*
> from pg_statistic s, pg_attribute a, pg_class c
> where starelid = c.oid and attrelid = c.oid and staattnum = attnum
> and relname = 'cdsongs';
That doesn't look good to me as it contains dodgy bytes...
attname | attdispersion | starelid | staattnum | staop | stanullfrac | stacommonfrac | stacommonval | staloval |
stahival
---------+---------------+----------+-----------+-------+-------------+---------------+---------------+----------+--------------------------------------------------------------
songid | -1 | 61094022 | 1 | 97 | 0 | 2.3748e-07 | 1 | 1 |
4210874
cdid | 4.65505e-06 | 61094022 | 2 | 97 | 0 | 2.32731e-05 | 677 | 1 |
364477
track | 0.0231229 | 61094022 | 3 | 97 | 0 | 0.086018 | 0 | 0 | 97
song | 0.000647958 | 61094022 | 4 | 1066 | 0 | 0.00319886 | | |
þöföramca
extt | 0.890813 | 61094022 | 5 | 664 | 0.937611 | 2.9685e-05 | The Residents | | ýòî
ïðîéäåò- ëåòè íà âîñõîä,
À ÿ, òàê è áûòü, íà çàêàò.
The data came from freedb, translated to postgresql by a small C
program which I didn't write.
Nick
--
Part 3 MEng Cybernetics; Reading, UK http://www.nickpiper.co.uk/
Change PGP actions of mailer or fetch key see website 1024D/3ED8B27F
Choose life. Be Vegan :-) Please reduce needless cruelty + suffering !
Nicholas Piper <nick@nickpiper.co.uk> writes:
> On Tue, 07 Aug 2001, Tom Lane wrote:
>> Your real problem is the bogus selectivity estimate. What version
>> are you running? If 7.0, see contrib/likeplanning/. If 7.1, I'd
>> be interested to see what you get from
> I'm on 7.1
> (PostgreSQL 7.1.2 on i686-pc-linux-gnu, compiled by GCC 2.95.4)
Okay ... [ looks at code ... ] oh, I see the problem. The pattern-match
selectivity code doesn't try to deal with "lower(foo) LIKE ...":
/*
* If expression is not var op constant for
* a simple var of a real relation (no subqueries, for now),
* then punt and return a default estimate.
*/
This could probably be improved, but I don't have time to think about it
now. In the meantime you could try knocking DEFAULT_MATCH_SEL down a
little bit. (It's already been reduced to 0.005 in current sources,
in fact.) See src/backend/utils/adt/selfuncs.c.
regards, tom lane
On Tue, 07 Aug 2001, Nicholas Piper wrote: Replying to myself with some more information. > On Tue, 07 Aug 2001, Tom Lane wrote: > > select attname,attdispersion,s.* > > from pg_statistic s, pg_attribute a, pg_class c > > where starelid = c.oid and attrelid = c.oid and staattnum = attnum > > and relname = 'cdsongs'; > That doesn't look good to me as it contains dodgy bytes... I've removed (hopefully) the nasty entries in my tables by deleting where string is > 'zzzzzz'. The new table which Tom said would be useful is ; attname | attdispersion | starelid | staattnum | staop | stanullfrac | stacommonfrac | stacommonval | staloval | stahival ---------+---------------+----------+-----------+-------+-------------+---------------+---------------+----------+--------------------------- songid | -1 | 61094022 | 1 | 97 | 0 | 2.38796e-07 | 1 | 1 | 4210874 cdid | 4.68083e-06 | 61094022 | 2 | 97 | 0 | 2.3402e-05 | 677 | 1 | 364477 track | 0.0231398 | 61094022 | 3 | 97 | 0 | 0.0860681 | 0 | 0 | 97 song | 0.000651592 | 61094022 | 4 | 1066 | 0 | 0.00321658 | | | zzo, Moderatocon Allegro extt | 0.890565 | 61094022 | 5 | 664 | 0.937465 | 2.98494e-05 | The Residents | | zur Ruhgebracht Unfortunately I'm still getting the same results: depos=# explain select * from cdsongs where lower(song) like 'mushroom festival in hell'; NOTICE: QUERY PLAN: Seq Scan on cdsongs (cost=0.00..103835.24 rows=41877 width=40) EXPLAIN depos=# set enable_seqscan = off; SET VARIABLE depos=# explain select * from cdsongs where lower(song) like 'mushroom festival in hell'; NOTICE: QUERY PLAN: Index Scan using idx_cdsongs_song on cdsongs (cost=0.00..114921.91 rows=41877 width=40) EXPLAIN -- Part 3 MEng Cybernetics; Reading, UK http://www.nickpiper.co.uk/ Change PGP actions of mailer or fetch key see website 1024D/3ED8B27F Choose life. Be Vegan :-) Please reduce needless cruelty + suffering !