Re: performace problem after VACUUM ANALYZE
От | Tom Lane |
---|---|
Тема | Re: performace problem after VACUUM ANALYZE |
Дата | |
Msg-id | 12084.1045264757@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | performace problem after VACUUM ANALYZE (Scott Cain <cain@cshl.org>) |
Список | pgsql-performance |
Scott Cain <cain@cshl.org> writes: > Here is the query that is causing the problems: > select distinct f.name,fl.nbeg,fl.nend,fl.strand,f.type_id,f.feature_id > from feature f, featureloc fl > where > fl.srcfeature_id = 1 and > ((fl.strand=1 and fl.nbeg <= 393164 and fl.nend >= 390956) OR > (fl.strand=-1 and fl.nend <= 393164 and fl.nbeg >= 390956)) and > f.feature_id = fl.feature_id > [ and the index he'd like it to use is ] > Index "featureloc_src_strand_beg_end" > Column | Type > ---------------+---------- > srcfeature_id | integer > strand | smallint > nbeg | integer > nend | integer > btree After fooling with this I see a couple of problems. One is the same old cross-datatype-comparison issue that so frequently bites people: "1" and "-1" are integer constants, and comparing them to a smallint column isn't an indexable operation. You need casts. (Or, forget the "optimization" of storing strand as a smallint. Depending on your row layout, it's quite likely saving you no space anyway.) Problem two is that the optimizer isn't smart enough to recognize that a query condition laid out in this form should be processed as two indexscans --- it would possibly have gotten it right if the first index column had been inside the OR, but not this way. The upshot is that when you force it to use index featureloc_src_strand_beg_end, it's actually only using the srcfeature_id column of the index --- which is slow of course, and also explains why the optimizer doesn't find that option very attractive. I had some success in getting current sources to generate a desirable plan by doing this: regression=# explain select distinct * regression-# from feature f join featureloc fl on (f.feature_id = fl.feature_id) where regression-# ((fl.srcfeature_id = 1 and fl.strand=1::int2 and fl.nbeg <= 393164 and fl.nend >= 390956) OR regression(# (fl.srcfeature_id = 1 and fl.strand=-1::int2 and fl.nend <= 393164 and fl.nbeg >= 390956)); Unique (cost=34.79..34.85 rows=5 width=50) -> Sort (cost=34.79..34.80 rows=5 width=50) Sort Key: f.name, fl.nbeg, fl.nend, fl.strand -> Hash Join (cost=9.68..34.73 rows=5 width=50) Hash Cond: ("outer".feature_id = "inner".feature_id) -> Seq Scan on feature f (cost=0.00..20.00 rows=1000 width=36) -> Hash (cost=9.68..9.68 rows=1 width=14) -> Index Scan using featureloc_src_strand_beg_end, featureloc_src_strand_beg_end on featureloc fl (cost=0.00..9.68 rows=1 width=14) Index Cond: (((srcfeature_id = 1) AND (strand = 1::smallint) AND (nbeg <= 393164) AND (nend >=390956)) OR ((srcfeature_id = 1) AND (strand = -1::smallint) AND (nbeg >= 390956) AND (nend <= 393164))) Filter: (((srcfeature_id = 1) AND (strand = 1::smallint) AND (nbeg <= 393164) AND (nend >= 390956))OR ((srcfeature_id = 1) AND (strand = -1::smallint) AND (nend <= 393164) AND (nbeg >= 390956))) (10 rows) Shoving the join condition into an explicit JOIN clause is a hack, but it nicely does the job of keeping the WHERE clause as a pristine OR-of-ANDs structure, so that the optimizer can hardly fail to notice that that's the preferred canonical form. I would strongly recommend an upgrade to PG 7.3, both on general principles and because you can actually see what the indexscan condition is in EXPLAIN's output. Before 7.3 you had to grovel through EXPLAIN VERBOSE to be sure what was really happening with a multicolumn index. regards, tom lane
В списке pgsql-performance по дате отправления:
Предыдущее
От: ShengQiang ShuДата:
Сообщение: Re: [Gmod-schema] Re: performace problem after VACUUM ANALYZE
Следующее
От: SLetovsky@aol.comДата:
Сообщение: Re: [Gmod-schema] Re: performace problem after VACUUM ANALYZE