Обсуждение: Speed of joins using sparse indexes
This mail tries to explain the solution that I've found to address the problem of the joins that uses tables with very sparse indexes. The exact problem was : How can I manage the problem of select a few rows with a boolean atribute when they are 5 rows with flag='Y' in a table of 100000 rows? I't must to be an index, but the optimizer asumes that a Seq Scan is more cheap... yes, yes... I know : if I ask for the 100000 rows with flag='N' then Seq Scan is the solution, but the interesting query is the other : to extract the 5 rows with flag='Y' from whitin the 100000 rows with the flag='N'. A possible solution to optimize this kind of query is to create an auxiliar table with the id's of the 5 rows with flag='Y', maintained by rules watching the attribute flag in the target table. In this manner, I never do a update/insert in the flag table and I replace the "flag='Y'" in the query in favour of "TABLE.id=FLAG_TABLE.id" (another join). It's a kind of tell to Postgres "Hey, I'm very interested in the rows with flag='Y'" ... :) and the results in speed-up are amazing. I think is better than "fake" a dense index to change the behaviour of the optimizer. Any suggestions? Roberto. ... sorry for my English ;) -- Roberto Moreda Resp. Dpto. Informática Handem/San Luis Tlf +34 981 779000 Fax +34 981 779022 Pol. Piadela Sur, Autovía A6 Sal.567 15300 Betanzos (A Coruña) - España
[Charset iso-8859-1 unsupported, filtering to ASCII...] > This mail tries to explain the solution that I've found to address the > problem of the joins that uses tables with very sparse indexes. > > The exact problem was : > > How can I manage the problem of select a few rows with a boolean atribute > when they are 5 rows with flag='Y' in a table of 100000 rows? > I't must to be an index, but > the optimizer asumes that a Seq Scan is more cheap... yes, yes... I know : > if I ask for the 100000 rows with flag='N' then Seq Scan is the solution, > but the interesting query is the other : to extract the 5 rows with > flag='Y' from whitin the 100000 rows with the flag='N'. > > A possible solution to optimize this kind of query is to create an auxiliar > table with the id's of the 5 rows with flag='Y', maintained by rules watching > the attribute flag in the target table. In this manner, I never do a > update/insert in the flag table and I replace the "flag='Y'" in the query in > favour of "TABLE.id=FLAG_TABLE.id" (another join). > > It's a kind of tell to Postgres "Hey, I'm very interested in the rows with > flag='Y'" ... :) and the results in speed-up are amazing. This sounds like a very good solution. Rules can help. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
have you tried a composite index with another, more distributed attribute (like a timestamp)? - with boolean as first component. create index ix on tablename(boolean-attribute,timestamp-attribute); ------------------------------------------------------------ Rex McMaster rmcm@compsoft.com.au rex@mcmaster.wattle.id.au PGP Public key: http://www.compsoft.com.au/~rmcm/pgp-pk Roberto Moreda writes: > This mail tries to explain the solution that I've found to address the > problem of the joins that uses tables with very sparse indexes. > > The exact problem was : > > How can I manage the problem of select a few rows with a boolean atribute > when they are 5 rows with flag='Y' in a table of 100000 rows? > I't must to be an index, but > the optimizer asumes that a Seq Scan is more cheap... yes, yes... I know : > if I ask for the 100000 rows with flag='N' then Seq Scan is the solution, > but the interesting query is the other : to extract the 5 rows with > flag='Y' from whitin the 100000 rows with the flag='N'. > > A possible solution to optimize this kind of query is to create an auxiliar > table with the id's of the 5 rows with flag='Y', maintained by rules watching > the attribute flag in the target table. In this manner, I never do a > update/insert in the flag table and I replace the "flag='Y'" in the query in > favour of "TABLE.id=FLAG_TABLE.id" (another join). > > It's a kind of tell to Postgres "Hey, I'm very interested in the rows with > flag='Y'" ... :) and the results in speed-up are amazing. > > I think is better than "fake" a dense index to change the behaviour of the > optimizer. > > Any suggestions? > > Roberto. > > ... sorry for my English ;) > > -- > Roberto Moreda > Resp. Dpto. Informática Handem/San Luis > Tlf +34 981 779000 > Fax +34 981 779022 > Pol. Piadela Sur, Autovía A6 Sal.567 > 15300 Betanzos (A Coruña) - España --