Обсуждение: Clustered/covering indexes (or lack thereof :-)
This is probably a FAQ, but I can't find a good answer... So - are there common techniques to compensate for the lack of clustered/covering indexes in PostgreSQL? To be more specific - here is my table (simplified): topic_id int post_id int post_text varchar(1024) The most used query is: SELECT post_id, post_text FROM Posts WHERE topic_id=XXX. Normally I would have created a clustered index on topic_id, and the whole query would take ~1 disk seek. What would be the common way to handle this in PostgreSQL, provided that I can't afford 1 disk seek per record returned? -- View this message in context: http://www.nabble.com/Clustered-covering-indexes-%28or-lack-thereof-%3A-%29-tf4789321.html#a13700848 Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
adrobj wrote: > This is probably a FAQ, but I can't find a good answer... > > So - are there common techniques to compensate for the lack of > clustered/covering indexes in PostgreSQL? To be more specific - here is my > table (simplified): > > topic_id int > post_id int > post_text varchar(1024) > > The most used query is: SELECT post_id, post_text FROM Posts WHERE > topic_id=XXX. Normally I would have created a clustered index on topic_id, > and the whole query would take ~1 disk seek. > > What would be the common way to handle this in PostgreSQL, provided that I > can't afford 1 disk seek per record returned? You can cluster the table, see http://www.postgresql.org/docs/8.2/interactive/sql-cluster.html. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Sun, 2007-11-11 at 22:59 -0800, adrobj wrote: > This is probably a FAQ, but I can't find a good answer... > > So - are there common techniques to compensate for the lack of > clustered/covering indexes in PostgreSQL? To be more specific - here is my > table (simplified): > > topic_id int > post_id int > post_text varchar(1024) > > The most used query is: SELECT post_id, post_text FROM Posts WHERE > topic_id=XXX. Normally I would have created a clustered index on topic_id, > and the whole query would take ~1 disk seek. > > What would be the common way to handle this in PostgreSQL, provided that I > can't afford 1 disk seek per record returned? > Periodically CLUSTER the table on the topic_id index. The table will not be perfectly clustered at all times, but it will be close enough that it won't make much difference. There's still the hit of performing a CLUSTER, however. Another option, if you have a relatively small number of topic_ids, is to break it into separate tables, one for each topic_id. Regards, Jeff Davis
In response to Jeff Davis <pgsql@j-davis.com>: > On Sun, 2007-11-11 at 22:59 -0800, adrobj wrote: > > This is probably a FAQ, but I can't find a good answer... > > > > So - are there common techniques to compensate for the lack of > > clustered/covering indexes in PostgreSQL? To be more specific - here is my > > table (simplified): > > > > topic_id int > > post_id int > > post_text varchar(1024) > > > > The most used query is: SELECT post_id, post_text FROM Posts WHERE > > topic_id=XXX. Normally I would have created a clustered index on topic_id, > > and the whole query would take ~1 disk seek. > > > > What would be the common way to handle this in PostgreSQL, provided that I > > can't afford 1 disk seek per record returned? > > > > Periodically CLUSTER the table on the topic_id index. The table will not > be perfectly clustered at all times, but it will be close enough that it > won't make much difference. > > There's still the hit of performing a CLUSTER, however. > > Another option, if you have a relatively small number of topic_ids, is > to break it into separate tables, one for each topic_id. Or materialize the data, if performance is the utmost requirement. Create second table: materialized_topics ( topic_id int, post_ids int[], post_texts text[] ) Now add a trigger to your original table that updates materialized_topics any time the first table is altered. Thus you always have fast lookups. Of course, this may be non-optimal if that table sees a lot of updates. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023