Обсуждение: Clustered/covering indexes (or lack thereof :-)

Поиск
Список
Период
Сортировка

Clustered/covering indexes (or lack thereof :-)

От
adrobj
Дата:
This is probably a FAQ, but I can't find a good answer...

So - are there common techniques to compensate for the lack of
clustered/covering indexes in PostgreSQL? To be more specific - here is my
table (simplified):

topic_id int
post_id int
post_text varchar(1024)

The most used query is: SELECT post_id, post_text FROM Posts WHERE
topic_id=XXX. Normally I would have created a clustered index on topic_id,
and the whole query would take ~1 disk seek.

What would be the common way to handle this in PostgreSQL, provided that I
can't afford 1 disk seek per record returned?

--
View this message in context:
http://www.nabble.com/Clustered-covering-indexes-%28or-lack-thereof-%3A-%29-tf4789321.html#a13700848
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


Re: Clustered/covering indexes (or lack thereof :-)

От
Heikki Linnakangas
Дата:
adrobj wrote:
> This is probably a FAQ, but I can't find a good answer...
>
> So - are there common techniques to compensate for the lack of
> clustered/covering indexes in PostgreSQL? To be more specific - here is my
> table (simplified):
>
> topic_id int
> post_id int
> post_text varchar(1024)
>
> The most used query is: SELECT post_id, post_text FROM Posts WHERE
> topic_id=XXX. Normally I would have created a clustered index on topic_id,
> and the whole query would take ~1 disk seek.
>
> What would be the common way to handle this in PostgreSQL, provided that I
> can't afford 1 disk seek per record returned?

You can cluster the table, see
http://www.postgresql.org/docs/8.2/interactive/sql-cluster.html.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Clustered/covering indexes (or lack thereof :-)

От
Jeff Davis
Дата:
On Sun, 2007-11-11 at 22:59 -0800, adrobj wrote:
> This is probably a FAQ, but I can't find a good answer...
>
> So - are there common techniques to compensate for the lack of
> clustered/covering indexes in PostgreSQL? To be more specific - here is my
> table (simplified):
>
> topic_id int
> post_id int
> post_text varchar(1024)
>
> The most used query is: SELECT post_id, post_text FROM Posts WHERE
> topic_id=XXX. Normally I would have created a clustered index on topic_id,
> and the whole query would take ~1 disk seek.
>
> What would be the common way to handle this in PostgreSQL, provided that I
> can't afford 1 disk seek per record returned?
>

Periodically CLUSTER the table on the topic_id index. The table will not
be perfectly clustered at all times, but it will be close enough that it
won't make much difference.

There's still the hit of performing a CLUSTER, however.

Another option, if you have a relatively small number of topic_ids, is
to break it into separate tables, one for each topic_id.

Regards,
    Jeff Davis


Re: Clustered/covering indexes (or lack thereof :-)

От
Bill Moran
Дата:
In response to Jeff Davis <pgsql@j-davis.com>:

> On Sun, 2007-11-11 at 22:59 -0800, adrobj wrote:
> > This is probably a FAQ, but I can't find a good answer...
> >
> > So - are there common techniques to compensate for the lack of
> > clustered/covering indexes in PostgreSQL? To be more specific - here is my
> > table (simplified):
> >
> > topic_id int
> > post_id int
> > post_text varchar(1024)
> >
> > The most used query is: SELECT post_id, post_text FROM Posts WHERE
> > topic_id=XXX. Normally I would have created a clustered index on topic_id,
> > and the whole query would take ~1 disk seek.
> >
> > What would be the common way to handle this in PostgreSQL, provided that I
> > can't afford 1 disk seek per record returned?
> >
>
> Periodically CLUSTER the table on the topic_id index. The table will not
> be perfectly clustered at all times, but it will be close enough that it
> won't make much difference.
>
> There's still the hit of performing a CLUSTER, however.
>
> Another option, if you have a relatively small number of topic_ids, is
> to break it into separate tables, one for each topic_id.

Or materialize the data, if performance is the utmost requirement.

Create second table:
materialized_topics (
 topic_id int,
 post_ids int[],
 post_texts text[]
)

Now add a trigger to your original table that updates materialized_topics
any time the first table is altered.  Thus you always have fast lookups.

Of course, this may be non-optimal if that table sees a lot of updates.

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023