Обсуждение: CLUSTER versus a dedicated table

Поиск
Список
Период
Сортировка

CLUSTER versus a dedicated table

От
Robert James
Дата:
Hi.  I'm interested in understanding the differences between
CLUSTERing a table and making a dedicated one.

We have a table with about 1 million records.  On a given day, only
about 1% of them are of interest.  That 1% changes every day (it's
WHERE active_date = today), and so we index and cluster on it.

Even so, the planner shows a very large cost for the Index Scan: about
3500.  If I instead do a SELECT INTO temp_table FROM big_table WHERE
active_date = today, and then do SELECT * FROM temp_table, I get a
planned cost of 65.  Yet, the actual time for both queries is almost
identical.

Questions:
1. Why is there such a discrepancy between the planner's estimate and
the actual cost?

2. In a case like this, will I in general see a performance gain by
doing a daily SELECT INTO and then querying from that table? My ad hoc
test doesn't indicate I would (despite the planner's prediction), and
I'd rather avoid this if it won't help.

3. In general, does CLUSTER provide all the performance benefits of a
dedicated table? If it doesn't, what does it lack?

Thank you.

Re: CLUSTER versus a dedicated table

От
"mark"
Дата:

> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-
> owner@postgresql.org] On Behalf Of Robert James
> Sent: Wednesday, June 01, 2011 5:55 PM
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] CLUSTER versus a dedicated table
>
> Hi.  I'm interested in understanding the differences between
> CLUSTERing a table and making a dedicated one.
>
> We have a table with about 1 million records.  On a given day, only
> about 1% of them are of interest.  That 1% changes every day (it's
> WHERE active_date = today), and so we index and cluster on it.
>
> Even so, the planner shows a very large cost for the Index Scan: about
> 3500.  If I instead do a SELECT INTO temp_table FROM big_table WHERE
> active_date = today, and then do SELECT * FROM temp_table, I get a
> planned cost of 65.  Yet, the actual time for both queries is almost
> identical.
>
> Questions:
> 1. Why is there such a discrepancy between the planner's estimate and
> the actual cost?
>
> 2. In a case like this, will I in general see a performance gain by
> doing a daily SELECT INTO and then querying from that table? My ad hoc
> test doesn't indicate I would (despite the planner's prediction), and
> I'd rather avoid this if it won't help.
>
> 3. In general, does CLUSTER provide all the performance benefits of a
> dedicated table? If it doesn't, what does it lack?
>
> Thank you.

Start here:
http://wiki.postgresql.org/wiki/SlowQueryQuestions

1: there could be many reasons for the planner to come up a grossly
inaccurate ESTIMATE for some values. Last time the table was analyzed, is
usually where people start.

2: look at table partitioning it's pretty straight forward and sounds like
it might be a good fit for you. It will however involve some triggers or
rules and check constraints. Table partitioning has some downsides though,
you should be aware of what they are before you commit to it.

3: clustering, from a high level, just reorders the data on disk by a given
index. Depending on your setup keeping it close to that clustered ordering
might be trivial or it might not be. Big tables are relative to different
people, 1M rows might be a big table or it might not be, since you didn't
post the size of the table and indexes we can only guess.  Table
partitioning helps most with table maintenance, IMO,  but can be very useful
it the constraint exclusion can eliminate a large number of child tables
right off so it doesn't have to traverse large indexes or do lots of random
IO.


You will need to post at lot more specific info if you want more specific
help.  The guide to reporting slow queries or guide to reporting problems
and start gathering specific information and then post back to the list.



-Mark

>
> --
> Sent via pgsql-performance mailing list (pgsql-
> performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


Re: CLUSTER versus a dedicated table

От
Merlin Moncure
Дата:
On Wed, Jun 1, 2011 at 7:54 PM, Robert James <srobertjames@gmail.com> wrote:
> Hi.  I'm interested in understanding the differences between
> CLUSTERing a table and making a dedicated one.
>
> We have a table with about 1 million records.  On a given day, only
> about 1% of them are of interest.  That 1% changes every day (it's
> WHERE active_date = today), and so we index and cluster on it.
>
> Even so, the planner shows a very large cost for the Index Scan: about
> 3500.  If I instead do a SELECT INTO temp_table FROM big_table WHERE
> active_date = today, and then do SELECT * FROM temp_table, I get a
> planned cost of 65.  Yet, the actual time for both queries is almost
> identical.
>
> Questions:
> 1. Why is there such a discrepancy between the planner's estimate and
> the actual cost?
>
> 2. In a case like this, will I in general see a performance gain by
> doing a daily SELECT INTO and then querying from that table? My ad hoc
> test doesn't indicate I would (despite the planner's prediction), and
> I'd rather avoid this if it won't help.
>
> 3. In general, does CLUSTER provide all the performance benefits of a
> dedicated table? If it doesn't, what does it lack?

no. i suspect you may be over thinking the problem -- what led you to
want to cluster in the first place?

merlin