Re: Table with active and historical data

От

Merlin Moncure

Тема

Дата

1 июня 2011 г. в 22:56:11

Msg-id

BANLkTik-RnmigAg43K6vB0XRt6WvbyeWOQ@mail.gmail.com

обсуждение

Ответ на

Table with active and historical data (Robert James)

Список

pgsql-general

Дерево обсуждения

Table with active and historical data Robert James <srobertjames@gmail.com> 1 июня 2011 г. в 20:30:22

Re: Table with active and historical data salah jubeh <s_jubeh@yahoo.com> 2 июня 2011 г. в 07:50:14

Re: Table with active and historical data Merlin Moncure <mmoncure@gmail.com> 1 июня 2011 г. в 22:56:11

On Wed, Jun 1, 2011 at 7:30 PM, Robert James  wrote:
> I have a table with a little active data and a lot of historical data.
>  I'd like to be able to access the active data very quickly - quicker
> than an index.  Here are the details:
>
> 1. Table has about 1 million records
> 2. Has a column active_date - on a given date, only about 1% are
> active.  active_date is indexed and clustered on.
> 3. Many of my queries are WHERE active_date = today.  Postgres uses
> the index for these, but still lakes quite a lot of time.  I repeat
> these queries regularly.

can we see a query and its 'explain analyze' that you think takes a lot of time?

> 4. I'd like to somehow presort or partition the data so that Postgres
> doesn't have to do an index scan each time.  I'm not sure how to do
> this? Idea?  I know it can be done with inheritance and triggers (
> http://stackoverflow.com/questions/994882/what-is-a-good-way-to-horizontal-shard-in-postgresql
> ), but that method looks a little too complex for me.  I'm looking for
> something simple.

an index scan should be good enough, but if it isn't you can look at
partitioning. let's make sure that's really necessary before doing it
however.

> 5. Finally, I should point out that I still do a large number of
> queries on historical data as well.
>
> What do you recommend? Ideas? Also: Why doesn't cluster on active_date
> solve the problem? Specifically, if I run SELECT * FROM full_table
> WHERE active_date = today, I get a cost of 3500.  If I first select
> those records into a new table, and then do SELECT * on the new table,
> I get a cost of 64.  Why is that? Why doesn't clustering pregroup
> them?

clustering is a tool that allows you to control which tuples are
grouped together on pages -- if you are pulling up more than one tuple
a time hopefully you can reduce the total number of pages you have to
scan by doing it.  The bigger the table is, the more that matters.

merlin

В списке pgsql-general по дате отправления

Предыдущее

От: Craig Ringer

Дата: 1 июня 2011 г. в 21:50:51

Сообщение: Re: Mixed up protocol packets in server response?

Следующее

От: Craig Ringer

Дата: 1 июня 2011 г. в 23:32:28

Сообщение: Re: SELECT to_timestamp crash PostgreSQL 9.1beta1