Обсуждение: Partitioned tables in queries

Поиск
Список
Период
Сортировка

Partitioned tables in queries

От
Kevin Keith
Дата:
I have a case where I am partitioning tables based on a date range in
version 8.1.4. For example:

table_with_millions_of_records
interaction_id  char(16) primary key
start_date   timestamp (without timezone) - indexed
.. other columns

child_1   start_date >= 2006-07-21 00:00:00
child_2   start_date >= 2006-07-20 00:00:00 and start_date < 2006-07-21
00:00:00
...
child_5   start_date >= 2006-07-17 00:00:00 and start_date < 2006-07-18
00:00:00

with rules on the parent and child tables that redirect the data to the
appropriate child table based on the start_date.

Because this table is going to grow very large (very quickly), and will
need to be purged daily, I created partitions, or child tables to hold
data for each day. I have done the same thing in Oracle in the past, and
the PostgreSQL solution works great. The archival process is very simple
- drop the expired child table. I am having one problem.

If I run a query on the full table (there are 5 child tables with data
for the last 5 days), and my where clause contains data for the current
day only:
where start_date > date_trunc('day', now())
all 5 child tables are scanned when I look at the output from explain
analyze.

My question is - can I force the planner to only scan the relevant child
table - when the key related to the partitioned data it part of the
where clause?

Thanks,

Kevin


...

Re: Partitioned tables in queries

От
Steve Atkins
Дата:
On Jul 21, 2006, at 12:17 PM, Kevin Keith wrote:

> I have a case where I am partitioning tables based on a date range
> in version 8.1.4. For example:
>
> table_with_millions_of_records
> interaction_id  char(16) primary key
> start_date   timestamp (without timezone) - indexed
> .. other columns
>
> child_1   start_date >= 2006-07-21 00:00:00
> child_2   start_date >= 2006-07-20 00:00:00 and start_date <
> 2006-07-21 00:00:00
> ...
> child_5   start_date >= 2006-07-17 00:00:00 and start_date <
> 2006-07-18 00:00:00
>
> with rules on the parent and child tables that redirect the data to
> the appropriate child table based on the start_date.
>
> Because this table is going to grow very large (very quickly), and
> will need to be purged daily, I created partitions, or child tables
> to hold data for each day. I have done the same thing in Oracle in
> the past, and the PostgreSQL solution works great. The archival
> process is very simple - drop the expired child table. I am having
> one problem.
>
> If I run a query on the full table (there are 5 child tables with
> data for the last 5 days), and my where clause contains data for
> the current day only:
> where start_date > date_trunc('day', now())
> all 5 child tables are scanned when I look at the output from
> explain analyze.
>
> My question is - can I force the planner to only scan the relevant
> child table - when the key related to the partitioned data it part
> of the where clause?

Yes. You'll need non-overlapping check constraints in each child
table and to set constraint_exclusion to "on" in postgresql.conf.

See http://www.postgresql.org/docs/8.1/static/ddl-partitioning.html
for the gory details.

Cheers,
   Steve

Re: Partitioned tables in queries

От
Kevin Keith
Дата:
My post might have been a little premature - and I apologize for that.

I have figured out what was causing the problem:
1. Constraint exclusion was disabled. I re-enabled.
2. I found that using the now() function - and arbitrary interval will
produce a different execution plan that using a specific date. For example:
    assuming the current time is 16:00:
    a) where start_date > now() - interval '4 hours' scans all child tables.
    b) where start_date > '2006-07-21 12:00:00' only scans the child
table with today's data.

So am I to assume that the value in the query must be a constant, and
cannot be a result of a built-in function in order for
constraint_exclusion to work correctly?

Thanks,

Kevin


Kevin Keith wrote:
> I have a case where I am partitioning tables based on a date range in
> version 8.1.4. For example:
>
> table_with_millions_of_records
> interaction_id  char(16) primary key
> start_date   timestamp (without timezone) - indexed
> .. other columns
>
> child_1   start_date >= 2006-07-21 00:00:00
> child_2   start_date >= 2006-07-20 00:00:00 and start_date <
> 2006-07-21 00:00:00
> ...
> child_5   start_date >= 2006-07-17 00:00:00 and start_date <
> 2006-07-18 00:00:00
>
> with rules on the parent and child tables that redirect the data to
> the appropriate child table based on the start_date.
>
> Because this table is going to grow very large (very quickly), and
> will need to be purged daily, I created partitions, or child tables to
> hold data for each day. I have done the same thing in Oracle in the
> past, and the PostgreSQL solution works great. The archival process is
> very simple - drop the expired child table. I am having one problem.
>
> If I run a query on the full table (there are 5 child tables with data
> for the last 5 days), and my where clause contains data for the
> current day only:
> where start_date > date_trunc('day', now())
> all 5 child tables are scanned when I look at the output from explain
> analyze.
>
> My question is - can I force the planner to only scan the relevant
> child table - when the key related to the partitioned data it part of
> the where clause?
>
> Thanks,
>
> Kevin
>
>
> ...
>


Re: Partitioned tables in queries

От
"Andrew Hammond"
Дата:
> 2. I found that using the now() function - and arbitrary interval will
> produce a different execution plan that using a specific date. For example:
>     assuming the current time is 16:00:
>     a) where start_date > now() - interval '4 hours' scans all child tables.
>     b) where start_date > '2006-07-21 12:00:00' only scans the child
> table with today's data.
>
> So am I to assume that the value in the query must be a constant, and
> cannot be a result of a built-in function in order for
> constraint_exclusion to work correctly?

Have you tried WHERE start_date > (SELECT now() - interval '4 hours')?
Certainly using the constant will allow CBE to work. I think that a
subquery might too.