Re: Declarative partitioning grammar

Поиск
Список
Период
Сортировка
От	Mike
Тема	Re: Declarative partitioning grammar
Дата	11 января 2008 г. 23:04:02
Msg-id	1200096228.31759.19.camel@ipso.snappymail.ca обсуждение исходный текст
Ответ на	Declarative partitioning grammar (Gavin Sherry <swm@alcove.com.au>)
Ответы	Re: Declarative partitioning grammar (Jeff Cohen <jcohen@greenplum.com>)
Список	pgsql-hackers
Дерево обсуждения
Pardon my ignorance as I've never actually used partitioning before but
plan to in the near future, but couldn't the grammar resemble a common
WHERE clause more closely?

> ...   PARTITION BY RANGE(order_date)
>       (
>         START (date '2005-12-01') end (date '2007-12-01')
>            EVERY(interval '2 months')
>       );
>

PARTITION BY RANGE(order_date) ( WHERE order_date >= '2005-12-01' AND order_date < '2007-12-01' EVERY interval '2
months') 

OR

PARTITION BY RANGE(order_date) ( WHERE order_date BETWEEN '2005-12-01' AND '2007-12-01' )

Of course using '>','>=','<','<=' instead of start/end eliminates
any ambiguity along with the need for INCLUSIVE/EXCLUSIVE.

> ...  PARTITION BY LIST (state, deptno)
>       (
>         VALUES ('OR', 1, 'WA', 1),
>         VALUES ('AZ', 1, 'UT', 1, 'NM', 1),
>         VALUES ('OR', 2, 'WA', 2),
>         VALUES ('AZ', 2, 'UT', 2, 'NM', 2),
>         PARTITION region_null VALUES (NULL, NULL),
>         PARTITION region_other
>       );

PARTITION BY LIST (state,deptno) (PARTITION one WHERE state in ('OR', WA') AND deptno = 1 PARTITION two WHERE state
in('AZ', UT') AND deptno IN (1,2) PARTITION region_null WHERE state is null OR deptno is NULLPARTITION region_other 
);

Do you even need to list the columns in the PARTITION BY part?

PARTITION BY LIST (PARTITION one WHERE state in ('OR', WA') AND deptno = 1 PARTITION two WHERE state in ('AZ', UT')
ANDdeptno IN (1,2) PARTITION region_null WHERE state is null OR deptno is NULLPARTITION region_other 
);

Is there really a reason to not have a named partition as well? Sure it
saves a few keystrokes, but it makes trying to do anything with them at
a later date that much more difficult.

Your originally suggested grammar might be shorter to type, but using
WHERE clause syntax we are all familiar with seems a lot more intuitive
to me on the surface at least. Why not try to reuse grammar that already
exists as much as possible?

On Sat, 2008-01-12 at 00:19 +0100, Gavin Sherry wrote:

> CREATE TABLE is modified to accept a PARTITION BY clause. This clause
> contains one or more partition declarations. The syntax is as follows:
>
> PARTITION BY {partition_type} (column_name[, column_name...])
> [PARTITIONS number]
>   (
>        partition_declaration[, partition_declaration...]
>
>   )

> List
> ----
>
> ...    PARTITION BY LIST (state)
>        (PARTITION q1_northwest VALUES ('OR', 'WA'),
>         PARTITION q1_southwest VALUES ('AZ', 'UT', 'NM'),
>         PARTITION q1_northeast VALUES  ('NY', 'VM', 'NJ'),
>         PARTITION q1_southeast VALUES ('FL', 'GA'),
>         PARTITION q1_northcentral VALUES ('SD', 'WI'),
>         PARTITION q1_southcentral VALUES ('OK', 'TX'));
>
> Here, we produce 6 different partitions. The first partition groups
> states in the North West of the USA. We introduce here the named
> partition concept for clarity.
>
> Range
> -----
>
> Range has the most expressive grammar. I'll introduce it in steps:
>
> ...    PARTITION BY RANGE (b)
> (
> PARTITION aa start (date '2007-01-01') end (date '2008-01-01'),
> PARTITION bb start (date '2008-01-01') end (date '2009-01-01')
> );
>
> Here, we create 2 partitions: aa and bb. Partition aa has the range
> 2007-01-01 to 2008-01-01; partition bb has the range 2008-01-01 to
> 2009-01-01. Intervals always have this problem: are the bounds included
> in the range? To deal with this we define: the start of the range is
> included in the range. The ending bound is not. This can be modified
> with the keywords INCLUSIVE and EXCLUSIVE, which modify this property on
> a rule by rule basis.
>
> It is common that these partitions follow a pattern, such as following
> every week, month or year. So, we support the following specification:
>
> ...   PARTITION BY RANGE(order_date)
>       (
>         START (date '2005-12-01') end (date '2007-12-01')
>            EVERY(interval '2 months')
>       );
>
> If we like, we can mix the specification a little:
>
> ...   PARTITION BY RANGE(order_date)
>      ( PARTITION Q1_2005 end (date '2005-04-01'),
>        PARTITION Q2_2005 end (date '2005-07-01'),
>        PARTITION Q3_2005 end (date '2005-10-10'),
>        PARTITION Q4_2005 end (date '2006-01-01'),
>        START (date '2006-02-01') end (date '2008-04-01')
>              EVERY (interval '2 weeks')
>      );
>
> an interesting result of the flexibility of the grammar we've come up
> with is that you can do something like this:
>
> ...   PARTITION BY RANGE(order_date)
>       ( PARTITION minny end date '2004-12-01'),
>         end (date '2006-12-01'),
>         PARTITION maxny start (date '2006-12-01')
>       );
>
> Here, when order_date is less than 2004-12-01, we put the data in minny,
> when it is between 2004-12-01 and 2006-12-01 we put it in an unnamed
> partition and after this we put it in maxny.
>
> Tablespaces
> -----------
>
> We allow inline tablespace specification, such as:
>
> ...   PARTITION BY RANGE(order_date)
>       (
>         PARTITION minny TABLESPACE compress,
>         start (date '2004-12-01') end (date '2006-12-01') TABLESPACE hot,
>         PARTITION maxny TABLESPACE compress
>       );
>
> I've used the term compress here intentionally. A number of operating
> systems now ship file systems which can compress partitions. Users with
> issues with the amount of data they want to keep online can delay the
> time until they need new storage to a future date by compressing less
> regularly used data with this technique, for a performance cost. Data
> being used heavily can live on an uncompressed file system, affected.
>
> Multi-column support
> --------------------
>
> We can have multi-column partitions.
>
> ...  PARTITION BY LIST (state, deptno)
>       (
>         VALUES ('OR', 1, 'WA', 1),
>         VALUES ('AZ', 1, 'UT', 1, 'NM', 1),
>         VALUES ('OR', 2, 'WA', 2),
>         VALUES ('AZ', 2, 'UT', 2, 'NM', 2),
>         PARTITION region_null VALUES (NULL, NULL),
>         PARTITION region_other
>       );
>
> Looking at this syntax now, I think I prefer:
>
> VALUES ('OR', 1),('WA', 1)
>
> To specify keys for the same partition. Thoughts?
>
> Composite partition support
> ---------------------------
>
> Given that we're talking about systems with potentially very large
> amounts of data, power users may want to combine range partitioning with
> hash or list partitioning. For example, your analysis might always be on
> a date range but also be broken down by sales office. So, this would
> combine range and list partitioning (if the sales offices were known) or
> hash partitioning (if they weren't known).
>
> To do this, we introduce the SUBPARTITION clause:
>
> ...    PARTITION BY RANGE(order_date) SUBPARTITION BY HASH (office_id)
>         SUBPARTITIONS 8
>         (
>             start (date '2005-12-01') end (date '2007-12-01')
>                   every (interval '3 months'),
>             start (date '2007-12-01')
>                   end (date '2008-12-01') every (interval '1 month')
>         );
>
> The first partition specification covers 8 partitions, the second 12 for
> 20 partitions in total. Once we add the subpartitioning we have 160
> partitions in total (20 * 8).
>
> Subpartitioning by list can look like this (see templates below):
>
> ...    PARTITION BY RANGE(order_date) SUBPARTITION BY LIST (customer_id)
>        (
>            partition minny (subpartition c1 values (1), subpartition
>                             c2 values (2)),
>            start (date '2004-12-01') end (date '2006-12-01')
>                  (subpartition c1 values (1), subpartition c2 values (2)),
>            partition maxy (values (1), values (2)
>        )
>
> So, the list parameters of each sub partition look like arguments to the
> primary partition. Again, see templates below if you think this looks
> cumbersome.
>
> We do not preclude subpartitions of subpartitions. So, the following is
> valid:
>
> ...   PARTITION BY HASH(b)
> PARTITIONS 2
> SUBPARTITION BY HASH(d)
> SUBPARTITIONS 2,
> SUBPARTITION BY HASH(e) SUBPARTITIONS 2,
> SUBPARTITION BY HASH(f) SUBPARTITIONS 2,
> SUBPARTITION BY HASH(g) SUBPARTITIONS 2,
> SUBPARTITION BY HASH(h) SUBPARTITIONS 2;
>
> Subpartition templates
> ----------------------
>
> There are times we want subpartitions to be laid out in a specific way
> for all partitions. To do this, we use templates:
>
> ...   PARTITION BY RANGE (order_date)
>        SUBPARTITION BY LIST (state)
>        SUBPARTITION TEMPLATE
>        (
>        SUBPARTITION northwest VALUES ('OR', 'WA'),
>        SUBPARTITION southwest VALUES ('AZ', 'UT', 'NM'),
>        SUBPARTITION northeast VALUES  ('NY', 'VM', 'NJ'),
>        SUBPARTITION southeast VALUES ('FL', 'GA'),
>        SUBPARTITION northcentral VALUES ('SD', 'WI'),
>        SUBPARTITION southcentral VALUES ('OK', 'TX')
>        )
>        (start (date '2001-01-01') end (date '2010-01-01')
>           every (interval '3 months')
>        )
>
> For each of the 36 odd partitions we create here, each is subpartitioned
> into geographical areas.
>
> Data management with ALTER
> --------------------------
>
> These are all arguments to ALTER TABLE. All of these require validation
> against the existing specification.
>
> ADD
> ---
>
> For range and list partitioning, it's important to be able to add
> partitions for data not covered by the existing specification. So, we
> propose:
>
> ...  ADD PARTITION q1_2008 end (date '2008-04-01')
>
> COALESCE (maybe)
> ----------------
>
> For hash partitions, remove a partition from the number of hash
> partitions and distribute its data to the remaining partitions.
>
> ... COALESCE PARTITION [name];
>
> I'm not sure if this is really used but other systems we looked at have
> it. Thoughts?
>
> DROP
> ----
>
> For list and range partitions, drop a specified partition from the set
> of partitions.
>
> ... DROP PARTITION minny;
>
> This drops a named partition. Often, it will be difficult for users to
> know partition names, and they might be unnamed. So, we allow this
> syntax:
>
> ... DROP PARTITION FOR(date '2007-01-01');
>
> for range partitions; and:
>
> ... DROP PARTITION FOR(VALUES('CA'));
>
> for list partitions.
>
> We've also discussed something like:
>
> ... DROP PARTITION FOR(POSITION(1));
>
> so that users can easily drop a specific partition in an array of range
> partitions. It seems to me, though, that the use case is generally to
> drop the oldest partition so perhaps we should have a more explicit
> syntax. Thoughts?
>
> EXCHANGE
> --------
>
> This sub-clause allows us to make a table a partition in a set of
> partitions or take a partition out of a set but keep it as a table. IBM
> uses ATTACH and DETACH, which is explicit but Oracle uses EXCHANGE. I'll
> explain the latter:
>
> ... EXCHANGE <partition identifier> WITH TABLE <table name>
>
> partition identifier is one of PARTITION <name> or PARTITION FOR(...).
> The partition in the partition set 'becomes' the table <table name> and
> vice-versa. Essentially, we'd swap the relfilenodes. This means that we
> have to first ADD PARTITION then swap the table and the partition.
> Thoughts?
>
> MERGE
> -----
>
> You can merge and list partitions and any two range partitions:
>
> ... MERGE <partition id>, <partition id> [INTO PARTITION <partition name>]
>
> For range partitions:
>
> ... MERGE PARTITION FOR(date '2006_01_01'), PARTITION FOR(date '2007-01-01');
>
> For list partitions:
>
> ... MERGE PARTITION FOR(VALUES('CA', 'MA')
>
> This begs the question of why we have COALESCE for hash partitioning. I
> don't know, it just seems like the right thing since you can't merge two
> hash partitions together (well, you shouldn't want to).
>
> RENAME
> ------
>
> Rename a partition. We can use partition name or FOR clause.
>
> SPLIT
> -----
>
> Split is used to divide a partition in two. It is designed for list and
> range partitioning but I guess we could/should support hash. I need to
> think about that. For RANGE partitions:
>
> ... SPLIT <partition id> <AT-clause> [INTO (PARTITION <partition name1>,
>      PARTITION <partition name2>)];
>
> AT clause specifies the point at which the partition is split in two:
>
> ... SPLIT PARTITION FOR(2000) AT 1000 INTO PARTITION (part1000,
>      part2000)
>
> We might want ways to do this with unnamed partitions, it seems to me.
> Thoughts?
>
> For list:
>
> ... SPLIT PARTITION region_east AT( VALUES ('CT', 'MA', 'MD') )
>      INTO
>      (
>       PARTITION region_east_1,
>       PARTITION region_east_2
>      );
>
> In this case, values from region_east specified in the AT() list are put in
> region_east_1 and the rest are put in region_east_2.
>
> I think a better way for supporting split with hash is via ADD. I'm sure
> some people think that ugly so I'd like feedback.
>
> TRUNCATE
> --------
>
> Truncate a specified partition:
>
> ... TRUNCATE PARTITION FOR ('2005-01-01')
>
> We could specify a name too.
>
> This will use truncate internally.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
--
Mike <ipso@snappymail.ca>
В списке pgsql-hackers по дате отправления:
Предыдущее

От: Ron Mayer
Дата: 11 января 2008 г., 22:53:05
Сообщение: Re: Declarative partitioning grammar
Следующее

От: "Warren Turkal"
Дата: 11 января 2008 г., 23:16:35
Сообщение: Re: Declarative partitioning grammar
Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Declarative partitioning grammar

Предыдущее

Следующее