Обсуждение: Re: [COMMITTERS] pgsql: Implement table partitioning.

Поиск
Список
Период
Сортировка

Re: [COMMITTERS] pgsql: Implement table partitioning.

От
Robert Haas
Дата:
On Wed, Dec 7, 2016 at 1:20 PM, Robert Haas <rhaas@postgresql.org> wrote:
> Implement table partitioning.

Well, that didn't take long to cause problems.  The very first
buildfarm machine to report after this commit is longfin, which is
unhappy:

***************
*** 392,419 **** c text, d text ) PARTITION BY RANGE (a oid_ops, plusone(b), c collate "default", d
collate "en_US"); -- check relkind SELECT relkind FROM pg_class WHERE relname = 'partitioned';  relkind ---------
!  P
! (1 row)
 -- check that range partition key columns are marked NOT NULL SELECT attname, attnotnull FROM pg_attribute WHERE
attrelid=
 
'partitioned'::regclass AND attnum > 0;
!  attname | attnotnull
! ---------+------------
!  a       | t
!  b       | f
!  c       | t
!  d       | t
! (4 rows)
! -- prevent a function referenced in partition key from being dropped DROP FUNCTION plusone(int);
- ERROR:  cannot drop function plusone(integer) because other objects
depend on it
- DETAIL:  table partitioned depends on function plusone(integer)
- HINT:  Use DROP ... CASCADE to drop the dependent objects too. -- partitioned table cannot partiticipate in regular
inheritanceCREATE TABLE partitioned2 ( a int
 
--- 392,411 ---- c text, d text ) PARTITION BY RANGE (a oid_ops, plusone(b), c collate "default", d
collate "en_US");
+ ERROR:  collation "en_US" for encoding "SQL_ASCII" does not exist -- check relkind SELECT relkind FROM pg_class WHERE
relname= 'partitioned';  relkind ---------
 
! (0 rows)
 -- check that range partition key columns are marked NOT NULL SELECT attname, attnotnull FROM pg_attribute WHERE
attrelid=
 
'partitioned'::regclass AND attnum > 0;
! ERROR:  relation "partitioned" does not exist
! LINE 1: ...me, attnotnull FROM pg_attribute WHERE attrelid = 'partition...
!                                                              ^ -- prevent a function referenced in partition key from
beingdropped DROP FUNCTION plusone(int); -- partitioned table cannot partiticipate in regular inheritance CREATE TABLE
partitioned2( a int
 

No idea why yet, but I'll try to figure it out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [COMMITTERS] pgsql: Implement table partitioning.

От
Robert Haas
Дата:
On Wed, Dec 7, 2016 at 1:30 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>   -- partitioned table cannot partiticipate in regular inheritance
>   CREATE TABLE partitioned2 (
>   a int
> --- 392,411 ----
>   c text,
>   d text
>   ) PARTITION BY RANGE (a oid_ops, plusone(b), c collate "default", d
> collate "en_US");
> + ERROR:  collation "en_US" for encoding "SQL_ASCII" does not exist
...
> No idea why yet, but I'll try to figure it out.

And of course that'd be because relying on en_US isn't portable.  Sigh.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [COMMITTERS] pgsql: Implement table partitioning.

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> And of course that'd be because relying on en_US isn't portable.  Sigh.

You can't rely on *any* collations other than C and POSIX.
        regards, tom lane



Re: [COMMITTERS] pgsql: Implement table partitioning.

От
Robert Haas
Дата:
On Wed, Dec 7, 2016 at 3:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> And of course that'd be because relying on en_US isn't portable.  Sigh.
>
> You can't rely on *any* collations other than C and POSIX.

I get it; I just missed that during review, and then sent that message
before I even looked at it carefully, so that you would know I was
working on it.  I think that it's fixed now; at any rate, the
buildfarm seems happy enough.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [COMMITTERS] pgsql: Implement table partitioning.

От
Amit Langote
Дата:
On 2016/12/08 3:33, Robert Haas wrote:
> On Wed, Dec 7, 2016 at 1:30 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>   -- partitioned table cannot partiticipate in regular inheritance
>>   CREATE TABLE partitioned2 (
>>   a int
>> --- 392,411 ----
>>   c text,
>>   d text
>>   ) PARTITION BY RANGE (a oid_ops, plusone(b), c collate "default", d
>> collate "en_US");
>> + ERROR:  collation "en_US" for encoding "SQL_ASCII" does not exist
> ...
>> No idea why yet, but I'll try to figure it out.
> 
> And of course that'd be because relying on en_US isn't portable.  Sigh.

Should've thought about the non-portability of locales.  Thanks for
catching and fixing anyway!

Thanks,
Amit





Re: [HACKERS] [COMMITTERS] pgsql: Implement table partitioning.

От
Keith Fiske
Дата:

On Wed, Dec 7, 2016 at 1:30 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Dec 7, 2016 at 1:20 PM, Robert Haas <rhaas@postgresql.org> wrote:
> Implement table partitioning.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Being that table partitioning is something I'm slightly interested in, figured I'd give it a whirl.

This example in the docs has an extraneous comma after the second column

CREATE TABLE cities (
    name         text not null,
    population   int,
) PARTITION BY LIST (initcap(name));

And the WITH OPTIONS clause does not appear to be working using another example from the docs. Not seeing any obvious typos.

keith@keith=# CREATE TABLE measurement_y2016m07
keith-#     PARTITION OF measurement (
keith(#     unitsales WITH OPTIONS DEFAULT 0
keith(# ) FOR VALUES FROM ('2016-07-01') TO ('2016-08-01');
2016-12-09 12:51:48.728 EST [11711] ERROR:  syntax error at or near "WITH" at character 80
2016-12-09 12:51:48.728 EST [11711] STATEMENT:  CREATE TABLE measurement_y2016m07
        PARTITION OF measurement (
        unitsales WITH OPTIONS DEFAULT 0
    ) FOR VALUES FROM ('2016-07-01') TO ('2016-08-01');
ERROR:  syntax error at or near "WITH"
LINE 3:     unitsales WITH OPTIONS DEFAULT 0
                      ^
Time: 0.184 ms

Removing the unit_sales default allows it to work fine

keith@keith=# CREATE TABLE measurement_y2016m07
    PARTITION OF measurement
 FOR VALUES FROM ('2016-07-01') TO ('2016-08-01');
CREATE TABLE
Time: 5.001 ms

Re: [HACKERS] [COMMITTERS] pgsql: Implement table partitioning.

От
Amit Langote
Дата:
Hi Keith,

On Sat, Dec 10, 2016 at 3:00 AM, Keith Fiske <keith@omniti.com> wrote:
> Being that table partitioning is something I'm slightly interested in,
> figured I'd give it a whirl.
>
> This example in the docs has an extraneous comma after the second column
>
> CREATE TABLE cities (
>     name         text not null,
>     population   int,
> ) PARTITION BY LIST (initcap(name));
>
> And the WITH OPTIONS clause does not appear to be working using another
> example from the docs. Not seeing any obvious typos.
>
> keith@keith=# CREATE TABLE measurement_y2016m07
> keith-#     PARTITION OF measurement (
> keith(#     unitsales WITH OPTIONS DEFAULT 0
> keith(# ) FOR VALUES FROM ('2016-07-01') TO ('2016-08-01');
> 2016-12-09 12:51:48.728 EST [11711] ERROR:  syntax error at or near "WITH"
> at character 80
> 2016-12-09 12:51:48.728 EST [11711] STATEMENT:  CREATE TABLE
> measurement_y2016m07
>         PARTITION OF measurement (
>         unitsales WITH OPTIONS DEFAULT 0
>     ) FOR VALUES FROM ('2016-07-01') TO ('2016-08-01');
> ERROR:  syntax error at or near "WITH"
> LINE 3:     unitsales WITH OPTIONS DEFAULT 0
>                       ^
> Time: 0.184 ms
>
> Removing the unit_sales default allows it to work fine

WITH OPTIONS keyword phrase is something that was made redundant in
the last version of the patch, but I forgot to remove the same in the
example.  I've sent a doc patch to fix that.

If you try - unitsales DEFAULT 0, it will work.  Note that I did not
specify WITH OPTIONS.

Thanks,
Amit



Re: [HACKERS] [COMMITTERS] pgsql: Implement table partitioning.

От
Keith Fiske
Дата:


On Fri, Dec 9, 2016 at 1:13 PM, Amit Langote <amitlangote09@gmail.com> wrote:
Hi Keith,

On Sat, Dec 10, 2016 at 3:00 AM, Keith Fiske <keith@omniti.com> wrote:
> Being that table partitioning is something I'm slightly interested in,
> figured I'd give it a whirl.
>
> This example in the docs has an extraneous comma after the second column
>
> CREATE TABLE cities (
>     name         text not null,
>     population   int,
> ) PARTITION BY LIST (initcap(name));
>
> And the WITH OPTIONS clause does not appear to be working using another
> example from the docs. Not seeing any obvious typos.
>
> keith@keith=# CREATE TABLE measurement_y2016m07
> keith-#     PARTITION OF measurement (
> keith(#     unitsales WITH OPTIONS DEFAULT 0
> keith(# ) FOR VALUES FROM ('2016-07-01') TO ('2016-08-01');
> 2016-12-09 12:51:48.728 EST [11711] ERROR:  syntax error at or near "WITH"
> at character 80
> 2016-12-09 12:51:48.728 EST [11711] STATEMENT:  CREATE TABLE
> measurement_y2016m07
>         PARTITION OF measurement (
>         unitsales WITH OPTIONS DEFAULT 0
>     ) FOR VALUES FROM ('2016-07-01') TO ('2016-08-01');
> ERROR:  syntax error at or near "WITH"
> LINE 3:     unitsales WITH OPTIONS DEFAULT 0
>                       ^
> Time: 0.184 ms
>
> Removing the unit_sales default allows it to work fine

WITH OPTIONS keyword phrase is something that was made redundant in
the last version of the patch, but I forgot to remove the same in the
example.  I've sent a doc patch to fix that.

If you try - unitsales DEFAULT 0, it will work.  Note that I did not
specify WITH OPTIONS.

Thanks,
Amit

That works. Thanks!

keith@keith=# CREATE TABLE measurement_y2016m07
    PARTITION OF measurement (
    unitsales DEFAULT 0
) FOR VALUES FROM ('2016-07-01') TO ('2016-08-01');
CREATE TABLE
Time: 4.091 ms

Re: [HACKERS] [COMMITTERS] pgsql: Implement table partitioning.

От
Keith Fiske
Дата:

On Fri, Dec 9, 2016 at 1:23 PM, Keith Fiske <keith@omniti.com> wrote:


On Fri, Dec 9, 2016 at 1:13 PM, Amit Langote <amitlangote09@gmail.com> wrote:
Hi Keith,

On Sat, Dec 10, 2016 at 3:00 AM, Keith Fiske <keith@omniti.com> wrote:
> Being that table partitioning is something I'm slightly interested in,
> figured I'd give it a whirl.
>
> This example in the docs has an extraneous comma after the second column
>
> CREATE TABLE cities (
>     name         text not null,
>     population   int,
> ) PARTITION BY LIST (initcap(name));
>
> And the WITH OPTIONS clause does not appear to be working using another
> example from the docs. Not seeing any obvious typos.
>
> keith@keith=# CREATE TABLE measurement_y2016m07
> keith-#     PARTITION OF measurement (
> keith(#     unitsales WITH OPTIONS DEFAULT 0
> keith(# ) FOR VALUES FROM ('2016-07-01') TO ('2016-08-01');
> 2016-12-09 12:51:48.728 EST [11711] ERROR:  syntax error at or near "WITH"
> at character 80
> 2016-12-09 12:51:48.728 EST [11711] STATEMENT:  CREATE TABLE
> measurement_y2016m07
>         PARTITION OF measurement (
>         unitsales WITH OPTIONS DEFAULT 0
>     ) FOR VALUES FROM ('2016-07-01') TO ('2016-08-01');
> ERROR:  syntax error at or near "WITH"
> LINE 3:     unitsales WITH OPTIONS DEFAULT 0
>                       ^
> Time: 0.184 ms
>
> Removing the unit_sales default allows it to work fine

WITH OPTIONS keyword phrase is something that was made redundant in
the last version of the patch, but I forgot to remove the same in the
example.  I've sent a doc patch to fix that.

If you try - unitsales DEFAULT 0, it will work.  Note that I did not
specify WITH OPTIONS.

Thanks,
Amit

That works. Thanks!

keith@keith=# CREATE TABLE measurement_y2016m07
    PARTITION OF measurement (
    unitsales DEFAULT 0
) FOR VALUES FROM ('2016-07-01') TO ('2016-08-01');
CREATE TABLE
Time: 4.091 ms


Working on a blog post for this feature and just found some more inconsistencies with the doc examples. Looks like the city_id column was defined in the measurements table when it should be in the cities table. The addition of the partition to the cities table fails since it's missing.

Examples should look like this:

CREATE TABLE measurement (
    logdate         date not null,
    peaktemp        int,
    unitsales       int
) PARTITION BY RANGE (logdate);

CREATE TABLE cities (
    city_id         bigserial not null,
    name         text not null,
    population   int
) PARTITION BY LIST (initcap(name));

I actually changed my example to have city_id use bigserial to show that sequences are inherited automatically. May be good to show that in the docs.

Another suggestion I had was for handling when data is inserted that doesn't match any defined child tables. Right now it just errors out, but in pg_partman I'd had it send the data to the parent instead to avoid data loss. I know that's not possible here, but how about syntax to define a child table as a "default" to take data that would normally be rejected? Maybe something like

CREATE TABLE measurement_default
    PARTITION OF measurement (
    unitsales DEFAULT 0
) FOR VALUES DEFAULT;


Keith

Re: [HACKERS] [COMMITTERS] pgsql: Implement table partitioning.

От
Robert Haas
Дата:
On Fri, Dec 9, 2016 at 5:55 PM, Keith Fiske <keith@omniti.com> wrote:
> Another suggestion I had was for handling when data is inserted that doesn't
> match any defined child tables. Right now it just errors out, but in
> pg_partman I'd had it send the data to the parent instead to avoid data
> loss. I know that's not possible here, but how about syntax to define a
> child table as a "default" to take data that would normally be rejected?
> Maybe something like
>
> CREATE TABLE measurement_default
>     PARTITION OF measurement (
>     unitsales DEFAULT 0
> ) FOR VALUES DEFAULT;

One thing that's tricky/annoying about this is that if you have a
DEFAULT partition and then add a partition, you have to scan the
DEFAULT partition for data that should be moved to the new partition.
That makes what would otherwise be a quick operation slow.  Still, I'm
sure there's a market for that feature.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] [COMMITTERS] pgsql: Implement table partitioning.

От
Keith Fiske
Дата:

On Fri, Dec 9, 2016 at 10:01 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Dec 9, 2016 at 5:55 PM, Keith Fiske <keith@omniti.com> wrote:
> Another suggestion I had was for handling when data is inserted that doesn't
> match any defined child tables. Right now it just errors out, but in
> pg_partman I'd had it send the data to the parent instead to avoid data
> loss. I know that's not possible here, but how about syntax to define a
> child table as a "default" to take data that would normally be rejected?
> Maybe something like
>
> CREATE TABLE measurement_default
>     PARTITION OF measurement (
>     unitsales DEFAULT 0
> ) FOR VALUES DEFAULT;

One thing that's tricky/annoying about this is that if you have a
DEFAULT partition and then add a partition, you have to scan the
DEFAULT partition for data that should be moved to the new partition.
That makes what would otherwise be a quick operation slow.  Still, I'm
sure there's a market for that feature.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

I would find that perfectly acceptable as long as a caveat about the performance impact was included in the documentation. My intent with putting the data in the parent in pg_partman was solely to avoid data loss and I also included a function for monitoring if data went into the parent. That sort of function may not have real utility in core, but I think the intent of the DEFAULT location is a catchall "just in case" and not really intended as a permanent data store. If people did use it that way, and a warning was included about its cost when adding new partitions, then that's on the user for doing that.

I recall reading in the other thread about this that you're looking to make locking across the partition set less strict eventually. If you could make the scan and data move not block on anything except the partitions involved, I think the performance impact of scanning the default partition and moving the data wouldn't even be that bad in the end.

Keith

Re: [HACKERS] [COMMITTERS] pgsql: Implement table partitioning.

От
Christophe Pettus
Дата:
> On Dec 9, 2016, at 22:52, Keith Fiske <keith@omniti.com> wrote:
> On Fri, Dec 9, 2016 at 10:01 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> One thing that's tricky/annoying about this is that if you have a
>> DEFAULT partition and then add a partition, you have to scan the
>> DEFAULT partition for data that should be moved to the new partition.
>> That makes what would otherwise be a quick operation slow.  Still, I'm
>> sure there's a market for that feature.
>
> I would find that perfectly acceptable as long as a caveat about the performance impact was included in the
documentation.

+1.  I don't think it's conceptually different from adding a column with a default, in that regard; you just have to
knowthe impact. 


Re: [HACKERS] [COMMITTERS] pgsql: Implement table partitioning.

От
Jim Nasby
Дата:
On 12/10/16 1:02 PM, Christophe Pettus wrote:
>
>> On Dec 9, 2016, at 22:52, Keith Fiske <keith@omniti.com> wrote:
>> On Fri, Dec 9, 2016 at 10:01 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> One thing that's tricky/annoying about this is that if you have a
>>> DEFAULT partition and then add a partition, you have to scan the
>>> DEFAULT partition for data that should be moved to the new partition.
>>> That makes what would otherwise be a quick operation slow.  Still, I'm
>>> sure there's a market for that feature.
>>
>> I would find that perfectly acceptable as long as a caveat about the performance impact was included in the
documentation.
>
> +1.  I don't think it's conceptually different from adding a column with a default, in that regard; you just have to
knowthe impact.
 

FWIW, I can think of another option: always check the default partition 
for data, even if the data should only exist in a specific partition. If 
that proved to be too expensive in the normal case it could be optional.

Is it possible to manually (and incrementally) move data from the 
default partition to a table that will become the partition for that 
data and then do a fast cut-over once that's done? That would be akin to 
adding a field without a DEFAULT, adding the DEFAULT after that, and 
then slowly updating all the existing rows...
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



Re: [HACKERS] [COMMITTERS] pgsql: Implement table partitioning.

От
Amit Langote
Дата:
On 2016/12/10 7:55, Keith Fiske wrote:
> Working on a blog post for this feature and just found some more
> inconsistencies with the doc examples. Looks like the city_id column was
> defined in the measurements table when it should be in the cities table.
> The addition of the partition to the cities table fails since it's missing.
> 
> Examples should look like this:
> 
> CREATE TABLE measurement (
>     logdate         date not null,
>     peaktemp        int,
>     unitsales       int
> ) PARTITION BY RANGE (logdate);
> 
> CREATE TABLE cities (
>     city_id         bigserial not null,
>     name         text not null,
>     population   int
> ) PARTITION BY LIST (initcap(name));
> 
> I actually changed my example to have city_id use bigserial to show that
> sequences are inherited automatically. May be good to show that in the docs.

Attached is a documentation patch fixing inconsistencies in the examples
that Keith reports and also improve them a bit (cities_west example sounds
a bit contrived now that I think).

Also, I posted a patch earlier [1] to mention the limitation that row
movement caused by UPDATE is treated an error.  I have combined it into
this patch, so that all the documentation fixes proposed are together.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/a4f261c2-8554-f443-05ff-d97dddc19689%40lab.ntt.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения