Обсуждение: [HACKERS] Multi column range partition table

Поиск

Список

Период

Сортировка

[HACKERS] Multi column range partition table

От

amul sul

Дата:

22 июня 2017 г., 17:48:28

Hi,

While working on the another patch, I came across the case where
I need an auto generated partition for a mutil-column range partitioned
table having following range bound:

PARTITION p1 FROM  (UNBOUNDED, UNBOUNDED) TO (10, 10)
PARTITION p2 FROM  (10, 10)  TO (10, UNBOUNDED)
PARTITION p3 FROM  (10, UNBOUNDED) TO (20, 10)
PARTITION p4 FROM (20, 10) TO (20, UNBOUNDED)
PARTITION p5 FROM (20, UNBOUNDED) TO (UNBOUNDED, UNBOUNDED)

In this, a lower bound of the partition is an upper bound of the
previous partition.

While trying to create p3 partition with (10, UNBOUNDED) to (20, 10) bound,
got an overlap partition error.

Here is the SQL to reproduced this error:

CREATE TABLE range_parted ( i1 int,  i2 int ) PARTITION BY RANGE (i1, i2);
CREATE TABLE p1 PARTITION OF range_parted FOR VALUES FROM (UNBOUNDED,
UNBOUNDED) TO (10, 10);
CREATE TABLE p2 PARTITION OF range_parted FOR VALUES FROM (10, 10) TO
(10, UNBOUNDED);
CREATE TABLE p3   PARTITION OF tab1 FOR VALUES FROM (10, UNBOUNDED) TO (20, 10);

ERROR:  partition "p3" would overlap partition "tab1_p_10_10"

This happened because of UNBOUNDED handling, where it is a negative infinite
if it is in FROM clause.  Wondering can't we explicitly treat this as
a positive infinite value, can we?

Thoughts/Comments?

Regards,
Amul

Re: [HACKERS] Multi column range partition table

От

Amit Langote

Дата:

23 июня 2017 г., 07:28:21

On 2017/06/22 20:48, amul sul wrote:
> Hi,
> 
> While working on the another patch, I came across the case where
> I need an auto generated partition for a mutil-column range partitioned
> table having following range bound:
> 
> PARTITION p1 FROM  (UNBOUNDED, UNBOUNDED) TO (10, 10)
> PARTITION p2 FROM  (10, 10)  TO (10, UNBOUNDED)
> PARTITION p3 FROM  (10, UNBOUNDED) TO (20, 10)
> PARTITION p4 FROM (20, 10) TO (20, UNBOUNDED)
> PARTITION p5 FROM (20, UNBOUNDED) TO (UNBOUNDED, UNBOUNDED)
> 
> In this, a lower bound of the partition is an upper bound of the
> previous partition.
> 
> While trying to create p3 partition with (10, UNBOUNDED) to (20, 10) bound,
> got an overlap partition error.
> 
> Here is the SQL to reproduced this error:
> 
> CREATE TABLE range_parted ( i1 int,  i2 int ) PARTITION BY RANGE (i1, i2);
> CREATE TABLE p1 PARTITION OF range_parted FOR VALUES FROM (UNBOUNDED,
> UNBOUNDED) TO (10, 10);
> CREATE TABLE p2 PARTITION OF range_parted FOR VALUES FROM (10, 10) TO
> (10, UNBOUNDED);
> CREATE TABLE p3   PARTITION OF tab1 FOR VALUES FROM (10, UNBOUNDED) TO (20, 10);
> 
> ERROR:  partition "p3" would overlap partition "tab1_p_10_10"
> 
> This happened because of UNBOUNDED handling, where it is a negative infinite
> if it is in FROM clause.  Wondering can't we explicitly treat this as
> a positive infinite value, can we?

No, we cannot.  What would be greater than (or equal to) +infinite?
Nothing.  So, even if you will want p3 to accept (10, 9890148), it won't
because 9890148 is not >= +infinite.  It will accept only the rows where
the first column is > 10 (second column is not checked in that case).

You will have to define p3 as follows:

CREATE TABLE p3 PARTITION OF tab1 FOR VALUES FROM (11, UNBOUNDED) TO (20, 10);

It's fine to use the previous partition's upper bound as the lower bound
of the current partition, if the former does contain an UNBOUNDED value,
because whereas a finite value divides the range into two parts (assigned
to the two partitions respectively), an UNBOUNDED value does not.  The
latter represents an abstract end of the range (either on the positive
side or the negative).

Does that make sense?

Thanks,
Amit

Re: [HACKERS] Multi column range partition table

От

amul sul

Дата:

23 июня 2017 г., 10:42:48

On Fri, Jun 23, 2017 at 6:58 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2017/06/22 20:48, amul sul wrote:
>> Hi,
>>
>> While working on the another patch, I came across the case where
>> I need an auto generated partition for a mutil-column range partitioned
>> table having following range bound:
>>
>> PARTITION p1 FROM  (UNBOUNDED, UNBOUNDED) TO (10, 10)
>> PARTITION p2 FROM  (10, 10)  TO (10, UNBOUNDED)
>> PARTITION p3 FROM  (10, UNBOUNDED) TO (20, 10)
>> PARTITION p4 FROM (20, 10) TO (20, UNBOUNDED)
>> PARTITION p5 FROM (20, UNBOUNDED) TO (UNBOUNDED, UNBOUNDED)
>>
>> In this, a lower bound of the partition is an upper bound of the
>> previous partition.
>>
>> While trying to create p3 partition with (10, UNBOUNDED) to (20, 10) bound,
>> got an overlap partition error.
>>
>> Here is the SQL to reproduced this error:
>>
>> CREATE TABLE range_parted ( i1 int,  i2 int ) PARTITION BY RANGE (i1, i2);
>> CREATE TABLE p1 PARTITION OF range_parted FOR VALUES FROM (UNBOUNDED,
>> UNBOUNDED) TO (10, 10);
>> CREATE TABLE p2 PARTITION OF range_parted FOR VALUES FROM (10, 10) TO
>> (10, UNBOUNDED);
>> CREATE TABLE p3   PARTITION OF tab1 FOR VALUES FROM (10, UNBOUNDED) TO (20, 10);
>>
>> ERROR:  partition "p3" would overlap partition "tab1_p_10_10"
>>
>> This happened because of UNBOUNDED handling, where it is a negative infinite
>> if it is in FROM clause.  Wondering can't we explicitly treat this as
>> a positive infinite value, can we?
>
> No, we cannot.  What would be greater than (or equal to) +infinite?
> Nothing.  So, even if you will want p3 to accept (10, 9890148), it won't
> because 9890148 is not >= +infinite.  It will accept only the rows where
> the first column is > 10 (second column is not checked in that case).
>
> You will have to define p3 as follows:
>
> CREATE TABLE p3 PARTITION OF tab1 FOR VALUES FROM (11, UNBOUNDED) TO (20, 10);
>
What if the partition key column is FLOAT ?

Regards,
Amul

Re: [HACKERS] Multi column range partition table

От

Amit Langote

Дата:

23 июня 2017 г., 12:02:04

On 2017/06/23 13:42, amul sul wrote:
> On Fri, Jun 23, 2017 at 6:58 AM, Amit Langote wrote:
>> On 2017/06/22 20:48, amul sul wrote:
>>> This happened because of UNBOUNDED handling, where it is a negative infinite
>>> if it is in FROM clause.  Wondering can't we explicitly treat this as
>>> a positive infinite value, can we?
>>
>> No, we cannot.  What would be greater than (or equal to) +infinite?
>> Nothing.  So, even if you will want p3 to accept (10, 9890148), it won't
>> because 9890148 is not >= +infinite.  It will accept only the rows where
>> the first column is > 10 (second column is not checked in that case).
>>
>> You will have to define p3 as follows:
>>
>> CREATE TABLE p3 PARTITION OF tab1 FOR VALUES FROM (11, UNBOUNDED) TO (20, 10);
>>
> What if the partition key column is FLOAT ?

I would say use a value such that the btfloat4cmp (or btfloat8cmp) will
tell it to be greater than 10.

Of course, we can't write what I just said in the user-level
documentation, because the fact that we use system- or user-defined btree
comparison proc (btfloat4/8cmp) for partitioning may be irrelevant to the
users.  Although, we do mention in the documentation that we use btree
operator class specified semantics for partitioning.  In any case,
defining your partitioning or indexing on raw float type column(s) is
prone to semantic caveats, I'd think.

Also, there was interesting exchange on this topic during the patch
development [1].  Excerpt:

"Same for ranges of floating-point numbers, which are also probably an
unlikely candidate for a partitioning key anyway."

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmoaucSqQ%3DdJFhaojSpb1706MQYo1Tfn_3tWv6CVWhAOdrQ%40mail.gmail.com

Re: [HACKERS] Multi column range partition table

От

Ashutosh Bapat

Дата:

23 июня 2017 г., 13:01:04

On Fri, Jun 23, 2017 at 6:58 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2017/06/22 20:48, amul sul wrote:
>> Hi,
>>
>> While working on the another patch, I came across the case where
>> I need an auto generated partition for a mutil-column range partitioned
>> table having following range bound:
>>
>> PARTITION p1 FROM  (UNBOUNDED, UNBOUNDED) TO (10, 10)
>> PARTITION p2 FROM  (10, 10)  TO (10, UNBOUNDED)
>> PARTITION p3 FROM  (10, UNBOUNDED) TO (20, 10)
>> PARTITION p4 FROM (20, 10) TO (20, UNBOUNDED)
>> PARTITION p5 FROM (20, UNBOUNDED) TO (UNBOUNDED, UNBOUNDED)
>>
>> In this, a lower bound of the partition is an upper bound of the
>> previous partition.
>>
>> While trying to create p3 partition with (10, UNBOUNDED) to (20, 10) bound,
>> got an overlap partition error.
>>
>> Here is the SQL to reproduced this error:
>>
>> CREATE TABLE range_parted ( i1 int,  i2 int ) PARTITION BY RANGE (i1, i2);
>> CREATE TABLE p1 PARTITION OF range_parted FOR VALUES FROM (UNBOUNDED,
>> UNBOUNDED) TO (10, 10);
>> CREATE TABLE p2 PARTITION OF range_parted FOR VALUES FROM (10, 10) TO
>> (10, UNBOUNDED);
>> CREATE TABLE p3   PARTITION OF tab1 FOR VALUES FROM (10, UNBOUNDED) TO (20, 10);
>>
>> ERROR:  partition "p3" would overlap partition "tab1_p_10_10"
>>
>> This happened because of UNBOUNDED handling, where it is a negative infinite
>> if it is in FROM clause.  Wondering can't we explicitly treat this as
>> a positive infinite value, can we?

The way we have designed our syntax, we don't have a way to tell that
p3 comes after p2 and they have no gap between those. But I don't
think that's your question. What you are struggling with is a way to
specify a lower bound (10, +infinity) so that anything with i1 > 10
would go to partition 3.

>
> No, we cannot.  What would be greater than (or equal to) +infinite?
> Nothing.  So, even if you will want p3 to accept (10, 9890148), it won't
> because 9890148 is not >= +infinite.  It will accept only the rows where
> the first column is > 10 (second column is not checked in that case).
>
> You will have to define p3 as follows:
>
> CREATE TABLE p3 PARTITION OF tab1 FOR VALUES FROM (11, UNBOUNDED) TO (20, 10);

That's not exactly same as specifying (10, +infinity) in case i1 is a
float. A user can not precisely tell what would be the acceptable
value just greater than 10.

An UNBOUNDED in the lower bound is always considered as -infinity for
that data type. There is no way to specify a lower bound which has
+infinity in it. +infinite as a lower bounds for the first key may not
make sense (although that means that the partition will always be
empty), but it does make sense for keys after the first as Amul has
explained below.

The question is do we have support for that and if not, will we
consider it for v10 or v11 and how.

>
> It's fine to use the previous partition's upper bound as the lower bound
> of the current partition, if the former does contain an UNBOUNDED value,
> because whereas a finite value divides the range into two parts (assigned
> to the two partitions respectively), an UNBOUNDED value does not.  The
> latter represents an abstract end of the range (either on the positive
> side or the negative).

Not exactly for second key onwards.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Multi column range partition table

От

Dean Rasheed

Дата:

23 июня 2017 г., 14:00:09

On 23 June 2017 at 08:01, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> The way we have designed our syntax, we don't have a way to tell that
> p3 comes after p2 and they have no gap between those. But I don't
> think that's your question. What you are struggling with is a way to
> specify a lower bound (10, +infinity) so that anything with i1 > 10
> would go to partition 3.
>

I think actually there is a fundamental problem here, which arises
because UNBOUNDED has 2 different meanings depending on context, and
thus it is not possible in general to specify the start of one range
to be equal to the end of the previous range, as is necessary to get
contiguous non-overlapping ranges.

Note that this isn't just a problem for floating point datatypes
either, it also applies to other types such as strings. For example,
given a partition over (text, int) types defined with the following
values:
 FROM ('a', UNBOUNDED) TO ('b', UNBOUNDED)

which is equivalent to
 FROM ('a', -INFINITY) TO ('b', +INFINITY)

where should the next range start?

Even if you were to find a way to specify "the next string after 'b'",
it wouldn't exactly be pretty. The problem is that the above partition
corresponds to "all the strings starting with 'a', plus the string
'b', which is pretty ugly. A neater way to define the pair of ranges
in this case would be:
 FROM ('a', -INFINITY) TO ('b', -INFINITY) FROM ('b', -INFINITY) TO ('c', -INFINITY)

since then all strings starting with 'a' would fall into the first
partition and all the strings starting with 'b' would fall into the
second one.

Currently, when there are 2 partition columns, the partition
constraint is defined as
 (a is not null) and (b is not null) and (a > al or (a = al and b >= bl)) and (a < au or (a = au and b < bu))

if the upper bound bu were allowed to be -INFINITY (something that
should probably be forbidden unless the previous column's upper bound
were finite), then this would simplify to
 (a is not null) and (b is not null) and (a > al or (a = al and b >= bl)) and (a < au)

and in the example above, where al is -INFINITY, it would further simplify to
 (a is not null) and (b is not null) and (a >= al) and (a < au)

There would also be a similar simplification possible if the lower
bound of a partition column were allowed to be +INFINITY.

So, I think that having UNBOUNDED represent both -INFINITY and
+INFINITY depending on context is a design flaw, and that we need to
allow both -INFINITY and +INFINITY as upper and lower bounds (provided
they are preceded by a column with a finite bound). I think that, in
general, that's the only way to allow contiguous non-overlapping
partitions to be defined on multiple columns.

Regards,
Dean

Re: [HACKERS] Multi column range partition table

От

Amit Langote

Дата:

30 июня 2017 г., 14:06:32

On 2017/06/23 17:00, Dean Rasheed wrote:
> On 23 June 2017 at 08:01, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>> The way we have designed our syntax, we don't have a way to tell that
>> p3 comes after p2 and they have no gap between those. But I don't
>> think that's your question. What you are struggling with is a way to
>> specify a lower bound (10, +infinity) so that anything with i1 > 10
>> would go to partition 3.
> 
> I think actually there is a fundamental problem here, which arises
> because UNBOUNDED has 2 different meanings depending on context, and
> thus it is not possible in general to specify the start of one range
> to be equal to the end of the previous range, as is necessary to get
> contiguous non-overlapping ranges.

Okay, I thought about this a bit more and I think I realize that this
arbitrary-sounding restriction of allowing only -infinity in FROM and
+infinity in TO limits the usefulness of the feature to specify infinite
bounds at all.

> Note that this isn't just a problem for floating point datatypes
> either, it also applies to other types such as strings. For example,
> given a partition over (text, int) types defined with the following
> values:
> 
>   FROM ('a', UNBOUNDED) TO ('b', UNBOUNDED)
> 
> which is equivalent to
> 
>   FROM ('a', -INFINITY) TO ('b', +INFINITY)
> 
> where should the next range start?
> 
> Even if you were to find a way to specify "the next string after 'b'",
> it wouldn't exactly be pretty. The problem is that the above partition
> corresponds to "all the strings starting with 'a', plus the string
> 'b', which is pretty ugly. A neater way to define the pair of ranges
> in this case would be:
> 
>   FROM ('a', -INFINITY) TO ('b', -INFINITY)
>   FROM ('b', -INFINITY) TO ('c', -INFINITY)
> 
> since then all strings starting with 'a' would fall into the first
> partition and all the strings starting with 'b' would fall into the
> second one.

I agree that a valid use case like the one above is awkward to express
currently.

> Currently, when there are 2 partition columns, the partition
> constraint is defined as
> 
>   (a is not null) and (b is not null)
>   and
>   (a > al or (a = al and b >= bl))
>   and
>   (a < au or (a = au and b < bu))
> 
> if the upper bound bu were allowed to be -INFINITY (something that
> should probably be forbidden unless the previous column's upper bound
> were finite), then this would simplify to
> 
>   (a is not null) and (b is not null)
>   and
>   (a > al or (a = al and b >= bl))
>   and
>   (a < au)
> 
> and in the example above, where al is -INFINITY, it would further simplify to
> 
>   (a is not null) and (b is not null)
>   and
>   (a >= al)
>   and
>   (a < au)
> 
> There would also be a similar simplification possible if the lower
> bound of a partition column were allowed to be +INFINITY.

Yep.

> So, I think that having UNBOUNDED represent both -INFINITY and
> +INFINITY depending on context is a design flaw, and that we need to
> allow both -INFINITY and +INFINITY as upper and lower bounds (provided
> they are preceded by a column with a finite bound). I think that, in
> general, that's the only way to allow contiguous non-overlapping
> partitions to be defined on multiple columns.

Alright, I spent some time implementing a patch to allow specifying
-infinity and +infinity in arbitrary ways.  Of course, it prevents
nonsensical inputs with appropriate error messages.

When implementing the same, I initially thought that the only grammar
modification required is to allow specifying a sign before the unbounded
keyword, but thought it sounded strange to call the actual bound values
-unbounded and +unbounded.  While the keyword "unbounded" describes the
property of being unbounded, actual values are really -infinity and
+infinity.  So, I decided to instead modify the grammar to accept
-infinity and +infinity in the FROM and TO lists.  The sign is optional
and in its absence, infinity in FROM means -infinity and vice versa.  This
decision may be seen as controversial, now that we are actually in beta,
if we decide to go with this patch at all.

Some adjustments were required in the logic in partition.c that depended
on the old assumption that all infinite values in the lower bound meant
-infinity and vice versa.  That includes get_qual_for_range() being able
to simplify the partition constraint as Dean mentioned in his email.

When testing the patch, I realized that the current code in
check_new_partition_bound() that checks for range partition overlap had a
latent bug that resulted in false positives for the new cases that the new
less restrictive syntax allowed.  I spent some time simplifying that code
while also fixing the aforementioned bug.  It's implemented in the
attached patch 0001.

0002 is the patch that implements the new syntax.

It's possible that this won't be considered a PG 10 open item but a new
feature and so PG 11 material, as Ashutosh also wondered.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Multi column range partition table

От

Ashutosh Bapat

Дата:

30 июня 2017 г., 15:04:13

On Fri, Jun 30, 2017 at 1:36 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
>
> Alright, I spent some time implementing a patch to allow specifying
> -infinity and +infinity in arbitrary ways.  Of course, it prevents
> nonsensical inputs with appropriate error messages.

I don't think -infinity and +infinity are the right terms. For a
string or character data type there is no -infinity and +infinity.
Similarly for enums. We need to extend UNBOUNDED somehow to indicate
the end of a given type in the given direction. I thought about
UNBOUNDED LEFT/RIGHT but then whether LEFT indicates -ve side or +side
would cause confusion. Also LEFT/RIGHT may work for a single
dimensional datatype but not for multi-dimensional spaces. How about
MINIMUM/MAXIMUM or UNBOUNDED MIN/MAX to indicate the extremities.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Multi column range partition table

От

Dean Rasheed

Дата:

02 июля 2017 г., 22:59:53

On 30 June 2017 at 09:06, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
> When testing the patch, I realized that the current code in
> check_new_partition_bound() that checks for range partition overlap had a
> latent bug that resulted in false positives for the new cases that the new
> less restrictive syntax allowed.  I spent some time simplifying that code
> while also fixing the aforementioned bug.  It's implemented in the
> attached patch 0001.
>

I haven't had time to look at 0002 yet, but looking at 0001, I'm not
convinced that this really represents much of a simplification, but I
do prefer the way it now consistently reports the first overlapping
partition in the error message.

I'm not entirely convinced by this change either:

-                        if (equal || off1 != off2)
+                        if (off2 > off1 + 1 || ((off2 == off1 + 1) && !equal))

Passing probe_is_bound = true to partition_bound_bsearch() will I
think cause it to return equal = false when the upper bound of one
partition equals the lower bound of another, so relying on the "equal"
flag here seems a bit dubious. I think I can just about convince
myself that it works, but not for the reasons stated in the comments.

It also seems unnecessary for this code to be doing 2 binary searches.
I think a better simplification would be to just do one binary search
to find the gap that the lower bound fits in, and then test the upper
bound of the new partition against the lower bound of the next
partition (if there is one), as in the attached patch.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

simplify-new-range-partition-bounds-check.patch

Re: [HACKERS] Multi column range partition table

От

Dean Rasheed

Дата:

02 июля 2017 г., 23:15:15

On 30 June 2017 at 10:04, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Fri, Jun 30, 2017 at 1:36 PM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>>
>> Alright, I spent some time implementing a patch to allow specifying
>> -infinity and +infinity in arbitrary ways.  Of course, it prevents
>> nonsensical inputs with appropriate error messages.
>
> I don't think -infinity and +infinity are the right terms. For a
> string or character data type there is no -infinity and +infinity.
> Similarly for enums. We need to extend UNBOUNDED somehow to indicate
> the end of a given type in the given direction. I thought about
> UNBOUNDED LEFT/RIGHT but then whether LEFT indicates -ve side or +side
> would cause confusion. Also LEFT/RIGHT may work for a single
> dimensional datatype but not for multi-dimensional spaces. How about
> MINIMUM/MAXIMUM or UNBOUNDED MIN/MAX to indicate the extremities.
>

Yes, I think you're right. Also, some datatypes include values that
are equal to +/-infinity, which would then behave differently from
unbounded as range bounds, so it wouldn't be a good idea to overload
that term.

My first thought was UNBOUNDED ABOVE/BELOW, because that matches the
terminology already in use of upper and lower bounds.

Regards,
Dean

Re: [HACKERS] Multi column range partition table

От

Amit Langote

Дата:

03 июля 2017 г., 08:32:15

Hi Dean,

Thanks a lot for the review.

On 2017/07/03 1:59, Dean Rasheed wrote:
> On 30 June 2017 at 09:06, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> When testing the patch, I realized that the current code in
>> check_new_partition_bound() that checks for range partition overlap had a
>> latent bug that resulted in false positives for the new cases that the new
>> less restrictive syntax allowed.  I spent some time simplifying that code
>> while also fixing the aforementioned bug.  It's implemented in the
>> attached patch 0001.
>>
> 
> I haven't had time to look at 0002 yet, but looking at 0001, I'm not
> convinced that this really represents much of a simplification, but I
> do prefer the way it now consistently reports the first overlapping
> partition in the error message.
> 
> I'm not entirely convinced by this change either:
> 
> -                        if (equal || off1 != off2)
> +                        if (off2 > off1 + 1 || ((off2 == off1 + 1) && !equal))
> 
> Passing probe_is_bound = true to partition_bound_bsearch() will I
> think cause it to return equal = false when the upper bound of one
> partition equals the lower bound of another, so relying on the "equal"
> flag here seems a bit dubious. I think I can just about convince
> myself that it works, but not for the reasons stated in the comments.

You are right.  What's actually happening in the case where I was thinking
equal would be set to true is that off2 ends up being equal to off1, so
the second arm of that || is not checked at all.

> It also seems unnecessary for this code to be doing 2 binary searches.
> I think a better simplification would be to just do one binary search
> to find the gap that the lower bound fits in, and then test the upper
> bound of the new partition against the lower bound of the next
> partition (if there is one), as in the attached patch.

I agree.  The patch looks good to me.

Thanks again.

Regards,
Amit

Re: [HACKERS] Multi column range partition table

От

Amit Langote

Дата:

03 июля 2017 г., 11:00:11

On 2017/07/03 2:15, Dean Rasheed wrote:
> On 30 June 2017 at 10:04, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>> On Fri, Jun 30, 2017 at 1:36 PM, Amit Langote
>> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>>>
>>> Alright, I spent some time implementing a patch to allow specifying
>>> -infinity and +infinity in arbitrary ways.  Of course, it prevents
>>> nonsensical inputs with appropriate error messages.
>>
>> I don't think -infinity and +infinity are the right terms. For a
>> string or character data type there is no -infinity and +infinity.
>> Similarly for enums. We need to extend UNBOUNDED somehow to indicate
>> the end of a given type in the given direction. I thought about
>> UNBOUNDED LEFT/RIGHT but then whether LEFT indicates -ve side or +side
>> would cause confusion. Also LEFT/RIGHT may work for a single
>> dimensional datatype but not for multi-dimensional spaces. How about
>> MINIMUM/MAXIMUM or UNBOUNDED MIN/MAX to indicate the extremities.
>>
> 
> Yes, I think you're right. Also, some datatypes include values that
> are equal to +/-infinity, which would then behave differently from
> unbounded as range bounds, so it wouldn't be a good idea to overload
> that term.

Agree with you both that using (+/-) infinity may not be a good idea after
all.

> My first thought was UNBOUNDED ABOVE/BELOW, because that matches the
> terminology already in use of upper and lower bounds.

I was starting to like the Ashutosh's suggested UNBOUNDED MIN/MAX syntax,
but could you clarify your comment that ABOVE/BELOW is the terminology
already in use of upper and lower bounds?  I couldn't find ABOVE/BELOW in
our existing syntax anywhere that uses the upper/lower bound notion, so
was confused a little bit.

Also, I assume UNBOUNDED ABOVE signifies positive infinity and vice versa.

Thanks,
Amit

Re: [HACKERS] Multi column range partition table

От

Amit Langote

Дата:

03 июля 2017 г., 12:10:14

On 2017/07/03 14:00, Amit Langote wrote:
> On 2017/07/03 2:15, Dean Rasheed wrote:
>> On 30 June 2017 at 10:04, Ashutosh Bapat
>> <ashutosh.bapat@enterprisedb.com> wrote:
>>> On Fri, Jun 30, 2017 at 1:36 PM, Amit Langote
>>> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>>>>
>>>> Alright, I spent some time implementing a patch to allow specifying
>>>> -infinity and +infinity in arbitrary ways.  Of course, it prevents
>>>> nonsensical inputs with appropriate error messages.
>>>
>>> I don't think -infinity and +infinity are the right terms. For a
>>> string or character data type there is no -infinity and +infinity.
>>> Similarly for enums. We need to extend UNBOUNDED somehow to indicate
>>> the end of a given type in the given direction. I thought about
>>> UNBOUNDED LEFT/RIGHT but then whether LEFT indicates -ve side or +side
>>> would cause confusion. Also LEFT/RIGHT may work for a single
>>> dimensional datatype but not for multi-dimensional spaces. How about
>>> MINIMUM/MAXIMUM or UNBOUNDED MIN/MAX to indicate the extremities.
>>>
>>
>> Yes, I think you're right. Also, some datatypes include values that
>> are equal to +/-infinity, which would then behave differently from
>> unbounded as range bounds, so it wouldn't be a good idea to overload
>> that term.
> 
> Agree with you both that using (+/-) infinity may not be a good idea after
> all.
> 
>> My first thought was UNBOUNDED ABOVE/BELOW, because that matches the
>> terminology already in use of upper and lower bounds.
> 
> I was starting to like the Ashutosh's suggested UNBOUNDED MIN/MAX syntax,
> but could you clarify your comment that ABOVE/BELOW is the terminology
> already in use of upper and lower bounds?  I couldn't find ABOVE/BELOW in
> our existing syntax anywhere that uses the upper/lower bound notion, so
> was confused a little bit.
> 
> Also, I assume UNBOUNDED ABOVE signifies positive infinity and vice versa.

Anyway, here's the revised version of the syntax patch that implements
ABOVE/BELOW extension to UNBOUNDED specification.

0001 is the patch that Dean posted [1] as a replacement for what I earlier
posted for simplifying range partition overlap check.

0002 is the UNBOUNDED syntax extension patch.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CAEZATCVcBCBZsMcHj37TF%2BdcsjCtKZdZ_FAaJjaFMvfoXRqZMg%40mail.gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Multi column range partition table

От

Dean Rasheed

Дата:

03 июля 2017 г., 14:36:19

On 3 July 2017 at 06:00, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2017/07/03 2:15, Dean Rasheed wrote:
>> My first thought was UNBOUNDED ABOVE/BELOW, because that matches the
>> terminology already in use of upper and lower bounds.
>
> I was starting to like the Ashutosh's suggested UNBOUNDED MIN/MAX syntax,
> but could you clarify your comment that ABOVE/BELOW is the terminology
> already in use of upper and lower bounds?  I couldn't find ABOVE/BELOW in
> our existing syntax anywhere that uses the upper/lower bound notion, so
> was confused a little bit.
>

I just meant that the words "above" and "below" more closely match the
already-used terms "upper" and "lower" for the bounds, so that
terminology seemed more consistent, e.g. "UNBOUNDED ABOVE" => no upper
bound.

> Also, I assume UNBOUNDED ABOVE signifies positive infinity and vice versa.
>

Right.

I'm not particularly wedded to that terminology. I always find naming
things hard, so if anyone can think of anything better, let's hear it.

The bigger question is do we want this for PG10? If so, time is
getting tight. My feeling is that we do, because otherwise we'd be
changing the syntax in PG11 of a feature only just released in PG10,
and I think the current syntax is flawed, so it would be better not to
have it in any public release. I'd feel better hearing from the
original committer though.

Meanwhile, I'll continue trying to review the latest patches...

Regards,
Dean

Re: [HACKERS] Multi column range partition table

От

Ashutosh Bapat

Дата:

03 июля 2017 г., 14:43:55

On Mon, Jul 3, 2017 at 2:06 PM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> On 3 July 2017 at 06:00, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> On 2017/07/03 2:15, Dean Rasheed wrote:
>>> My first thought was UNBOUNDED ABOVE/BELOW, because that matches the
>>> terminology already in use of upper and lower bounds.
>>
>> I was starting to like the Ashutosh's suggested UNBOUNDED MIN/MAX syntax,
>> but could you clarify your comment that ABOVE/BELOW is the terminology
>> already in use of upper and lower bounds?  I couldn't find ABOVE/BELOW in
>> our existing syntax anywhere that uses the upper/lower bound notion, so
>> was confused a little bit.
>>
>
> I just meant that the words "above" and "below" more closely match the
> already-used terms "upper" and "lower" for the bounds, so that
> terminology seemed more consistent, e.g. "UNBOUNDED ABOVE" => no upper
> bound.
>
>
>> Also, I assume UNBOUNDED ABOVE signifies positive infinity and vice versa.
>>
>
> Right.
>
> I'm not particularly wedded to that terminology. I always find naming
> things hard, so if anyone can think of anything better, let's hear it.

Yet another option: UNBOUNDED UPPER/LOWER.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Multi column range partition table

От

Amit Langote

Дата:

03 июля 2017 г., 15:32:24

Hi Dean,

On 2017/07/03 17:36, Dean Rasheed wrote:
> On 3 July 2017 at 06:00, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> On 2017/07/03 2:15, Dean Rasheed wrote:
>>> My first thought was UNBOUNDED ABOVE/BELOW, because that matches the
>>> terminology already in use of upper and lower bounds.
>>
>> I was starting to like the Ashutosh's suggested UNBOUNDED MIN/MAX syntax,
>> but could you clarify your comment that ABOVE/BELOW is the terminology
>> already in use of upper and lower bounds?  I couldn't find ABOVE/BELOW in
>> our existing syntax anywhere that uses the upper/lower bound notion, so
>> was confused a little bit.
>>
> 
> I just meant that the words "above" and "below" more closely match the
> already-used terms "upper" and "lower" for the bounds, so that
> terminology seemed more consistent, e.g. "UNBOUNDED ABOVE" => no upper
> bound.
>
>> Also, I assume UNBOUNDED ABOVE signifies positive infinity and vice versa.
>>
> 
> Right.

I see, thanks for clarifying.

> I'm not particularly wedded to that terminology. I always find naming
> things hard, so if anyone can think of anything better, let's hear it.
> 
> The bigger question is do we want this for PG10? If so, time is
> getting tight. My feeling is that we do, because otherwise we'd be
> changing the syntax in PG11 of a feature only just released in PG10,
> and I think the current syntax is flawed, so it would be better not to
> have it in any public release. I'd feel better hearing from the
> original committer though.

The way I have extended the syntax in the posted patch, ABOVE/BELOW (or
whatever we decide instead) are optional.  UNBOUNDED without the
ABOVE/BELOW specifications implicitly means UNBOUNDED ABOVE if in FROM and
vice versa, which seems to me like sensible default behavior and what's
already present in PG 10.

Do you think ABOVE/BELOW shouldn't really be optional?

> Meanwhile, I'll continue trying to review the latest patches...

I had forgotten to update the CREATE TABLE documentation in 0002 to
reflect the syntax extension.  Fixed in the attached latest patch.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Multi column range partition table

От

Dean Rasheed

Дата:

04 июля 2017 г., 13:49:05

On 3 July 2017 at 10:32, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2017/07/03 17:36, Dean Rasheed wrote:
>> The bigger question is do we want this for PG10? If so, time is
>> getting tight. My feeling is that we do, because otherwise we'd be
>> changing the syntax in PG11 of a feature only just released in PG10,
>> and I think the current syntax is flawed, so it would be better not to
>> have it in any public release. I'd feel better hearing from the
>> original committer though.
>
> The way I have extended the syntax in the posted patch, ABOVE/BELOW (or
> whatever we decide instead) are optional.  UNBOUNDED without the
> ABOVE/BELOW specifications implicitly means UNBOUNDED ABOVE if in FROM and
> vice versa, which seems to me like sensible default behavior and what's
> already present in PG 10.
>
> Do you think ABOVE/BELOW shouldn't really be optional?
>

Hmm, I'm not so sure about that.

The more I think about this, the more I think that the current design
is broken, and that introducing UNBOUNDED ABOVE/BELOW is just a
sticking plaster to cover that up. Yes, it allows nicer multi-column
ranges to be defined, as demonstrated upthread. But, it also allows
some pretty counterintuitive things like making the lower bound
exclusive and the upper bound inclusive.

I think that's actually the real problem with the current design. If I
have a single-column partition like
 (col) FROM (x) TO (y)

it's pretty clear that's a simple range, inclusive at the lower end
and exclusive at the upper end:
 (x) <= (col) < (y)

If I now make that a 2-column partition, but leave the second column
unbounded:
 (col1,col2) FROM (x,UNBOUNDED) TO (y,UNBOUNDED)

my initial expectation would have been for that to mean the same
thing, i.e.,
 (x) <= (col1) < (y)

but that only happens if "UNBOUNDED" means negative infinity in both
places. That then starts to give the sort of desirable properties
you'd expect, like using the same expression for the lower bound of
one partition as the upper bound of another makes the two partitions
contiguous.

But of course, that's not exactly a pretty design either, because then
you'd be saying that UNBOUNDED means positive infinity if it's the
upper bound of the first column, and negative infinity if it's the
lower bound of the first column or either bound of any other column.

Another aspect of the current design I don't like is that you have to
keep repeating UNBOUNDED [ABOVE/BELOW], for each of the rest of the
columns in the bound, and anything else is an error. That's a pretty
verbose way of saying "the rest of the columns are unbounded".

So the more I think about this, the more I think that a cleaner design
would be as follows:

1). Don't allow UNBOUNDED, except in the first column, where it can   keep it's current meaning.

2). Allow the partition bounds to have fewer columns than the   partition definition, and have that mean the same as it
wouldhave   meant if you were partitioning by that many columns. So, for   example, if you were partitioning by
(col1,col2),you'd be allowed   to define a partition like so:

     FROM (x) TO (y)
   and it would mean
     x <= col1 < y
   Or you'd be able to define a partition like
     FROM (x1,x2) TO (y)
   which would mean
     (col1 > x1) OR (col1 = x1 AND col2 >= x2) AND col1 < y

3). Don't allow any value after UNBOUNDED (i.e., only specify   UNBOUNDED once in a partition bound).

This design has a few neat properties:

- Lower bounds are always inclusive and upper bounds are always exclusive.

- If the expression for the lower bound of one partition is the same as the expression for the upper bound of another,
the2 partitions are contiguous, making it easy to define a covering set of partitions.

- It's much easier to understand what a bound of "(x)" means than "(x,UNBOUNDED [ABOVE/BELOW])"

- It's much less verbose, and there's no needless repetition.

Of course, it's pretty late in the day to be proposing this kind of
redesign, but I fear that if we don't tackle it now, it will just be
harder to deal with in the future.

Actually, a quick, simple hacky implementation might be to just fill
in any omitted values in a partition bound with negative infinity
internally, and when printing a bound, omit any values after an
infinite value. But really, I think we'd want to tidy up the
implementation, and I think a number of things would actually get much
simpler. For example, get_qual_for_range() could simply stop when it
reached the end of the list of values for the bound, and it wouldn't
need to worry about an unbounded value following a bounded one.

Thoughts?

Regards,
Dean

Re: [HACKERS] Multi column range partition table

От

Amit Langote

Дата:

05 июля 2017 г., 15:43:13

Hi Dean,

On 2017/07/04 16:49, Dean Rasheed wrote:
> On 3 July 2017 at 10:32, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> On 2017/07/03 17:36, Dean Rasheed wrote:
>>> The bigger question is do we want this for PG10? If so, time is
>>> getting tight. My feeling is that we do, because otherwise we'd be
>>> changing the syntax in PG11 of a feature only just released in PG10,
>>> and I think the current syntax is flawed, so it would be better not to
>>> have it in any public release. I'd feel better hearing from the
>>> original committer though.
>>
>> The way I have extended the syntax in the posted patch, ABOVE/BELOW (or
>> whatever we decide instead) are optional.  UNBOUNDED without the
>> ABOVE/BELOW specifications implicitly means UNBOUNDED ABOVE if in FROM and
>> vice versa, which seems to me like sensible default behavior and what's
>> already present in PG 10.
>>
>> Do you think ABOVE/BELOW shouldn't really be optional?
>>
> 
> Hmm, I'm not so sure about that.
> 
> The more I think about this, the more I think that the current design
> is broken, and that introducing UNBOUNDED ABOVE/BELOW is just a
> sticking plaster to cover that up. Yes, it allows nicer multi-column
> ranges to be defined, as demonstrated upthread. But, it also allows
> some pretty counterintuitive things like making the lower bound
> exclusive and the upper bound inclusive.

Yes, I kind of got that impression from the example, but wasn't able to
reach the same conclusion as yours that it stems from the underlying
design issues; I thought we'd just have to document them as caveats, but
that doesn't really sound nice.  Thanks for pointing that out.

> I think that's actually the real problem with the current design. If I
> have a single-column partition like
> 
>   (col) FROM (x) TO (y)
> 
> it's pretty clear that's a simple range, inclusive at the lower end
> and exclusive at the upper end:
> 
>   (x) <= (col) < (y)
> 
> If I now make that a 2-column partition, but leave the second column
> unbounded:
> 
>   (col1,col2) FROM (x,UNBOUNDED) TO (y,UNBOUNDED)
> 
> my initial expectation would have been for that to mean the same
> thing, i.e.,
> 
>   (x) <= (col1) < (y)
> 
> but that only happens if "UNBOUNDED" means negative infinity in both
> places. That then starts to give the sort of desirable properties
> you'd expect, like using the same expression for the lower bound of
> one partition as the upper bound of another makes the two partitions
> contiguous.
> 
> But of course, that's not exactly a pretty design either, because then
> you'd be saying that UNBOUNDED means positive infinity if it's the
> upper bound of the first column, and negative infinity if it's the
> lower bound of the first column or either bound of any other column.

Initially, I didn't understand the part where you said FROM (x, UNBOUNDED)
TO (y, UNBOUNDED) would mean the same thing as (x) <= (col1) < (y),
because row comparison logic that underlying multi-column range partition
key comparisons appears to me to contradict the same.  But, maybe it's
thinking about the implementation details like this that's clouding my
judgement about the correctness or the intuitiveness of the current design.

> Another aspect of the current design I don't like is that you have to
> keep repeating UNBOUNDED [ABOVE/BELOW], for each of the rest of the
> columns in the bound, and anything else is an error. That's a pretty
> verbose way of saying "the rest of the columns are unbounded".
>
> So the more I think about this, the more I think that a cleaner design
> would be as follows:
> 
> 1). Don't allow UNBOUNDED, except in the first column, where it can
>     keep it's current meaning.
> 
> 2). Allow the partition bounds to have fewer columns than the
>     partition definition, and have that mean the same as it would have
>     meant if you were partitioning by that many columns. So, for
>     example, if you were partitioning by (col1,col2), you'd be allowed
>     to define a partition like so:
> 
>       FROM (x) TO (y)
> 
>     and it would mean
> 
>       x <= col1 < y
> 
>     Or you'd be able to define a partition like
> 
>       FROM (x1,x2) TO (y)
> 
>     which would mean
> 
>       (col1 > x1) OR (col1 = x1 AND col2 >= x2) AND col1 < y
> 
> 3). Don't allow any value after UNBOUNDED (i.e., only specify
>     UNBOUNDED once in a partition bound).

I assume we don't need the ability of specifying ABOVE/BELOW in this design.

In retrospect, that sounds like something that was implemented in the
earlier versions of the patch, whereby there was no ability to specify
UNBOUNDED on a per-column basis.  So the syntax was:

FROM { (x [, ...]) | UNBOUNDED } TO { (y [, ...]) | UNBOUNDED }

But, it was pointed out to me [1] that that doesn't address the use case,
for example, where part1 goes up to (10, 10) and part2 goes from (10, 10)
up to (10, unbounded).

The new design will limit the usage of unbounded range partitions at the
tail ends.

> This design has a few neat properties:
> 
> - Lower bounds are always inclusive and upper bounds are always
>   exclusive.
> 
> - If the expression for the lower bound of one partition is the same
>   as the expression for the upper bound of another, the 2 partitions
>   are contiguous, making it easy to define a covering set of
>   partitions.
> 
> - It's much easier to understand what a bound of "(x)" means than
>   "(x,UNBOUNDED [ABOVE/BELOW])"
> 
> - It's much less verbose, and there's no needless repetition.

They all sound good to me.

> Of course, it's pretty late in the day to be proposing this kind of
> redesign, but I fear that if we don't tackle it now, it will just be
> harder to deal with in the future.
> 
> Actually, a quick, simple hacky implementation might be to just fill
> in any omitted values in a partition bound with negative infinity
> internally, and when printing a bound, omit any values after an
> infinite value. But really, I think we'd want to tidy up the
> implementation, and I think a number of things would actually get much
> simpler. For example, get_qual_for_range() could simply stop when it
> reached the end of the list of values for the bound, and it wouldn't
> need to worry about an unbounded value following a bounded one.
> 
> Thoughts?

I cooked up a patch for the "hacky" implementation for now, just as you
described in the above paragraph.  Will you be willing to give it a look?
I will also think about the non-hacky way of implementing this.

0001 is your patch to tidy up check_new_partition_bound()  (must be
applied before 0002)

0002 is the patch to implement the range partition syntax redesign that
you outlined above

Thanks again.

Regards,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmoYJcUTcN7vVgg54GHtffH11JJWYZnfF4KiRxjV-iaACQg%40mail.gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Multi column range partition table

От

Dean Rasheed

Дата:

05 июля 2017 г., 20:18:49

On 5 July 2017 at 10:43, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> So the more I think about this, the more I think that a cleaner design
>> would be as follows:
>>
>> 1). Don't allow UNBOUNDED, except in the first column, where it can
>>     keep it's current meaning.
>>
>> 2). Allow the partition bounds to have fewer columns than the
>>     partition definition, and have that mean the same as it would have
>>     meant if you were partitioning by that many columns. So, for
>>     example, if you were partitioning by (col1,col2), you'd be allowed
>>     to define a partition like so:
>>
>>       FROM (x) TO (y)
>>
>>     and it would mean
>>
>>       x <= col1 < y
>>
>>     Or you'd be able to define a partition like
>>
>>       FROM (x1,x2) TO (y)
>>
>>     which would mean
>>
>>       (col1 > x1) OR (col1 = x1 AND col2 >= x2) AND col1 < y
>>
>> 3). Don't allow any value after UNBOUNDED (i.e., only specify
>>     UNBOUNDED once in a partition bound).
>
> I assume we don't need the ability of specifying ABOVE/BELOW in this design.
>

Yes that's right.

> In retrospect, that sounds like something that was implemented in the
> earlier versions of the patch, whereby there was no ability to specify
> UNBOUNDED on a per-column basis.  So the syntax was:
>
> FROM { (x [, ...]) | UNBOUNDED } TO { (y [, ...]) | UNBOUNDED }
>

Yes, that's where I ended up too.

> But, it was pointed out to me [1] that that doesn't address the use case,
> for example, where part1 goes up to (10, 10) and part2 goes from (10, 10)
> up to (10, unbounded).
>
> The new design will limit the usage of unbounded range partitions at the
> tail ends.
>

True, but I don't think that's really a problem. When the first column
is a discrete type, an upper bound of (10, unbounded) can be rewritten
as (11) in the new design. When it's a continuous type, e.g. floating
point, it can no longer be represented, because (10.0, unbounded)
really means (col1 <= 10.0). But we've already decided not to support
anything other than inclusive lower bounds and exclusive upper bounds,
so allowing this upper bound goes against that design choice.

>> Of course, it's pretty late in the day to be proposing this kind of
>> redesign, but I fear that if we don't tackle it now, it will just be
>> harder to deal with in the future.
>>
>> Actually, a quick, simple hacky implementation might be to just fill
>> in any omitted values in a partition bound with negative infinity
>> internally, and when printing a bound, omit any values after an
>> infinite value. But really, I think we'd want to tidy up the
>> implementation, and I think a number of things would actually get much
>> simpler. For example, get_qual_for_range() could simply stop when it
>> reached the end of the list of values for the bound, and it wouldn't
>> need to worry about an unbounded value following a bounded one.
>>
>> Thoughts?
>
> I cooked up a patch for the "hacky" implementation for now, just as you
> described in the above paragraph.  Will you be willing to give it a look?
> I will also think about the non-hacky way of implementing this.
>

OK, I'll take a look.

Meanwhile, I already had a go at the "non-hacky" implementation (WIP
patch attached). The more I worked on it, the simpler things got,
which I think is a good sign.

Part-way through, I realised that the PartitionRangeDatum Node type is
no longer needed, because each bound value is now necessarily finite,
so the lowerdatums and upperdatums lists in a PartitionBoundSpec can
now be made into lists of Const nodes, making them match the
listdatums field used for LIST partitioning, and then a whole lot of
related code gets simplified.

It needed a little bit more code in partition.c to track individual
bound sizes, but there were a number of other places that could be
simplified, so overall this represents a reduction in the code size
and complexity.

It's not complete (e.g., no doc updates yet), but it passes all the
tests, and so far seems to work as I would expect.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

refactor-unbounded-range-partitions.patch

Re: [HACKERS] Multi column range partition table

От

Dean Rasheed

Дата:

05 июля 2017 г., 23:07:37

On 5 July 2017 at 10:43, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
> In retrospect, that sounds like something that was implemented in the
> earlier versions of the patch, whereby there was no ability to specify
> UNBOUNDED on a per-column basis.  So the syntax was:
>
> FROM { (x [, ...]) | UNBOUNDED } TO { (y [, ...]) | UNBOUNDED }
>
> But, it was pointed out to me [1] that that doesn't address the use case,
> for example, where part1 goes up to (10, 10) and part2 goes from (10, 10)
> up to (10, unbounded).
>

[Reading that other thread]

It's a reasonable point that our syntax is quite different from
Oracle's, and doing this takes it even further away, and removes
support for things that they do support.

For the record, Oracle allows things like the following:

DROP TABLE t1;
CREATE TABLE t1 (a NUMBER, b NUMBER, c NUMBER)
PARTITION BY RANGE (a,b,c) (PARTITION t1p1 VALUES LESS THAN (1,2,3),  PARTITION t1p2 VALUES LESS THAN (2,3,4),
PARTITIONt1p3 VALUES LESS THAN (3,MAXVALUE,5),  PARTITION t1p4 VALUES LESS THAN (4,MAXVALUE,6) );

INSERT INTO t1 VALUES(1,2,3);
INSERT INTO t1 VALUES(2,3,4);
INSERT INTO t1 VALUES(3,4,5);
INSERT INTO t1 VALUES(3.01,4,5);
INSERT INTO t1 VALUES(4,5,10);

COLUMN subobject_name FORMAT a20;
SELECT a, b, c, subobject_name FROM t1, user_objects oWHERE o.data_object_id = dbms_rowid.rowid_object(t1.ROWID)ORDER
BYa,b,c;

        A          B          C SUBOBJECT_NAME
---------- ---------- ---------- --------------------        1          2          3 T1P2        2          3
4T1P3        3          4          5 T1P3     3.01          4          5 T1P4        4          5         10 T1P4

So they use MAXVALUE instead of UNBOUNDED for an upper bound, which is
more explicit. They don't have an equivalent MINVALUE, but it's
arguably not necessary, since the first partition's lower bound is
implicitly unbounded.

With this syntax they don't need to worry about gaps or overlaps
between partitions, which is nice, but arguably less flexible.

They're also more lax about allowing finite values after MAXVALUE, and
they document the fact that any value after a MAXVALUE is ignored.

I don't think their scheme provides any way to define a partition of
the above table that would hold all rows for which a < some value.

So if we were to go for maximum flexibility and compatibility with
Oracle, then perhaps what we would do is more like the original idea
of UNBOUNDED ABOVE/BELOW, except call them MINVALUE and MAXVALUE,
which conveniently are already unreserved keywords, as well as being
much shorter. Plus, we would also relax the constraint about having
finite values after MINVALUE/MAXVALUE.

I think I'll go play around with that idea to see what it looks like
in practice. Your previous patch already does much of that, and is far
less invasive.

Regards,
Dean

Re: [HACKERS] Multi column range partition table

От

Amit Langote

Дата:

06 июля 2017 г., 08:14:47

Hi Dean,

On 2017/07/05 23:18, Dean Rasheed wrote:
> On 5 July 2017 at 10:43, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> In retrospect, that sounds like something that was implemented in the
>> earlier versions of the patch, whereby there was no ability to specify
>> UNBOUNDED on a per-column basis.  So the syntax was:
>>
>> FROM { (x [, ...]) | UNBOUNDED } TO { (y [, ...]) | UNBOUNDED }
>>
> Yes, that's where I ended up too.

I see.

>> But, it was pointed out to me [1] that that doesn't address the use case,
>> for example, where part1 goes up to (10, 10) and part2 goes from (10, 10)
>> up to (10, unbounded).
>>
>> The new design will limit the usage of unbounded range partitions at the
>> tail ends.
> 
> True, but I don't think that's really a problem. When the first column
> is a discrete type, an upper bound of (10, unbounded) can be rewritten
> as (11) in the new design. When it's a continuous type, e.g. floating
> point, it can no longer be represented, because (10.0, unbounded)
> really means (col1 <= 10.0). But we've already decided not to support
> anything other than inclusive lower bounds and exclusive upper bounds,
> so allowing this upper bound goes against that design choice.

Yes.

>>> Of course, it's pretty late in the day to be proposing this kind of
>>> redesign, but I fear that if we don't tackle it now, it will just be
>>> harder to deal with in the future.
>>>
>>> Actually, a quick, simple hacky implementation might be to just fill
>>> in any omitted values in a partition bound with negative infinity
>>> internally, and when printing a bound, omit any values after an
>>> infinite value. But really, I think we'd want to tidy up the
>>> implementation, and I think a number of things would actually get much
>>> simpler. For example, get_qual_for_range() could simply stop when it
>>> reached the end of the list of values for the bound, and it wouldn't
>>> need to worry about an unbounded value following a bounded one.
>>>
>>> Thoughts?
>>
>> I cooked up a patch for the "hacky" implementation for now, just as you
>> described in the above paragraph.  Will you be willing to give it a look?
>> I will also think about the non-hacky way of implementing this.
> 
> OK, I'll take a look.

Thanks a lot for your time.

> Meanwhile, I already had a go at the "non-hacky" implementation (WIP
> patch attached). The more I worked on it, the simpler things got,
> which I think is a good sign.

It definitely looks good to me.  I was thinking of more or less the same
approach, but couldn't have done as clean a job as you've done with your
patch.

> Part-way through, I realised that the PartitionRangeDatum Node type is
> no longer needed, because each bound value is now necessarily finite,
> so the lowerdatums and upperdatums lists in a PartitionBoundSpec can
> now be made into lists of Const nodes, making them match the
> listdatums field used for LIST partitioning, and then a whole lot of
> related code gets simplified.

Yeah, seems that way.

> It needed a little bit more code in partition.c to track individual
> bound sizes, but there were a number of other places that could be
> simplified, so overall this represents a reduction in the code size
> and complexity.

Sounds reasonable.

> It's not complete (e.g., no doc updates yet), but it passes all the
> tests, and so far seems to work as I would expect.

Thanks a lot for working on it.

Regards,
Amit

Re: [HACKERS] Multi column range partition table

От

Dean Rasheed

Дата:

06 июля 2017 г., 15:30:21

On 5 July 2017 at 10:43, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
> 0001 is your patch to tidy up check_new_partition_bound()  (must be
> applied before 0002)
>

I pushed this first patch, simplifying check_new_partition_bound() for
range partitions, since it seemed like a good simplification, but note
that I don't think that was actually the cause of the latent bug you
saw upthread.

I think the real issue was in partition_rbound_cmp() -- normally, if
the upper bound of one partition coincides with the lower bound of
another, that function would report the upper bound as the smaller
one, but that logic breaks if any of the bound values are infinite,
since then it will exit early, returning 0, without ever comparing the
"lower" flags on the bounds.

I'm tempted to push a fix for that independently, since it's a bug
waiting to happen, even though it's not possible to hit it currently.

Regards,
Dean

Re: [HACKERS] Multi column range partition table

От

Dean Rasheed

Дата:

07 июля 2017 г., 01:55:15

On 5 July 2017 at 18:07, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> So if we were to go for maximum flexibility and compatibility with
> Oracle, then perhaps what we would do is more like the original idea
> of UNBOUNDED ABOVE/BELOW, except call them MINVALUE and MAXVALUE,
> which conveniently are already unreserved keywords, as well as being
> much shorter. Plus, we would also relax the constraint about having
> finite values after MINVALUE/MAXVALUE.
>

So I know that I have flip-flopped a few times on this now, but I'm
now starting to think that this approach, replacing UNBOUNDED with
MINVALUE and MAXVALUE is the best way to go, along with permitting
finite values after MINVALUE/MAXVALUE.

This gives the greatest flexibility, it's not too verbose, and it
makes it easy to define contiguous sets of partitions just by making
the lower bound of one match the upper bound of another.

With this approach, any partition bounds that Oracle allows are also
valid in PostgreSQL, not that I would normally give too much weight to
that, but it is I think quite a nice syntax. Of course, we also
support things that Oracle doesn't allow, such as MINVALUE and gaps
between partitions.

Parts of the patch are similar to your UNBOUNDED ABOVE/BELOW patch,
but there are a number of differences -- most notably, I replaced the
"infinite" boolean flag on PartitionRangeDatum with a 3-value enum and
did away with all the DefElem nodes and the associated special string
constants being copied and compared.

However, this is also an incompatible syntax change, and any attempt
to support both the old and new syntaxes is likely to be messy, so we
really need to get consensus on whether this is the right thing to do,
and whether it *can* be done now for PG10.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Replace_UNBOUNDED_with_MINVALUE_and_MAXVALUE.patch

Re: [HACKERS] Multi column range partition table

От

Tom Lane

Дата:

07 июля 2017 г., 02:04:46

Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> However, this is also an incompatible syntax change, and any attempt
> to support both the old and new syntaxes is likely to be messy, so we
> really need to get consensus on whether this is the right thing to do,
> and whether it *can* be done now for PG10.

FWIW, I'd much rather see us get it right the first time than release
PG10 with a syntax that we'll regret later.  I do not think that beta2,
or even beta3, is too late for such a change.

I'm not taking a position on whether this proposal is actually better
than what we have.  But if there's a consensus that it is, we should
go ahead and do it, not worry that it's too late.
        regards, tom lane

Re: [HACKERS] Multi column range partition table

От

Dean Rasheed

Дата:

07 июля 2017 г., 02:24:07

On 6 July 2017 at 21:04, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Dean Rasheed <dean.a.rasheed@gmail.com> writes:
>> However, this is also an incompatible syntax change, and any attempt
>> to support both the old and new syntaxes is likely to be messy, so we
>> really need to get consensus on whether this is the right thing to do,
>> and whether it *can* be done now for PG10.
>
> FWIW, I'd much rather see us get it right the first time than release
> PG10 with a syntax that we'll regret later.  I do not think that beta2,
> or even beta3, is too late for such a change.
>
> I'm not taking a position on whether this proposal is actually better
> than what we have.  But if there's a consensus that it is, we should
> go ahead and do it, not worry that it's too late.
>

OK, thanks. That's good to know.

Regards,
Dean

Re: [HACKERS] Multi column range partition table

От

Joe Conway

Дата:

07 июля 2017 г., 03:43:44

On 07/06/2017 01:24 PM, Dean Rasheed wrote:
> On 6 July 2017 at 21:04, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Dean Rasheed <dean.a.rasheed@gmail.com> writes:
>>> However, this is also an incompatible syntax change, and any attempt
>>> to support both the old and new syntaxes is likely to be messy, so we
>>> really need to get consensus on whether this is the right thing to do,
>>> and whether it *can* be done now for PG10.
>>
>> FWIW, I'd much rather see us get it right the first time than release
>> PG10 with a syntax that we'll regret later.  I do not think that beta2,
>> or even beta3, is too late for such a change.
>>
>> I'm not taking a position on whether this proposal is actually better
>> than what we have.  But if there's a consensus that it is, we should
>> go ahead and do it, not worry that it's too late.
>>
>
> OK, thanks. That's good to know.

I agree we should get this right the first time and I also agree with
Dean's proposal, so I guess I'm a +2

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Re: [HACKERS] Multi column range partition table

От

Amit Langote

Дата:

07 июля 2017 г., 07:06:18

On 2017/07/06 18:30, Dean Rasheed wrote:
> On 5 July 2017 at 10:43, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> 0001 is your patch to tidy up check_new_partition_bound()  (must be
>> applied before 0002)
>>
> 
> I pushed this first patch, simplifying check_new_partition_bound() for
> range partitions, since it seemed like a good simplification, but note
> that I don't think that was actually the cause of the latent bug you
> saw upthread.

I like how simple check_new_partition_bound() has now become.

> I think the real issue was in partition_rbound_cmp() -- normally, if
> the upper bound of one partition coincides with the lower bound of
> another, that function would report the upper bound as the smaller
> one, but that logic breaks if any of the bound values are infinite,
> since then it will exit early, returning 0, without ever comparing the
> "lower" flags on the bounds.
> 
> I'm tempted to push a fix for that independently, since it's a bug
> waiting to happen, even though it's not possible to hit it currently.

Oops, you're right.  Thanks for the fix.

Regards,
Amit

Re: [HACKERS] Multi column range partition table

От

Amit Langote

Дата:

07 июля 2017 г., 08:21:28

On 2017/07/07 4:55, Dean Rasheed wrote:
> On 5 July 2017 at 18:07, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>> So if we were to go for maximum flexibility and compatibility with
>> Oracle, then perhaps what we would do is more like the original idea
>> of UNBOUNDED ABOVE/BELOW, except call them MINVALUE and MAXVALUE,
>> which conveniently are already unreserved keywords, as well as being
>> much shorter. Plus, we would also relax the constraint about having
>> finite values after MINVALUE/MAXVALUE.
>>
> So I know that I have flip-flopped a few times on this now, but I'm
> now starting to think that this approach, replacing UNBOUNDED with
> MINVALUE and MAXVALUE is the best way to go, along with permitting
> finite values after MINVALUE/MAXVALUE.

Sure.

> This gives the greatest flexibility, it's not too verbose, and it
> makes it easy to define contiguous sets of partitions just by making
> the lower bound of one match the upper bound of another.
> 
> With this approach, any partition bounds that Oracle allows are also
> valid in PostgreSQL, not that I would normally give too much weight to
> that, but it is I think quite a nice syntax. Of course, we also
> support things that Oracle doesn't allow, such as MINVALUE and gaps
> between partitions.

Agreed.  MINVALUE/MAXVALUE seems like a good way forward.

> Parts of the patch are similar to your UNBOUNDED ABOVE/BELOW patch,
> but there are a number of differences -- most notably, I replaced the
> "infinite" boolean flag on PartitionRangeDatum with a 3-value enum and
> did away with all the DefElem nodes and the associated special string
> constants being copied and compared.

That's better.

> However, this is also an incompatible syntax change, and any attempt
> to support both the old and new syntaxes is likely to be messy, so we
> really need to get consensus on whether this is the right thing to do,
> and whether it *can* be done now for PG10.

+1 to releasing this syntax in PG 10.

The patch looks generally good, although I found and fixed some minor
issues (typos and such).  Please find attached the updated patch.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Replace_UNBOUNDED_with_MINVALUE_and_MAXVALUE-v2.patch

Re: [HACKERS] Multi column range partition table

От

Dean Rasheed

Дата:

07 июля 2017 г., 15:30:46

On 7 July 2017 at 03:21, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
> The patch looks generally good, although I found and fixed some minor
> issues (typos and such).  Please find attached the updated patch.
>

Thanks for the review. Those changes all look good. I also see that I
missed an example in the docs at the bottom of the CREATE TABLE page,
so I'll go update that.

Regards,
Dean

Re: [HACKERS] Multi column range partition table

От

Dean Rasheed

Дата:

09 июля 2017 г., 13:42:32

On 6 July 2017 at 22:43, Joe Conway <mail@joeconway.com> wrote:
> I agree we should get this right the first time and I also agree with
> Dean's proposal, so I guess I'm a +2
>

On 7 July 2017 at 03:21, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
> +1 to releasing this syntax in PG 10.
>

So, that's 3 votes in favour of replacing UNBOUNDED with
MINVALUE/MAXVALUE for range partition bounds in PG 10. Not a huge
consensus, but no objections either. Any one else have an opinion?

Robert, have you been following this thread?

I was thinking of pushing this later today, in time for beta2.

Regards,
Dean

Re: [HACKERS] Multi column range partition table

От

Noah Misch

Дата:

11 июля 2017 г., 12:32:28

On Sun, Jul 09, 2017 at 08:42:32AM +0100, Dean Rasheed wrote:
> On 6 July 2017 at 22:43, Joe Conway <mail@joeconway.com> wrote:
> > I agree we should get this right the first time and I also agree with
> > Dean's proposal, so I guess I'm a +2
> >
> 
> On 7 July 2017 at 03:21, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
> > +1 to releasing this syntax in PG 10.
> >
> 
> So, that's 3 votes in favour of replacing UNBOUNDED with
> MINVALUE/MAXVALUE for range partition bounds in PG 10. Not a huge
> consensus, but no objections either. Any one else have an opinion?
> 
> Robert, have you been following this thread?
> 
> I was thinking of pushing this later today, in time for beta2.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Robert,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

Re: [HACKERS] Multi column range partition table

От

Ashutosh Bapat

Дата:

11 июля 2017 г., 18:29:45

On Sun, Jul 9, 2017 at 1:12 PM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> On 6 July 2017 at 22:43, Joe Conway <mail@joeconway.com> wrote:
>> I agree we should get this right the first time and I also agree with
>> Dean's proposal, so I guess I'm a +2
>>
>
> On 7 July 2017 at 03:21, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> +1 to releasing this syntax in PG 10.
>>
>
> So, that's 3 votes in favour of replacing UNBOUNDED with
> MINVALUE/MAXVALUE for range partition bounds in PG 10. Not a huge
> consensus, but no objections either. Any one else have an opinion?
>

+     <para>
+      Also note that some element types, such as <literal>timestamp</>,
+      have a notion of "infinity", which is just another value that can
+      be stored. This is different from <literal>MINVALUE</> and
+      <literal>MAXVALUE</>, which are not real values that can be stored,
+      but rather they are ways of saying the value is unbounded.
+      <literal>MAXVALUE</> can be thought of as being greater than any
+      other value, including "infinity" and <literal>MINVALUE</> as being
+      less than any other value, including "minus infinity". Thus the range
+      <literal>FROM ('infinity') TO (MAXVALUE)</> is not an empty range; it
+      allows precisely one value to be stored — the timestamp
+      "infinity".     </para>

The description in this paragraph seems to be attaching intuitive
meaning of word "unbounded" to MAXVALUE and MINVALUE, which have
different intuitive meanings of themselves. Not sure if that's how we
should describe MAXVALUE/MINVALUE.

Most of the patch seems to be replacing "content" with "kind",
RangeDatumContent with PartitionRangeDatumKind  and RANGE_DATUM_FINITE
with PARTITION_RANGE_DATUM_VALUE. But those changes in name don't seem
to be adding much value to the patch. Replacing RANGE_DATUM_NEG_INF
and RANGE_DATUM_POS_INF with PARTITION_RANGE_DATUM_MINVALUE and
PARTITION_RANGE_DATUM_MAXVALUE looks like a good change in line with
MINVALUE/MAXVALUE change. May be we should reuse the previous
variables, enum type name and except those two, so that the total
change introduced by the patch is minimal.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Multi column range partition table

От

Dean Rasheed

Дата:

12 июля 2017 г., 01:24:53

On 11 July 2017 at 13:29, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> +     <para>
> +      Also note that some element types, such as <literal>timestamp</>,
> +      have a notion of "infinity", which is just another value that can
> +      be stored. This is different from <literal>MINVALUE</> and
> +      <literal>MAXVALUE</>, which are not real values that can be stored,
> +      but rather they are ways of saying the value is unbounded.
> +      <literal>MAXVALUE</> can be thought of as being greater than any
> +      other value, including "infinity" and <literal>MINVALUE</> as being
> +      less than any other value, including "minus infinity". Thus the range
> +      <literal>FROM ('infinity') TO (MAXVALUE)</> is not an empty range; it
> +      allows precisely one value to be stored — the timestamp
> +      "infinity".
>       </para>
>
> The description in this paragraph seems to be attaching intuitive
> meaning of word "unbounded" to MAXVALUE and MINVALUE, which have
> different intuitive meanings of themselves. Not sure if that's how we
> should describe MAXVALUE/MINVALUE.
>

I'm not sure I understand your point. MINVALUE and MAXVALUE do mean
unbounded below and above respectively. This paragraph is just making
the point that that isn't the same as -/+ infinity.

> Most of the patch seems to be replacing "content" with "kind",
> RangeDatumContent with PartitionRangeDatumKind  and RANGE_DATUM_FINITE
> with PARTITION_RANGE_DATUM_VALUE. But those changes in name don't seem
> to be adding much value to the patch. Replacing RANGE_DATUM_NEG_INF
> and RANGE_DATUM_POS_INF with PARTITION_RANGE_DATUM_MINVALUE and
> PARTITION_RANGE_DATUM_MAXVALUE looks like a good change in line with
> MINVALUE/MAXVALUE change. May be we should reuse the previous
> variables, enum type name and except those two, so that the total
> change introduced by the patch is minimal.
>

No, this isn't just renaming that other enum. It's about replacing the
boolean "infinite" flag on PartitionRangeDatum with something that can
properly enumerate the 3 kinds of PartitionRangeDatum that are allowed
(and, as noted above "finite"/"infinite isn't the right terminology
either). Putting that new enum in parsenodes.h makes it globally
available, wherever the PartitionRangeDatum structure is used. A
side-effect of that change is that the old RangeDatumContent enum that
was local to partition.c is no longer needed.

RangeDatumContent wouldn't be a good name for a globally visible enum
of this kind because that name fails to link it to partitioning in any
way, and could easily be confused as having something to do with RTEs
or range types. Also, the term "content" is more traditionally used in
the Postgres sources for a field *holding* content, rather than a
field specifying the *kind* of content. On the other hand, you'll note
that the term "kind" is by far the most commonly used term for naming
this kind of enum, and any matching fields.

IMO, code consistency and readability takes precedence over keeping
patch sizes down.

Regards,
Dean

Re: [HACKERS] Multi column range partition table

От

Amit Langote

Дата:

12 июля 2017 г., 07:56:22

On 2017/07/12 4:24, Dean Rasheed wrote:
> On 11 July 2017 at 13:29, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
>> Most of the patch seems to be replacing "content" with "kind",
>> RangeDatumContent with PartitionRangeDatumKind  and RANGE_DATUM_FINITE
>> with PARTITION_RANGE_DATUM_VALUE. But those changes in name don't seem
>> to be adding much value to the patch. Replacing RANGE_DATUM_NEG_INF
>> and RANGE_DATUM_POS_INF with PARTITION_RANGE_DATUM_MINVALUE and
>> PARTITION_RANGE_DATUM_MAXVALUE looks like a good change in line with
>> MINVALUE/MAXVALUE change. May be we should reuse the previous
>> variables, enum type name and except those two, so that the total
>> change introduced by the patch is minimal.
>>
> 
> No, this isn't just renaming that other enum. It's about replacing the
> boolean "infinite" flag on PartitionRangeDatum with something that can
> properly enumerate the 3 kinds of PartitionRangeDatum that are allowed
> (and, as noted above "finite"/"infinite isn't the right terminology
> either). Putting that new enum in parsenodes.h makes it globally
> available, wherever the PartitionRangeDatum structure is used. A
> side-effect of that change is that the old RangeDatumContent enum that
> was local to partition.c is no longer needed.
> 
> RangeDatumContent wouldn't be a good name for a globally visible enum
> of this kind because that name fails to link it to partitioning in any
> way, and could easily be confused as having something to do with RTEs
> or range types. Also, the term "content" is more traditionally used in
> the Postgres sources for a field *holding* content, rather than a
> field specifying the *kind* of content. On the other hand, you'll note
> that the term "kind" is by far the most commonly used term for naming
> this kind of enum, and any matching fields.
> 
> IMO, code consistency and readability takes precedence over keeping
> patch sizes down.

I agree with Dean here that the new global PartitionRangeDatumKind enum is
an improvement over the previous infinite flag in the parse node plus the
partition.c local RangeDatumContent enum for all the reasons he mentioned.

Thanks,
Amit

Re: [HACKERS] Multi column range partition table

От

Ashutosh Bapat

Дата:

12 июля 2017 г., 15:46:13

On Wed, Jul 12, 2017 at 12:54 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> On 11 July 2017 at 13:29, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>> +     <para>
>> +      Also note that some element types, such as <literal>timestamp</>,
>> +      have a notion of "infinity", which is just another value that can
>> +      be stored. This is different from <literal>MINVALUE</> and
>> +      <literal>MAXVALUE</>, which are not real values that can be stored,
>> +      but rather they are ways of saying the value is unbounded.
>> +      <literal>MAXVALUE</> can be thought of as being greater than any
>> +      other value, including "infinity" and <literal>MINVALUE</> as being
>> +      less than any other value, including "minus infinity". Thus the range
>> +      <literal>FROM ('infinity') TO (MAXVALUE)</> is not an empty range; it
>> +      allows precisely one value to be stored — the timestamp
>> +      "infinity".
>>       </para>
>>
>> The description in this paragraph seems to be attaching intuitive
>> meaning of word "unbounded" to MAXVALUE and MINVALUE, which have
>> different intuitive meanings of themselves. Not sure if that's how we
>> should describe MAXVALUE/MINVALUE.
>>
>
> I'm not sure I understand your point. MINVALUE and MAXVALUE do mean
> unbounded below and above respectively. This paragraph is just making
> the point that that isn't the same as -/+ infinity.
>

What confuses me and probably users is something named min/max"value"
is not a value but something lesser or greater than any other "value".
The paragraph above explains that <literal>FROM ('infinity') TO
(MAXVALUE)</> implies a partition with only infinity value in there.
What would be the meaning of <literal>FROM (MINVALUE) TO ('minus
infinity')</>, would that be allowed? What would it contain esp. when
the upper bounds are always exclusive?

>
>> Most of the patch seems to be replacing "content" with "kind",
>> RangeDatumContent with PartitionRangeDatumKind  and RANGE_DATUM_FINITE
>> with PARTITION_RANGE_DATUM_VALUE. But those changes in name don't seem
>> to be adding much value to the patch. Replacing RANGE_DATUM_NEG_INF
>> and RANGE_DATUM_POS_INF with PARTITION_RANGE_DATUM_MINVALUE and
>> PARTITION_RANGE_DATUM_MAXVALUE looks like a good change in line with
>> MINVALUE/MAXVALUE change. May be we should reuse the previous
>> variables, enum type name and except those two, so that the total
>> change introduced by the patch is minimal.
>>
>
> No, this isn't just renaming that other enum. It's about replacing the
> boolean "infinite" flag on PartitionRangeDatum with something that can
> properly enumerate the 3 kinds of PartitionRangeDatum that are allowed
> (and, as noted above "finite"/"infinite isn't the right terminology
> either).

Right. I think we need that change.

> Putting that new enum in parsenodes.h makes it globally
> available, wherever the PartitionRangeDatum structure is used. A
> side-effect of that change is that the old RangeDatumContent enum that
> was local to partition.c is no longer needed.

Hmm, I failed to notice the changes in _out, _equal, _read functions.
The downside is that enum can not be used for anything other than
partitioning. But I can not imagine where will we use it though.

>
> RangeDatumContent wouldn't be a good name for a globally visible enum
> of this kind because that name fails to link it to partitioning in any
> way, and could easily be confused as having something to do with RTEs
> or range types. Also, the term "content" is more traditionally used in
> the Postgres sources for a field *holding* content, rather than a
> field specifying the *kind* of content. On the other hand, you'll note
> that the term "kind" is by far the most commonly used term for naming
> this kind of enum, and any matching fields.

Ok.

>
> IMO, code consistency and readability takes precedence over keeping
> patch sizes down.

No doubt about that.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Multi column range partition table

От

Dean Rasheed

Дата:

13 июля 2017 г., 14:19:28

On 12 July 2017 at 10:46, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Wed, Jul 12, 2017 at 12:54 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>> On 11 July 2017 at 13:29, Ashutosh Bapat
>>> The description in this paragraph seems to be attaching intuitive
>>> meaning of word "unbounded" to MAXVALUE and MINVALUE, which have
>>> different intuitive meanings of themselves. Not sure if that's how we
>>> should describe MAXVALUE/MINVALUE.
>>>
>> I'm not sure I understand your point. MINVALUE and MAXVALUE do mean
>> unbounded below and above respectively. This paragraph is just making
>> the point that that isn't the same as -/+ infinity.
>>
> What confuses me and probably users is something named min/max"value"
> is not a value but something lesser or greater than any other "value".

Ah OK, I see what you're saying.

It's worth noting though that, after a little looking around, I found
that Oracle, MySQL and DB2 all use MINVALUE/MAXVALUE for unbounded
range partitions (although in the case of Oracle and MySQL, they
currently only support specifying upper bounds, and only use MAXVALUE
at the moment).

So MINVALUE/MAXVALUE are likely to be familiar to at least some people
coming from other databases. Of course, for those other databases, the
surrounding syntax for creating partitioned tables is completely
different, but at least this makes the bounds themselves portable (our
supported set of bounds will be a superset of those supported by
Oracle and MySQL, and I think the same as those supported by DB2).

I also personally quite like those terms, because they're nice and
concise, and it's pretty obvious which is which.

Regards,
Dean

Re: [HACKERS] Multi column range partition table

От

Robert Haas

Дата:

14 июля 2017 г., 11:12:26

On Sun, Jul 9, 2017 at 2:42 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> On 6 July 2017 at 22:43, Joe Conway <mail@joeconway.com> wrote:
>> I agree we should get this right the first time and I also agree with
>> Dean's proposal, so I guess I'm a +2
>
> On 7 July 2017 at 03:21, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> +1 to releasing this syntax in PG 10.
>
> So, that's 3 votes in favour of replacing UNBOUNDED with
> MINVALUE/MAXVALUE for range partition bounds in PG 10. Not a huge
> consensus, but no objections either. Any one else have an opinion?
>
> Robert, have you been following this thread?

Uh, no.  Sorry.  I agree that it's a big problem that (10, UNBOUNDED)
interpreted as a maximum value means first_column <= 10 and when
interpreted as a minimum value means first_column >= 10, because those
things aren't opposites of each other.  I guess the proposal here
would make (10, MAXVALUE) as a maximum value mean first_column <= 10
and as a minimum would mean first_column > 10, and contrariwise for
MINVALUE.  That seems to restore the intended design principle of the
system, which is good, but...

...originally, Amit proposed to attach a postfix INCLUSIVE or
EXCLUSIVE to each bound specification, and this does feel like a bit
of a back door to the same place, kinda.  A partition defined to run
from (10, MAXVALUE) TO (11, MAXVALUE) is a lot like a partition
defined to run from (10) EXCLUSIVE to (11) EXCLUSIVE.  And if we
eventually decide to allow that, then what will be the difference
between a partition which starts at (10, MAXVALUE) EXCLUSIVE and one
which starts from (10, MAXVALUE) INCLUSIVE?

I haven't thought through this well enough to be sure that there's any
problem with what is being proposed, and I definitely don't have a
better solution off the top of my head, but I feel slightly nervous.

Apologies again for the slow response - will update again by Monday.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Multi column range partition table

От

Dean Rasheed

Дата:

16 июля 2017 г., 16:40:55

On 14 July 2017 at 06:12, Robert Haas <robertmhaas@gmail.com> wrote:
> I agree that it's a big problem that (10, UNBOUNDED)
> interpreted as a maximum value means first_column <= 10 and when
> interpreted as a minimum value means first_column >= 10, because those
> things aren't opposites of each other.  I guess the proposal here
> would make (10, MAXVALUE) as a maximum value mean first_column <= 10
> and as a minimum would mean first_column > 10, and contrariwise for
> MINVALUE.  That seems to restore the intended design principle of the
> system, which is good

Right. So in general, when using MINVALUE/MAXVALUE for the 2nd column
of a 2-column partitioning scheme, the partition constraints simplify
as follows:
 FROM (x, MINVALUE) => col1 >= x FROM (x, MAXVALUE) => col1 > x
 TO (x, MINVALUE)   => col1 < x TO (x, MAXVALUE)   => col1 <= x

which restores the property that one partition can be made contiguous
with another by having the upper bounds of one partition equal to the
lower bounds of the other.

Note that the choice of MINVALUE or MAXVALUE only affects whether the
constraint on the previous column is inclusive or exclusive. That's
quite different from what an INCLUSIVE/EXCLUSIVE flag would do.

>, but...
>
> ...originally, Amit proposed to attach a postfix INCLUSIVE or
> EXCLUSIVE to each bound specification, and this does feel like a bit
> of a back door to the same place, kinda.  A partition defined to run
> from (10, MAXVALUE) TO (11, MAXVALUE) is a lot like a partition
> defined to run from (10) EXCLUSIVE to (11) EXCLUSIVE.  And if we
> eventually decide to allow that, then what will be the difference
> between a partition which starts at (10, MAXVALUE) EXCLUSIVE and one
> which starts from (10, MAXVALUE) INCLUSIVE?

The INCLUSIVE/EXCLUSIVE flag would apply to the constraint as a whole:
 FROM (x, y) INCLUSIVE => (col1, col2) >= (x, y) FROM (x, y) EXCLUSIVE => (col1, col2) > (x, y)
 TO (x, y) INCLUSIVE   => (col1, col2) <= (x, y) TO (x, y) EXCLUSIVE   => (col1, col2) < (x, y)

which, when expanded out, actually only affects the constraint on the
final column, and then only in the case where all the other columns
are equal to the partition bound value:
 FROM (x, y) INCLUSIVE => col1 > x OR (col1 = x AND col2 >= y) FROM (x, y) EXCLUSIVE => col1 > x OR (col1 = x AND col2
>y)

 TO (x, y) INCLUSIVE   => col1 < x OR (col2 = x AND col2 <= y) TO (x, y) EXCLUSIVE   => col1 < x OR (col2 = x AND col2
<y)

So while MINVALUE/MAXVALUE makes a particular column unbounded
below/above, and as a side-effect can influence the inclusivity of the
preceding column, INCLUSIVE/EXCLUSIVE affects the inclusivity of the
final column (something that MINVALUE/MAXVALUE cannot do).

MINVALUE/MAXVALUE takes precedence, in the sense that if the bound on
any column is MINVALUE/MAXVALUE, that column and any later columns are
unbounded and no longer appear in the partition constraint expression,
and so any INCLUSIVE/EXCLUSIVE flag would have no effect. That seems
pretty intuitive to me -- "unbounded inclusive" is no different from
"unbounded exclusive".

Technically, anything that can be done using INCLUSIVE/EXCLUSIVE can
also be done using using MINVALUE/MAXVALUE, by artificially adding
another partitioning column and making it unbounded above/below, but
that would really just be a hack, and it (artificially adding an extra
column) would be unnecessary if we added INCLUSIVE/EXCLUSIVE support
in a later release. Thus, I think the 2 features would complement each
other quite nicely.

Regards,
Dean

Re: [HACKERS] Multi column range partition table

От

Robert Haas

Дата:

17 июля 2017 г., 21:34:45

On Sun, Jul 16, 2017 at 6:40 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> Technically, anything that can be done using INCLUSIVE/EXCLUSIVE can
> also be done using using MINVALUE/MAXVALUE, by artificially adding
> another partitioning column and making it unbounded above/below, but
> that would really just be a hack, and it (artificially adding an extra
> column) would be unnecessary if we added INCLUSIVE/EXCLUSIVE support
> in a later release. Thus, I think the 2 features would complement each
> other quite nicely.

OK, works for me.  I'm not really keen about the MINVALUE/MAXVALUE
syntax -- it's really +/- infinity, not a value at all -- but I
haven't got a better proposal and yours at least has the virtue of
perhaps being familiar to those who know about Oracle.

Do you want to own this open item, then?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Multi column range partition table

От

Dean Rasheed

Дата:

17 июля 2017 г., 22:37:13

On 17 July 2017 at 16:34, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Jul 16, 2017 at 6:40 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>> Technically, anything that can be done using INCLUSIVE/EXCLUSIVE can
>> also be done using using MINVALUE/MAXVALUE, by artificially adding
>> another partitioning column and making it unbounded above/below, but
>> that would really just be a hack, and it (artificially adding an extra
>> column) would be unnecessary if we added INCLUSIVE/EXCLUSIVE support
>> in a later release. Thus, I think the 2 features would complement each
>> other quite nicely.
>
> OK, works for me.  I'm not really keen about the MINVALUE/MAXVALUE
> syntax -- it's really +/- infinity, not a value at all -- but I
> haven't got a better proposal and yours at least has the virtue of
> perhaps being familiar to those who know about Oracle.
>

Cool. Sounds like we've reached a consensus, albeit with some
reservations around the fact that MINVALUE/MAXVALUE aren't actually
values, despite their names.

+/- infinity *are* values for some datatypes such as timestamps, so it
had to be something different from that, and MINVALUE/MAXVALUE are
quite short and simple, and match the syntax used by 3 other
databases.


> Do you want to own this open item, then?
>

OK.

I need to give the patch another read-through, and then I'll aim to
push it sometime in the next few days.

Regards,
Dean

Re: [HACKERS] Multi column range partition table

От

Dean Rasheed

Дата:

21 июля 2017 г., 23:34:57

On 17 July 2017 at 17:37, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> On 17 July 2017 at 16:34, Robert Haas <robertmhaas@gmail.com> wrote:
>> Do you want to own this open item, then?
>>
> OK.
>
> I need to give the patch another read-through, and then I'll aim to
> push it sometime in the next few days.
>

Committed.

Regards,
Dean

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: [HACKERS] Multi column range partition table

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения