Re: [HACKERS] Multi column range partition table

Поиск
Список
Период
Сортировка
От Amit Langote
Тема Re: [HACKERS] Multi column range partition table
Дата
Msg-id 132b3537-6cd2-2675-1ba3-234035079cd0@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: [HACKERS] Multi column range partition table  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Ответы Re: [HACKERS] Multi column range partition table  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Re: [HACKERS] Multi column range partition table  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Список pgsql-hackers
On 2017/06/23 17:00, Dean Rasheed wrote:
> On 23 June 2017 at 08:01, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>> The way we have designed our syntax, we don't have a way to tell that
>> p3 comes after p2 and they have no gap between those. But I don't
>> think that's your question. What you are struggling with is a way to
>> specify a lower bound (10, +infinity) so that anything with i1 > 10
>> would go to partition 3.
> 
> I think actually there is a fundamental problem here, which arises
> because UNBOUNDED has 2 different meanings depending on context, and
> thus it is not possible in general to specify the start of one range
> to be equal to the end of the previous range, as is necessary to get
> contiguous non-overlapping ranges.

Okay, I thought about this a bit more and I think I realize that this
arbitrary-sounding restriction of allowing only -infinity in FROM and
+infinity in TO limits the usefulness of the feature to specify infinite
bounds at all.

> Note that this isn't just a problem for floating point datatypes
> either, it also applies to other types such as strings. For example,
> given a partition over (text, int) types defined with the following
> values:
> 
>   FROM ('a', UNBOUNDED) TO ('b', UNBOUNDED)
> 
> which is equivalent to
> 
>   FROM ('a', -INFINITY) TO ('b', +INFINITY)
> 
> where should the next range start?
> 
> Even if you were to find a way to specify "the next string after 'b'",
> it wouldn't exactly be pretty. The problem is that the above partition
> corresponds to "all the strings starting with 'a', plus the string
> 'b', which is pretty ugly. A neater way to define the pair of ranges
> in this case would be:
> 
>   FROM ('a', -INFINITY) TO ('b', -INFINITY)
>   FROM ('b', -INFINITY) TO ('c', -INFINITY)
> 
> since then all strings starting with 'a' would fall into the first
> partition and all the strings starting with 'b' would fall into the
> second one.

I agree that a valid use case like the one above is awkward to express
currently.

> Currently, when there are 2 partition columns, the partition
> constraint is defined as
> 
>   (a is not null) and (b is not null)
>   and
>   (a > al or (a = al and b >= bl))
>   and
>   (a < au or (a = au and b < bu))
> 
> if the upper bound bu were allowed to be -INFINITY (something that
> should probably be forbidden unless the previous column's upper bound
> were finite), then this would simplify to
> 
>   (a is not null) and (b is not null)
>   and
>   (a > al or (a = al and b >= bl))
>   and
>   (a < au)
> 
> and in the example above, where al is -INFINITY, it would further simplify to
> 
>   (a is not null) and (b is not null)
>   and
>   (a >= al)
>   and
>   (a < au)
> 
> There would also be a similar simplification possible if the lower
> bound of a partition column were allowed to be +INFINITY.

Yep.

> So, I think that having UNBOUNDED represent both -INFINITY and
> +INFINITY depending on context is a design flaw, and that we need to
> allow both -INFINITY and +INFINITY as upper and lower bounds (provided
> they are preceded by a column with a finite bound). I think that, in
> general, that's the only way to allow contiguous non-overlapping
> partitions to be defined on multiple columns.

Alright, I spent some time implementing a patch to allow specifying
-infinity and +infinity in arbitrary ways.  Of course, it prevents
nonsensical inputs with appropriate error messages.

When implementing the same, I initially thought that the only grammar
modification required is to allow specifying a sign before the unbounded
keyword, but thought it sounded strange to call the actual bound values
-unbounded and +unbounded.  While the keyword "unbounded" describes the
property of being unbounded, actual values are really -infinity and
+infinity.  So, I decided to instead modify the grammar to accept
-infinity and +infinity in the FROM and TO lists.  The sign is optional
and in its absence, infinity in FROM means -infinity and vice versa.  This
decision may be seen as controversial, now that we are actually in beta,
if we decide to go with this patch at all.

Some adjustments were required in the logic in partition.c that depended
on the old assumption that all infinite values in the lower bound meant
-infinity and vice versa.  That includes get_qual_for_range() being able
to simplify the partition constraint as Dean mentioned in his email.

When testing the patch, I realized that the current code in
check_new_partition_bound() that checks for range partition overlap had a
latent bug that resulted in false positives for the new cases that the new
less restrictive syntax allowed.  I spent some time simplifying that code
while also fixing the aforementioned bug.  It's implemented in the
attached patch 0001.

0002 is the patch that implements the new syntax.

It's possible that this won't be considered a PG 10 open item but a new
feature and so PG 11 material, as Ashutosh also wondered.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: [HACKERS] Fix a typo in aclchk.c
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server