Обсуждение: COLLATE: Hash partition vs UPDATE

Поиск

Список

Период

Сортировка

COLLATE: Hash partition vs UPDATE

От

Jesper Pedersen

Дата:

08 апреля 2019 г., 16:33:55

Hi,

The following case

-- test.sql --
CREATE TABLE test (a text PRIMARY KEY, b text) PARTITION BY HASH (a);
CREATE TABLE test_p0 PARTITION OF test FOR VALUES WITH (MODULUS 2, 
REMAINDER 0);
CREATE TABLE test_p1 PARTITION OF test FOR VALUES WITH (MODULUS 2, 
REMAINDER 1);
-- CREATE INDEX idx_test_b ON test USING HASH (b);

INSERT INTO test VALUES ('aaaa', 'aaaa');

-- Regression
UPDATE test SET b = 'bbbb' WHERE a = 'aaaa';
-- test.sql --

fails on master, which includes [1], with


psql:test.sql:9: ERROR:  could not determine which collation to use for 
string hashing
HINT:  Use the COLLATE clause to set the collation explicitly.


It passes on 11.x.

I'll add it to the open items list.

[1] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=5e1963fb764e9cc092e0f7b58b28985c311431d9

Best regards,
  Jesper

Re: COLLATE: Hash partition vs UPDATE

От

Amit Langote

Дата:

09 апреля 2019 г., 03:18:55

Hi Jesper,

On 2019/04/09 1:33, Jesper Pedersen wrote:
> Hi,
> 
> The following case
> 
> -- test.sql --
> CREATE TABLE test (a text PRIMARY KEY, b text) PARTITION BY HASH (a);
> CREATE TABLE test_p0 PARTITION OF test FOR VALUES WITH (MODULUS 2,
> REMAINDER 0);
> CREATE TABLE test_p1 PARTITION OF test FOR VALUES WITH (MODULUS 2,
> REMAINDER 1);
> -- CREATE INDEX idx_test_b ON test USING HASH (b);
> 
> INSERT INTO test VALUES ('aaaa', 'aaaa');
> 
> -- Regression
> UPDATE test SET b = 'bbbb' WHERE a = 'aaaa';
> -- test.sql --
> 
> fails on master, which includes [1], with
> 
> 
> psql:test.sql:9: ERROR:  could not determine which collation to use for
> string hashing
> HINT:  Use the COLLATE clause to set the collation explicitly.
> 
> 
> It passes on 11.x.

Thanks for the report.

This seems to broken since the following commit (I see you already cc'd
Peter):

commit 5e1963fb764e9cc092e0f7b58b28985c311431d9
Author: Peter Eisentraut <peter@eisentraut.org>
Date:   Fri Mar 22 12:09:32 2019 +0100

    Collations with nondeterministic comparison

As of this commit, hashing functions hashtext() and hashtextextended()
require a valid collation to be passed in.  ISTM,
satisfies_hash_partition() that's called by hash partition constraint
checking should have been changed to use FunctionCall2Coll() interface to
account for the requirements of the above commit.  I see that it did that
for compute_partition_hash_value(), which is used by hash partition tuple
routing.  That also seems to be covered by regression tests, but there are
no tests that cover satisfies_hash_partition().

Attached patch is an attempt to fix this.  I've also added Amul Sul who
can maybe comment on the satisfies_hash_partition() changes.

BTW, it seems we don't need to back-patch this to PG 11 which introduced
hash partitioning, because text hashing functions don't need collation
there, right?

Thanks,
Amit

Вложения

satisfies_hash_partition-collate-1.patch

Re: COLLATE: Hash partition vs UPDATE

От

Jesper Pedersen

Дата:

09 апреля 2019 г., 12:43:57

Hi Amit,

On 4/8/19 11:18 PM, Amit Langote wrote:
> As of this commit, hashing functions hashtext() and hashtextextended()
> require a valid collation to be passed in.  ISTM,
> satisfies_hash_partition() that's called by hash partition constraint
> checking should have been changed to use FunctionCall2Coll() interface to
> account for the requirements of the above commit.  I see that it did that
> for compute_partition_hash_value(), which is used by hash partition tuple
> routing.  That also seems to be covered by regression tests, but there are
> no tests that cover satisfies_hash_partition().
> 
> Attached patch is an attempt to fix this.  I've also added Amul Sul who
> can maybe comment on the satisfies_hash_partition() changes.
> 

Yeah, that works here - apart from an issue with the test case; fixed in 
the attached.

Best regards,
  Jesper

Вложения

satisfies_hash_partition-collate-2.patch

Re: COLLATE: Hash partition vs UPDATE

От

Amit Langote

Дата:

09 апреля 2019 г., 12:58:20

On Tue, Apr 9, 2019 at 9:44 PM Jesper Pedersen
<jesper.pedersen@redhat.com> wrote:
>
> Hi Amit,
>
> On 4/8/19 11:18 PM, Amit Langote wrote:
> > As of this commit, hashing functions hashtext() and hashtextextended()
> > require a valid collation to be passed in.  ISTM,
> > satisfies_hash_partition() that's called by hash partition constraint
> > checking should have been changed to use FunctionCall2Coll() interface to
> > account for the requirements of the above commit.  I see that it did that
> > for compute_partition_hash_value(), which is used by hash partition tuple
> > routing.  That also seems to be covered by regression tests, but there are
> > no tests that cover satisfies_hash_partition().
> >
> > Attached patch is an attempt to fix this.  I've also added Amul Sul who
> > can maybe comment on the satisfies_hash_partition() changes.
> >
>
> Yeah, that works here - apart from an issue with the test case; fixed in
> the attached.

Ah, crap.  Last minute changes are bad.

Thanks for fixing.

Thanks,
Amit

Re: COLLATE: Hash partition vs UPDATE

От

Tom Lane

Дата:

14 апреля 2019 г., 20:50:07

Jesper Pedersen <jesper.pedersen@redhat.com> writes:
> Yeah, that works here - apart from an issue with the test case; fixed in
> the attached.

Couple issues spotted in an eyeball review of that:

* There is code that supposes that partsupfunc[] is the last
field of ColumnsHashData, eg

            fcinfo->flinfo->fn_extra =
                MemoryContextAllocZero(fcinfo->flinfo->fn_mcxt,
                                       offsetof(ColumnsHashData, partsupfunc) +
                                       sizeof(FmgrInfo) * nargs);

I'm a bit surprised that this patch manages to run without crashing,
because this would certainly not allocate space for partcollid[].

I think we would likely be well advised to do

-        FmgrInfo    partsupfunc[PARTITION_MAX_KEYS];
+        FmgrInfo    partsupfunc[FLEXIBLE_ARRAY_MEMBER];

to make it more obvious that that has to be the last field.  Or else
drop the cuteness with variable-size allocations of ColumnsHashData.
FmgrInfo is only 48 bytes, I'm not really sure that it's worth the
risk of bugs to "optimize" this.

* I see collation-less calls of the partsupfunc at both partbounds.c:2931
and partbounds.c:2970, but this patch touches only the first one.  How
can that be right?

            regards, tom lane

Re: COLLATE: Hash partition vs UPDATE

От

Amit Langote

Дата:

15 апреля 2019 г., 06:22:08

Thanks for the review.

On 2019/04/15 5:50, Tom Lane wrote:
> Jesper Pedersen <jesper.pedersen@redhat.com> writes:
>> Yeah, that works here - apart from an issue with the test case; fixed in 
>> the attached.
> 
> Couple issues spotted in an eyeball review of that:
> 
> * There is code that supposes that partsupfunc[] is the last
> field of ColumnsHashData, eg
> 
>             fcinfo->flinfo->fn_extra =
>                 MemoryContextAllocZero(fcinfo->flinfo->fn_mcxt,
>                                        offsetof(ColumnsHashData, partsupfunc) +
>                                        sizeof(FmgrInfo) * nargs);
> 
> I'm a bit surprised that this patch manages to run without crashing,
> because this would certainly not allocate space for partcollid[].
> 
> I think we would likely be well advised to do
> 
> -        FmgrInfo    partsupfunc[PARTITION_MAX_KEYS];
> +        FmgrInfo    partsupfunc[FLEXIBLE_ARRAY_MEMBER];

I went with this:

-        FmgrInfo    partsupfunc[PARTITION_MAX_KEYS];
         Oid         partcollid[PARTITION_MAX_KEYS];
+        FmgrInfo    partsupfunc[FLEXIBLE_ARRAY_MEMBER];

> to make it more obvious that that has to be the last field.  Or else
> drop the cuteness with variable-size allocations of ColumnsHashData.
> FmgrInfo is only 48 bytes, I'm not really sure that it's worth the
> risk of bugs to "optimize" this.

I wonder if workloads on hash partitioned tables that require calling
satisfies_hash_partition repeatedly may not be as common as thought when
writing this code?  The only case I see where it's being repeatedly called
is bulk inserts into a hash-partitioned table, that too, only if BR
triggers on partitions necessitate rechecking the partition constraint.

> * I see collation-less calls of the partsupfunc at both partbounds.c:2931
> and partbounds.c:2970, but this patch touches only the first one.  How
> can that be right?

Oops, that's wrong.

Attached updated patch.

Thanks,
Amit

Вложения

satisfies_hash_partition-collate-3.patch

Re: COLLATE: Hash partition vs UPDATE

От

Tom Lane

Дата:

15 апреля 2019 г., 20:47:46

Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
> Attached updated patch.

LGTM, pushed.

            regards, tom lane

Re: COLLATE: Hash partition vs UPDATE

От

Amit Langote

Дата:

16 апреля 2019 г., 00:17:05

On 2019/04/16 5:47, Tom Lane wrote:
> Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
>> Attached updated patch.
> 
> LGTM, pushed.

Thank you.

Regards,
Amit

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: COLLATE: Hash partition vs UPDATE

Вложения

Вложения

Вложения