Обсуждение: COLLATE: Hash partition vs UPDATE
Hi, The following case -- test.sql -- CREATE TABLE test (a text PRIMARY KEY, b text) PARTITION BY HASH (a); CREATE TABLE test_p0 PARTITION OF test FOR VALUES WITH (MODULUS 2, REMAINDER 0); CREATE TABLE test_p1 PARTITION OF test FOR VALUES WITH (MODULUS 2, REMAINDER 1); -- CREATE INDEX idx_test_b ON test USING HASH (b); INSERT INTO test VALUES ('aaaa', 'aaaa'); -- Regression UPDATE test SET b = 'bbbb' WHERE a = 'aaaa'; -- test.sql -- fails on master, which includes [1], with psql:test.sql:9: ERROR: could not determine which collation to use for string hashing HINT: Use the COLLATE clause to set the collation explicitly. It passes on 11.x. I'll add it to the open items list. [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=5e1963fb764e9cc092e0f7b58b28985c311431d9 Best regards, Jesper
Hi Jesper, On 2019/04/09 1:33, Jesper Pedersen wrote: > Hi, > > The following case > > -- test.sql -- > CREATE TABLE test (a text PRIMARY KEY, b text) PARTITION BY HASH (a); > CREATE TABLE test_p0 PARTITION OF test FOR VALUES WITH (MODULUS 2, > REMAINDER 0); > CREATE TABLE test_p1 PARTITION OF test FOR VALUES WITH (MODULUS 2, > REMAINDER 1); > -- CREATE INDEX idx_test_b ON test USING HASH (b); > > INSERT INTO test VALUES ('aaaa', 'aaaa'); > > -- Regression > UPDATE test SET b = 'bbbb' WHERE a = 'aaaa'; > -- test.sql -- > > fails on master, which includes [1], with > > > psql:test.sql:9: ERROR: could not determine which collation to use for > string hashing > HINT: Use the COLLATE clause to set the collation explicitly. > > > It passes on 11.x. Thanks for the report. This seems to broken since the following commit (I see you already cc'd Peter): commit 5e1963fb764e9cc092e0f7b58b28985c311431d9 Author: Peter Eisentraut <peter@eisentraut.org> Date: Fri Mar 22 12:09:32 2019 +0100 Collations with nondeterministic comparison As of this commit, hashing functions hashtext() and hashtextextended() require a valid collation to be passed in. ISTM, satisfies_hash_partition() that's called by hash partition constraint checking should have been changed to use FunctionCall2Coll() interface to account for the requirements of the above commit. I see that it did that for compute_partition_hash_value(), which is used by hash partition tuple routing. That also seems to be covered by regression tests, but there are no tests that cover satisfies_hash_partition(). Attached patch is an attempt to fix this. I've also added Amul Sul who can maybe comment on the satisfies_hash_partition() changes. BTW, it seems we don't need to back-patch this to PG 11 which introduced hash partitioning, because text hashing functions don't need collation there, right? Thanks, Amit
Вложения
Hi Amit, On 4/8/19 11:18 PM, Amit Langote wrote: > As of this commit, hashing functions hashtext() and hashtextextended() > require a valid collation to be passed in. ISTM, > satisfies_hash_partition() that's called by hash partition constraint > checking should have been changed to use FunctionCall2Coll() interface to > account for the requirements of the above commit. I see that it did that > for compute_partition_hash_value(), which is used by hash partition tuple > routing. That also seems to be covered by regression tests, but there are > no tests that cover satisfies_hash_partition(). > > Attached patch is an attempt to fix this. I've also added Amul Sul who > can maybe comment on the satisfies_hash_partition() changes. > Yeah, that works here - apart from an issue with the test case; fixed in the attached. Best regards, Jesper
Вложения
On Tue, Apr 9, 2019 at 9:44 PM Jesper Pedersen <jesper.pedersen@redhat.com> wrote: > > Hi Amit, > > On 4/8/19 11:18 PM, Amit Langote wrote: > > As of this commit, hashing functions hashtext() and hashtextextended() > > require a valid collation to be passed in. ISTM, > > satisfies_hash_partition() that's called by hash partition constraint > > checking should have been changed to use FunctionCall2Coll() interface to > > account for the requirements of the above commit. I see that it did that > > for compute_partition_hash_value(), which is used by hash partition tuple > > routing. That also seems to be covered by regression tests, but there are > > no tests that cover satisfies_hash_partition(). > > > > Attached patch is an attempt to fix this. I've also added Amul Sul who > > can maybe comment on the satisfies_hash_partition() changes. > > > > Yeah, that works here - apart from an issue with the test case; fixed in > the attached. Ah, crap. Last minute changes are bad. Thanks for fixing. Thanks, Amit
Jesper Pedersen <jesper.pedersen@redhat.com> writes: > Yeah, that works here - apart from an issue with the test case; fixed in > the attached. Couple issues spotted in an eyeball review of that: * There is code that supposes that partsupfunc[] is the last field of ColumnsHashData, eg fcinfo->flinfo->fn_extra = MemoryContextAllocZero(fcinfo->flinfo->fn_mcxt, offsetof(ColumnsHashData, partsupfunc) + sizeof(FmgrInfo) * nargs); I'm a bit surprised that this patch manages to run without crashing, because this would certainly not allocate space for partcollid[]. I think we would likely be well advised to do - FmgrInfo partsupfunc[PARTITION_MAX_KEYS]; + FmgrInfo partsupfunc[FLEXIBLE_ARRAY_MEMBER]; to make it more obvious that that has to be the last field. Or else drop the cuteness with variable-size allocations of ColumnsHashData. FmgrInfo is only 48 bytes, I'm not really sure that it's worth the risk of bugs to "optimize" this. * I see collation-less calls of the partsupfunc at both partbounds.c:2931 and partbounds.c:2970, but this patch touches only the first one. How can that be right? regards, tom lane
Thanks for the review. On 2019/04/15 5:50, Tom Lane wrote: > Jesper Pedersen <jesper.pedersen@redhat.com> writes: >> Yeah, that works here - apart from an issue with the test case; fixed in >> the attached. > > Couple issues spotted in an eyeball review of that: > > * There is code that supposes that partsupfunc[] is the last > field of ColumnsHashData, eg > > fcinfo->flinfo->fn_extra = > MemoryContextAllocZero(fcinfo->flinfo->fn_mcxt, > offsetof(ColumnsHashData, partsupfunc) + > sizeof(FmgrInfo) * nargs); > > I'm a bit surprised that this patch manages to run without crashing, > because this would certainly not allocate space for partcollid[]. > > I think we would likely be well advised to do > > - FmgrInfo partsupfunc[PARTITION_MAX_KEYS]; > + FmgrInfo partsupfunc[FLEXIBLE_ARRAY_MEMBER]; I went with this: - FmgrInfo partsupfunc[PARTITION_MAX_KEYS]; Oid partcollid[PARTITION_MAX_KEYS]; + FmgrInfo partsupfunc[FLEXIBLE_ARRAY_MEMBER]; > to make it more obvious that that has to be the last field. Or else > drop the cuteness with variable-size allocations of ColumnsHashData. > FmgrInfo is only 48 bytes, I'm not really sure that it's worth the > risk of bugs to "optimize" this. I wonder if workloads on hash partitioned tables that require calling satisfies_hash_partition repeatedly may not be as common as thought when writing this code? The only case I see where it's being repeatedly called is bulk inserts into a hash-partitioned table, that too, only if BR triggers on partitions necessitate rechecking the partition constraint. > * I see collation-less calls of the partsupfunc at both partbounds.c:2931 > and partbounds.c:2970, but this patch touches only the first one. How > can that be right? Oops, that's wrong. Attached updated patch. Thanks, Amit
Вложения
Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes: > Attached updated patch. LGTM, pushed. regards, tom lane
On 2019/04/16 5:47, Tom Lane wrote: > Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes: >> Attached updated patch. > > LGTM, pushed. Thank you. Regards, Amit