Re: [HACKERS] [POC] hash partitioning
| От | amul sul | 
|---|---|
| Тема | Re: [HACKERS] [POC] hash partitioning | 
| Дата | |
| Msg-id | CAAJ_b97dGjD3ns2DhVMyLFN2EZaEmb22mq9LbEb_3RbeYvA7NQ@mail.gmail.com обсуждение исходный текст | 
| Ответ на | Re: [HACKERS] [POC] hash partitioning (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>) | 
| Ответы | Re: [HACKERS] [POC] hash partitioning | 
| Список | pgsql-hackers | 
On Thu, Sep 28, 2017 at 11:24 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > On 2017/09/27 22:41, Jesper Pedersen wrote: >> On 09/27/2017 03:05 AM, amul sul wrote: >>>>> Attached rebased patch, thanks. >>>>> >>>>> >>>> While reading through the patch I thought it would be better to keep >>>> MODULUS and REMAINDER in caps, if CREATE TABLE was in caps too in order to >>>> highlight that these are "keywords" for hash partition. >>>> >>>> Also updated some of the documentation. >>>> >>>> >>> Thanks a lot for the patch, included in the attached version. >>> >> >> Thank you. >> >> Based on [1] I have moved the patch to "Ready for Committer". > > Thanks a lot Amul for working on this. Like Jesper said, the patch looks > pretty good overall. I was looking at the latest version with intent to > study certain things about hash partitioning the way patch implements it, > during which I noticed some things. > Thanks Amit for looking at the patch. > + The modulus must be a positive integer, and the remainder must a > > must be a > Fixed in the attached version. > + suppose you have a hash-partitioned table with 8 children, each of > which > + has modulus 8, but find it necessary to increase the number of > partitions > + to 16. > Fixed in the attached version. > Might it be a good idea to say 8 "partitions" instead of "children" in the > first sentence? > > + each modulus-8 partition until none remain. While this may still > involve > + a large amount of data movement at each step, it is still better than > + having to create a whole new table and move all the data at once. > + </para> > + > Fixed in the attached version. > I read the paragraph that ends with the above text and started wondering > if the example to redistribute data in hash partitions by detaching and > attaching with new modulus/remainder could be illustrated with an example? > Maybe in the examples section of the ALTER TABLE page? > I think hint in the documentation is more than enough. There is N number of ways of data redistribution, the document is not meant to explain all of those. > + Since hash operator class provide only equality, not ordering, > collation > > Either "Since hash operator classes provide" or "Since hash operator class > provides" > Fixed in the attached version. > Other than the above points, patch looks good. > > > By the way, I noticed a couple of things about hash partition constraints: > > 1. In get_qual_for_hash(), using > get_fn_expr_rettype(&key->partsupfunc[i]), which returns InvalidOid for > the lack of fn_expr being set to non-NULL value, causes funcrettype of the > FuncExpr being generated for hashing partition key columns to be set to > InvalidOid, which I think is wrong. That is, the following if condition > in get_fn_expr_rettype() is always satisfied: > > if (!flinfo || !flinfo->fn_expr) > return InvalidOid; > > I think we could use get_func_rettype(&key->partsupfunc[i].fn_oid) > instead. Attached patch > hash-v21-set-funcexpr-funcrettype-correctly.patch, which applies on top > v21 of your patch. > Thanks for the patch, included in the attached version. > 2. It seems that the reason constraint exclusion doesn't work with hash > partitions as implemented by the patch is that predtest.c: > operator_predicate_proof() returns false even without looking into the > hash partition constraint, which is of the following form: > > satisfies_hash_partition(<mod>, <rem>, <key1-exthash>,..) > > beccause the above constraint expression doesn't translate into a a binary > opclause (an OpExpr), which operator_predicate_proof() knows how to work > with. So, false is returned at the beginning of that function by the > following code: > > if (!is_opclause(predicate)) > return false; > > For example, > > create table p (a int) partition by hash (a); > create table p0 partition of p for values with (modulus 4, remainder 0); > create table p1 partition of p for values with (modulus 4, remainder 1); > \d+ p0 > <...> > Partition constraint: satisfies_hash_partition(4, 0, hashint4extended(a, > '8816678312871386367'::bigint)) > > -- both p0 and p1 scanned > explain select * from p where satisfies_hash_partition(4, 0, > hashint4extended(a, '8816678312871386367'::bigint)); > QUERY PLAN > > ---------------------------------------------------------------------------------------------------- > Append (cost=0.00..96.50 rows=1700 width=4) > -> Seq Scan on p0 (cost=0.00..48.25 rows=850 width=4) > Filter: satisfies_hash_partition(4, 0, hashint4extended(a, > '8816678312871386367'::bigint)) > -> Seq Scan on p1 (cost=0.00..48.25 rows=850 width=4) > Filter: satisfies_hash_partition(4, 0, hashint4extended(a, > '8816678312871386367'::bigint)) > (5 rows) > > -- both p0 and p1 scanned > explain select * from p where satisfies_hash_partition(4, 1, > hashint4extended(a, '8816678312871386367'::bigint)); > QUERY PLAN > > ---------------------------------------------------------------------------------------------------- > Append (cost=0.00..96.50 rows=1700 width=4) > -> Seq Scan on p0 (cost=0.00..48.25 rows=850 width=4) > Filter: satisfies_hash_partition(4, 1, hashint4extended(a, > '8816678312871386367'::bigint)) > -> Seq Scan on p1 (cost=0.00..48.25 rows=850 width=4) > Filter: satisfies_hash_partition(4, 1, hashint4extended(a, > '8816678312871386367'::bigint)) > (5 rows) > > > I looked into how satisfies_hash_partition() works and came up with an > idea that I think will make constraint exclusion work. What if we emitted > the hash partition constraint in the following form instead: > > hash_partition_mod(hash_partition_hash(key1-exthash, key2-exthash), > <mod>) = <rem> > > With that form, constraint exclusion seems to work as illustrated below: > > \d+ p0 > <...> > Partition constraint: > (hash_partition_modulus(hash_partition_hash(hashint4extended(a, > '8816678312871386367'::bigint)), 4) = 0) > > -- note only p0 is scanned > explain select * from p where > hash_partition_modulus(hash_partition_hash(hashint4extended(a, > '8816678312871386367'::bigint)), 4) = 0; > QUERY PLAN > > -------------------------------------------------------------------------------------------------------------------------- > Append (cost=0.00..61.00 rows=13 width=4) > -> Seq Scan on p0 (cost=0.00..61.00 rows=13 width=4) > Filter: > (hash_partition_modulus(hash_partition_hash(hashint4extended(a, > '8816678312871386367'::bigint)), 4) = 0) > (3 rows) > > -- note only p1 is scanned > explain select * from p where > hash_partition_modulus(hash_partition_hash(hashint4extended(a, > '8816678312871386367'::bigint)), 4) = 1; > QUERY PLAN > > -------------------------------------------------------------------------------------------------------------------------- > Append (cost=0.00..61.00 rows=13 width=4) > -> Seq Scan on p1 (cost=0.00..61.00 rows=13 width=4) > Filter: > (hash_partition_modulus(hash_partition_hash(hashint4extended(a, > '8816678312871386367'::bigint)), 4) = 1) > (3 rows) > > I tried to implement that in the attached > hash-v21-hash-part-constraint.patch, which applies on top v21 of your > patch (actually on top of > hash-v21-set-funcexpr-funcrettype-correctly.patch, which I think should be > applied anyway as it fixes a bug of the original patch). > > What do you think? Eventually, the new partition-pruning method [1] will > make using constraint exclusion obsolete, but it might be a good idea to > have it working if we can. > It does not really do the partition pruning via constraint exclusion and I don't think anyone is going to use the remainder in the where condition to fetch data and hash partitioning is not meant for that. But I am sure that we could solve this problem using your and Beena's work toward faster partition pruning[1] and Runtime Partition Pruning[2]. Will think on this changes if it is required for the pruning feature. Regards, Amul 1] https://postgr.es/m/098b9c71-1915-1a2a-8d52-1a7a50ce79e8@lab.ntt.co.jp 2] https://postgr.es/m/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A@mail.gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
В списке pgsql-hackers по дате отправления: