Re: Speeding up INSERTs and UPDATEs to partitioned tables
От | Amit Langote |
---|---|
Тема | Re: Speeding up INSERTs and UPDATEs to partitioned tables |
Дата | |
Msg-id | 64b668ca-8b35-2355-f937-ac058ef85690@lab.ntt.co.jp обсуждение исходный текст |
Ответ на | Re: Speeding up INSERTs and UPDATEs to partitioned tables (David Rowley <david.rowley@2ndquadrant.com>) |
Ответы |
Re: Speeding up INSERTs and UPDATEs to partitioned tables
(David Rowley <david.rowley@2ndquadrant.com>)
|
Список | pgsql-hackers |
Hi David, Thanks for taking a look. On 2018/07/15 17:34, David Rowley wrote: > I've looked over the code and the ExecUseUpdateResultRelForRouting() > function is broken. Your while loop only skips partitions for the > current partitioned table, it does not skip ModifyTable subnodes that > belong to other partitioned tables. > > You can use the following. The code does not find the t1_a2 subnode. > > create table t1 (a int, b int) partition by list(a); > create table t1_a1 partition of t1 for values in(1) partition by list(b); > create table t1_a2 partition of t1 for values in(2); > create table t1_a1_b1 partition of t1_a1 for values in(1); > create table t1_a1_b2 partition of t1_a1 for values in(2); > insert into t1 values(2,2); > > update t1 set a = a; Hmm, it indeed is broken. > I think there might not be enough information to make this work > correctly, as if you change the loop to skip subnodes, then it won't > work in cases where the partition[0] was pruned. > > I've another patch sitting here, partly done, that changes > pg_class.relispartition into pg_class.relpartitionparent. If we had > that then we could code your loop to work correctly.> Alternatively, > I guess we could just ignore the UPDATE's ResultRelInfos and just > build new ones. Unsure if there's actually a reason we need to reuse > the existing ones, is there? We try to use the existing ones because we thought back when the patch was written (not by me though) that redoing all the work that InitResultRelInfo does for each partition, for which we could have instead used an existing one, would cumulatively end up being more expensive than figuring out which ones we could reuse by a linear scan of partition and result rels arrays in parallel. I don't remember seeing a benchmark to demonstrate the benefit of doing this though. Maybe it was posted, but I don't remember having looked at it closely. > I think you'd need to know the owning partition and skip subnodes that > don't belong to pd->reldesc. Alternatively, a hashtable could be built > with all the oids belonging to pd->reldesc, then we could loop over > the update_rris finding subnodes that can be found in the hashtable. > Likely this will be much slower than the sort of merge lookup that the > previous code did. I think one option is to simply give up on the idea of matching *all* UPDATE result rels that belong to a given partitioned table (pd->reldesc) in one call of ExecUseUpdateResultRelForRouting. Instead, pass the index of the partition (in pd->partdesc->oids) to find the ResultRelInfo for, loop over all UPDATE result rels looking for one, and return immediately on finding one after having stored its pointer in proute->partitions. In the worst case, we'll end up scanning UPDATE result rels array for every partition that gets touched, but maybe such an UPDATE query is less common or even if such a query occurs, tuple routing might be the last of its bottlenecks. I have implemented that approach in the updated patch. That means I also needed to change things so that ExecUseUpdateResultRelsForRouting is now only called by ExecFindPartition, because with the new arrangement, it's useless to call it from ExecSetupPartitionTupleRouting. Moreover, an UPDATE may not use tuple routing at all, even if the fact that partition key is being updated results in calling ExecSetupPartitionTupleRouting. > Another thing that I don't like is the PARTITION_ROUTING_MAXSIZE code. > The code seems to assume that there can only be at the most 65536 > partitions, but I don't think there's any code which restricts us to > that. There is code in the planner that will bork when trying to > create a RangeTblEntry up that high, but as far as I know that won't > be noticed on the INSERT path. I don't think this code has any > business knowing what the special varnos are set to either. It would > be better to just remove the limit and suffer the small wasted array > space. I understand you've probably coded it like this due to the > similar code that was in my patch, but with mine I knew the total > number of partitions. Your patch does not. OK, I changed it to UINT_MAX. > Other thoughts on the patch: > > I wonder if it's worth having syscache keep a count on the number of > sub-partitioned tables a partition has. If there are none in the root > partition then the partition_dispatch_info can be initialized with > just 1 element to store the root details. Although, maybe it's not > worth it to reduce the array size by 7 elements. Hmm yes. Allocating space for 8 pointers when we really need 1 is not too bad, if the alternative is to modify partcache.c. > Also, I'm a bit confused why you change the comments in > execPartition.h for PartitionTupleRouting to be inline again. I > brought those out of line as I thought the complexity of the code > warranted that. You're inlining them again goes against what all the > other structs do in that file. It was out-of-line to begin with but it started to become distracting when updating the comments. But I agree about being consistent and hence I have moved them back to where they were. I have significantly rewritten those comments though to be clearer. > Apart from that, I think the idea is promising. We'll just need to > find a way to make ExecUseUpdateResultRelForRouting work correctly. Let me know what you think of the code in the updated patch. Thanks, Amit
Вложения
В списке pgsql-hackers по дате отправления:
Следующее
От: Kyotaro HORIGUCHIДата:
Сообщение: Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe perprocess