Re: partitioning performance tests after recent patches
От | David Rowley |
---|---|
Тема | Re: partitioning performance tests after recent patches |
Дата | |
Msg-id | CAKJS1f_bxoPtyvXiT2rP5HPb1MjSithoSQ4ph2BH-DO7NA+RsA@mail.gmail.com обсуждение исходный текст |
Ответ на | partitioning performance tests after recent patches (Floris Van Nee <florisvannee@Optiver.com>) |
Ответы |
Re: partitioning performance tests after recent patches
(Floris Van Nee <florisvannee@Optiver.com>)
|
Список | pgsql-hackers |
On Mon, 15 Apr 2019 at 07:19, Floris Van Nee <florisvannee@optiver.com> wrote: > 3) What could be causing the big performance difference between case 7 (simple SELECT) and 8 (simple SELECT with ORDERBY <index> LIMIT 1)? For 4096 partitions, TPS of 7) is around 5, while adding the ORDER BY <index> LIMIT 1 makes TPSdrop well below 1. In theory, run-time pruning of the right chunk should take exactly the same amount of time in bothcases, because both are pruning timestamp now() on the same number of partitions. The resulting plans are also identicalwith the exception of the top LIMIT-node (in PG11 they differ slightly as a MergeAppend is chosen for the ORDERBY instead of an Append, in HEAD with ordered append this is not necessary anymore). Am I missing something here? With the information provided, I don't really see any reason why the ORDER BY LIMIT would slow it down if the plan is the same apart from the LIMIT node. Please share the EXPLAIN ANALYZE output of each. > 4) A more general question about run-time pruning in nested loops, like the one for case 14. I believe I read in one ofthe previous threads that run-time pruning only reoccurs if it determines that the value that determines which partitionsmust be excluded has changed in between iterations. How is this defined? Eg. let's say partitions are 1-day wideand the first iteration of the loop filters on the partitioned table for timestamp between 14-04-2019 12:00 and 14-04-201920:00 (dynamically determined). Then the second iteration comes along and now filters on values between 14-04-201912:00 and 14-04-2019 19:00. The partition that should be scanned hasn't changed, because both timestamps fall intothe same partition. Is the full process of run-time pruning applied again, or is there some kind of shortcut that firstchecks if the previous pruning result is still valid even if the value has changed slightly? If not, would this be apossible optimization, as I think it's a case that occurs very often? I don't know the run-time pruning code very well though,so it may just be a crazy idea that can't be practically achieved. Currently, there's no shortcut. It knows which parameters partition pruning depends on and it reprunes whenever the value of ones of these changes. I'm not really sure how rechecking would work exactly. There are cases where it wouldn't be possible, say the condition was: partkey >= $1 and there was no partition for $1 since it was beyond the range of the defined range partitions. How could we tell if we can perform the shortcut if the next param value falls off the lower bound of the defined partitions? The first would include no partitions and the second includes all partitions, but the actual value of $1 belongs to no partition in either case so we can't check to see if it matches the same partition. Perhaps it could work for equality operators when just a single partition is matched in the first place, it might then be possible to do a shortcircuit recheck to see if the same partition matches the next set of values. The problem with that is that run-time pruning code in the executor does not care about which operators are used. It just passes those details off to the pruning code to deal with it. Perhaps something can be decided in the planner in analyze_partkey_exprs() to have it set a "do_recheck" flag to tell the executor to check before pruning again... Or maybe it's okay to just try a recheck when we match to just a single partition and just recheck the new values are allowed in that partition when re-pruning. However, that might be just too overly dumb since for inequality operators the original values may never even have falling inside the partition's bounds in the first place. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
В списке pgsql-hackers по дате отправления: