Обсуждение: Issue for partitioning with extra check constriants
Simon, Greg, etc., Just barked my nose against a major performance issue with CE & partitioning, and was wondering if anyone had poked at it. The issue is this: when a partitioned table is evaluated by the planner for constraint exclusion, it evaluates ALL check constraints on each partition, regardless of whether or not they include a referenced column in the query (and whether or not they relate to partitioning). If some of those check constraints are expensive (like GIS functions) then this can add considerably (on the order of 2ms per partition) to planning time. If this is news to anyone, I have a nice test case. So ... how plausible is it to fix the planner so that it only evaluates check constraints on a partition if there is a match of referenced columns? Are we talking "moderate", "hard" or "nearly impossible"? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > The issue is this: when a partitioned table is evaluated by the planner > for constraint exclusion, it evaluates ALL check constraints on each > partition, regardless of whether or not they include a referenced column > in the query (and whether or not they relate to partitioning). [ shrug ... ] We do not promise that the current partitioning scheme scales to the number of partitions where this is likely to be an interesting concern. *After* we have a real partitioning scheme, it might be worth worrying about this sort of problem, if it's still a problem then. > Are we talking "moderate", "hard" or "nearly impossible"? We're talking "wasted effort on a dead-end situation". The time that would go into this would be much better spent on real partitioning. regards, tom lane
> [ shrug ... ] We do not promise that the current partitioning scheme > scales to the number of partitions where this is likely to be an > interesting concern. Actually, you can demonstrate pretty significant response time delays on only 50 partitions. > We're talking "wasted effort on a dead-end situation". The time that > would go into this would be much better spent on real partitioning. That only applies if someone is working on "real partitioning". Is anyone? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: >> [ shrug ... ] We do not promise that the current partitioning scheme >> scales to the number of partitions where this is likely to be an >> interesting concern. > Actually, you can demonstrate pretty significant response time delays on > only 50 partitions. And your point is? The design center for the current setup is maybe 5 or 10 partitions. We didn't intend it to be used for more partitions than you might have spindles to spread the data across. >> We're talking "wasted effort on a dead-end situation". The time that >> would go into this would be much better spent on real partitioning. > That only applies if someone is working on "real partitioning". Is anyone? There is discussion going on, and even if there weren't, the argument still applies. Time spent on this band-aid would be time taken away from a real solution. In case you haven't noticed, we have very finite amounts of manpower that's competent to do planner surgery. regards, tom lane
> And your point is? The design center for the current setup is maybe 5 > or 10 partitions. We didn't intend it to be used for more partitions > than you might have spindles to spread the data across. Where did that come from? It certainly wasn't anywhere when the feature was introduced. Simon intended for this version of partitioning to scale to 100-200 partitions (and it does, provided that you dump all other table constraints), and partitioning has nothing to do with spindles. I think you're getting it mixed up with tablespaces. The main reason for partitioning is ease of maintenance (VACUUM, dropping partitions, etc.) not any kind of I/O optimization. I'd like to add the following statement to our docs on partitioning, in section 5.9.4: ===== Constraint exclusion is tested for every CHECK constraint on the partitions, even CHECK constraints which have nothing to do with the partitioning scheme. This can add siginficant extra planner time, especially if your partitions have CHECK constraints which are costly to evaluate. For performance, it can be a good idea to eliminate all extra CHECK constraints on partitions or to re-implement them as triggers. ===== >In case you haven't noticed, we have very finite > amounts of manpower that's competent to do planner surgery. Point. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Mon, 2010-10-04 at 11:34 -0700, Josh Berkus wrote: > > And your point is? The design center for the current setup is maybe 5 > > or 10 partitions. We didn't intend it to be used for more partitions > > than you might have spindles to spread the data across. > > Where did that come from? Yeah that is a bit odd. I don't recall any discussion in regards to such a weird limitation. > It certainly wasn't anywhere when the feature > was introduced. Simon intended for this version of partitioning to > scale to 100-200 partitions (and it does, provided that you dump all > other table constraints), and partitioning has nothing to do with > spindles. I think you're getting it mixed up with tablespaces. Great! that would be an excellent addition. > > The main reason for partitioning is ease of maintenance (VACUUM, > dropping partitions, etc.) not any kind of I/O optimization. Well that is certainly "a" main reason but it is not "the" main reason. We have lots of customers using it to manage very large amounts of data using the constraint exclusion features (and gaining from the smaller index sizes). Jd -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
Josh Berkus <josh@agliodbs.com> writes: >> And your point is? The design center for the current setup is maybe 5 >> or 10 partitions. We didn't intend it to be used for more partitions >> than you might have spindles to spread the data across. > Where did that come from? It certainly wasn't anywhere when the feature > was introduced. Simon intended for this version of partitioning to > scale to 100-200 partitions (and it does, provided that you dump all > other table constraints), and partitioning has nothing to do with > spindles. I think you're getting it mixed up with tablespaces. [ shrug... ] If Simon thought that, he obviously hadn't done any careful study of the planner's performance. You can maybe get that far as long as the partitions have just very simple constraints, but anything nontrivial won't scale. As you found out. regards, tom lane
On Mon, 2010-10-04 at 11:34 -0700, Josh Berkus wrote: > > And your point is? The design center for the current setup is maybe 5 > > or 10 partitions. We didn't intend it to be used for more partitions > > than you might have spindles to spread the data across. > > Where did that come from? Yeah that is a bit odd. I don't recall any discussion in regards to such a weird limitation. > It certainly wasn't anywhere when the feature > was introduced. Simon intended for this version of partitioning to > scale to 100-200 partitions (and it does, provided that you dump all > other table constraints), and partitioning has nothing to do with > spindles. I think you're getting it mixed up with tablespaces. Great! that would be an excellent addition. > > The main reason for partitioning is ease of maintenance (VACUUM, > dropping partitions, etc.) not any kind of I/O optimization. Well that is certainly "a" main reason but it is not "the" main reason. We have lots of customers using it to manage very large amounts of data using the constraint exclusion features (and gaining from the smaller index sizes). Jd -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt