Re: On Scalability
От | Greg Stark |
---|---|
Тема | Re: On Scalability |
Дата | |
Msg-id | AANLkTinYE9NXQE+1DVa6-DOAZbNXFCdvAZFjRXcP4xVn@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: On Scalability ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>) |
Список | pgsql-hackers |
Firstly I want to say I think this discussion is over-looking some benefits of the current system in other use cases. I don't think we should get rid of the current system even once we have "proper" partitioning. It solves use cases such as data warehouse queries that need to do a full table scan of some subset of the data which happens to be located in a single sub-table quite well. In that case being able to do a sequential scan instead of an index range scan is a big benefit and the overhead of the analysis is irrelevant for a data warehouse query. And the constraint may or may not have anything to do with the partitioning key. You cold have constraints like "customer_id in (...)" for last month's financial records so lookups for new customers don't need to check all the historical tables from before they became customers. In fact what I'm interested in doing is extending the support to use stats on children marked read-only. If we have a histogram for a table which has been marked read-only since the table was analyzed then we could trust the upper and lower bounds or the most-frequent-list to exclude partitions. That would really help for things like date-range lookups on tables where the partition key is "financial quarter" or "invoice_id" or some other nearly perfectly correlated column. None of this replaces having a good partitioning story for OLTP queries and management needs. But it extends the usefulness of that setup to data warehouse queries on other related columns that haven't been explicitly declared as the partitioning key. On Thu, Oct 7, 2010 at 8:35 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Vincenzo Romano <vincenzo.romano@notorand.it> wrote: >> 2010/10/7 Stephen Frost <sfrost@snowman.net>: > >>> Yes, that would be the problem. Proving something based on >>> expressions is alot more time consuming and complicated than >>> being explicitly told what goes where. >> >> Consuming computing resources at DDL-time should be OK if that >> will lead to big savings at DML-time (run-time), my opinion. It'd >> be just like compile time optimizations. > > I think something you haven't entirely grasped is how pluggable > PostgreSQL is -- you can not only define your own functions in a > wide variety of languages (including C), but your own data types, > operators, casts, index strategies, etc. I suspect it's likely that a partitioning system would only work with btree opclasses anyways. It might be interesting to think about what it would take to make the setups we've talked about in the past work with arbitrary operator classes as long as those operator classes support some concept of "mutually exclusive". But nothing we've talked about so far would be that flexible. Pre-analyzing the check constraints to construct a partitioning data structure might even be a plausible way to move forward -- I don't see any obvious show-stoppers. The system could look for a set of btree opclass based conditions that guarantee all the partitions are mutually exclusive ranges. My instincts tell me it would be less useful though because there's less the system would be able to do with that structure to help the user. That is, if it *can't* prove the constraints are mutually exclusive then the user is left with a bunch of check constraints and no useful feedback about what they've done wrong. And if it can prove it the user is happy but the next time he has to add a partition he has to look at the existing partitions and carefully construct his check constraint instead of having the system help him out by supplying one side of the bounds and providing a convenient syntax. It would also be hard to specify how to automatically add partitions which I expect is a feature people will want eventually. There are some plus sides as well -- allowing some optimizations for check constraints without requiring the user to promise to always use that as their partitioning key in the future. But I think on the whole it would be a disadvantage. -- greg
В списке pgsql-hackers по дате отправления: