Обсуждение: Partitioning and constraint exclusion
Howdy, I spent a majority of today playing around with pg_partman (awesome tool btw!). I am mainly using the time-static method with an interval of one month. I wanted to see what performance improvements I could get with some common queries that are used by our analytics team. A lot of these queries summarize data by day or by month. Our largest database gets 10+ million rows a day to several different tables (each). I played around with a subset of the data spread across about 6 months (about 10 million rows total). I noticed that I could get very nice partition elimination using constant values in the where clause. Ex: select * from <table> where <constraint_col> between '2015-01-01' and '2015-02-15' However, I could not get any partition elimination for queries that did not have constant values in the where clause. Ex: select * from <table> where <constraint_col> >= (select max(date) from <other_table>) Unfortunately all of our queries on the analytics team need to be dynamic like this and summarize data based around certain recorded events and dates from other tables. I saw the note in the docs about not being able to use current_timestamp in the where clause but I really need to be able to use a sub select or CTE in the where clause for the needed dates. I tried about 10 different ways (on both 9.1 and 9.4) to dynamically get the data (sub selects, cte, joins) for my constraint column but all of them resulted in a full scan of all partitions. I am kind of bummed out by this as dropping in partitioning in this method will just hurt performance and not improve it. The only good thing I see is the ability to delete (drop) older data from the table, but this is not a functionality we need right now. I am going to try a few other ways tomorrow, I am hoping I am doing something wrong, or is this just typical? Thanks, Sam
Samuel Smith wrote > I noticed that I could get very nice partition elimination using > constant values in the where clause. > > Ex: > select * from > where > <constraint_col> > between '2015-01-01' and > '2015-02-15' > > However, I could not get any partition elimination for queries that did > not have constant values in the where clause. > > Ex: > select * from > where > <constraint_col> > >= (select max(date) from > <other_table> > ) > > Unfortunately all of our queries on the analytics team need to be > dynamic like this and summarize data based around certain recorded > events and dates from other tables. I saw the note in the docs about not > being able to use current_timestamp in the where clause but I really > need to be able to use a sub select or CTE in the where clause for the > needed dates. > > I tried about 10 different ways (on both 9.1 and 9.4) to dynamically get > the data (sub selects, cte, joins) for my constraint column but all of > them resulted in a full scan of all partitions. > > I am going to try a few other ways tomorrow, I am hoping I am doing > something wrong, or is this just typical? In short - since the planner determines exclusion constraints and the executor, which strictly follows the planner in the query execution process, would be the one to determine what the value of your date is - there is no way for a single query to provide data that would then be used to determine exclusion constraints. Now, that said, I don't believe you should be actual full table scans during processing if you have proper indexes setup. An index scan should be usable and quickly determine which tables lack data to contribute to the query results. In terms of separating out the date query and partition query: PREPARE/EXECUTE in pure SQL (can, must?, be direct) EXECUTE/USING in pl/pgsql (via a function) You may have other reasonable options on the client side... You should consider providing EXPLAIN ANALYZE results and maybe a test case so others can give pointers. David J. -- View this message in context: http://postgresql.nabble.com/Partitioning-and-constraint-exclusion-tp5840353p5840356.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Samuel, * Samuel Smith (pgsql@net153.net) wrote: > I noticed that I could get very nice partition elimination using > constant values in the where clause. > > Ex: > select * from <table> where <constraint_col> between '2015-01-01' > and '2015-02-15' > > However, I could not get any partition elimination for queries that > did not have constant values in the where clause. > > Ex: > select * from <table> where <constraint_col> >= (select max(date) > from <other_table>) That's correct. > Unfortunately all of our queries on the analytics team need to be > dynamic like this and summarize data based around certain recorded > events and dates from other tables. I saw the note in the docs about > not being able to use current_timestamp in the where clause but I > really need to be able to use a sub select or CTE in the where > clause for the needed dates. Not sure if this will help, but the planner is smart enough to implement one-time filters for certain cases. Instead of using inheiritance-based partitioning, you can use a view like so: CREATE VIEW v AS SELECT * FROM table1 WHERE column1 = 5 UNION ALL SELECT * FROM table2 WHERE column1 = 6 ; Then for cases where we can prove that no results will be returned from the individual union-all branch, we'll skip it: SELECT * FROM v WHERE column1 = (select max(column1) from table3); Unfortunately, we don't appear to support that for an inequality as you show above. I'm not sure offhand why not but it didn't work in my testing. Another approach to dealing with this is to use plpgsql functions and 'return execute' which essentially compute the constant and then build a dyanmic SQL query using the constant and return the results. It's a bit awkward compared to just writing the query, but it does work. Thanks! Stephen
Вложения
On 03/04/2015 12:11 AM, David G Johnston wrote: > > In short - since the planner determines exclusion constraints and the > executor, which strictly follows the planner in the query execution process, > would be the one to determine what the value of your date is - there is no > way for a single query to provide data that would then be used to determine > exclusion constraints. > > Now, that said, I don't believe you should be actual full table scans during > processing if you have proper indexes setup. An index scan should be usable > and quickly determine which tables lack data to contribute to the query > results. > > In terms of separating out the date query and partition query: > > PREPARE/EXECUTE in pure SQL (can, must?, be direct) > EXECUTE/USING in pl/pgsql (via a function) > > You may have other reasonable options on the client side... > > You should consider providing EXPLAIN ANALYZE results and maybe a test case > so others can give pointers. > > David J. > Just to correct, the constraint column is also an indexed column. What I meant by "scanning all partitions" was that the index of all the partitions was being checked. Not the biggest issue, but still unwanted. I spent most of yesterday comparing how DB2 handles table partitions and running similar explain queries. It indeed can do partition elimination when the values of the constraint column are coming from a sub query or other none constant value. Does anyone know if there is a wishlist item for improving this in postgresql or is this as good as it gets for now? I can work around the issues, but it just kind of caught me off guard since there are so many nice benchmarks on the web with partitioning in postgresql (and now that I go back an look at them, they all have constants in the where clause). --Sam
Sam, * Samuel Smith (pgsql@net153.net) wrote: > Does anyone know if there is a wishlist item for improving this in > postgresql or is this as good as it gets for now? It's absolutely on the todo list for PG to support declarative partitioning and handle these cases better. There has been a good sized thread very recently discussing how to make that happen over on the hackers mailing list recently and while it almost certainly won't be in the next release (expected to be 9.5, released around September or October), it may very well be in the release the following year. Thanks! Stephen