Обсуждение: CTE Changes in PostgreSQL 12, can we have a GUC to get old behavior
The CTE change in PostgreSQL 12 broke several of PostGIS regression tests because many of our tests are negative tests that test to confirm we get warnings in certain cases. In the past, these would output 1 notice because the CTE was materialized, now they output 1 for each column. An example is as follows: WITH data AS ( SELECT '#2911' l, ST_Metadata(ST_Rescale( ST_AddBand( ST_MakeEmptyRaster(10, 10, 0, 0, 1, -1, 0, 0, 0), 1, '8BUI', 0, 0 ), 2.0, -2.0 )) m ) SELECT l, (m).* FROM data; In prior versions this raster test would return one notice: NOTICE: Raster has default geotransform. Adjusting metadata for use of GDAL Warp API Now it returns 10 notices because the call is being done 10 times (1 for each column) NOTICE: Raster has default geotransform. Adjusting metadata for use of GDAL Warp API NOTICE: Raster has default geotransform. Adjusting metadata for use of GDAL Warp API NOTICE: Raster has default geotransform. Adjusting metadata for use of GDAL Warp API NOTICE: Raster has default geotransform. Adjusting metadata for use of GDAL Warp API NOTICE: Raster has default geotransform. Adjusting metadata for use of GDAL Warp API NOTICE: Raster has default geotransform. Adjusting metadata for use of GDAL Warp API NOTICE: Raster has default geotransform. Adjusting metadata for use of GDAL Warp API NOTICE: Raster has default geotransform. Adjusting metadata for use of GDAL Warp API NOTICE: Raster has default geotransform. Adjusting metadata for use of GDAL Warp API NOTICE: Raster has default geotransform. Adjusting metadata for use of GDAL Warp API The regression errors are easy enough to fix with OFFSET or subquery. What I'm more concerned about is that I expect we'll have performance degradation. Historically PostGIS functions haven't been costed right and can't be because they rely on INLINING of sql functions which gets broken when too high of cost is put on functions. We have a ton of functions like these that return composite objects and this above function is particularly expensive so to have it call that 10 times is almost guaranteed to be a performance killer. I know there is a new MATERIALIZED keyword to get the old behavior, but people are not going to be able to change their apps to introduce new keywords, especially ones meant to be deployed by many versions of PostgreSQL. That said IS THERE or can there be a GUC like set cte_materialized = on; to get the old behavior? Thanks, Regina PostGIS PSC member
Hi, On 2019-02-22 15:33:08 -0500, Regina Obe wrote: > The CTE change in PostgreSQL 12 broke several of PostGIS regression tests > because many of our tests are negative tests that test to confirm we get > warnings in certain cases. In the past, these would output 1 notice because > the CTE was materialized, now they output 1 for each column. > > An example is as follows: > > WITH data AS ( SELECT '#2911' l, ST_Metadata(ST_Rescale( ST_AddBand( > ST_MakeEmptyRaster(10, 10, 0, 0, 1, -1, 0, 0, 0), 1, '8BUI', 0, 0 ), > 2.0, -2.0 )) m ) SELECT l, (m).* FROM data; > The regression errors are easy enough to fix with OFFSET or subquery. What > I'm more concerned about is that I expect we'll have performance > degradation. > > Historically PostGIS functions haven't been costed right and can't be > because they rely on INLINING of sql functions which gets broken when too > high of cost is put on functions. We have a ton of functions like these > that return composite objects and this above function is particularly > expensive so to have it call that 10 times is almost guaranteed to be a > performance killer. I think there's a fair argument that we shouldn't inline in a way that increases the number of function calls due to (foo).*. In fact, I'm mildly surprised that we do that? > That said IS THERE or can there be a GUC like > > set cte_materialized = on; > > to get the old behavior? -incredibly many. That'll just make it harder to understand what SQL means. Greetings, Andres Freund
On Fri, Feb 22, 2019 at 3:33 PM Regina Obe <lr@pcorp.us> wrote: > Historically PostGIS functions haven't been costed right and can't be > because they rely on INLINING of sql functions which gets broken when too > high of cost is put on functions. We have a ton of functions like these > that return composite objects and this above function is particularly > expensive so to have it call that 10 times is almost guaranteed to be a > performance killer. This is good evidence that using the cost to decide on inlining is a terrible idea and should be changed. > I know there is a new MATERIALIZED keyword to get the old behavior, but > people are not going to be able to change their apps to introduce new > keywords, especially ones meant to be deployed by many versions of > PostgreSQL. > > That said IS THERE or can there be a GUC like > > set cte_materialized = on; > > to get the old behavior? Behavior changing GUCs *really* suck. If we add such a GUC, it will affect not only PostGIS but everything run on the server -- and we made this change because we believe it's going to improve things overall. I'm really reluctant to believe that it's right to encourage people to go back in the opposite direction, especially because it means there will be no consistency from one PostgreSQL system to the next. I think there are probably other ways of fixing this query that won't have such dramatic effects; it doesn't really seem to need to use WITH, and I bet you could also tweak the WITH query to prevent inlining. I also think Andres's question about why this gets inlined in the first place is a good one; the (m).* seems like it ought to be counted as a multiple reference. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Andres Freund <andres@anarazel.de> writes: > On 2019-02-22 15:33:08 -0500, Regina Obe wrote: >> The CTE change in PostgreSQL 12 broke several of PostGIS regression tests >> because many of our tests are negative tests that test to confirm we get >> warnings in certain cases. In the past, these would output 1 notice because >> the CTE was materialized, now they output 1 for each column. >> An example is as follows: >> WITH data AS ( SELECT '#2911' l, ST_Metadata(ST_Rescale( ST_AddBand( >> ST_MakeEmptyRaster(10, 10, 0, 0, 1, -1, 0, 0, 0), 1, '8BUI', 0, 0 ), >> 2.0, -2.0 )) m ) SELECT l, (m).* FROM data; > I think there's a fair argument that we shouldn't inline in a way that > increases the number of function calls due to (foo).*. In fact, I'm > mildly surprised that we do that? Well, is (foo).* all that much different from multiple written-out calls to the function? But yeah, I'd rather try to address this complaint by tweaking the inlining rules. It would certainly help if the function had a realistic (high) cost estimate, because if it pretends to be cheap, we really *should* be willing to inline it, one would think. >> That said IS THERE or can there be a GUC like >> set cte_materialized = on; >> to get the old behavior? > -incredibly many. That'll just make it harder to understand what SQL > means. Agreed. Now that we bit that bullet, I don't want to make things even messier with a GUC. The thing to do if you want to force backwards-compatible behavior without using new-in-v12 syntax is to insert good ol' OFFSET 0 in the CTE query. regards, tom lane
> I think there are probably other ways of fixing this query that won't have > such dramatic effects; it doesn't really seem to need to use WITH, and I bet > you could also tweak the WITH query to prevent inlining. Yes I know I can change THIS QUERY. I've changed other ones to work around this. Normally I just use a LATERAL for this. My point is lots of people use CTEs intentionally for this kind of thing because they know they are materialized. It's going to make a lot of people hesitant to upgrade if they think they need to revisit every CTE (that they intentionallywrote cause they thought it would be materialized) to throw in a MATERIALIZED keyword. > I also think > Andres's question about why this gets inlined in the first place is a good one; > the (m).* seems like it ought to be counted as a multiple reference. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL > Company Well if we can at least prevent the multiple reference thing from inlining that might be good enough to solve most performanceregression issues that arise. Thanks, Regina
On Fri, Feb 22, 2019 at 4:27 PM Regina Obe <lr@pcorp.us> wrote: > It's going to make a lot of people hesitant to upgrade if they think they need to revisit every CTE (that they intentionallywrote cause they thought it would be materialized) to throw in a MATERIALIZED keyword. You might be right, but I hope you're wrong, because if you're right, then it means that the recent change was ill-considered and ought to be reverted. And I think there are a lot of people who are REALLY looking forward to that behavior change. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2019-02-22 16:27:28 -0500, Regina Obe wrote: > > I think there are probably other ways of fixing this query that won't have > > such dramatic effects; it doesn't really seem to need to use WITH, and I bet > > you could also tweak the WITH query to prevent inlining. > > Yes I know I can change THIS QUERY. I've changed other ones to work around this. > Normally I just use a LATERAL for this. > > My point is lots of people use CTEs intentionally for this kind of thing because they know they are materialized. > > It's going to make a lot of people hesitant to upgrade if they think they need to revisit every CTE (that they intentionallywrote cause they thought it would be materialized) to throw in a MATERIALIZED keyword. This was extensively discussed, in several threads about inlining CTEs. But realistically, it doesn't actually solve the problem to offer a GUC, because we'd not be able to remove it anytime soon. I see benefit in discussing how we can address regressions like your example query here, but I don't feel there's any benefit in re-opening the whole discussion about GUCs, defaults, and whatnot. Greetings, Andres Freund