Обсуждение: SOLVED - RE: Poor performance using CTE
-----Original Message----- From: Andrew Dunstan [mailto:andrew@dunslane.net] Sent: Wednesday, November 14, 2012 11:08 AM To: David Greco Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Poor performance using CTE On 11/14/2012 10:56 AM, David Greco wrote: > You're right. I was translating an oracle query , but looks like PG will allow some syntax that is different. Trying tofind entries in fedexinvoices where smp_pkg.get_invoice_charges(id) returns a record containing charge_name in ('ADDRESSCORRECTION CHARGE','ADDRESS CORRECTION'). Should return the fedexinvoices row and the row from smp_pkg.get_invoice_chargesthat contains the address correction. > > > Something like this, though this is syntactically incorrect as smp_pkg.get_invoice_charges returns a set: > > > select fedexinvoices.*, (smp_pkg.get_invoice_charges(id)).* > from fedexinvoices > WHERE > trim(fedexinvoices.trackno)='799159791643' > and > (smp_pkg.get_invoice_charges(id)).charge_name IN ('ADDRESS CORRECTION > CHARGE','ADDRESS CORRECTION') First, please don't top-post when someone has replied underneath your post. It makes the thread totally unreadable. See <http://idallen.com/topposting.html> You could do something like this: WITH invoices as ( select * from fedexinvoices where trim(fedexinvoices.trackno)='799159791643' ), charges as ( SELECT fi2.id, smp_pkg.get_invoice_charges(fi2.id) charge_info from fedexinvoices fi2 join invoices i on i.id = f12.id ) select invoices.* from invoices inner join charges on charges.id = invoices.id AND (charges.charge_info).charge_name IN ('ADDRESS CORRECTION CHARGE','ADDRESS CORRECTION') ; Or probably something way simpler but I just did this fairly quickly and mechanically cheers andrew Thanks, that did the trick. Though I'm still not clear as to why.
David Greco <David_Greco@harte-hanks.com> writes: > Thanks, that did the trick. Though I'm still not clear as to why. PG treats WITH as an optimization fence --- the WITH query will be executed pretty much as-is. It may be that Oracle flattens the query somehow; though if you're using black-box functions in both cases, it's not obvious where the optimizer could get any purchase that way. regards, tom lane
On 11/15/2012 12:29 AM, Tom Lane wrote: > David Greco <David_Greco@harte-hanks.com> writes: >> Thanks, that did the trick. Though I'm still not clear as to why. > PG treats WITH as an optimization fence --- the WITH query will be > executed pretty much as-is. It may be that Oracle flattens the query > somehow; though if you're using black-box functions in both cases, > it's not obvious where the optimizer could get any purchase that way. > I was looking through the latest spec drafts I have access to and couldn't find any reference to Pg's optimisation-fence-for-CTEs behaviour being required by the standard, though I've repeatedly seen it said that there is such a requirement. Do you know where it's specified? All I can see is that the optimised result must have the same effect as the original. That'd mean that wCTEs and CTE terms that use VOLATILE functions or functions with side-effects couldn't be optimised into other queries. Simple CTEs could be, though, and there are times I've really wished I could use a CTE but I've had to use a set-returning subquery to get reasonable plans. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Craig Ringer <craig@2ndQuadrant.com> writes: > I was looking through the latest spec drafts I have access to and > couldn't find any reference to Pg's optimisation-fence-for-CTEs > behaviour being required by the standard, though I've repeatedly seen it > said that there is such a requirement. I don't believe it's required by the standard (it's hard to see how it could be, when query optimization is a topic outside the spec to start with). However, we allow INSERT/UPDATE/DELETE RETURNING inside WITH, and for those I think you really need to treat WITH as an optimization fence. It's a lot more debatable for SELECT; there are some advantages to providing a fence this way but there are definitely downsides too. I could see adjusting that definition in the future, as we get more experience with use of CTEs. regards, tom lane
On 11/14/2012 08:17 PM, Craig Ringer wrote: > On 11/15/2012 12:29 AM, Tom Lane wrote: >> David Greco <David_Greco@harte-hanks.com> writes: >>> Thanks, that did the trick. Though I'm still not clear as to why. >> PG treats WITH as an optimization fence --- the WITH query will be >> executed pretty much as-is. It may be that Oracle flattens the query >> somehow; though if you're using black-box functions in both cases, >> it's not obvious where the optimizer could get any purchase that way. >> > I was looking through the latest spec drafts I have access to and > couldn't find any reference to Pg's optimisation-fence-for-CTEs > behaviour being required by the standard, though I've repeatedly seen it > said that there is such a requirement. > > Do you know where it's specified? > > All I can see is that the optimised result must have the same effect as > the original. That'd mean that wCTEs and CTE terms that use VOLATILE > functions or functions with side-effects couldn't be optimised into > other queries. Simple CTEs could be, though, and there are times I've > really wished I could use a CTE but I've had to use a set-returning > subquery to get reasonable plans. It cuts both ways. I have used CTEs a LOT precisely because this behaviour lets me get better plans. Without that I'll be back to using the "offset 0" hack. cheers andrew
On 15 November 2012 01:46, Andrew Dunstan <andrew@dunslane.net> wrote: > It cuts both ways. I have used CTEs a LOT precisely because this behaviour > lets me get better plans. Without that I'll be back to using the "offset 0" > hack. Is the "OFFSET 0" hack really so bad? We've been telling people to do that for years, so it's already something that we've effectively committed to. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On 15/11/12 15:03, Peter Geoghegan wrote: > On 15 November 2012 01:46, Andrew Dunstan <andrew@dunslane.net> wrote: >> It cuts both ways. I have used CTEs a LOT precisely because this behaviour >> lets me get better plans. Without that I'll be back to using the "offset 0" >> hack. > Is the "OFFSET 0" hack really so bad? We've been telling people to do > that for years, so it's already something that we've effectively > committed to. > How about adding the keywords FENCED and NOT FENCED to the SQL definition of CTE's - with FENCED being the default? Cheers, Gavin
On Wed, Nov 14, 2012 at 8:03 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote: > On 15 November 2012 01:46, Andrew Dunstan <andrew@dunslane.net> wrote: >> It cuts both ways. I have used CTEs a LOT precisely because this behaviour >> lets me get better plans. Without that I'll be back to using the "offset 0" >> hack. > > Is the "OFFSET 0" hack really so bad? We've been telling people to do > that for years, so it's already something that we've effectively > committed to. IMSNHO, 'OFFSET 0' is completely unreadable black magic. I agree with Andrew: CTEs allow for manual composition of queries and can be the best tool when the planner is outsmarting itself. In the old days, we'd extract data to a temp table and join against that: CTE are essentially a formalization of that technique. I like things the way they are; if CTE are hurting your plan, that's an indication you're using them inappropriately. merlin
On Tue, Nov 20, 2012 at 4:22 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Wed, Nov 14, 2012 at 8:03 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote: >> On 15 November 2012 01:46, Andrew Dunstan <andrew@dunslane.net> wrote: >>> It cuts both ways. I have used CTEs a LOT precisely because this behaviour >>> lets me get better plans. Without that I'll be back to using the "offset 0" >>> hack. >> >> Is the "OFFSET 0" hack really so bad? We've been telling people to do >> that for years, so it's already something that we've effectively >> committed to. > > IMSNHO, 'OFFSET 0' is completely unreadable black magic. I agree with > Andrew: CTEs allow for manual composition of queries and can be the > best tool when the planner is outsmarting itself. In the old days, > we'd extract data to a temp table and join against that: CTE are > essentially a formalization of that technique. I like things the way > they are; if CTE are hurting your plan, that's an indication you're > using them inappropriately. I agree, **BUT**, I cannot imagine how pushing constraints to the CTE (under adequate conditions) could be anything but beneficial. It *could* just be a lack of imagination on my part. But if it were not, then it'd be nice for it to be done automatically (since this particular CTE behavior bites enough people already).
My perspective on this is that CTEs *should* be just like creating a temporary table and then joining to it, but without the materialization costs. In that respect, they seem like they should be like nifty VIEWs. If I wanted the behavior of materialization and then join, I'd do that explicitly with temporary tables, but using CTEs as an explicit optimization barrier feels like the explaining away surprising behavior. As can be seen by the current conversation, not everyone is convinced that CTEs ought to be an explicit optimization barrier, and setting that behavior as somehow desirable or explicit (rather than merely an implementation detail) feels shortsighted to me. I would be delighted to find that in some future version of PostgreSQL, but if that is not to be, at the very least, the verbiage surrounding CTEs might want to include (perhaps prominently) something along the lines of "CTEs are currently an optimization barrier, but this is an implementation detail and may change in future versions". Perhaps even including a small blurb about what an optimization barrier even means (my understanding is that it merely forces materialization of that part of the query). That's just my perspective, coming at the use of CTEs not as a PostgreSQL developer, but as somebody who learned about CTEs and started using them - only to discover surprising behavior. On Tue, Nov 20, 2012 at 1:22 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Wed, Nov 14, 2012 at 8:03 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote: >> On 15 November 2012 01:46, Andrew Dunstan <andrew@dunslane.net> wrote: >>> It cuts both ways. I have used CTEs a LOT precisely because this behaviour >>> lets me get better plans. Without that I'll be back to using the "offset 0" >>> hack. >> >> Is the "OFFSET 0" hack really so bad? We've been telling people to do >> that for years, so it's already something that we've effectively >> committed to. > > IMSNHO, 'OFFSET 0' is completely unreadable black magic. I agree with > Andrew: CTEs allow for manual composition of queries and can be the > best tool when the planner is outsmarting itself. In the old days, > we'd extract data to a temp table and join against that: CTE are > essentially a formalization of that technique. I like things the way > they are; if CTE are hurting your plan, that's an indication you're > using them inappropriately. > > merlin > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance -- Jon
On Tue, Nov 20, 2012 at 1:53 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote: > As can be seen by the current conversation, not everyone is convinced that CTEs ought to be an explicit optimization barrier On Tue, Nov 20, 2012 at 1:26 PM, Claudio Freire <klaussfreire@gmail.com> wrote: > It *could* just be a lack of imagination on my part. But if it were > not, then it'd be nice for it to be done automatically (since this > particular CTE behavior bites enough people already). Sure. I just find it personally hard to find a good demarcation line between A: "queries where pushing quals through are universally beneficial and wanted" and B: "queries where we are inserting an explicit materialization step to avoid planner issues", particularly where there is substantial overlap with between A and C: "queries that are written with a CTE and arguably shouldn't be". Put another way, I find CTE to express: 'this then that' where joins express 'this with that'. So current behavior is not surprising at all. All that said, there could be a narrow class of low hanging cases (such as the OP's) that could be sniped...I'm just skeptical. merlin
On Tue, Nov 20, 2012 at 5:24 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Tue, Nov 20, 2012 at 1:53 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote: >> As can be seen by the current conversation, not everyone is convinced > that CTEs ought to be an explicit optimization barrier > > On Tue, Nov 20, 2012 at 1:26 PM, Claudio Freire <klaussfreire@gmail.com> wrote: >> It *could* just be a lack of imagination on my part. But if it were >> not, then it'd be nice for it to be done automatically (since this >> particular CTE behavior bites enough people already). > > Sure. I just find it personally hard to find a good demarcation line > between A: "queries where pushing quals through are universally > beneficial and wanted" and B: "queries where we are inserting an > explicit materialization step to avoid planner issues", particularly > where there is substantial overlap with between A and C: "queries that > are written with a CTE and arguably shouldn't be". > > Put another way, I find CTE to express: 'this then that' where joins > express 'this with that'. So current behavior is not surprising at > all. All that said, there could be a narrow class of low hanging cases > (such as the OP's) that could be sniped...I'm just skeptical. It could work very well towards CTE-including views, where the quals cannot be added in the view but would be present when the view is expanded in final queries.
Jon Nelson <jnelson+pgsql@jamponi.net> writes: > ... Perhaps even including a > small blurb about what an optimization barrier even means (my > understanding is that it merely forces materialization of that part of > the query). FWIW, it has nothing to do with materialization; it means that we don't push conditions down into that subquery, nor pull subexpressions up out of it, nor rearrange join order across the subquery boundary. In short the subquery is planned separately from the outer query. But it could then be run by the executor in the usual tuple-at-a-time fashion, without materializing the whole subquery result. It is true that CTEScan nodes materialize the subquery output (ie copy it into a tuplestore), but that's to support multiple CTEScans reading the same CTE. One of the optimizations we *should* put in place sometime is skipping the tuplestore if there's only one CTEScan on the CTE. regards, tom lane
On 11/21/2012 03:53 AM, Jon Nelson wrote: > My perspective on this is that CTEs *should* be just like creating a > temporary table and then joining to it, but without the > materialization costs. In that respect, they seem like they should be > like nifty VIEWs. If I wanted the behavior of materialization and then > join, I'd do that explicitly with temporary tables, but using CTEs as > an explicit optimization barrier feels like the explaining away > surprising behavior. I agree, especially since that barrier isn't specified as standard, so we're using a standard feature with a subtle quirk as a database-specific optimisation trick. A hint, as it were, like OFFSET 0. *(Dons asbestos underwear an dives for cover)* My big problem with the status quo is that it breaks queries from other databases, like MS SQL server, where CTEs are optimised. I see this periodically on Stack Overflow, with people asking variants of "Why does PostgreSQL take 10,000 times longer to execute this query"? (not a literal quote). I really want to see this formalized and made explicit with `WITH tablename AS MATERIALIZE (SELECT)` or similar. Right now I often can't use CTEs to clean up hard-to-read queries because of the optimisation barrier, so I have to create a temporary view, temporary table, or use nested subqueries in FROM instead. Ugly. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
I'd also add ANALYZED/NOT ANALYZED. This should force it behave like 'create table, analyze, select' with statistics used in second query plan.
P.S. defaults can be configurable.
20 лист. 2012 02:22, "Gavin Flower" <GavinFlower@archidevsys.co.nz> напис.
On 15/11/12 15:03, Peter Geoghegan wrote:On 15 November 2012 01:46, Andrew Dunstan <andrew@dunslane.net> wrote:How about adding the keywords FENCED and NOT FENCED to the SQL definition of CTE's - with FENCED being the default?It cuts both ways. I have used CTEs a LOT precisely because this behaviourIs the "OFFSET 0" hack really so bad? We've been telling people to do
lets me get better plans. Without that I'll be back to using the "offset 0"
hack.
that for years, so it's already something that we've effectively
committed to.
Cheers,
Gavin
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On Tue, Nov 20, 2012 at 02:24:01PM -0600, Merlin Moncure wrote: > On Tue, Nov 20, 2012 at 1:53 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote: > > As can be seen by the current conversation, not everyone is convinced > that CTEs ought to be an explicit optimization barrier > > On Tue, Nov 20, 2012 at 1:26 PM, Claudio Freire <klaussfreire@gmail.com> wrote: > > It *could* just be a lack of imagination on my part. But if it were > > not, then it'd be nice for it to be done automatically (since this > > particular CTE behavior bites enough people already). > > Sure. I just find it personally hard to find a good demarcation line > between A: "queries where pushing quals through are universally > beneficial and wanted" and B: "queries where we are inserting an > explicit materialization step to avoid planner issues", particularly > where there is substantial overlap with between A and C: "queries that > are written with a CTE and arguably shouldn't be". > > Put another way, I find CTE to express: 'this then that' where joins > express 'this with that'. So current behavior is not surprising at > all. All that said, there could be a narrow class of low hanging cases > (such as the OP's) that could be sniped...I'm just skeptical. Is thi -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +