Обсуждение: query - laziness of lateral join with function
Hi I'm using cross join lateral with a non-trivial function in an attempt to limit calculation of that function, and am wondering about some aspects of how lateral is currently implemented. NB these queries are generated by a certain ORM, and are usually embedded in much more complex queries... Case one: counting select count(alpha.id) from alpha cross join lateral some_function(alpha.id) as some_val where alpha.test Here the function is strict, and moreover its argument will never be null - hence there should always be a non-null value returned. I would expect that since the function doesn't impact on the number of rows (always one value returned for each row in alpha), then I'd hope the function is never called. EXPLAIN shows it being called for each row in the main table. Case two: pagination select alpha.*, some_val from alpha cross join lateral some_function(alpha.id) as some_val where alpha.test order by alpha.name asc limit 100 offset 100 Same setup as above, and I'd expect that the ordering and selection of rows can be done first and the function only called on the rows that get selected. Again, EXPLAIN shows otherwise. So: am I expecting too much for LATERAL, or have I missed a trick somewhere? Many thanks in advance! Paul -- View this message in context: http://postgresql.nabble.com/query-laziness-of-lateral-join-with-function-tp5837706.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
paulcc <paulcc.two@gmail.com> writes: > select count(alpha.id) > from alpha > cross join lateral some_function(alpha.id) as some_val > where alpha.test > Here the function is strict, and moreover its argument will never > be null - hence there should always be a non-null value returned. > I would expect that since the function doesn't impact on the > number of rows (always one value returned for each row in alpha), > then I'd hope the function is never called. EXPLAIN shows it being > called for each row in the main table. You're out of luck on that one at the moment, although testing it on HEAD suggests that commit 55d5b3c08279b487cfa44d4b6e6eea67a0af89e4 might have fixed it for you in future releases. > select alpha.*, some_val > from alpha > cross join lateral some_function(alpha.id) as some_val > where alpha.test > order by alpha.name asc > limit 100 offset 100 > Same setup as above, and I'd expect that the ordering and > selection of rows can be done first and the function only > called on the rows that get selected. The planner might produce such a result if there's an opportunity to perform the sorting via an index on "alpha" (ie, the ORDER BY matches some index). If it has to do an explicit sort it's gonna do the join first. (If you have such an index, and it's not going for the plan you want, you might need to crank up the COST property of some_function to persuade the planner that it should try to minimize the number of calls even if that means a slower scan choice.) In both cases though, I rather wonder why you're using LATERAL at all, as opposed to just calling the function in the main query when you want its result. The query planner can't be expected to make up for arbitrary amounts of stupidity in the formulation of the submitted query. regards, tom lane
Tom Lane-2 wrote > paulcc < > paulcc.two@ > > writes: >> select count(alpha.id) >> from alpha >> cross join lateral some_function(alpha.id) as some_val >> where alpha.test > >> Here the function is strict, and moreover its argument will never >> be null - hence there should always be a non-null value returned. > > In both cases though, I rather wonder why you're using LATERAL at all, as > opposed to just calling the function in the main query when you want its > result. The query planner can't be expected to make up for arbitrary > amounts of stupidity in the formulation of the submitted query. I'm trying to answer this with a bit more detail but cannot because the OP provided too little information which is then causing Tom to make assumptions. I'm not sure to what degree the ORM is being stupid here since I do not know why it thinks LATERAL is more appropriate than a select-list function call for a non-SRF function (which I have to presume this is, but it is not stated). With respect to "the function will never return NULL": this is not the issue. The issue is that the function could return nothing (i.e., zero records) in which case the CROSS JOIN would suppress the corresponding correlated row from the result. Non-SRF functions are more easily used within the select-list of the query instead of attached to a LATERAL clause; the only issue there is when the function returns a composite and you try to immediately explode it into its constituent parts - the function will be evaluated multiple times. I'm not sure if that is what Tom is saying above but the combination of that limitation and limited optimizations if the function is in LATERAL seems to be in conflict here. There has been a recent uptick in interest in making PostgreSQL more ORM friendly (i.e., more able to simply ignore stuff that is added to the query even though a particular call doesn't actually need it) but I haven't seen anyone looking into LATERAL. More detailed reports may at least bring exposure to what is being used in the wild and garner interest from other parties in improving things. Unfortunately this report is too limited to really make a dent; lacking even the name of the ORM that is being used and the entire queries that are being generated - and why. David J. -- View this message in context: http://postgresql.nabble.com/query-laziness-of-lateral-join-with-function-tp5837706p5837735.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
On Feb 12, 2015 9:17 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> The planner might produce such a result if there's an opportunity
> to perform the sorting via an index on "alpha" (ie, the ORDER BY
> matches some index). If it has to do an explicit sort it's gonna
> do the join first.
>
> (If you have such an index, and it's not going for the plan you want,
> you might need to crank up the COST property of some_function to
> persuade the planner that it should try to minimize the number of calls
> even if that means a slower scan choice.)
>
> In both cases though, I rather wonder why you're using LATERAL at all, as
> opposed to just calling the function in the main query when you want its
> result. The query planner can't be expected to make up for arbitrary
> amounts of stupidity in the formulation of the submitted query.
Useful, many thanks. I'll try playing with cost changes and a more targeted index.
In my real code, the function actually returns a json hash from which several fields are extracted in the main select (why? Unpacking in the query saves some hassle in the app code...) So my plan was to use lateral to limit function calls to max once per row. Is there a better way, other than using a nested query?