Обсуждение: query_planner() API change
I've been looking at what it would take to do proper cost estimation for the recently-discussed patch to suppress calculation of unnecessary ORDER BY expressions. It turns out that knowledge of that would have to propagate into query_planner(), because the place where we do the cost comparison between unsorted and presorted paths is in there (planmain.c lines 390ff in HEAD). As it stands, query_planner() will actually refuse to return the presorted path to grouping_planner() at all if it thinks it's a loser cost-wise, meaning grouping_planner() would have no opportunity to override the decision. So there's no way to fix this without some API change for query_planner(). While we could complicate query_planner()'s API even more to add some understanding of unnecessary resjunk items, I think this is probably the straw that breaks the camel's back for the current approach here. There is already a comment like this in query_planner(): * This introduces some undesirable coupling between this code and * grouping_planner, but the alternatives seem evenuglier; we couldn't * pass back completed paths without making these decisions here. I think it's time to bite the bullet and *not* pass back completed paths. What's looking more attractive now is to just pass back the top-level RelOptInfo ("final_rel" in query_planner()). We could remove all three output parameters of query_planner(), and essentially just move lines 265-420 (pretty much everything after the make_one_rel() call) into planner.c. Since that code is almost all about grouping-related choices, this seems like it'll be a net improvement modularity-wise, even though it'll make grouping_planner() even bigger. We could probably ameliorate the latter problem by putting the calculation of num_groups and adjustment of tuple_fraction into a subroutine. Objections, better ideas? regards, tom lane
On Sun, Aug 4, 2013 at 6:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I've been looking at what it would take to do proper cost estimation > for the recently-discussed patch to suppress calculation of unnecessary > ORDER BY expressions. It turns out that knowledge of that would have > to propagate into query_planner(), because the place where we do the cost > comparison between unsorted and presorted paths is in there (planmain.c > lines 390ff in HEAD). As it stands, query_planner() will actually refuse > to return the presorted path to grouping_planner() at all if it thinks > it's a loser cost-wise, meaning grouping_planner() would have no > opportunity to override the decision. So there's no way to fix this > without some API change for query_planner(). > > While we could complicate query_planner()'s API even more to add some > understanding of unnecessary resjunk items, I think this is probably > the straw that breaks the camel's back for the current approach here. > There is already a comment like this in query_planner(): > > * This introduces some undesirable coupling between this code and > * grouping_planner, but the alternatives seem even uglier; we couldn't > * pass back completed paths without making these decisions here. > > I think it's time to bite the bullet and *not* pass back completed paths. > What's looking more attractive now is to just pass back the top-level > RelOptInfo ("final_rel" in query_planner()). We could remove all three > output parameters of query_planner(), and essentially just move lines > 265-420 (pretty much everything after the make_one_rel() call) into > planner.c. Since that code is almost all about grouping-related choices, > this seems like it'll be a net improvement modularity-wise, even though > it'll make grouping_planner() even bigger. We could probably ameliorate > the latter problem by putting the calculation of num_groups and adjustment > of tuple_fraction into a subroutine. > > Objections, better ideas? I tend to think this is a pretty good plan. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, Aug 4, 2013 at 6:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think it's time to bite the bullet and *not* pass back completed paths. >> What's looking more attractive now is to just pass back the top-level >> RelOptInfo ("final_rel" in query_planner()). > I tend to think this is a pretty good plan. I looked around a little more and noted that this would complicate the special-case handling of an empty join tree (viz, "SELECT 2+2"). Right now query_planner() just has to make the appropriate Result path and it's done. We'd have to create a dummy RelOptInfo representing an empty set of relations, which is a bit weird but probably not too unreasonable when all's said and done. regards, tom lane
> Robert Haas <robertmhaas@gmail.com> writes: > > On Sun, Aug 4, 2013 at 6:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> I think it's time to bite the bullet and *not* pass back completed paths. > >> What's looking more attractive now is to just pass back the top-level > >> RelOptInfo ("final_rel" in query_planner()). > > > I tend to think this is a pretty good plan. > > I looked around a little more and noted that this would complicate the > special-case handling of an empty join tree (viz, "SELECT 2+2"). Right now > query_planner() just has to make the appropriate Result path and it's done. > We'd have to create a dummy RelOptInfo representing an empty set of relations, > which is a bit weird but probably not too unreasonable when all's said and done. I think this is reasonable. Thanks, Best regards, Etsuro Fujita
> While we could complicate query_planner()'s API even more to add some > understanding of unnecessary resjunk items, I think this is probably > the straw that breaks the camel's back for the current approach here. > There is already a comment like this in query_planner(): > > * This introduces some undesirable coupling between this code and > * grouping_planner, but the alternatives seem even uglier; we couldn't > * pass back completed paths without making these decisions here. I agree with the idea,but am trying to understand why adding understanding of resjunk columns is a bad idea. Just for understanding purpose, could you please elaborate a bit on it? Regards, Atri -- Regards, Atri l'apprenant
On Mon, Aug 5, 2013 at 3:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Can you please mention the subject of the thread? I tried to locate the thread based on this description, but couldn't locate it. Are you referring to the discussion related to aggregation with specified ordering?
A doubt at the end ...
I've been looking at what it would take to do proper cost estimation
for the recently-discussed patch to suppress calculation of unnecessary
ORDER BY expressions.
Can you please mention the subject of the thread? I tried to locate the thread based on this description, but couldn't locate it. Are you referring to the discussion related to aggregation with specified ordering?
A doubt at the end ...
It turns out that knowledge of that would have
to propagate into query_planner(), because the place where we do the cost
comparison between unsorted and presorted paths is in there (planmain.c
lines 390ff in HEAD). As it stands, query_planner() will actually refuse
to return the presorted path to grouping_planner() at all if it thinks
it's a loser cost-wise, meaning grouping_planner() would have no
opportunity to override the decision. So there's no way to fix this
without some API change for query_planner().
While we could complicate query_planner()'s API even more to add some
understanding of unnecessary resjunk items, I think this is probably
the straw that breaks the camel's back for the current approach here.
There is already a comment like this in query_planner():
* This introduces some undesirable coupling between this code and
* grouping_planner, but the alternatives seem even uglier; we couldn't
* pass back completed paths without making these decisions here.
I think it's time to bite the bullet and *not* pass back completed paths.
What's looking more attractive now is to just pass back the top-level
RelOptInfo ("final_rel" in query_planner()). We could remove all three
output parameters of query_planner(), and essentially just move lines
265-420 (pretty much everything after the make_one_rel() call) into
planner.c. Since that code is almost all about grouping-related choices,
this seems like it'll be a net improvement modularity-wise, even though
it'll make grouping_planner() even bigger. We could probably ameliorate
the latter problem by putting the calculation of num_groups and adjustment
of tuple_fraction into a subroutine.
Can we change the query_planner() to return both the paths (presorted and unsorted) irrespective of the cost of presorted path, and let grouping_planner() (or any caller of query_planner()) handle which of them to pick up?
Objections, better ideas?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Postgres Database Company
Ashutosh Bapat
EntepriseDB Corporation
The Postgres Database Company
> I agree with the idea,but am trying to understand why adding understanding of > resjunk columns is a bad idea. Just for understanding purpose, could you please > elaborate a bit on it? Although I may not have understood your question correctly, I think it is good to see http://www.postgresql.org/message-id/14993.1354552292@sss.pgh.pa.us Best regards, Etsuro Fujita
> On Mon, Aug 5, 2013 at 3:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I've been looking at what it would take to do proper cost estimation >> for the recently-discussed patch to suppress calculation of >> unnecessary ORDER BY expressions. > Can you please mention the subject of the thread? I tried to locate the thread > based on this description, but couldn't locate it. Please see http://www.postgresql.org/message-id/6543.1375470829@sss.pgh.pa.us Best regards, Etsuro Fujita
Atri Sharma <atri.jiit@gmail.com> writes: >> While we could complicate query_planner()'s API even more to add some >> understanding of unnecessary resjunk items, I think this is probably >> the straw that breaks the camel's back for the current approach here. >> There is already a comment like this in query_planner(): >> >> * This introduces some undesirable coupling between this code and >> * grouping_planner, but the alternatives seem even uglier; we couldn't >> * pass back completed paths without making these decisions here. > I agree with the idea,but am trying to understand why adding > understanding of resjunk columns is a bad idea. Just for understanding > purpose, could you please elaborate a bit on it? It's just that doing it that way would require making both planner.c and planmain.c intimately involved in the decision about whether suppressing resjunk ORDER BY targets is a win. Really, anything to do with ordering/grouping implementation decisions is grouping_planner's business. So putting chunks of that logic in a completely different file doesn't seem like a great design, especially not if it requires weighing down query_planner()'s API even more. query_planner should only be concerned with scan/join planning. Basically, we'd be moving knowledge of how to dig the best paths out of a RelOptInfo from query_planner to grouping_planner --- which when you think about it seems like mostly a wash from a modularity standpoint, anyway. Having done that, we can get query_planner's fingers out of a number of issues that are really grouping_planner's business. Returning the RelOptInfo also eliminates the baked-into-the-API assumption that only one of the presorted path(s) could be of interest to grouping_planner, which is something I've long suspected would become a problem someday. On balance I'm feeling like this is a win even without considering the proposed changes for resjunk targets. regards, tom lane
Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes: > Can we change the query_planner() to return both the paths (presorted and > unsorted) irrespective of the cost of presorted path, and let > grouping_planner() (or any caller of query_planner()) handle which of them > to pick up? That's exactly the result this change would have, since all the potential Paths are attached to the top-level RelOptInfo. regards, tom lane
On Mon, Aug 5, 2013 at 6:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Atri Sharma <atri.jiit@gmail.com> writes: >>> While we could complicate query_planner()'s API even more to add some >>> understanding of unnecessary resjunk items, I think this is probably >>> the straw that breaks the camel's back for the current approach here. >>> There is already a comment like this in query_planner(): >>> >>> * This introduces some undesirable coupling between this code and >>> * grouping_planner, but the alternatives seem even uglier; we couldn't >>> * pass back completed paths without making these decisions here. > >> I agree with the idea,but am trying to understand why adding >> understanding of resjunk columns is a bad idea. Just for understanding >> purpose, could you please elaborate a bit on it? > > It's just that doing it that way would require making both planner.c and > planmain.c intimately involved in the decision about whether suppressing > resjunk ORDER BY targets is a win. Really, anything to do with > ordering/grouping implementation decisions is grouping_planner's business. > So putting chunks of that logic in a completely different file doesn't > seem like a great design, especially not if it requires weighing down > query_planner()'s API even more. query_planner should only be concerned > with scan/join planning. > > Basically, we'd be moving knowledge of how to dig the best paths out of a > RelOptInfo from query_planner to grouping_planner --- which when you think > about it seems like mostly a wash from a modularity standpoint, anyway. > Having done that, we can get query_planner's fingers out of a number of > issues that are really grouping_planner's business. Returning the > RelOptInfo also eliminates the baked-into-the-API assumption that only one > of the presorted path(s) could be of interest to grouping_planner, which > is something I've long suspected would become a problem someday. > > On balance I'm feeling like this is a win even without considering the > proposed changes for resjunk targets. Thanks a ton for such a detailed explanation. So, query_planner() returns both,the unsorted and presorted paths and lets grouping_planner() decide between them, and grouping_planner() ignores unnecessary ORDER BY columns,right? Sorry if I am being naive here, I am just trying to assimilate the overall process for my understanding. Thanks, atri -- Regards, Atri l'apprenant