Обсуждение: Getting rid of cheap-startup-cost paths earlier
Currently, the planner keeps paths that appear to win on the grounds of either cheapest startup cost or cheapest total cost. It suddenly struck me that in many simple cases (viz, those with no LIMIT, EXISTS, cursor fast-start preference, etc) we could know a-priori that cheapest startup cost is not going to be interesting, and hence immediately discard any path that doesn't win on total cost. This would require some additional logic to detect whether the case applies, as well as extra complexity in add_path. So it's possible that it wouldn't be worthwhile overall. Still, it seems like it might be a useful idea to investigate. Thoughts? regards, tom lane
On 22 May 2012 06:50, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Currently, the planner keeps paths that appear to win on the grounds of > either cheapest startup cost or cheapest total cost. It suddenly struck > me that in many simple cases (viz, those with no LIMIT, EXISTS, cursor > fast-start preference, etc) we could know a-priori that cheapest startup > cost is not going to be interesting, and hence immediately discard any > path that doesn't win on total cost. My experience is that most people don't provide a LIMIT explicitly even when they know that's the desired behaviour. That's because either they simply don't understand that SQL can return lots of rows, or SQL knowledge isn't enough, or worse that people don't even know that specifying it would alter query plans. Regrettably the current planning of LIMIT clauses causes more problems so in many cases these have been explicitly removed from SQL by developers that know how many rows they wish to see. I would have proposed a default-LIMIT parameter before now, but for that last point. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On May 22, 2012, at 9:57 AM, Simon Riggs wrote: > On 22 May 2012 06:50, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Currently, the planner keeps paths that appear to win on the grounds of >> either cheapest startup cost or cheapest total cost. It suddenly struck >> me that in many simple cases (viz, those with no LIMIT, EXISTS, cursor >> fast-start preference, etc) we could know a-priori that cheapest startup >> cost is not going to be interesting, and hence immediately discard any >> path that doesn't win on total cost. > > My experience is that most people don't provide a LIMIT explicitly > even when they know that's the desired behaviour. That's because > either they simply don't understand that SQL can return lots of rows, > or SQL knowledge isn't enough, or worse that people don't even know > that specifying it would alter query plans. > > Regrettably the current planning of LIMIT clauses causes more problems > so in many cases these have been explicitly removed from SQL by > developers that know how many rows they wish to see. > > I would have proposed a default-LIMIT parameter before now, but for > that last point. this sounds like a total disaster to me ... why in the world should we have a default LIMIT parameter? i guess if somebody is not able to use LIMIT he should better not touch the DB. we clearly cannot fix incompetence by adding parameters. regards, hans -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de
On 22 May 2012 12:12, PostgreSQL - Hans-Jürgen Schönig <postgres@cybertec.at> wrote: > > On May 22, 2012, at 9:57 AM, Simon Riggs wrote: > >> On 22 May 2012 06:50, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >>> Currently, the planner keeps paths that appear to win on the grounds of >>> either cheapest startup cost or cheapest total cost. It suddenly struck >>> me that in many simple cases (viz, those with no LIMIT, EXISTS, cursor >>> fast-start preference, etc) we could know a-priori that cheapest startup >>> cost is not going to be interesting, and hence immediately discard any >>> path that doesn't win on total cost. >> >> My experience is that most people don't provide a LIMIT explicitly >> even when they know that's the desired behaviour. That's because >> either they simply don't understand that SQL can return lots of rows, >> or SQL knowledge isn't enough, or worse that people don't even know >> that specifying it would alter query plans. >> >> Regrettably the current planning of LIMIT clauses causes more problems >> so in many cases these have been explicitly removed from SQL by >> developers that know how many rows they wish to see. >> >> I would have proposed a default-LIMIT parameter before now, but for >> that last point. > > > this sounds like a total disaster to me ... > why in the world should we have a default LIMIT parameter? Its common for people to write SQL they want to have a fast response to, yet to not specify that in their SQL code. > i guess if somebody is not able to use LIMIT he should better not touch the DB. > we clearly cannot fix incompetence by adding parameters. Not sure I understand that argument. Many things are set by default in parameters. Oracle allows the user to choose whether plans return rows quickly, or execute plans efficiently overall. That is a relevant user choice, not a hint. We simply don't know that if LIMIT is absent whether the user wishes fast or efficient plans. If they use LIMIT we know they want fast plan. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, May 22, 2012 at 1:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Currently, the planner keeps paths that appear to win on the grounds of > either cheapest startup cost or cheapest total cost. It suddenly struck > me that in many simple cases (viz, those with no LIMIT, EXISTS, cursor > fast-start preference, etc) we could know a-priori that cheapest startup > cost is not going to be interesting, and hence immediately discard any > path that doesn't win on total cost. > > This would require some additional logic to detect whether the case > applies, as well as extra complexity in add_path. So it's possible > that it wouldn't be worthwhile overall. Still, it seems like it might > be a useful idea to investigate. > > Thoughts? Yeah, I think we should investigate that. Presumably you could easily have a situation where one part of the tree is under a LIMIT or EXISTS and therefore needs to preserve fast-start plans but the rest of the (potentially large) tree isn't, so we need something fairly fine-grained, I think. Maybe we could add a flag to each RelOptInfo indicating whether fast-start plans should be kept, or something like that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, May 22, 2012 at 08:29:48AM -0400, Robert Haas wrote: > On Tue, May 22, 2012 at 1:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Currently, the planner keeps paths that appear to win on the grounds of > > either cheapest startup cost or cheapest total cost. It suddenly struck > > me that in many simple cases (viz, those with no LIMIT, EXISTS, cursor > > fast-start preference, etc) we could know a-priori that cheapest startup > > cost is not going to be interesting, and hence immediately discard any > > path that doesn't win on total cost. > > > > This would require some additional logic to detect whether the case > > applies, as well as extra complexity in add_path. So it's possible > > that it wouldn't be worthwhile overall. Still, it seems like it might > > be a useful idea to investigate. > > > > Thoughts? > > Yeah, I think we should investigate that. Presumably you could easily > have a situation where one part of the tree is under a LIMIT or EXISTS > and therefore needs to preserve fast-start plans but the rest of the > (potentially large) tree isn't, so we need something fairly > fine-grained, I think. Maybe we could add a flag to each RelOptInfo > indicating whether fast-start plans should be kept, or something like > that. Is this a TODO? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, May 22, 2012 at 1:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Currently, the planner keeps paths that appear to win on the grounds of >> either cheapest startup cost or cheapest total cost. It suddenly struck >> me that in many simple cases (viz, those with no LIMIT, EXISTS, cursor >> fast-start preference, etc) we could know a-priori that cheapest startup >> cost is not going to be interesting, and hence immediately discard any >> path that doesn't win on total cost. >> >> This would require some additional logic to detect whether the case >> applies, as well as extra complexity in add_path. So it's possible >> that it wouldn't be worthwhile overall. Still, it seems like it might >> be a useful idea to investigate. >> >> Thoughts? > Yeah, I think we should investigate that. Presumably you could easily > have a situation where one part of the tree is under a LIMIT or EXISTS > and therefore needs to preserve fast-start plans but the rest of the > (potentially large) tree isn't, so we need something fairly > fine-grained, I think. Maybe we could add a flag to each RelOptInfo > indicating whether fast-start plans should be kept, or something like > that. I got around to looking at this finally. It turns out to be a big win, at least for queries without any LIMIT or other reason to worry about fast-start plans. As things currently stand, there isn't any reason to control the decision at finer than per-subquery level. I did it the way you suggest above anyway, with a per-RelOptInfo flag, because add_path() is passed only a RelOptInfo and not the PlannerInfo root structure. We could have changed that of course, but it would have meant changing an API used by FDWs, which would be annoying. It seems possible that in future somebody might think of a way to determine this on a per-relation level, so I thought this design might be a bit more future-proof anyway. regards, tom lane
On Sat, Sep 1, 2012 at 06:23:59PM -0400, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Tue, May 22, 2012 at 1:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Currently, the planner keeps paths that appear to win on the grounds of > >> either cheapest startup cost or cheapest total cost. It suddenly struck > >> me that in many simple cases (viz, those with no LIMIT, EXISTS, cursor > >> fast-start preference, etc) we could know a-priori that cheapest startup > >> cost is not going to be interesting, and hence immediately discard any > >> path that doesn't win on total cost. > >> > >> This would require some additional logic to detect whether the case > >> applies, as well as extra complexity in add_path. So it's possible > >> that it wouldn't be worthwhile overall. Still, it seems like it might > >> be a useful idea to investigate. > >> > >> Thoughts? > > > Yeah, I think we should investigate that. Presumably you could easily > > have a situation where one part of the tree is under a LIMIT or EXISTS > > and therefore needs to preserve fast-start plans but the rest of the > > (potentially large) tree isn't, so we need something fairly > > fine-grained, I think. Maybe we could add a flag to each RelOptInfo > > indicating whether fast-start plans should be kept, or something like > > that. > > I got around to looking at this finally. It turns out to be a big win, > at least for queries without any LIMIT or other reason to worry about > fast-start plans. Yes, I remember from the early days how quickly the number of considred paths can grow. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +