Обсуждение: pgsql/src/backend/nodes (copyfuncs.c outfuncs.c print.c)
Date: Thursday, October 26, 2000 @ 17:35:49 Author: tgl Update of /home/projects/pgsql/cvsroot/pgsql/src/backend/nodes from hub.org:/home/projects/pgsql/tmp/cvs-serv71501/src/backend/nodes Modified Files: copyfuncs.c outfuncs.c print.c ----------------------------- Log Message ----------------------------- Re-implement LIMIT/OFFSET as a plan node type, instead of a hack in ExecutorRun. This allows LIMIT to work in a view. Also, LIMIT in a cursor declaration will behave in a reasonable fashion, whereas before it was overridden by the FETCH count.
Tom Lane wrote: > Hiroshi Inoue <Inoue@tpf.co.jp> writes: > >> Re-implement LIMIT/OFFSET as a plan node type, instead of a hack in > >> ExecutorRun. This allows LIMIT to work in a view. Also, LIMIT in a > >> cursor declaration will behave in a reasonable fashion, > > > Does "reasonable" mean that LIMIT is treated as optimizer's > > hint but doesn't restrict total FETCH counts ? > > No, it means that a LIMIT in a cursor means what it says: the cursor > will show that many rows and no more. FETCH lets you move around in > the cursor, but not override the limit. I decided that the other > behavior was just too darn weird... if you want to argue about that, > let's take it up on pghackers not committers. > > Yes, the optimizer does pay attention to the limit. > Hmm,I'm not sure about your point. Please correct me if I'm misunderstanding. We can specify rows count by FETCH command. It seems to me that LIMIT in declare cursor statement is only for optimizer's hint. Regards. Hiroshi Inoue
Tom Lane wrote: > Now that I look at it, the optimizer *already* prefers fast-start plans > for cursors. Is LIMIT ALL really necessary as an additional hint, > and if so how should it interact with the bias for cursors? > If LIMIT doesn't restrict the total count of rows which cursors could return,there's no problem. Otherwise LIMIT ALL would be needed. Hiroshi Inoue
Hiroshi Inoue <Inoue@tpf.co.jp> writes: > How would be the behavior of the following command sequence ? > begin; > declare myc cursor for select * from t1 limit 1; > fetch in myc; > fetch in myc; > Could the last fetch return a row ? As the code now stands, the second fetch would return nothing. I think this is clearly what any reasonable person would expect given the LIMIT 1 clause. LIMIT ALL is a different story, because there's no semantic difference between writing LIMIT ALL and writing no limit clause at all. We have the option to create a distinction for planning purposes, however. Question is do we need one? regards, tom lane
Hiroshi Inoue <Inoue@tpf.co.jp> writes: >> Re-implement LIMIT/OFFSET as a plan node type, instead of a hack in >> ExecutorRun. This allows LIMIT to work in a view. Also, LIMIT in a >> cursor declaration will behave in a reasonable fashion, > Does "reasonable" mean that LIMIT is treated as optimizer's > hint but doesn't restrict total FETCH counts ? No, it means that a LIMIT in a cursor means what it says: the cursor will show that many rows and no more. FETCH lets you move around in the cursor, but not override the limit. I decided that the other behavior was just too darn weird... if you want to argue about that, let's take it up on pghackers not committers. Yes, the optimizer does pay attention to the limit. regards, tom lane
Tom Lane wrote: > Hiroshi Inoue <Inoue@tpf.co.jp> writes: > > For example,LIMIT ALL means LIMIT 1 for optimizer and means > > no LIMIT for executor. > > Comments ? > > I don't see the point. In the context of a regular SELECT, optimizing > that way would be wrong, because we are going to fetch all the data. > In the context of a DECLARE CURSOR, we already have a bias for fast- > start plans, so why do we need another? > Hmm,I missed somthing ? How would be the behavior of the following command sequence ? begin; declare myc cursor for select * from t1 limit 1; fetch in myc; fetch in myc; Could the last fetch return a row ? Regards, Hiroshi Inoue.
Tom Lane wrote: > Hiroshi Inoue <Inoue@tpf.co.jp> writes: > > It seems to me that LIMIT in declare cursor statement is only > > for optimizer's hint. > > I think that would just confuse people. It could be. However what does LIMIT mean ? Rows per FETCH ? Probably no. FETCH forward,backward,forward,backward,.... and suddenly EOF ? > If we want to have a hint > that says "optimize for fast start", it ought to be done in another > way than saying that SELECT ... LIMIT means different things in > different contexts. > Yes I want to give optimizer a hint "return first rows fast". When Jan implemented LIMIT first,there was an option "LIMIT ALL" and it was exactly designed for the purpose. Regards. Hiroshi Inoue
Hiroshi Inoue <Inoue@tpf.co.jp> writes: > For example,LIMIT ALL means LIMIT 1 for optimizer and means > no LIMIT for executor. > Comments ? I don't see the point. In the context of a regular SELECT, optimizing that way would be wrong, because we are going to fetch all the data. In the context of a DECLARE CURSOR, we already have a bias for fast- start plans, so why do we need another? regards, tom lane
Hiroshi Inoue <Inoue@tpf.co.jp> writes: > It seems to me that LIMIT in declare cursor statement is only > for optimizer's hint. I think that would just confuse people. If we want to have a hint that says "optimize for fast start", it ought to be done in another way than saying that SELECT ... LIMIT means different things in different contexts. Possibly the optimizer should always assume that cursors ought to be optimized for fast start, LIMIT or no LIMIT --- does that seem like a good idea to you? regards, tom lane
<<< No Message Collected >>>
Tom Lane wrote: > Hiroshi Inoue <Inoue@tpf.co.jp> writes: > > How would be the behavior of the following command sequence ? > > > begin; > > declare myc cursor for select * from t1 limit 1; > > fetch in myc; > > fetch in myc; > > > Could the last fetch return a row ? > > As the code now stands, the second fetch would return nothing. > I think this is clearly what any reasonable person would expect > given the LIMIT 1 clause. > Different from ordinary select statements we could gain the same result in case of cursors. begin; declare myc cursor for select * from t1; fetch in myc; For exaple, begin; declare myc cursor for select * from t1 limit all; fetch 20 in myc; (the first page) ...(interaction) fetch 20 in myc; (the next page) ..(interaction) fetch backward 20 in myc; (the previous page) ... What I expect here is to get rows of each page in an average response time not the total throughput of db operation. Regards, Hiroshi Inoue > > LIMIT ALL is a different story, because there's no semantic difference > between writing LIMIT ALL and writing no limit clause at all. We have > the option to create a distinction for planning purposes, however. > Question is do we need one? > > regards, tom lane
Hiroshi Inoue <Inoue@tpf.co.jp> writes: > Tom Lane wrote: >> Now that I look at it, the optimizer *already* prefers fast-start plans >> for cursors. Is LIMIT ALL really necessary as an additional hint, >> and if so how should it interact with the bias for cursors? > If LIMIT doesn't restrict the total count of rows which cursors > could return,there's no problem. Otherwise LIMIT ALL would be > needed. But is there a reason to treat LIMIT ALL differently from no LIMIT clause at all? regards, tom lane
Tom Lane wrote: > Hiroshi Inoue <Inoue@tpf.co.jp> writes: > > Tom Lane wrote: > >> Now that I look at it, the optimizer *already* prefers fast-start plans > >> for cursors. Is LIMIT ALL really necessary as an additional hint, > >> and if so how should it interact with the bias for cursors? > > > If LIMIT doesn't restrict the total count of rows which cursors > > could return,there's no problem. Otherwise LIMIT ALL would be > > needed. > > But is there a reason to treat LIMIT ALL differently from no LIMIT > clause at all? > For example,LIMIT ALL means LIMIT 1 for optimizer and means no LIMIT for executor. Comments ? Regards, Hiroshi Inoue.
Tom Lane wrote: > Date: Thursday, October 26, 2000 @ 17:35:49 > Author: tgl > > Update of /home/projects/pgsql/cvsroot/pgsql/src/backend/nodes > from hub.org:/home/projects/pgsql/tmp/cvs-serv71501/src/backend/nodes > > Modified Files: > copyfuncs.c outfuncs.c print.c > > ----------------------------- Log Message ----------------------------- > > Re-implement LIMIT/OFFSET as a plan node type, instead of a hack in > ExecutorRun. This allows LIMIT to work in a view. Also, LIMIT in a > cursor declaration will behave in a reasonable fashion, Does "reasonable" mean that LIMIT is treated as optimizer's hint but doesn't restrict total FETCH counts ? Regards. Hiroshi Inoue
Hiroshi Inoue <Inoue@tpf.co.jp> writes: > Yes I want to give optimizer a hint "return first rows fast". > When Jan implemented LIMIT first,there was an option > "LIMIT ALL" and it was exactly designed for the purpose. Well, we could make that work that way again, I think. Need to look at the code, but I think the optimizer could tell the difference between a LIMIT ALL clause and no limit clause at all. regards, tom lane
At 20:17 26/10/00 -0400, Tom Lane wrote: > >Possibly the optimizer should always assume that cursors ought to >be optimized for fast start, LIMIT or no LIMIT --- does that seem >like a good idea to you? > No to me; I'd vote for an 'OPTIMIZE FOR FAST START' phrase ahead of always doing a fast start. It also seems natural that a LIMIT clause should (a) limit the results in all cases and (b) be used by the optimiser in such a way that it *considers* a fast start approach more carefully. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
At 12:11 27/10/00 +0900, Hiroshi Inoue wrote: > >For example,LIMIT ALL means LIMIT 1 for optimizer and means >no LIMIT for executor. >Comments ? > It seems there's two possibilities: (a) You know you will only use a limited number of rows, but you are not sure exactly how many. In this case, I'd vote for a 'OPTIMIZE FOR FAST START' clause. (b) You really want all rows, in which case you should let the optimizer do it's stuff. If it fails to work well, then use either 'OPTIMIZE FOR TOTAL COST' or 'OPTIMIZE FOR FAST START' to change the behaviour. ISTM that LIMIT ALL is just the syntax for the default limit clause - and should, if anything, be equivalent to 'OPTIMIZE FOR TOTAL COST'. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
At 20:59 26/10/00 -0400, Tom Lane wrote: >Hiroshi Inoue <Inoue@tpf.co.jp> writes: >> Yes I want to give optimizer a hint "return first rows fast". >> When Jan implemented LIMIT first,there was an option >> "LIMIT ALL" and it was exactly designed for the purpose. > >Well, we could make that work that way again, I think. I think that would be a *bad* idea. ISTM that the syntax is obtuse for the meaning it is being given. The (mild) confusion in this thread is evidence of that, at least. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Hiroshi Inoue <Inoue@tpf.co.jp> writes: > begin; > declare myc cursor for select * from t1 limit all; > fetch 20 in myc; (the first page) > ...(interaction) > fetch 20 in myc; (the next page) > ..(interaction) > fetch backward 20 in myc; (the previous page) > ... > What I expect here is to get rows of each page in > an average response time not the total throughput > of db operation. Yes, but why should the presence of "limit all" affect that? It's not apparent to me why the optimizer should treat this case differently from plain declare myc cursor for select * from t1; regards, tom lane
> have ML members seen this discussion ? The whole thread's only gone to pgsql-committers, which has pretty small readership AFAIK. If you want a wider discussion, let's bring it up in pghackers. regards, tom lane
Tom Lane wrote: > Hiroshi Inoue <Inoue@tpf.co.jp> writes: > > Tom Lane wrote: > >> Yes, but why should the presence of "limit all" affect that? > >> It's not apparent to me why the optimizer should treat this > >> case differently from plain > >> declare myc cursor for select * from t1; > > > Am I misunderstanding ? > > Doesn't optimizer make the plan for the query > > "select * for t1" which would use SeqScan > > in most cases ? > > In a plain SELECT, yes. In a DECLARE CURSOR, it's currently set up > to prefer indexscans anyway, LIMIT or no LIMIT (see lines 853 ff in > src/backend/optimizer/plan/planner.c, current sources). Probably you mean if (parse->isPortal) tuple_fraction = 0.10; Seems 0.10 isn't sufficently small in pretty many cases. In addtion,SeqScan isn't used even when we want it e.g. in the case cursors are just used to avoid the exhaution of memory. > I think it > makes sense to have that preference for DECLARE, and what I'm wondering > is if we need an additional preference when the DECLARE contains a LIMIT > clause --- and if so, what should that be? > I don't think we can specify appropriate LIMIT for cursors. We could judge if an application needs an average response time of total throuput. Regards, Hiroshi Inoue
Hiroshi Inoue <Inoue@tpf.co.jp> writes: > Tom Lane wrote: >> Yes, but why should the presence of "limit all" affect that? >> It's not apparent to me why the optimizer should treat this >> case differently from plain >> declare myc cursor for select * from t1; > Am I misunderstanding ? > Doesn't optimizer make the plan for the query > "select * for t1" which would use SeqScan > in most cases ? In a plain SELECT, yes. In a DECLARE CURSOR, it's currently set up to prefer indexscans anyway, LIMIT or no LIMIT (see lines 853 ff in src/backend/optimizer/plan/planner.c, current sources). I think it makes sense to have that preference for DECLARE, and what I'm wondering is if we need an additional preference when the DECLARE contains a LIMIT clause --- and if so, what should that be? regards, tom lane
Tom Lane wrote: > Hiroshi Inoue <Inoue@tpf.co.jp> writes: > > Seems 0.10 isn't sufficently small in pretty many cases. > > Agreed, it's an arbitrary number. But where/how can we get a better > one? > I recommend the choise of "LIMIT 1" and "no LIMIT" by some option. Others would have different opinions but have ML members seen this discussion ? I've received no mails about this thread from ML. Regards, Hiroshi Inoue
Hiroshi Inoue <Inoue@tpf.co.jp> writes: > Seems 0.10 isn't sufficently small in pretty many cases. Agreed, it's an arbitrary number. But where/how can we get a better one? regards, tom lane
Tom Lane wrote: > Hiroshi Inoue <Inoue@tpf.co.jp> writes: > > begin; > > declare myc cursor for select * from t1 limit all; > > fetch 20 in myc; (the first page) > > ...(interaction) > > fetch 20 in myc; (the next page) > > ..(interaction) > > fetch backward 20 in myc; (the previous page) > > ... > > > What I expect here is to get rows of each page in > > an average response time not the total throughput > > of db operation. > > Yes, but why should the presence of "limit all" affect that? > It's not apparent to me why the optimizer should treat this > case differently from plain > declare myc cursor for select * from t1; > Am I misunderstanding ? Doesn't optimizer make the plan for the query "select * for t1" which would use SeqScan in most cases ? Regards, Hiroshi Inoue
At 21:03 28/10/00, wrote: > >In a plain SELECT, yes. In a DECLARE CURSOR, it's currently set up >to prefer indexscans anyway, LIMIT or no LIMIT (see lines 853 ff in >src/backend/optimizer/plan/planner.c, current sources). I think it >makes sense to have that preference for DECLARE, and what I'm wondering >is if we need an additional preference when the DECLARE contains a LIMIT >clause --- and if so, what should that be? Do you really think it's not such a good idea to have different optimizer behaviour for SELECT and DECLARE CURSOR? My expectation is that putting a SELECT statement inside a cursor should not change it's performance. I'd be interested to know the reasons for your choice. The simplest solution would be to use the same logic in both cases; if the correct behaviour is not obvious to you, then our users will be left second guessing the optimizer once you start introducing several special cases (LIMIT in SELECT, LIMIT ALL in SELECT, LIMIT in DECLARE etc etc). ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > Do you really think it's not such a good idea to have different optimizer > behaviour for SELECT and DECLARE CURSOR? My expectation is that putting a > SELECT statement inside a cursor should not change it's performance. I'd be > interested to know the reasons for your choice. I think it's an excellent idea to have different behaviors, and the reason is that we know a stand-alone SELECT will deliver all its result rows, whereas for DECLARE it's quite possible that not all the possible result rows will be fetched. Moreover, the user is likely to fetch the cursor's results in bite-size chunks, so he will be interested in average response time as well as total time. In the proposal as written, LIMIT ALL and LIMIT n will in fact give rise to identical behavior in both contexts, and it's only the case without an explicit LIMIT that will behave differently. (If we add a SET-variable to control this, you could even make that behavior the same too, by setting the variable to 1.0.) regards, tom lane
Tom,thanks for your good summary. Seems this is the latest posting for this thread. > -----Original Message----- > From: Tom Lane > > Philip Warner <pjw@rhyme.com.au> writes: > > Do you really think it's not such a good idea to have different > optimizer > > behaviour for SELECT and DECLARE CURSOR? My expectation is that > putting a > > SELECT statement inside a cursor should not change it's > performance. I'd be > > interested to know the reasons for your choice. > > I think it's an excellent idea to have different behaviors, and the > reason is that we know a stand-alone SELECT will deliver all its result > rows, whereas for DECLARE it's quite possible that not all the possible > result rows will be fetched. Moreover, the user is likely to fetch the > cursor's results in bite-size chunks, so he will be interested in > average response time as well as total time. > Cursors have a different character from stand-alone SELECT. We don't have to FETCH results continuously from cursors. It's well known that an average response time is significant in some applications. For example,we could make interactive paging applications which require a next/prior page(small part of the result of a query) according to user's request. There may be more excellent ways to achive it but I don't know how to do it in PostgreSQL. Regards. Hiroshi Inoue