Обсуждение: pgsql/src/backend/nodes (copyfuncs.c outfuncs.c print.c)

Поиск
Список
Период
Сортировка

pgsql/src/backend/nodes (copyfuncs.c outfuncs.c print.c)

От
Tom Lane
Дата:
  Date: Thursday, October 26, 2000 @ 17:35:49
Author: tgl

Update of /home/projects/pgsql/cvsroot/pgsql/src/backend/nodes
     from hub.org:/home/projects/pgsql/tmp/cvs-serv71501/src/backend/nodes

Modified Files:
    copyfuncs.c outfuncs.c print.c

-----------------------------  Log Message  -----------------------------

Re-implement LIMIT/OFFSET as a plan node type, instead of a hack in
ExecutorRun.  This allows LIMIT to work in a view.  Also, LIMIT in a
cursor declaration will behave in a reasonable fashion, whereas before
it was overridden by the FETCH count.

Re: pgsql/src/backend/nodes (copyfuncs.c outfuncs.c print.c)

От
Hiroshi Inoue
Дата:

Tom Lane wrote:

> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> >> Re-implement LIMIT/OFFSET as a plan node type, instead of a hack in
> >> ExecutorRun.  This allows LIMIT to work in a view.  Also, LIMIT in a
> >> cursor declaration will behave in a reasonable fashion,
>
> > Does "reasonable" mean that LIMIT is treated as optimizer's
> > hint but doesn't restrict total FETCH counts ?
>
> No, it means that a LIMIT in a cursor means what it says: the cursor
> will show that many rows and no more.  FETCH lets you move around in
> the cursor, but not override the limit.  I decided that the other
> behavior was just too darn weird... if you want to argue about that,
> let's take it up on pghackers not committers.
>
> Yes, the optimizer does pay attention to the limit.
>

Hmm,I'm not sure about your point.
Please correct me if I'm misunderstanding.

We can specify rows count by FETCH command.
It seems to me that LIMIT in declare cursor statement is only
for optimizer's hint.

Regards.
Hiroshi Inoue



Re: pgsql/src/backend/nodes (copyfuncs.c outfuncs.c print.c)

От
Hiroshi Inoue
Дата:

Tom Lane wrote:

> Now that I look at it, the optimizer *already* prefers fast-start plans
> for cursors.  Is LIMIT ALL really necessary as an additional hint,
> and if so how should it interact with the bias for cursors?
>

If LIMIT doesn't restrict the total count of rows which cursors
could return,there's no problem. Otherwise LIMIT ALL would be
needed.

Hiroshi Inoue



Re: pgsql/src/backend/nodes (copyfuncs.c outfuncs.c print.c)

От
Tom Lane
Дата:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> How would be the behavior of the following command sequence ?

> begin;
> declare myc cursor for select * from t1 limit 1;
> fetch in myc;
> fetch in myc;

> Could the last fetch return a row ?

As the code now stands, the second fetch would return nothing.
I think this is clearly what any reasonable person would expect
given the LIMIT 1 clause.

LIMIT ALL is a different story, because there's no semantic difference
between writing LIMIT ALL and writing no limit clause at all.  We have
the option to create a distinction for planning purposes, however.
Question is do we need one?

            regards, tom lane

Re: pgsql/src/backend/nodes (copyfuncs.c outfuncs.c print.c)

От
Tom Lane
Дата:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
>> Re-implement LIMIT/OFFSET as a plan node type, instead of a hack in
>> ExecutorRun.  This allows LIMIT to work in a view.  Also, LIMIT in a
>> cursor declaration will behave in a reasonable fashion,

> Does "reasonable" mean that LIMIT is treated as optimizer's
> hint but doesn't restrict total FETCH counts ?

No, it means that a LIMIT in a cursor means what it says: the cursor
will show that many rows and no more.  FETCH lets you move around in
the cursor, but not override the limit.  I decided that the other
behavior was just too darn weird... if you want to argue about that,
let's take it up on pghackers not committers.

Yes, the optimizer does pay attention to the limit.

            regards, tom lane

Re: pgsql/src/backend/nodes (copyfuncs.c outfuncs.c print.c)

От
Hiroshi Inoue
Дата:

Tom Lane wrote:

> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> > For example,LIMIT ALL means LIMIT 1 for optimizer and means
> > no LIMIT for executor.
> > Comments ?
>
> I don't see the point.  In the context of a regular SELECT, optimizing
> that way would be wrong, because we are going to fetch all the data.
> In the context of a DECLARE CURSOR, we already have a bias for fast-
> start plans, so why do we need another?
>

Hmm,I missed somthing ?
How would be the behavior of the following command sequence ?

begin;
declare myc cursor for select * from t1 limit 1;
fetch in myc;
fetch in myc;

Could the last fetch return a row ?

Regards, Hiroshi Inoue.



Re: pgsql/src/backend/nodes (copyfuncs.c outfuncs.c print.c)

От
Hiroshi Inoue
Дата:

Tom Lane wrote:

> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> > It seems to me that LIMIT in declare cursor statement is only
> > for optimizer's hint.
>
> I think that would just confuse people.

It could be. However what does LIMIT mean ?
Rows per FETCH ? Probably no.
FETCH forward,backward,forward,backward,.... and suddenly EOF ?

> If we want to have a hint
> that says "optimize for fast start", it ought to be done in another
> way than saying that SELECT ... LIMIT means different things in
> different contexts.
>

Yes I want to give optimizer a hint  "return first rows fast".
When Jan implemented LIMIT first,there was an option
"LIMIT ALL" and it was exactly designed for the purpose.

Regards.
Hiroshi Inoue


Re: pgsql/src/backend/nodes (copyfuncs.c outfuncs.c print.c)

От
Tom Lane
Дата:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> For example,LIMIT ALL means LIMIT 1 for optimizer and means
> no LIMIT for executor.
> Comments ?

I don't see the point.  In the context of a regular SELECT, optimizing
that way would be wrong, because we are going to fetch all the data.
In the context of a DECLARE CURSOR, we already have a bias for fast-
start plans, so why do we need another?

            regards, tom lane

Re: pgsql/src/backend/nodes (copyfuncs.c outfuncs.c print.c)

От
Tom Lane
Дата:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> It seems to me that LIMIT in declare cursor statement is only
> for optimizer's hint.

I think that would just confuse people.  If we want to have a hint
that says "optimize for fast start", it ought to be done in another
way than saying that SELECT ... LIMIT means different things in
different contexts.

Possibly the optimizer should always assume that cursors ought to
be optimized for fast start, LIMIT or no LIMIT --- does that seem
like a good idea to you?

            regards, tom lane

Re: pgsql/src/backend/nodes (copyfuncs.c outfuncs.c print.c)

От
Tom Lane
Дата:
<<< No Message Collected >>>

Re: pgsql/src/backend/nodes (copyfuncs.c outfuncs.c print.c)

От
Hiroshi Inoue
Дата:

Tom Lane wrote:

> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> > How would be the behavior of the following command sequence ?
>
> > begin;
> > declare myc cursor for select * from t1 limit 1;
> > fetch in myc;
> > fetch in myc;
>
> > Could the last fetch return a row ?
>
> As the code now stands, the second fetch would return nothing.
> I think this is clearly what any reasonable person would expect
> given the LIMIT 1 clause.
>

Different from ordinary select statements we could
gain the same result in case of cursors.

begin;
declare myc cursor for select * from t1;
fetch in myc;


For exaple,

begin;
declare myc cursor for select * from t1 limit all;
fetch 20 in myc; (the first page)
...(interaction)
fetch 20 in myc; (the next page)
..(interaction)
fetch backward 20 in myc; (the previous page)
...

What I expect here is to get rows of each page in
an average response time not the total throughput
of db operation.

Regards, Hiroshi Inoue




>
> LIMIT ALL is a different story, because there's no semantic difference
> between writing LIMIT ALL and writing no limit clause at all.  We have
> the option to create a distinction for planning purposes, however.
> Question is do we need one?
>
>                         regards, tom lane


Re: pgsql/src/backend/nodes (copyfuncs.c outfuncs.c print.c)

От
Tom Lane
Дата:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> Tom Lane wrote:
>> Now that I look at it, the optimizer *already* prefers fast-start plans
>> for cursors.  Is LIMIT ALL really necessary as an additional hint,
>> and if so how should it interact with the bias for cursors?

> If LIMIT doesn't restrict the total count of rows which cursors
> could return,there's no problem. Otherwise LIMIT ALL would be
> needed.

But is there a reason to treat LIMIT ALL differently from no LIMIT
clause at all?

            regards, tom lane

Re: pgsql/src/backend/nodes (copyfuncs.c outfuncs.c print.c)

От
Hiroshi Inoue
Дата:

Tom Lane wrote:

> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> > Tom Lane wrote:
> >> Now that I look at it, the optimizer *already* prefers fast-start plans
> >> for cursors.  Is LIMIT ALL really necessary as an additional hint,
> >> and if so how should it interact with the bias for cursors?
>
> > If LIMIT doesn't restrict the total count of rows which cursors
> > could return,there's no problem. Otherwise LIMIT ALL would be
> > needed.
>
> But is there a reason to treat LIMIT ALL differently from no LIMIT
> clause at all?
>

For example,LIMIT ALL means LIMIT 1 for optimizer and means
no LIMIT for executor.
Comments ?

Regards, Hiroshi Inoue.



Re: pgsql/src/backend/nodes (copyfuncs.c outfuncs.c print.c)

От
Hiroshi Inoue
Дата:

Tom Lane wrote:

>   Date: Thursday, October 26, 2000 @ 17:35:49
> Author: tgl
>
> Update of /home/projects/pgsql/cvsroot/pgsql/src/backend/nodes
>      from hub.org:/home/projects/pgsql/tmp/cvs-serv71501/src/backend/nodes
>
> Modified Files:
>         copyfuncs.c outfuncs.c print.c
>
> -----------------------------  Log Message  -----------------------------
>
> Re-implement LIMIT/OFFSET as a plan node type, instead of a hack in
> ExecutorRun.  This allows LIMIT to work in a view.  Also, LIMIT in a
> cursor declaration will behave in a reasonable fashion,

Does "reasonable" mean that LIMIT is treated as optimizer's
hint but doesn't restrict total FETCH counts ?

Regards.
Hiroshi Inoue


Re: pgsql/src/backend/nodes (copyfuncs.c outfuncs.c print.c)

От
Tom Lane
Дата:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> Yes I want to give optimizer a hint  "return first rows fast".
> When Jan implemented LIMIT first,there was an option
> "LIMIT ALL" and it was exactly designed for the purpose.

Well, we could make that work that way again, I think.  Need to look
at the code, but I think the optimizer could tell the difference between
a LIMIT ALL clause and no limit clause at all.

            regards, tom lane

Re: pgsql/src/backend/nodes (copyfuncs.c outfuncs.c print.c)

От
Philip Warner
Дата:
At 20:17 26/10/00 -0400, Tom Lane wrote:
>
>Possibly the optimizer should always assume that cursors ought to
>be optimized for fast start, LIMIT or no LIMIT --- does that seem
>like a good idea to you?
>

No to me; I'd vote for an 'OPTIMIZE FOR FAST START' phrase ahead of always
doing a fast start. It also seems natural that a LIMIT clause should (a)
limit the results in all cases and (b) be used by the optimiser in such a
way that it *considers* a fast start approach more carefully.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|
                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

Re: pgsql/src/backend/nodes (copyfuncs.c outfuncs.c print.c)

От
Philip Warner
Дата:
At 12:11 27/10/00 +0900, Hiroshi Inoue wrote:
>
>For example,LIMIT ALL means LIMIT 1 for optimizer and means
>no LIMIT for executor.
>Comments ?
>

It seems there's two possibilities:

(a) You know you will only use a limited number of rows, but you are not
sure exactly how many. In this case, I'd vote for a 'OPTIMIZE FOR FAST
START' clause.

(b) You really want all rows, in which case you should let the optimizer do
it's stuff. If it fails to work well, then use either 'OPTIMIZE FOR TOTAL
COST' or 'OPTIMIZE FOR FAST START' to change the behaviour.

ISTM that LIMIT ALL is just the syntax for the default limit clause - and
should, if anything, be equivalent to 'OPTIMIZE FOR TOTAL COST'.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|
                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

Re: pgsql/src/backend/nodes (copyfuncs.c outfuncs.c print.c)

От
Philip Warner
Дата:
At 20:59 26/10/00 -0400, Tom Lane wrote:
>Hiroshi Inoue <Inoue@tpf.co.jp> writes:
>> Yes I want to give optimizer a hint  "return first rows fast".
>> When Jan implemented LIMIT first,there was an option
>> "LIMIT ALL" and it was exactly designed for the purpose.
>
>Well, we could make that work that way again, I think.

I think that would be a *bad* idea. ISTM that the syntax is obtuse for the
meaning it is being given. The (mild) confusion in this thread is evidence
of that, at least.



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|
                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

Re: pgsql/src/backend/nodes (copyfuncs.c outfuncs.c print.c)

От
Tom Lane
Дата:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> begin;
> declare myc cursor for select * from t1 limit all;
> fetch 20 in myc; (the first page)
> ...(interaction)
> fetch 20 in myc; (the next page)
> ..(interaction)
> fetch backward 20 in myc; (the previous page)
> ...

> What I expect here is to get rows of each page in
> an average response time not the total throughput
> of db operation.

Yes, but why should the presence of "limit all" affect that?
It's not apparent to me why the optimizer should treat this
case differently from plain
declare myc cursor for select * from t1;

            regards, tom lane

Re: pgsql/src/backend/nodes (copyfuncs.c outfuncs.c print.c)

От
Tom Lane
Дата:
> have ML members seen this discussion ?

The whole thread's only gone to pgsql-committers, which has pretty
small readership AFAIK.

If you want a wider discussion, let's bring it up in pghackers.

            regards, tom lane

Re: pgsql/src/backend/nodes (copyfuncs.c outfuncs.c print.c)

От
Hiroshi Inoue
Дата:

Tom Lane wrote:

> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> > Tom Lane wrote:
> >> Yes, but why should the presence of "limit all" affect that?
> >> It's not apparent to me why the optimizer should treat this
> >> case differently from plain
> >> declare myc cursor for select * from t1;
>
> > Am I misunderstanding ?
> > Doesn't optimizer make the plan for the query
> > "select * for t1" which would use SeqScan
> > in most cases ?
>
> In a plain SELECT, yes.  In a DECLARE CURSOR, it's currently set up
> to prefer indexscans anyway, LIMIT or no LIMIT (see lines 853 ff in
> src/backend/optimizer/plan/planner.c, current sources).

Probably  you mean
        if (parse->isPortal)
            tuple_fraction = 0.10;

Seems 0.10 isn't sufficently small in pretty many cases.
In addtion,SeqScan isn't used even when we want it e.g.
in the case cursors are just used to avoid the exhaution
of memory.


> I think it
> makes sense to have that preference for DECLARE, and what I'm wondering
> is if we need an additional preference when the DECLARE contains a LIMIT
> clause --- and if so, what should that be?
>

I don't think we can specify appropriate LIMIT for cursors.
We could judge if an application needs an average response
time of total throuput.

Regards, Hiroshi Inoue



Re: pgsql/src/backend/nodes (copyfuncs.c outfuncs.c print.c)

От
Tom Lane
Дата:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> Tom Lane wrote:
>> Yes, but why should the presence of "limit all" affect that?
>> It's not apparent to me why the optimizer should treat this
>> case differently from plain
>> declare myc cursor for select * from t1;

> Am I misunderstanding ?
> Doesn't optimizer make the plan for the query
> "select * for t1" which would use SeqScan
> in most cases ?

In a plain SELECT, yes.  In a DECLARE CURSOR, it's currently set up
to prefer indexscans anyway, LIMIT or no LIMIT (see lines 853 ff in
src/backend/optimizer/plan/planner.c, current sources).  I think it
makes sense to have that preference for DECLARE, and what I'm wondering
is if we need an additional preference when the DECLARE contains a LIMIT
clause --- and if so, what should that be?

            regards, tom lane

Re: pgsql/src/backend/nodes (copyfuncs.c outfuncs.c print.c)

От
Hiroshi Inoue
Дата:

Tom Lane wrote:

> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> > Seems 0.10 isn't sufficently small in pretty many cases.
>
> Agreed, it's an arbitrary number.  But where/how can we get a better
> one?
>

I  recommend the choise of "LIMIT 1" and "no LIMIT" by
some option. Others would have different opinions but
have ML members seen this discussion ? I've received
no mails about this thread from ML.

Regards, Hiroshi Inoue



Re: pgsql/src/backend/nodes (copyfuncs.c outfuncs.c print.c)

От
Tom Lane
Дата:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> Seems 0.10 isn't sufficently small in pretty many cases.

Agreed, it's an arbitrary number.  But where/how can we get a better
one?

            regards, tom lane

Re: pgsql/src/backend/nodes (copyfuncs.c outfuncs.c print.c)

От
Hiroshi Inoue
Дата:
Tom Lane wrote:

> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> > begin;
> > declare myc cursor for select * from t1 limit all;
> > fetch 20 in myc; (the first page)
> > ...(interaction)
> > fetch 20 in myc; (the next page)
> > ..(interaction)
> > fetch backward 20 in myc; (the previous page)
> > ...
>
> > What I expect here is to get rows of each page in
> > an average response time not the total throughput
> > of db operation.
>
> Yes, but why should the presence of "limit all" affect that?
> It's not apparent to me why the optimizer should treat this
> case differently from plain
> declare myc cursor for select * from t1;
>

Am I misunderstanding ?
Doesn't optimizer make the plan for the query
"select * for t1" which would use SeqScan
in most cases ?

Regards, Hiroshi Inoue



Re: pgsql/src/backend/nodes (copyfuncs.c outfuncs.c print.c)

От
Philip Warner
Дата:
At 21:03 28/10/00,  wrote:
>
>In a plain SELECT, yes.  In a DECLARE CURSOR, it's currently set up
>to prefer indexscans anyway, LIMIT or no LIMIT (see lines 853 ff in
>src/backend/optimizer/plan/planner.c, current sources).  I think it
>makes sense to have that preference for DECLARE, and what I'm wondering
>is if we need an additional preference when the DECLARE contains a LIMIT
>clause --- and if so, what should that be?

Do you really think it's not such a good idea to have different optimizer
behaviour for SELECT and DECLARE CURSOR? My expectation is that putting a
SELECT statement inside a cursor should not change it's performance. I'd be
interested to know the reasons for your choice.

The simplest solution would be to use the same logic in both cases; if the
correct behaviour is not obvious to you, then our users will be left second
guessing the optimizer once you start introducing several special cases
(LIMIT in SELECT, LIMIT ALL in SELECT, LIMIT in DECLARE etc etc).


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|
                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

Re: pgsql/src/backend/nodes (copyfuncs.c outfuncs.c print.c)

От
Tom Lane
Дата:
Philip Warner <pjw@rhyme.com.au> writes:
> Do you really think it's not such a good idea to have different optimizer
> behaviour for SELECT and DECLARE CURSOR? My expectation is that putting a
> SELECT statement inside a cursor should not change it's performance. I'd be
> interested to know the reasons for your choice.

I think it's an excellent idea to have different behaviors, and the
reason is that we know a stand-alone SELECT will deliver all its result
rows, whereas for DECLARE it's quite possible that not all the possible
result rows will be fetched.  Moreover, the user is likely to fetch the
cursor's results in bite-size chunks, so he will be interested in
average response time as well as total time.

In the proposal as written, LIMIT ALL and LIMIT n will in fact give rise
to identical behavior in both contexts, and it's only the case without
an explicit LIMIT that will behave differently.  (If we add a
SET-variable to control this, you could even make that behavior the same
too, by setting the variable to 1.0.)

            regards, tom lane

RE: pgsql/src/backend/nodes (copyfuncs.c outfuncs.c print.c)

От
"Hiroshi Inoue"
Дата:
Tom,thanks for your good summary.
Seems this is the latest posting for this thread.

> -----Original Message-----
> From: Tom Lane
>
> Philip Warner <pjw@rhyme.com.au> writes:
> > Do you really think it's not such a good idea to have different
> optimizer
> > behaviour for SELECT and DECLARE CURSOR? My expectation is that
> putting a
> > SELECT statement inside a cursor should not change it's
> performance. I'd be
> > interested to know the reasons for your choice.
>
> I think it's an excellent idea to have different behaviors, and the
> reason is that we know a stand-alone SELECT will deliver all its result
> rows, whereas for DECLARE it's quite possible that not all the possible
> result rows will be fetched.  Moreover, the user is likely to fetch the
> cursor's results in bite-size chunks, so he will be interested in
> average response time as well as total time.
>

Cursors have a different character from stand-alone SELECT.
We don't have to FETCH results continuously from cursors.
It's well known that an average response time is significant
in some applications. For example,we could make interactive
paging applications which require a next/prior page(small part
of the result of a query) according to user's request.

There may be more excellent ways to achive it but I don't
know how to do it in PostgreSQL.

Regards.
Hiroshi Inoue