Обсуждение: Re: WITHIN GROUP patch

Поиск

Список

Период

Сортировка

Re: WITHIN GROUP patch

От

Pavel Stehule

Дата:

09 октября 2013 г., 14:20:24

2013/10/9 Pavel Stehule <pavel.stehule@gmail.com>

Hello

I checked a conformance with ANSI SQL - and I didn't find any issue.

I found so following error message is not too friendly (mainly because this functionality will be new)

postgres=# select dense_rank(3,3,2) within group (order by num desc, odd) from test4;
ERROR: Incorrect number of arguments for hypothetical set function
LINE 1: select dense_rank(3,3,2) within group (order by num desc, od...
               ^
postgres=# select dense_rank(3,3,2) within group (order by num desc) from test4;
ERROR: Incorrect number of arguments for hypothetical set function
LINE 1: select dense_rank(3,3,2) within group (order by num desc) fr...
               ^
postgres=# select dense_rank(3,3) within group (order by num desc) from test4;
ERROR: Incorrect number of arguments for hypothetical set function
LINE 1: select dense_rank(3,3) within group (order by num desc) from...
               ^
postgres=# select dense_rank(3,3) within group (order by num desc, num) from test4;
dense_rank
------------
          3
(1 row)

Probably some hint should be there?

Regards

Pavel

2013/10/2 Vik Fearing <vik.fearing@dalibo.com>
On 09/30/2013 06:34 PM, Pavel Stehule wrote:
>
> I looked on this patch - it is one from long patches - so I propose to
> divide review to a few parts:
>
> a) a conformance with ANSI SQL
> b) check of new aggregates - semantic, implementation
> c) source code checking - usual patch review
>
> Now I would to work on @a

I had an unexpected emergency come up, sorry about that. I plan on
doing B and C starting on Thursday (October 3).

I am grateful to have Pavel's help, this is a big patch.

--
Vik

Re: WITHIN GROUP patch

От

Andrew Gierth

Дата:

11 октября 2013 г., 02:35:48

>>>>> "Pavel" == Pavel Stehule <pavel.stehule@gmail.com> writes:
>> I found so following error message is not too friendly (mainly because>> this functionality will be new)>> >>
postgres=#select dense_rank(3,3,2) within group (order by num desc, odd)>> from test4;>> ERROR:  Incorrect number of
argumentsfor hypothetical set function>> LINE 1: select dense_rank(3,3,2) within group (order by num desc, od...>> ^>>
>>Probably some hint should be there?

Hm, yeah.

Anyone have ideas for better wording for the original error message,
or a DETAIL or HINT addition? The key point here is that the number of
_hypothetical_ arguments and the number of ordered columns must match,
but we don't currently exclude the possibility that a user-defined
hypothetical set function might have additional non-hypothetical args.

The first alternative that springs to mind is:

ERROR: Incorrect number of arguments for hypothetical set function
DETAIL: Number of hypothetical arguments (3) must equal number of ordered columns (2)

-- 
Andrew (irc:RhodiumToad)

Re: WITHIN GROUP patch

От

Pavel Stehule

Дата:

11 октября 2013 г., 06:36:09

2013/10/11 Andrew Gierth <andrew@tao11.riddles.org.uk>

>>>>> "Pavel" == Pavel Stehule <pavel.stehule@gmail.com> writes:

>> I found so following error message is not too friendly (mainly because
>> this functionality will be new)
>>
>> postgres=# select dense_rank(3,3,2) within group (order by num desc, odd)
>> from test4;
>> ERROR: Incorrect number of arguments for hypothetical set function
>> LINE 1: select dense_rank(3,3,2) within group (order by num desc, od...
>> ^
>>
>> Probably some hint should be there?

Hm, yeah.

Anyone have ideas for better wording for the original error message,
or a DETAIL or HINT addition? The key point here is that the number of
_hypothetical_ arguments and the number of ordered columns must match,
but we don't currently exclude the possibility that a user-defined
hypothetical set function might have additional non-hypothetical args.

The first alternative that springs to mind is:

ERROR: Incorrect number of arguments for hypothetical set function
DETAIL: Number of hypothetical arguments (3) must equal number of ordered columns (2)

Pavel

--
Andrew (irc:RhodiumToad)

Re: WITHIN GROUP patch

От

Robert Haas

Дата:

11 октября 2013 г., 12:31:54

On Thu, Oct 10, 2013 at 10:35 PM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
> The first alternative that springs to mind is:
>
> ERROR: Incorrect number of arguments for hypothetical set function
> DETAIL: Number of hypothetical arguments (3) must equal number of ordered columns (2)

I'd suggest trying to collapse that down into a single message; the
DETAIL largely recapitulates the original error message.  Also, I'd
suggest identifying the name of the function, since people may not be
able to identify that easily based just on the fact that it's a
hypothetical set function.

Here's one idea, with two variants depending on the direction of the inequality:

ERROR: function "%s" has %d arguments but only %d ordering columns
ERROR: function "%s" has %d ordering columns but only %d arguments

Or else leave out the actual numbers and just state the principle, but
identifying the exact function at fault, e.g.

ERROR: number of arguments to function "%s" does not match number of
ordering columns

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WITHIN GROUP patch

От

Vik Fearing

Дата:

15 октября 2013 г., 10:59:50

On 10/09/2013 04:19 PM, Pavel Stehule wrote:

I checked a conformance with ANSI SQL - and I didn't find any issue.

I found so following error message is not too friendly (mainly because this functionality will be new)

postgres=# select dense_rank(3,3,2) within group (order by num desc, odd) from test4;
ERROR: Incorrect number of arguments for hypothetical set function
LINE 1: select dense_rank(3,3,2) within group (order by num desc, od...
               ^
postgres=# select dense_rank(3,3,2) within group (order by num desc) from test4;
ERROR: Incorrect number of arguments for hypothetical set function
LINE 1: select dense_rank(3,3,2) within group (order by num desc) fr...
               ^
postgres=# select dense_rank(3,3) within group (order by num desc) from test4;
ERROR: Incorrect number of arguments for hypothetical set function
LINE 1: select dense_rank(3,3) within group (order by num desc) from...
               ^
postgres=# select dense_rank(3,3) within group (order by num desc, num) from test4;
dense_rank
------------
          3
(1 row)

Probably some hint should be there?

In addition to Pavel's review, I have finally finished reading the patch. Here are some notes, mainly on style:

First of all, it no longer compiles on HEAD because commit 4d212bac1752e1bad6f3aa6242061c393ae93a0a stole oid 3968. I modified that locally to be able to continue my review.

Some of the error messages do not comply with project style. That is, they begin with a capital letter.

Ordered set functions cannot have transition functions
Ordered set functions must have final functions
Invalid argument types for hypothetical set function
Invalid argument types for ordered set function
Incompatible change to aggregate definition
Too many arguments to ordered set function
Ordered set finalfns must not be strict
Cannot have multiple ORDER BY clauses with WITHIN GROUP
Cannot have DISTINCT and WITHIN GROUP together
Incorrect number of arguments for hypothetical set function
Incorrect number of direct arguments to ordered set function %s

And in pg_aggregate.c I found a comment with a similar problem that doesn't match its surrounding code:
Oid transsortop = InvalidOid; /* Can be omitted */

I didn't find any more examples like that, but I did see several block comments that weren't complete sentences whereas I think they should be. Also a lot of the code comments say "I" and I don't recall seeing that elsewhere. I may be wrong, but I would prefer if they were more neutral.

The documentation has a number of issues.

collateindex.pl complains of duplicated index entries for PERCENTILE CONTINUOUS and PERCENTILE DISCRETE. This is because the index markup is used for both overloaded versions. This is the same mistake Bruce made and then corrected in commit 5dcc48c2c76cf4b2b17c8e14fe3e588ae0c8eff3.

"if there are multiple equally good result" should have an s on the end in func.sgml.

Table 9-49 has an extra empty column. That should either be removed, or filled in with some kind of comment text like other similar tables.

Apart from that, it looks good. There is some mismatched coding styles in there but the next run of pgindent should catch them so it's no big deal.

I haven't yet exercised the actual functionality of the new functions, nor have I tried to create my own. Andrew alerted me to a division by zero bug in one of them, so I'll be looking forward to catching that.

So, more review to come.

-- 
Vik

Re: WITHIN GROUP patch

От

Atri Sharma

Дата:

04 ноября 2013 г., 07:43:22

On Tue, Oct 15, 2013 at 4:29 PM, Vik Fearing <vik.fearing@dalibo.com> wrote:
> On 10/09/2013 04:19 PM, Pavel Stehule wrote:
>
>
>> I checked a conformance with ANSI SQL - and I didn't find any issue.
>>
>> I found so following error message is not too friendly (mainly because
>> this functionality will be new)
>>
>> postgres=# select dense_rank(3,3,2) within group (order by num desc, odd)
>> from test4;
>> ERROR:  Incorrect number of arguments for hypothetical set function
>> LINE 1: select dense_rank(3,3,2) within group (order by num desc, od...
>>                ^
>> postgres=# select dense_rank(3,3,2) within group (order by num desc) from
>> test4;
>> ERROR:  Incorrect number of arguments for hypothetical set function
>> LINE 1: select dense_rank(3,3,2) within group (order by num desc) fr...
>>                ^
>> postgres=# select dense_rank(3,3) within group (order by num desc) from
>> test4;
>> ERROR:  Incorrect number of arguments for hypothetical set function
>> LINE 1: select dense_rank(3,3) within group (order by num desc) from...
>>                ^
>> postgres=# select dense_rank(3,3) within group (order by num desc, num)
>> from test4;
>>  dense_rank
>> ------------
>>           3
>> (1 row)
>>
>> Probably some hint should be there?
>>
>>
>
> In addition to Pavel's review, I have finally finished reading the patch.
> Here are some notes, mainly on style:
>
> First of all, it no longer compiles on HEAD because commit
> 4d212bac1752e1bad6f3aa6242061c393ae93a0a stole oid 3968.  I modified that
> locally to be able to continue my review.
>
> Some of the error messages do not comply with project style.  That is, they
> begin with a capital letter.
>
> Ordered set functions cannot have transition functions
> Ordered set functions must have final functions
> Invalid argument types for hypothetical set function
> Invalid argument types for ordered set function
> Incompatible change to aggregate definition
> Too many arguments to ordered set function
> Ordered set finalfns must not be strict
> Cannot have multiple ORDER BY clauses with WITHIN GROUP
> Cannot have DISTINCT and WITHIN GROUP together
>
> Incorrect number of arguments for hypothetical set function
> Incorrect number of direct arguments to ordered set function %s
>
> And in pg_aggregate.c I found a comment with a similar problem that doesn't
> match its surrounding code:
> Oid            transsortop = InvalidOid;  /* Can be omitted */
>
> I didn't find any more examples like that, but I did see several block
> comments that weren't complete sentences whereas I think they should be.
> Also a lot of the code comments say "I" and I don't recall seeing that
> elsewhere.  I may be wrong, but I would prefer if they were more neutral.
>
> The documentation has a number of issues.
>
> collateindex.pl complains of duplicated index entries for PERCENTILE
> CONTINUOUS and PERCENTILE DISCRETE.  This is because the index markup is
> used for both overloaded versions.  This is the same mistake Bruce made and
> then corrected in commit 5dcc48c2c76cf4b2b17c8e14fe3e588ae0c8eff3.
>
> "if there are multiple equally good result" should have an s on the end in
> func.sgml.
>
> Table 9-49 has an extra empty column.  That should either be removed, or
> filled in with some kind of comment text like other similar tables.
>
> Apart from that, it looks good.  There is some mismatched coding styles in
> there but the next run of pgindent should catch them so it's no big deal.
>
> I haven't yet exercised the actual functionality of the new functions, nor
> have I tried to create my own.  Andrew alerted me to a division by zero bug
> in one of them, so I'll be looking forward to catching that.
>
> So, more review to come.
>
> --
> Vik

Hi All,

Please find attached our latest version of the patch. This version
fixes the issues pointed out by the reviewers.

Regards,

Atri

--
Regards,

Atri
l'apprenant

Вложения

withingrouppatch41113.patch

Re: WITHIN GROUP patch

От

Vik Fearing

Дата:

05 ноября 2013 г., 06:20:30

On 11/04/2013 08:43 AM, Atri Sharma wrote:
> Please find attached our latest version of the patch. This version
> fixes the issues pointed out by the reviewers.

No, it doesn't.  The documentation still contains formatting and
grammatical errors, and the code comments still do not match the their
surroundings.  (The use of "I" in the code comments is a point I have
conceded on IRC, but I stand by my other remarks.)

Don't bother submitting a new patch until I've posted my technical
review, but please fix these issues on your local copy.

-- 
Vik

Re: WITHIN GROUP patch

От

Atri Sharma

Дата:

14 ноября 2013 г., 18:35:35

Hi all,

Please find the latest version of the patch. This version fixes the
issues pointed out by the reviewer and the divide by zero bug in
percent_rank function. This version also adds a regression test for
the divide by zero case in percent_rank.

Regards,

Atri

Вложения

withingroup14112013.patch

Re: WITHIN GROUP patch

От

Peter Eisentraut

Дата:

18 ноября 2013 г., 03:56:30

On Fri, 2013-11-15 at 00:05 +0530, Atri Sharma wrote:
> Please find the latest version of the patch. This version fixes the
> issues pointed out by the reviewer and the divide by zero bug in
> percent_rank function. This version also adds a regression test for
> the divide by zero case in percent_rank.

This patch doesn't apply.

Re: WITHIN GROUP patch

От

Atri Sharma

Дата:

21 ноября 2013 г., 10:05:28

On Mon, Nov 18, 2013 at 9:26 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
> On Fri, 2013-11-15 at 00:05 +0530, Atri Sharma wrote:
>> Please find the latest version of the patch. This version fixes the
>> issues pointed out by the reviewer and the divide by zero bug in
>> percent_rank function. This version also adds a regression test for
>> the divide by zero case in percent_rank.
>
> This patch doesn't apply.
>

Hi all,

Please find attached the latest patch for WITHIN GROUP. This patch is
after fixing the merge conflicts.

Regards,

Atri

--
Regards,

Atri
l'apprenant

Вложения

withingrouppatch211113context.patch

Re: WITHIN GROUP patch

От

Vik Fearing

Дата:

21 ноября 2013 г., 17:28:56

On 11/21/2013 11:04 AM, Atri Sharma wrote:
> Please find attached the latest patch for WITHIN GROUP. This patch is
> after fixing the merge conflicts.

I have spent quite some time on this and the previous versions. Here is
my review, following the questions on the wiki.

This patch is in context diff format and applies cleanly to today's
master. It contains several regression tests and for the most part,
good documentation. I would like to see at least one example of using
each of the two types of function (hypothetical and inverted
distribution) in section 4.2.8.

This patch implements what it says it does. We don't already have a way
to get these results without this patch that I know of, and I think we
do want it. I certainly want it. I do not have a copy of the SQL
standard, but I have full faith in the Andrew Gierth's claim that this
is part of it. Even if not, implementation details were brought up
during design and agreed upon by this list[1]. I don't see how anything
here could be dangerous. The custom ordered set functions I made
correctly passed a round-trip through dump/restore.

The code compiles without warning. All of the clean tests I did worked
as expected, and all of the dirty tests failed elegantly.

I did not find any corner cases, and I looked in as many corners as I
could think of. I didn't manage to trigger any assertion failures and I
didn't crash it.

I found no noticeable issues with performance, either directly or as
side effects.

I am not the most competent with code review so I'll be relying on
further review by another reviewer or the final committer. The patch
fixed the project comments/messages style issues I raised in my previous
review. I found the code comments lacking in some places (like
inversedistribution.c:mode_final for example) but I can't say if the
really is too terse, or if it's just me. On the other hand, I thought
the explanation blocks in the code comments were adequately descriptive.

There is some room for improvement in future versions. The query select
mode() within group (order by x) over (partition by y) from ... should
be valid but isn't. This was explained by Andrew on IRC as being
non-trivial: "specifically, we implemented WITHIN GROUP by repurposing
the infrastructure already present for agg(distinct ...) and agg(x order
by y) which are also not yet supported for
aggregate-as-window-function". I assume then that evolution on one side
will benefit the other.

All in all, I believe this is ready for committer.

[1]
http://www.postgresql.org/message-id/2b8b55b8ba82f83ef4e6070b95fb0cd0%40news-out.riddles.org.uk

--
Vik

Re: WITHIN GROUP patch

От

Andrew Gierth

Дата:

21 ноября 2013 г., 19:20:33

>>>>> "Vik" == Vik Fearing <vik.fearing@dalibo.com> writes:
Vik> I certainly want it.  I do not have a copy of the SQL standard,Vik> but I have full faith in the Andrew Gierth's
claimthat this isVik> part of it.
 

For reference, this is how I believe it matches up against the spec
(I'm working from the 2008 final):

10.9 <aggregate function>:
 <hypothetical set function> is intended to be implemented in this patch exactly as per spec.
 <inverse distribution function>: the spec defines two of these, PERCENTILE_CONT and PERCENTILE_DISC:
 PERCENTILE_CONT is defined in the spec for numeric types, in which case it returns an approximate numeric result, and
forinterval, in which case it returns interval. Our patch defines percentile_cont functions for float8 and interval
inputtypes, relying on implicit casting to float8 to handle other numeric input types.
 
 As an extension to the spec, we define a percentile_cont function that returns an array of percentile values in one
call,as suggested by some users.
 
 PERCENTILE_DISC is defined in the spec for the same types as _CONT. Our version on the other hand accepts any type
whichcan be sorted, and returns the same type. This does mean that our version may return an exact numeric type rather
thanan approximate one, so this is a possible slight deviation from the spec.
 
 i.e. our  percentile_disc(float8) within group (order by bigint) returns a bigint, while the spec seems to imply it
shouldreturn an approximate numeric type (i.e. float*).
 
 Again, we additionally provide an array version which is not in the spec.
 mode() is not in the spec, we just threw it in because it was easy.

6.10 <window function>
 The spec says that <hypothetical set function> is not allowed as a window function.
 The spec does not forbid other <ordered set function>s in a window function call, but we have NOT attempted to
implementthis (largely for the same reasons that DISTINCT and ORDER BY are not implemented for aggregates as window
functions).

Conformance:  all the relevant features are parts of T612, "Advanced
OLAP Operations", which we already list in the docs on the unsupported
list with the comment "some forms supported". Maybe that could be
changed now to "most forms supported", but that's a subjective call
(and one we didn't really consider doing in this patch).

-- 
Andrew (irc:RhodiumToad)

Re: WITHIN GROUP patch

От

Tom Lane

Дата:

03 декабря 2013 г., 17:47:22

Atri Sharma <atri.jiit@gmail.com> writes:
> Please find attached the latest patch for WITHIN GROUP. This patch is
> after fixing the merge conflicts.

I've started to look at this patch now.  I have a couple of immediate
reactions to the catalog changes:

1. I really hate the way you've overloaded the transvalue to do something
that has approximately nothing to do with transition state (and haven't
updated catalogs.sgml to explain that, either).  Seems like it'd be
cleaner to just hardwire a bool column that distinguishes regular and
hypothetical input rows.  And why do you need aggtranssortop for that?
I fail to see the point of sorting on the flag column.

2 I also don't care for the definition of aggordnargs, which is the number
of direct arguments to an ordered set function, except when it isn't.
Rather than overloading it to be both a count and a couple of flags,
I wonder whether we shouldn't expand aggisordsetfunc to be a three-way
"aggregate kind" field (ordinary agg, ordered set, or hypothetical set
function).
        regards, tom lane

Re: WITHIN GROUP patch

От

Andrew Gierth

Дата:

03 декабря 2013 г., 18:41:47

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> 1. I really hate the way you've overloaded the transvalue to doTom> something that has approximately nothing to do
withtransitionTom> state (and haven't updated catalogs.sgml to explain that,Tom> either).  Seems like it'd be cleaner
tojust hardwire a boolTom> column that distinguishes regular and hypothetical input rows.
 

The intention here is that while the provided functions all fit the
spec's idea of how inverse distribution or hypothetical set functions
work, the actual implementation mechanisms are more generally
applicable than that and a user-supplied function could well find
something else useful to do with them. Accordingly, hardcoding stuff
seemed inappropriate.
Tom> And why do you need aggtranssortop for that?  I fail to see theTom> point of sorting on the flag column.

It is convenient to the implementation to be able to rely on
encountering the hypothetical row deterministically before (or in some
cases after, as in cume_dist) its peers in the remainder of the sort
order. Removing that sort would make the results of the functions
incorrect.

There should probably be some comments about that. oops.
Tom> 2 I also don't care for the definition of aggordnargs, which isTom> the number of direct arguments to an ordered
setfunction,Tom> except when it isn't.  Rather than overloading it to be both aTom> count and a couple of flags, I
wonderwhether we shouldn'tTom> expand aggisordsetfunc to be a three-way "aggregate kind" fieldTom> (ordinary agg,
orderedset, or hypothetical set function).
 

It would still be overloaded in some sense because a non-hypothetical
ordered set function could still take an arbitrary number of args
(using variadic "any") - there aren't any provided, but there's no
good reason to disallow user-defined functions doing that - so you'd
still need a special value like -1 for aggordnargs to handle that.

-- 
Andrew (irc:RhodiumToad)

Re: WITHIN GROUP patch

От

Tom Lane

Дата:

04 декабря 2013 г., 02:15:00

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom> 1. I really hate the way you've overloaded the transvalue to do
>  Tom> something that has approximately nothing to do with transition
>  Tom> state (and haven't updated catalogs.sgml to explain that,
>  Tom> either).  Seems like it'd be cleaner to just hardwire a bool
>  Tom> column that distinguishes regular and hypothetical input rows.

> The intention here is that while the provided functions all fit the
> spec's idea of how inverse distribution or hypothetical set functions
> work, the actual implementation mechanisms are more generally
> applicable than that and a user-supplied function could well find
> something else useful to do with them. Accordingly, hardcoding stuff
> seemed inappropriate.

>  Tom> And why do you need aggtranssortop for that?  I fail to see the
>  Tom> point of sorting on the flag column.

> It is convenient to the implementation to be able to rely on
> encountering the hypothetical row deterministically before (or in some
> cases after, as in cume_dist) its peers in the remainder of the sort
> order. Removing that sort would make the results of the functions
> incorrect.

Well, okay, but you've not said anything that wouldn't be handled just as
well by some logic that adds a fixed integer-constant-zero flag column to
the rows going into the tuplesort.  The function-specific code could then
inject hypothetical row(s) with other values, eg 1 or -1, to get the
results you describe.  If there's any flexibility improvement from
allowing a different constant value than zero, or a different datatype
than integer, I don't see what it'd be.

On the other side of the coin, repurposing the transition-value catalog
infrastructure to do this totally different thing is confusing.  And what
if someday somebody has a use for a regular transition value along with
this stuff?  What you've done is unorthogonal for no very good reason.

(Actually, I'm wondering whether it wouldn't be better if the tuplesort
object *were* the transition value, and were managed primarily by the
aggregate function's code; in particular expecting the agg's transition
function to insert rows into the tuplesort object.  We could provide
helper functions to largely negate any duplication-of-code objections,
I'd think.  In this view the WITHIN GROUP columns aren't so different from
regular aggregate arguments, but there'd need to be some way for the
aggregate function to get hold of the sorting-semantics decoration on
them.)

>  Tom> 2 I also don't care for the definition of aggordnargs, which is
>  Tom> the number of direct arguments to an ordered set function,
>  Tom> except when it isn't.  Rather than overloading it to be both a
>  Tom> count and a couple of flags, I wonder whether we shouldn't
>  Tom> expand aggisordsetfunc to be a three-way "aggregate kind" field
>  Tom> (ordinary agg, ordered set, or hypothetical set function).

> It would still be overloaded in some sense because a non-hypothetical
> ordered set function could still take an arbitrary number of args
> (using variadic "any") - there aren't any provided, but there's no
> good reason to disallow user-defined functions doing that - so you'd
> still need a special value like -1 for aggordnargs to handle that.

Sure.  But a -1 to indicate "not applicable" doesn't seem like it's
too much of a stretch.  It's the -2 business that's bothering me.
Again, that seems unnecessarily non-orthogonal --- who's to say which
functions would want to constrain the number of direct arguments and
which wouldn't?  (I wonder whether having this info in the catalogs
isn't the wrong thing anyhow, as opposed to expecting the functions
themselves to check the argument count at runtime.)
        regards, tom lane

Re: WITHIN GROUP patch

От

Andrew Gierth

Дата:

04 декабря 2013 г., 19:31:39

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> Well, okay, but you've not said anything that wouldn't beTom> handled just as well by some logic that adds a
fixedTom>integer-constant-zero flag column to the rows going into theTom> tuplesort.

Adding such a column unconditionally even for non-hypothetical
functions would break the optimization for sorting a single column
(which is a big deal, something like 3x speed difference, for by-value
types).

Adding it only for hypothetical set functions is making a distinction
in how functions are executed that I don't think is warranted -
imagine for example a function that calculates some measure over a
frequency distribution by adding a known set of boundary values to the
sort; this would not be a hypothetical set function in terms of
argument processing, but it would still benefit from the extra sort
column. I did not want to unnecessarily restrict such possibilities.
>> It would still be overloaded in some sense because a non-hypothetical>> ordered set function could still take an
arbitrarynumber of args>> (using variadic "any") - there aren't any provided, but there's no>> good reason to disallow
user-definedfunctions doing that - so you'd>> still need a special value like -1 for aggordnargs to handle that.

Tom> Sure.  But a -1 to indicate "not applicable" doesn't seem like it'sTom> too much of a stretch.  It's the -2
businessthat's bothering me.Tom> Again, that seems unnecessarily non-orthogonal --- who's to say whichTom> functions
wouldwant to constrain the number of direct arguments andTom> which wouldn't?  (I wonder whether having this info in
thecatalogsTom> isn't the wrong thing anyhow, as opposed to expecting the functionsTom> themselves to check the
argumentcount at runtime.)

Not checking the number of arguments to a function until runtime seems
a bit on the perverse side. Having a fixed number of direct args is
the "normal" case (as seen from the fact that all the non-hypothetical
ordered set functions in the spec and in our patch have fixed argument
counts).

-- 
Andrew (irc:RhodiumToad)

Re: WITHIN GROUP patch

От

Tom Lane

Дата:

04 декабря 2013 г., 20:16:25

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom> Well, okay, but you've not said anything that wouldn't be
>  Tom> handled just as well by some logic that adds a fixed
>  Tom> integer-constant-zero flag column to the rows going into the
>  Tom> tuplesort.

> Adding such a column unconditionally even for non-hypothetical
> functions would break the optimization for sorting a single column
> (which is a big deal, something like 3x speed difference, for by-value
> types).

Well, sure, but I was only suggesting adding it when the aggregate asks
for it, probably via a new flag column in pg_aggregate.  The question
you're evading is what additional functionality could be had if the
aggregate could demand a different datatype or constant value for the
flag column.

> Adding it only for hypothetical set functions is making a distinction
> in how functions are executed that I don't think is warranted -

That seems like rather a curious argument from someone who's willing to
give up the ability to specify a regular transition value concurrently
with the flag column.

But anyway, what I'm thinking right now is that these questions would all
go away if the aggregate transfunction were receiving the rows and
sticking them into the tuplestore.  It could add whatever columns it felt
like.
        regards, tom lane

Re: WITHIN GROUP patch

От

Andrew Gierth

Дата:

04 декабря 2013 г., 21:00:32

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> Well, sure, but I was only suggesting adding it when theTom> aggregate asks for it, probably via a new flag column
inTom>pg_aggregate.

Sure, I was only pointing out the necessity.
Tom> The question you're evading is what additional functionalityTom> could be had if the aggregate could demand a
differentdatatypeTom> or constant value for the flag column.

I don't really see a question there to answer - I simply chose to
provide a general mechanism rather than make assumptions about what
future users of the code would desire. I have no specific application
in mind that would require some other type.
>> Adding it only for hypothetical set functions is making a>> distinction in how functions are executed that I don't
thinkis>> warranted -

Tom> That seems like rather a curious argument from someone who'sTom> willing to give up the ability to specify a
regulartransitionTom> value concurrently with the flag column.

In the current patch the idea of also specifying a regular transition
value is meaningless since there is no transition function.
Tom> But anyway, what I'm thinking right now is that these questionsTom> would all go away if the aggregate
transfunctionwere receivingTom> the rows and sticking them into the tuplestore.  It could addTom> whatever columns it
feltlike.

True, but this ends up duplicating the sorting functionality of
nodeAgg that we are leveraging off in the first place. I think this
will be somewhat more intrusive and likely slower.

-- 
Andrew (irc:RhodiumToad)

Re: WITHIN GROUP patch

От

Tom Lane

Дата:

04 декабря 2013 г., 22:01:38

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom> But anyway, what I'm thinking right now is that these questions
>  Tom> would all go away if the aggregate transfunction were receiving
>  Tom> the rows and sticking them into the tuplestore.  It could add
>  Tom> whatever columns it felt like.

> True, but this ends up duplicating the sorting functionality of
> nodeAgg that we are leveraging off in the first place. I think this
> will be somewhat more intrusive and likely slower.

Hm, it's just a refactoring of the same code we'd have to have anyway,
so I'm not seeing a reason to assume it'd be slower.  If anything,
this approach would open more opportunities for function-specific
optimizations, which in the long run could be faster.  (I'm not
claiming that any such optimizations would be in the first version.)

In hindsight I wonder if it wasn't a mistake to embed ordered-aggregate
support in nodeAgg.c the way we did.  We could have dumped that
responsibility into some sort of wrapper around specific aggregates,
with an option for some aggregates to skip the wrapper and handle it
themselves.  A trivial, and perhaps not very useful, example is that
non-order-sensitive aggregates like MIN/MAX/COUNT could have been coded
to simply ignore any ordering request.  I can't immediately think of any
examples that are compelling enough to justify such a refactoring now ---
unless it turns out to make WITHIN GROUP easier.

Anyway, I'm going to go off and look at the WITHIN GROUP patch with these
ideas in mind.
        regards, tom lane

Re: WITHIN GROUP patch

От

Tom Lane

Дата:

06 декабря 2013 г., 01:49:06

Further questions about WITHIN GROUP:

I believe that the spec requires that the "direct" arguments of an inverse
or hypothetical-set aggregate must not contain any Vars of the current
query level. They don't manage to say that in plain English, of course,
but in the <hypothetical set function> case the behavior is defined in
terms of this sub-select:
( SELECT 0, SK1, ..., SKK FROM TNAME UNION ALL VALUES (1, VE1, ..., VEK) )

where SKn are the sort key values, TNAME is an alias for the input table,
and VEn are the direct arguments. Any input-table Vars the aggregate
might refer to are thus in scope only for the SKn not the VEn. (This
definition also makes it clear that the VEn are to be evaluated once,
not once per row.) In the <inverse distribution function> case, they
resort to claiming that the value of the <inverse distribution function
argument> can be replaced by a numeric literal, which again makes it clear
that it's supposed to be evaluated just once.

So that's all fine, but the patch seems a bit confused about it. If you
try to cheat, you get an error message that's less than apropos:

regression=# select cume_dist(f1) within group(order by f1) from text_tbl ;
ERROR: column "text_tbl.f1" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: select cume_dist(f1) within group(order by f1) from text_tbl... ^

I'd have hoped for a message along the line of "fixed arguments of
cume_dist() must not contain variables", similar to the phrasing we
use when you try to put a variable in LIMIT.

Also, there are some comments and code changes in nodeAgg.c that seem
to be on the wrong page here:
+ /*
+ * Use the representative input tuple for any references to
+ * non-aggregated input columns in the qual and tlist. (If we are not
+ * grouping, and there are no input rows at all, we will come here
+ * with an empty firstSlot ... but if not grouping, there can't be any
+ * references to non-aggregated input columns, so no problem.)
+ * We do this before finalizing because for ordered set functions,
+ * finalize_aggregates can evaluate arguments referencing the tuple.
+ */
+ econtext->ecxt_outertuple = firstSlot;
+

The last two lines of that comment are new, all the rest was moved from
someplace else. Those last two lines are wrong according to the above
reasoning, and if they were correct the argument made in the pre-existing
part of the comment would be all wet, meaning the code could in fact crash
when trying to evaluate a Var reference in finalize_aggregates.

So I'm inclined to undo this part of the patch (and anything else that's
rearranging logic with an eye to Var references in finalize_aggregates),
and to try to fix parse_agg.c so it gives a more specific error message
for this case.

After looking at this I'm a bit less enthused about the notion of hybrid
aggregates than I was before. I now see that the spec intends that when
the WITHIN GROUP notation is used, *all* the arguments to the left of
WITHIN GROUP are meant to be fixed evaluate-once arguments. While we
could possibly define aggregates for which some of those argument
positions are treated as evaluate-once-per-row arguments, I'm realizing
that that'd likely be pretty darn confusing for users.

What I now think we should do about the added pg_aggregate columns is
to have:

aggnfixedargs int number of fixed arguments, excluding any hypothetical ones (always 0 for normal
aggregates)
aggkind char 'n' for normal aggregate, 'o' for ordered set function, 'h' for hypothetical-set
function

with the parsing rules that given
agg( n fixed arguments ) WITHIN GROUP ( ORDER BY k sort specifications )

1. WITHIN GROUP is disallowed for normal aggregates.

2. For an ordered set function, n must equal aggnfixedargs. We treat all
n fixed arguments as contributing to the aggregate's result collation,
but ignore the sort arguments.

3. For a hypothetical-set function, n must equal aggnfixedargs plus k,
and we match up type and collation info of the last k fixed arguments
with the corresponding sort arguments. The first n-k fixed arguments
contribute to the aggregate's result collation, the rest not.

Reading back over this email, I see I've gone back and forth between
using the terms "direct args" and "fixed args" for the evaluate-once
stuff to the left of WITHIN GROUP. I guess I'm not really sold on
either terminology, but we need something we can use consistently
in the code and docs. The spec is no help, it has no generic term
at all for these args. Does anybody else have a preference, or
maybe another suggestion entirely?
regards, tom lane

Re: WITHIN GROUP patch

От

David Johnston

Дата:

06 декабря 2013 г., 02:42:27

Tom Lane-2 wrote
> Further questions about WITHIN GROUP:
> 
> I believe that the spec requires that the "direct" arguments of an inverse
> or hypothetical-set aggregate must not contain any Vars of the current
> query level.  They don't manage to say that in plain English, of course,
> but in the 
> <hypothetical set function>
>  case the behavior is defined in
> terms of this sub-select:
> 
>     ( SELECT 0, SK1, ..., SKK
>       FROM TNAME UNION ALL
>       VALUES (1, VE1, ..., VEK) )
> 
> where SKn are the sort key values, TNAME is an alias for the input table,
> and VEn are the direct arguments.  Any input-table Vars the aggregate
> might refer to are thus in scope only for the SKn not the VEn.  (This
> definition also makes it clear that the VEn are to be evaluated once,
> not once per row.)  In the 
> <inverse distribution function>
>  case, they
> resort to claiming that the value of the 
> <inverse distribution function
> argument>
>  can be replaced by a numeric literal, which again makes it clear
> that it's supposed to be evaluated just once.
> 
> So that's all fine, but the patch seems a bit confused about it.  If you
> try to cheat, you get an error message that's less than apropos:
> 
> regression=# select cume_dist(f1) within group(order by f1) from text_tbl
> ;
> ERROR:  column "text_tbl.f1" must appear in the GROUP BY clause or be used
> in an aggregate function
> LINE 1: select cume_dist(f1) within group(order by f1) from text_tbl...
>                          ^
> 
> I'd have hoped for a message along the line of "fixed arguments of
> cume_dist() must not contain variables", similar to the phrasing we
> use when you try to put a variable in LIMIT.
> 
> Also, there are some comments and code changes in nodeAgg.c that seem
> to be on the wrong page here:
>   
> +       /*
> +        * Use the representative input tuple for any references to
> +        * non-aggregated input columns in the qual and tlist.    (If we
> are not
> +        * grouping, and there are no input rows at all, we will come here
> +        * with an empty firstSlot ... but if not grouping, there can't be
> any
> +        * references to non-aggregated input columns, so no problem.)
> +        * We do this before finalizing because for ordered set functions,
> +        * finalize_aggregates can evaluate arguments referencing the
> tuple.
> +        */
> +       econtext->ecxt_outertuple = firstSlot;
> + 
> 
> The last two lines of that comment are new, all the rest was moved from
> someplace else.  Those last two lines are wrong according to the above
> reasoning, and if they were correct the argument made in the pre-existing
> part of the comment would be all wet, meaning the code could in fact crash
> when trying to evaluate a Var reference in finalize_aggregates.
> 
> So I'm inclined to undo this part of the patch (and anything else that's
> rearranging logic with an eye to Var references in finalize_aggregates),
> and to try to fix parse_agg.c so it gives a more specific error message
> for this case.

The original design references the spec as allowing a column reference if it
is a group by key:

Select cume_dist(f1) within group ( order by f1 ) from text_tbl group by f1

No example was shown where this would be useful...but nonetheless the
comment and error both presume that such is true.

I would presume the implementation would only supply rows for sorting from
the single group in question so for practical purposes the columns have a
constant value for the entirety of the evaluation and so this makes sense
and acts just like partition by on a window aggregate.

Question, are there any tests that exercise this desired behavior?  That
assuming agreement can be had that the group by behavior is indeed spec or
something custom we want to support.

David J.




--
View this message in context: http://postgresql.1045698.n5.nabble.com/Re-WITHIN-GROUP-patch-tp5773851p5782041.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: WITHIN GROUP patch

От

Tom Lane

Дата:

06 декабря 2013 г., 03:08:01

David Johnston <polobo@yahoo.com> writes:
> The original design references the spec as allowing a column reference if it
> is a group by key:

> Select cume_dist(f1) within group ( order by f1 ) from text_tbl group by f1

> No example was shown where this would be useful...but nonetheless the
> comment and error both presume that such is true.

I can see no support for that position in the spec text, and as you say,
the usefulness is at best really debatable.  Unless somebody can present a
credible use-case, or convince me that this is indeed what the spec says,
I'd be inclined to blow this off in favor of having a more intelligible
error message.  (I think if you really did need such functionality, you
could get it with a sub-select.)
        regards, tom lane

Re: WITHIN GROUP patch

От

Andrew Gierth

Дата:

06 декабря 2013 г., 08:13:25

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> Further questions about WITHIN GROUP:
Tom> I believe that the spec requires that the "direct" arguments ofTom> an inverse or hypothetical-set aggregate must
notcontain anyTom> Vars of the current query level.
 

False.

The spec requires that the direct arguments must not contain _ungrouped_
columns (see <set function specification>), but nowhere are grouped
columns forbidden.
Tom> They don't manage to say that in plain English, of course, butTom> in the <hypothetical set function> case the
behavioris definedTom> in terms of this sub-select:
 
Tom>     ( SELECT 0, SK1, ..., SKKTom>       FROM TNAME UNION ALLTom>       VALUES (1, VE1, ..., VEK) )

"TNAME" there is not the input table or an alias for it, but rather
the specific subset of rows to which the grouping operation is being
applied (after applying a FILTER if any).

We're in the General Rules here, not the Syntax Rules, so this is
describing _how to compute the result_ rather than a syntactic
transformation of the input.

In any event, going by the docs on the web, Oracle does not forbid
grouped columns there (their wording is "This expr must be constant
within each aggregation group.") MSSQL seems to require a literal
constant, but that's obviously not per spec. IBM doesn't seem to
have it in db2 for linux, but some of their other products have it
and include examples of using grouped vars: see


http://pic.dhe.ibm.com/infocenter/ntz/v7r0m3/index.jsp?topic=%2Fcom.ibm.nz.dbu.doc%2Fc_dbuser_hypothetical_set_family_syntax.html

So I'm going to say that the majority opinion is on my side here.
Tom> So that's all fine, but the patch seems a bit confused about it.

The patch treats vars in the direct args exactly as it would treat
them anywhere else where they must be ungrouped.

[snip a bunch of stuff based on false assumptions]
Tom> What I now think we should do about the added pg_aggregateTom> columns is to have:
Tom> aggnfixedargs    int    number of fixed arguments, excluding anyTom>             hypothetical ones (always 0 for
normalaggregates)
 
Tom> aggkind        char    'n' for normal aggregate,Tom>                   'o' for ordered set function,Tom>
 'h' for hypothetical-set function
 

I don't see any obvious problem with this.
Tom> with the parsing rules that given
Tom>      agg( n fixed arguments ) WITHIN GROUP ( ORDER BY k sort specifications )
Tom> 1. WITHIN GROUP is disallowed for normal aggregates.

(This is what the submitted patch does.)
Tom> 2. For an ordered set function, n must equal aggnfixedargs.  WeTom> treat all n fixed arguments as contributing to
theaggregate'sTom> result collation, but ignore the sort arguments.
 

That doesn't work for getting a sensible collation out of
percentile_disc applied to a collatable type. (Which admittedly is an
extension to the spec, which allows only numeric and interval, but it
seems to me to be worth having.)
Tom> 3. For a hypothetical-set function, n must equal aggnfixedargsTom> plus k, and we match up type and collation info
ofthe last kTom> fixed arguments with the corresponding sort arguments.  TheTom> first n-k fixed arguments contribute
tothe aggregate's resultTom> collation, the rest not.
 

The submitted patch does essentially this but taking the number of
non-variadic args in place of the suggested aggnfixedargs. Presumably
in your version the latter would be derived from the former?
Tom> Reading back over this email, I see I've gone back and forthTom> between using the terms "direct args" and "fixed
args"for theTom> evaluate-once stuff to the left of WITHIN GROUP.  I guess I'mTom> not really sold on either
terminology,but we need something weTom> can use consistently in the code and docs.  The spec is no help,Tom> it has no
genericterm at all for these args.  Does anybody elseTom> have a preference, or maybe another suggestion entirely?
 

We (Atri and I) have been using "direct args", but personally I'm not
amazingly happy with it. Documentation for other dbs tends to just call
them "arguments", and refer to the WITHIN GROUP expressions as "ordering
expressions" or similar.

-- 
Andrew (irc:RhodiumToad)

Re: WITHIN GROUP patch

От

Tom Lane

Дата:

06 декабря 2013 г., 14:34:48

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom> I believe that the spec requires that the "direct" arguments of
>  Tom> an inverse or hypothetical-set aggregate must not contain any
>  Tom> Vars of the current query level.

> In any event, going by the docs on the web, Oracle does not forbid
> grouped columns there (their wording is "This expr must be constant
> within each aggregation group.") MSSQL seems to require a literal
> constant, but that's obviously not per spec. IBM doesn't seem to
> have it in db2 for linux, but some of their other products have it
> and include examples of using grouped vars: see
>
http://pic.dhe.ibm.com/infocenter/ntz/v7r0m3/index.jsp?topic=%2Fcom.ibm.nz.dbu.doc%2Fc_dbuser_hypothetical_set_family_syntax.html

OK, that's reasonably convincing.  I think we'll need a HINT or something
to clarify the error message, because it sure looks like those arguments
are "used in an aggregate function".

>  Tom> 2. For an ordered set function, n must equal aggnfixedargs.  We
>  Tom> treat all n fixed arguments as contributing to the aggregate's
>  Tom> result collation, but ignore the sort arguments.

> That doesn't work for getting a sensible collation out of
> percentile_disc applied to a collatable type. (Which admittedly is an
> extension to the spec, which allows only numeric and interval, but it
> seems to me to be worth having.)

Meh.  I don't think you can have that and also have the behavior that
multiple ORDER BY items aren't constrained to have the same collation;
at least not without some rule that amounts to a special case for
percentile_disc, which I'd resist.

>  Tom> 3. For a hypothetical-set function, n must equal aggnfixedargs
>  Tom> plus k, and we match up type and collation info of the last k
>  Tom> fixed arguments with the corresponding sort arguments.  The
>  Tom> first n-k fixed arguments contribute to the aggregate's result
>  Tom> collation, the rest not.

> The submitted patch does essentially this but taking the number of
> non-variadic args in place of the suggested aggnfixedargs. Presumably
> in your version the latter would be derived from the former?

I'm not on board with using variadic vs non variadic to determine this.
For example, imagine a hypothetical-set function that for some reason
supports only a single sort column; there would be no reason to use
VARIADIC in its definition, and indeed good reason not to.  In any
case, I don't think this behavior should be tied to implementation details
of the representation of the function signature, and IMV variadic is
just that --- particularly for VARIADIC ANY, which is nothing more than
a short-cut for overloading the function name with different numbers of
ANY arguments.  Once we've got a match that involves N direct arguments
and K ordering arguments, the behavior should be determinate without
respect to just how we got that match.

>  Tom> Reading back over this email, I see I've gone back and forth
>  Tom> between using the terms "direct args" and "fixed args" for the
>  Tom> evaluate-once stuff to the left of WITHIN GROUP.  I guess I'm
>  Tom> not really sold on either terminology, but we need something we
>  Tom> can use consistently in the code and docs.  The spec is no help,
>  Tom> it has no generic term at all for these args.  Does anybody else
>  Tom> have a preference, or maybe another suggestion entirely?

> We (Atri and I) have been using "direct args", but personally I'm not
> amazingly happy with it. Documentation for other dbs tends to just call
> them "arguments", and refer to the WITHIN GROUP expressions as "ordering
> expressions" or similar.

Well, given that I was mistaken to think there could be no Vars at all
in them, "fixed" may not be le mot juste.  Unless somebody's got an
alternative to "direct", let's go with that.
        regards, tom lane

Re: WITHIN GROUP patch

От

Tom Lane

Дата:

06 декабря 2013 г., 18:39:25

I don't especially like the syntax you invented for listing arguments in
CREATE AGGREGATE, especially not the WITHIN GROUP (*) special case.
If we stick with that we're going to need to touch a lot more places than
you have, notably regprocedure. I also fear that this syntax is not
appropriate for identifying aggregates reliably, ie, aggregate argument
lists that look different in this syntax could reduce to identical
pg_proc.proargs lists, and perhaps vice versa.

I think we should just have it list the arguments as they'd appear in
pg_proc, and rely on aggregate properties (to wit, aggkind and
aggndirectargs) to identify ordered-set and hypothetical aggregates.

A slightly different question is what \da ought to print. I can't
say I think that (VARIADIC "any") WITHIN GROUP (*) is going to prove
very helpful to users, but probably just (VARIADIC "any") isn't
going to do either, at least not unless we add an aggregate-kind
column to the printout, and maybe not even then. It might work to
cheat by duplicating the last item if it's variadic: (..., VARIADIC "any") WITHIN GROUP (VARIADIC "any")
while if it's not variadic, we'd have to work out which argument
positions correspond to the ordered-set arguments and put them
in the right places.

Regardless of that, though ... what is the reasoning for inventing
pg_get_aggregate_arguments() rather than just making
pg_get_function_arguments() do the right thing? The separate function
seems to accomplish little except complicating life for clients, eg in
psql's describe.c.
regards, tom lane

Re: WITHIN GROUP patch

От

Andrew Gierth

Дата:

06 декабря 2013 г., 19:32:26

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> Regardless of that, though ... what is the reasoning forTom> inventing pg_get_aggregate_arguments() rather than
justmakingTom> pg_get_function_arguments() do the right thing?
 

pg_get_function_arguments()'s interface assumes that the caller is
providing the enclosing parens. Changing it would have meant returning
a result like 'float8) WITHIN GROUP (float8' which I reckoned would
have too much chance of breaking existing callers.

-- 
Andrew (irc:RhodiumToad)

Re: WITHIN GROUP patch

От

Tom Lane

Дата:

06 декабря 2013 г., 20:15:39

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom> Regardless of that, though ... what is the reasoning for
>  Tom> inventing pg_get_aggregate_arguments() rather than just making
>  Tom> pg_get_function_arguments() do the right thing?

> pg_get_function_arguments()'s interface assumes that the caller is
> providing the enclosing parens. Changing it would have meant returning
> a result like 'float8) WITHIN GROUP (float8' which I reckoned would
> have too much chance of breaking existing callers.

Well, as it stands, existing callers are broken a fortiori; they can't
possibly produce the right output for an ordered-set aggregate, if we
define the "right output" as looking like that.  Of course, if we define
the right output as being just the arguments according to pg_proc, it's
fine.  But your point about the parens is a good one anyway, because now
that I think about it, what \da has traditionally printed is
pg_catalog | sum  | bigint           | integer             | sum as bigint acro
ss all integer input values

and I see the patch makes it do this
pg_catalog | sum  | bigint           | (integer)           | sum as bigint acro

which I cannot find to be an improvement, especially since \df does
not look like that.  (As patched, the output of "\df ran*" is positively
schizophrenic.)

One possibility is to forget all the parens and say that the display
looks like
     type1, type2 WITHIN GROUP type3, type4

Please don't object that that doesn't look exactly like the syntax
for calling the function, because it doesn't anyway --- remember you
also need ORDER BY in the call.

Or perhaps we could abbreviate that --- maybe just WITHIN?  Abbreviation
would be a good thing, especially if we're going to say 'VARIADIC "any"'
twice in common cases.  OTOH I'm not sure we can make that work as a
declaration syntax without reserving WITHIN.  I think WITHIN GROUP would
work, though I've not tried to see if bison likes it.

Anyway, the long and the short of this is that the SQL committee's bizarre
choice of syntax for calling these functions should not be followed too
slavishly when declaring them.
        regards, tom lane

Re: WITHIN GROUP patch

От

Tom Lane

Дата:

06 декабря 2013 г., 20:22:19

I wrote:
> One possibility is to forget all the parens and say that the display
> looks like
>       type1, type2 WITHIN GROUP type3, type4
> Please don't object that that doesn't look exactly like the syntax
> for calling the function, because it doesn't anyway --- remember you
> also need ORDER BY in the call.

Actually, now that I think of it, why not use this syntax for declaration
and display purposes:type1, type2 ORDER BY type3, type4
This has nearly as much relationship to the actual calling syntax as the
WITHIN GROUP proposal does, and it's hugely saner from a semantic
standpoint, because after all the ordering columns are ordering columns,
not grouping columns.
        regards, tom lane

Re: WITHIN GROUP patch

От

Andrew Gierth

Дата:

06 декабря 2013 г., 21:07:03

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>> Please don't object that that doesn't look exactly like the syntax>> for calling the function, because it doesn't
anyway--- remember>> you also need ORDER BY in the call.

Tom> Actually, now that I think of it, why not use this syntax forTom> declaration and display purposes:
Tom>     type1, type2 ORDER BY type3, type4
Tom> This has nearly as much relationship to the actual callingTom> syntax as the WITHIN GROUP proposal does,

But unfortunately it looks exactly like the calling sequence for a
normal aggregate with an order by clause - I really think that is
potentially too much confusion. (It's one thing not to look like
the calling syntax, it's another to look exactly like a valid
calling sequence for doing something _different_.)

-- 
Andrew (irc:RhodiumToad)

Re: WITHIN GROUP patch

От

David Johnston

Дата:

06 декабря 2013 г., 21:21:16

Andrew Gierth wrote
>>>>>> "Tom" == Tom Lane <

> tgl@.pa

> > writes:
> 
>  >> Please don't object that that doesn't look exactly like the syntax
>  >> for calling the function, because it doesn't anyway --- remember
>  >> you also need ORDER BY in the call.
> 
>  Tom> Actually, now that I think of it, why not use this syntax for
>  Tom> declaration and display purposes:
> 
>  Tom>     type1, type2 ORDER BY type3, type4
> 
>  Tom> This has nearly as much relationship to the actual calling
>  Tom> syntax as the WITHIN GROUP proposal does,
> 
> But unfortunately it looks exactly like the calling sequence for a
> normal aggregate with an order by clause - I really think that is
> potentially too much confusion. (It's one thing not to look like
> the calling syntax, it's another to look exactly like a valid
> calling sequence for doing something _different_.)
> 
> -- 
> Andrew (irc:RhodiumToad)

How about:

type1, type2 GROUP ORDER type3, type4

OR

GROUP type1, type2 ORDER type3, type4

David J.






--
View this message in context: http://postgresql.1045698.n5.nabble.com/Re-WITHIN-GROUP-patch-tp5773851p5782202.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: WITHIN GROUP patch

От

Tom Lane

Дата:

06 декабря 2013 г., 21:28:08

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom> Actually, now that I think of it, why not use this syntax for
>  Tom> declaration and display purposes:
>  Tom>     type1, type2 ORDER BY type3, type4

> But unfortunately it looks exactly like the calling sequence for a
> normal aggregate with an order by clause - I really think that is
> potentially too much confusion.

I thought about that too, but really that ship sailed long ago, and it
went to sea under the SQL committee's captaincy, so it's not our fault.
There are already at least four different standards-blessed ways you can
use ORDER BY in a query, some of them quite nearby (eg window functions);
so the potential for confusion is there no matter what we do.  In this
case, if we describe ordered-set aggregates using WITHIN GROUP rather than
ORDER BY, we might avoid confusion with the normal-aggregate case, but
instead we will have confusion about what the arguments even do.  Is
semantic confusion better than syntactic confusion?

Another thing to think about here is to wonder why the committee chose
anything as verbose as "agg(...) WITHIN GROUP (ORDER BY ...)" in the
first place.  The words ORDER BY certainly seem pretty unnecessary.
I'm suspicious that they might've been leaving the door open to put other
things into the second set of parens later --- GROUP BY, maybe?  So down
the road, we might regret it if we key off WITHIN GROUP and not ORDER BY.

Having said that, I'm not so dead set on it that I won't take WITHIN GROUP
if that's what more people want.  But we gotta lose the extra parens; they
are just too strange for function declaration/documentation purposes.
        regards, tom lane

Re: WITHIN GROUP patch

От

Andrew Gierth

Дата:

06 декабря 2013 г., 21:31:07

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>> pg_get_function_arguments()'s interface assumes that the caller is>> providing the enclosing parens. Changing it
wouldhave meant>> returning a result like 'float8) WITHIN GROUP (float8' which I>> reckoned would have too much chance
ofbreaking existing callers.

Tom> Well, as it stands, existing callers are broken a fortiori;

Not in most cases, because using the output of
pg_get_function_arguments works in all contexts except for
constructing the CREATE AGGREGATE statement (for example, a function
that uses the pg_get_function_arguments output to construct GRANTs
or ALTER OWNERs would still work).

And since constructing a correct CREATE AGGREGATE statement for an
ordered set function requires that the caller know about the
additional options to supply in the body, this does not seem like a
restriction.
Tom> Of course, if we define the right output as being just theTom> arguments according to pg_proc, it's fine.

The patch accepts the 'just the arguments according to pg_proc' as
input everywhere except in CREATE AGGREGATE, in addition to the
syntax with explicit WITHIN GROUP.

(With the exception of GRANT, as it happens, where since there is no
existing GRANT ON AGGREGATE variant and the patch doesn't add one,
only the pg_proc arguments form is accepted.)
Tom> But your point about the parens is a good one anyway, becauseTom> now that I think about it, what \da has
traditionallyprinted is

Tom>  pg_catalog | sum  | bigint           | integer             | sum as bigint acroTom> ss all integer input values
Tom> and I see the patch makes it do this
Tom>  pg_catalog | sum  | bigint           | (integer)           | sum as bigint acro
Tom> which I cannot find to be an improvement, especially since \dfTom> does not look like that.  (As patched, the
outputof "\df ran*"Tom> is positively schizophrenic.)

Since I don't particularly trust my own judgement on aesthetics, I used
the feedback I got from others when deciding what to do. Frankly I think
this one needs wider input than just you and me arguing over it.

-- 
Andrew (irc:RhodiumToad)

Re: WITHIN GROUP patch

От

Josh Berkus

Дата:

06 декабря 2013 г., 21:35:07

On 12/06/2013 01:30 PM, Andrew Gierth wrote:
> Since I don't particularly trust my own judgement on aesthetics, I used
> the feedback I got from others when deciding what to do. Frankly I think
> this one needs wider input than just you and me arguing over it.

Can someone paste examples of the two syntax alternatives we're talking
about here?  I've lost track.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: WITHIN GROUP patch

От

Andrew Gierth

Дата:

06 декабря 2013 г., 21:51:31

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> Another thing to think about here is to wonder why the committee choseTom> anything as verbose as "agg(...) WITHIN
GROUP(ORDER BY ...)" in theTom> first place.  The words ORDER BY certainly seem pretty unnecessary.
 

All of the ordered-set functions that the spec defines are intimately tied
to ordering of values, and they allow things like DESC ordering to affect
the results, so it seems obvious to me that they used (ORDER BY ...) because
what follows is both syntactically and semantically similar to any other use
of ORDER BY. (Similar logic seems to have been used for "FILTER (WHERE ...)".)
(The name "ordered set function" also seems quite specific.)

So I don't think there's any greater chance of the spec people adding
a WITHIN GROUP (something_other_than_order_by) construct than of them
adding any other awkward piece of syntax - and if they did, it would
represent an entirely new class of aggregate functions, separate from
ordered set functions and no doubt with its own headaches.

(I realize that expecting sanity from the spec writers is perhaps unwise)

-- 
Andrew (irc:RhodiumToad)

Re: WITHIN GROUP patch

От

Tom Lane

Дата:

06 декабря 2013 г., 21:58:23

Josh Berkus <josh@agliodbs.com> writes:
> Can someone paste examples of the two syntax alternatives we're talking
> about here?  I've lost track.

I'll leave it to Andrew to describe/defend exactly what his patch is
doing.  The alternative I'm thinking about is that in CREATE AGGREGATE
as well as \da output, the argument list of an ordered-set aggregate
would look like
     type1, type2, ... ORDER BY type3, type4, ...

if the aggregate only permits a fixed number of ordering columns
(as mode() does for example).  If it permits a variable number of
ordering columns, you could have
     type1, type2, ... ORDER BY [ type3, type4, ... ] VARIADIC something

99% of the time the right-hand part would just be "VARIADIC ANY"
but if an aggregate had need to lock down the ordering column types,
or the leading ordering column types, it could do that.  If it needs
a variable number of direct arguments as well (which in particular
hypothetical set functions do) then you would write
     [ type1, type2, ... ] VARIADIC something ORDER BY VARIADIC something

where we constrain the two "somethings" to be the same.  (Again, these
would *usually* be ANY, but I can envision that an aggregate might want to
constrain the argument types more than that.)  We have to constrain this
case because the underlying pg_proc representation and parser function
lookup code only allow the last argument to be declared VARIADIC.  So
under the hood, this last case would be represented in pg_proc with
proargs looking like just "[ type1, type2, ... ] VARIADIC something",
whereas in the first two cases the proargs representation would contain
all the same entries as above.

We could substitute something else for ORDER BY without doing a lot
of violence to this, if people are really down on that aspect.
        regards, tom lane

Re: WITHIN GROUP patch

От

Andrew Gierth

Дата:

06 декабря 2013 г., 22:12:42

>>>>> "Josh" == Josh Berkus <josh@agliodbs.com> writes:
>> Since I don't particularly trust my own judgement on aesthetics, I>> used the feedback I got from others when
decidingwhat to>> do. Frankly I think this one needs wider input than just you and>> me arguing over it.
 
Josh> Can someone paste examples of the two syntax alternatives we'reJosh> talking about here?  I've lost track.

OK. The starting point is that this is the calling syntax for ordered
set funcs, set by the spec:

SELECT func(foo) WITHIN GROUP (ORDER BY bar) FROM ...

where "foo" and "bar" might be fixed types, or polymorphic or variadic.

So we need to define (with no help from the spec) at least these:
 - what syntax is used in CREATE AGGREGATE to specify the number and   types of parameters for a newly defined "func"
 - what syntax is used to refer to the function in a   GRANT ... ON FUNCTION ...  statement, or ALTER ... OWNER TO ...
 - what should ::regprocedure casts accept as input and produce as   output
 - what output is shown in \df and \da when listing the function's   argument types

The patch as submitted answers those questions as follows:

CREATE AGGREGATE func(integer) WITHIN GROUP (text) ...

GRANT ... ON FUNCTION func(integer,text) ...
(there is no GRANT ... ON AGGREGATE ... (yet))

ALTER AGGREGATE func(integer) WITHIN GROUP (text) OWNER TO ...
ALTER AGGREGATE func(integer,text) OWNER TO ...
ALTER FUNCTION func(integer,text) OWNER TO ...
(all three of those are equivalent)

regprocedure outputs 'func(integer,text)' and accepts only that as
input

postgres=# \df func                            List of functionsSchema | Name | Result data type |     Argument data
types      | Type 
 
--------+------+------------------+-------------------------------+------public | func | text             | (integer)
WITHINGROUP (text) | agg
 
(1 row)

If there's also a func() that isn't an ordered set function, you get
output like this (which provoked tom's "schitzophrenic" comment):

postgres=# \df ran[a-z]{1,5}                                  List of functions  Schema   |   Name   | Result data type
|       Argument data types        |  Type  
 
------------+----------+------------------+-----------------------------------+--------pg_catalog | random   | double
precision|                                   | normalpg_catalog | rangesel | double precision | internal, oid,
internal,integer  | normalpg_catalog | rank     | bigint           |                                   |
windowpg_catalog| rank     | bigint           | (VARIADIC "any") WITHIN GROUP (*) | agg
 
(4 rows)

-- 
Andrew (irc:RhodiumToad)

Re: WITHIN GROUP patch

От

Tom Lane

Дата:

06 декабря 2013 г., 22:41:48

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> The patch as submitted answers those questions as follows:

> CREATE AGGREGATE func(integer) WITHIN GROUP (text) ...

You've glossed over a significant amount of complexity, as shown by your
example that prints WITHIN GROUP (*), a syntax that you've not defined
here.

In the long run we might think it worthwhile to actually store two
separate arglists for ordered-set aggregates; probably, pg_proc.proargs
would just describe the direct arguments and there'd be a second oidvector
in pg_aggregate that would describe the ORDER BY arguments.  This'd let
them be independently VARIADIC, or not.  I'm not proposing we do that
right now, because we don't have any use-cases that aren't sufficiently
handled by the hack of letting a single VARIADIC ANY entry cover both sets
of arguments.  I'd like though that the external syntax not be something
that prevents that from ever happening, and I'm afraid that this (*)
business is cheating enough to be a problem.
        regards, tom lane

Re: WITHIN GROUP patch

От

Andrew Gierth

Дата:

06 декабря 2013 г., 22:57:25

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> 2. For an ordered set function, n must equal aggnfixedargs.  WeTom> treat all n fixed arguments as contributing to
theaggregate'sTom> result collation, but ignore the sort arguments.

>> That doesn't work for getting a sensible collation out of>> percentile_disc applied to a collatable type. (Which
admittedlyis>> an extension to the spec, which allows only numeric and interval,>> but it seems to me to be worth
having.)
Tom> Meh.  I don't think you can have that and also have the behaviorTom> that multiple ORDER BY items aren't
constrainedto have the sameTom> collation; at least not without some rule that amounts to aTom> special case for
percentile_disc,which I'd resist.

What the submitted patch does (as discussed in the comment in
parse_collate) is to treat the sort argument as contributing to the
collation only if there is exactly one sort arg.

Consider a construct like:

select max(common_val) from (select mode() within group (order by textcol) as common_val         from ... group by
othercol)s;

(the same arguments for percentile_disc also apply to mode() and to
any other ordered set function that returns a value chosen from its
input sorted set)

Having this sort of thing not preserve the collation of textcol (or
fail) would be, IMO, surprising and undesirable.

-- 
Andrew (irc:RhodiumToad)

Re: WITHIN GROUP patch

От

Tom Lane

Дата:

06 декабря 2013 г., 23:17:00

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom> Meh.  I don't think you can have that and also have the behavior
>  Tom> that multiple ORDER BY items aren't constrained to have the same
>  Tom> collation; at least not without some rule that amounts to a
>  Tom> special case for percentile_disc, which I'd resist.

> What the submitted patch does (as discussed in the comment in
> parse_collate) is to treat the sort argument as contributing to the
> collation only if there is exactly one sort arg.

Blech :-(

Not wanting to consider the sort args when there's more than one doesn't
square with forcing them to be considered when there's just one.
It's the same aggregate after all, and in general the semantics ought
to be the same whether you give one sort col or three.

We might be able to make this work sanely by considering only argument
positions that were declared something other than ANY (anyelement being
other than that, so this isn't leaving polymorphics in the cold entirely).
This is a bit unlike the normal rules for collation assignment but
it doesn't result in weird semantics changes depending on how many sort
columns you supply.

> Consider a construct like:

> select max(common_val)
>   from (select mode() within group (order by textcol) as common_val
>           from ... group by othercol) s;

AFAICT none of the SQL-spec aggregates expose the kind of case I'm worried
about, because none of the ones that can take multiple sort columns have a
potentially collatable return type.  I don't think that justifies not
thinking ahead, though.
        regards, tom lane

Re: WITHIN GROUP patch

От

Andrew Gierth

Дата:

06 декабря 2013 г., 23:32:50

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> Not wanting to consider the sort args when there's more than oneTom> doesn't square with forcing them to be
consideredwhen there'sTom> just one.  It's the same aggregate after all,

This logic is only applied in the patch to aggregates that _aren't_
hypothetical.

(thinking out loud:) It might be more consistent to also add the
condition that the single sort column not be a variadic arg. And/or
the condition that it be the same type as the result. Or have a flag
in pg_aggregate to say "this agg returns one of its sorted input
values, so preserve the collation".
>> Consider a construct like:
>> select max(common_val)>> from (select mode() within group (order by textcol) as common_val>> from ... group by
othercol)s;

Tom> AFAICT none of the SQL-spec aggregates expose the kind of caseTom> I'm worried about, because none of the ones
thatcan takeTom> multiple sort columns have a potentially collatable return type.

None of the spec's ordered-set functions expose any collation issue at
all, because they _all_ have non-collatable return types, period.

The problem only arises from the desire to make functions like
percentile_disc and mode applicable to collatable types.

-- 
Andrew (irc:RhodiumToad)

Re: WITHIN GROUP patch

От

Tom Lane

Дата:

07 декабря 2013 г., 20:33:53

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom> I believe that the spec requires that the "direct" arguments of
>  Tom> an inverse or hypothetical-set aggregate must not contain any
>  Tom> Vars of the current query level.

> False.

After examining this more closely, ISTM that the direct arguments are
supposed to be processed as if they weren't inside an aggregate call at all.
That being the case, isn't it flat out wrong for check_agg_arguments()
to be examining the agg_ordset list?  It should ignore those expressions
whilst determining the aggregate's semantic level.  As an example, an
upper-level Var in those expressions isn't grounds for deciding that the
aggregate isn't of the current query level.
        regards, tom lane

Re: WITHIN GROUP patch

От

Andrew Gierth

Дата:

07 декабря 2013 г., 21:37:21

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> After examining this more closely, ISTM that the directTom> arguments are supposed to be processed as if they
weren'tinsideTom> an aggregate call at all.  That being the case, isn't it flatTom> out wrong for check_agg_arguments()
tobe examining theTom> agg_ordset list?  It should ignore those expressions whilstTom> determining the aggregate's
semanticlevel.  As an example, anTom> upper-level Var in those expressions isn't grounds for decidingTom> that the
aggregateisn't of the current query level.
 

Hmm... yes, you're probably right; but we'd still have to check somewhere
for improper nesting, no? since not even the direct args are allowed to
contain nested aggregate calls.

-- 
Andrew (irc:RhodiumToad)

Re: WITHIN GROUP patch

От

Tom Lane

Дата:

07 декабря 2013 г., 22:16:40

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom> After examining this more closely, ISTM that the direct
>  Tom> arguments are supposed to be processed as if they weren't inside
>  Tom> an aggregate call at all.  That being the case, isn't it flat
>  Tom> out wrong for check_agg_arguments() to be examining the
>  Tom> agg_ordset list?  It should ignore those expressions whilst
>  Tom> determining the aggregate's semantic level.  As an example, an
>  Tom> upper-level Var in those expressions isn't grounds for deciding
>  Tom> that the aggregate isn't of the current query level.

> Hmm... yes, you're probably right; but we'd still have to check somewhere
> for improper nesting, no? since not even the direct args are allowed to
> contain nested aggregate calls.

Yeah, I had come to that same conclusion while making the changes;
actually, check_agg_arguments needs to look for aggs but not vars there.

In principle we could probably support aggs there if we really wanted to;
but it would result in evaluation-order dependencies among the aggs of a
single query level, which doesn't seem like something we want to take on
for a feature that's not required by spec.
        regards, tom lane

Re: WITHIN GROUP patch

От

Andrew Gierth

Дата:

07 декабря 2013 г., 23:41:18

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>> Hmm... yes, you're probably right; but we'd still have to check>> somewhere for improper nesting, no? since not even
thedirect args>> are allowed to contain nested aggregate calls.

Tom> Yeah, I had come to that same conclusion while making theTom> changes; actually, check_agg_arguments needs to look
foraggsTom> but not vars there.

There's also the question of ungrouped vars, maybe. Consider these two
queries:

select array(select a+sum(x) from (values (0.3),(0.7)) v(a) group by a) from generate_series(1,5) g(x);

select array(select percentile_disc(a) within group (order by x)              from (values (0.3),(0.7)) v(a) group by
a)from generate_series(1,5) g(x);

In both cases the aggregation query is the outer one; but while the first
can return a value, I think the second one has to fail (at least I can't
see any reasonable way of executing it).

-- 
Andrew (irc:RhodiumToad)

Re: WITHIN GROUP patch

От

Tom Lane

Дата:

08 декабря 2013 г., 18:04:55

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> There's also the question of ungrouped vars, maybe. Consider these two
> queries:

> select array(select a+sum(x) from (values (0.3),(0.7)) v(a) group by a)
>   from generate_series(1,5) g(x);

> select array(select percentile_disc(a) within group (order by x)
>                from (values (0.3),(0.7)) v(a) group by a)
>   from generate_series(1,5) g(x);

> In both cases the aggregation query is the outer one; but while the first
> can return a value, I think the second one has to fail (at least I can't
> see any reasonable way of executing it).

Hm, interesting.  So having decided that the agg has level 1, we need to
reject any level-0 vars in the direct parameters, grouped or not.

We could alternatively decide that the agg has level 0, but that doesn't
seem terribly useful, and I think it's not per spec either.  SQL:2008
section 6.9 <set function specification> seems pretty clear that
only aggregated arguments should be considered when determining the
semantic level of an aggregate.  OTOH, I don't see any text there
restricting what can be in the non-aggregated arguments, so maybe the
committee thinks this case is sensible?  Or they just missed it.
        regards, tom lane

Re: WITHIN GROUP patch

От

Andrew Gierth

Дата:

09 декабря 2013 г., 03:14:03

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> We could alternatively decide that the agg has level 0, but thatTom> doesn't seem terribly useful, and I think
it'snot per specTom> either.  SQL:2008 section 6.9 <set function specification> seemsTom> pretty clear that only
aggregatedarguments should be consideredTom> when determining the semantic level of an aggregate.  OTOH, ITom> don't
seeany text there restricting what can be in theTom> non-aggregated arguments, so maybe the committee thinks thisTom>
caseis sensible?  Or they just missed it.
 

My bet is that they missed it, because there's another obvious
oversight there; it doesn't define column references in the FILTER
clause applied to an ordered set function as being aggregated column
references, yet it's clear that this must be the case (since they
filter the set of rows that the aggregated column references refer
to).

-- 
Andrew (irc:RhodiumToad)

Re: WITHIN GROUP patch

От

Peter Eisentraut

Дата:

09 декабря 2013 г., 14:29:11

On 11/21/13, 5:04 AM, Atri Sharma wrote:
> Please find attached the latest patch for WITHIN GROUP. This patch is
> after fixing the merge conflicts.

I would like to see more explanations and examples in the documentation.You introduce this feature with "Ordered set
functionscompute a single

result from an ordered set of input values."  But string_agg, for
example, does that as well, so it's not clear how this is different.
Between ordered aggregates, window functions, and this new feature, it
can get pretty confusing.  Also, the "hypothetical" part should perhaps
be explained in more detail.  The tutorial part of the documentation
contains a nice introduction to window function.  I suggest you add
something like that as well.

In func.sgml, please list the functions in alphabetical order.

Also, don't write "should" when you mean "must".

Re: WITHIN GROUP patch

От

Tom Lane

Дата:

22 декабря 2013 г., 00:18:57

[ still hacking away at this patch ]

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom> Not wanting to consider the sort args when there's more than one
>  Tom> doesn't square with forcing them to be considered when there's
>  Tom> just one.  It's the same aggregate after all,

> This logic is only applied in the patch to aggregates that _aren't_
> hypothetical.

> (thinking out loud:) It might be more consistent to also add the
> condition that the single sort column not be a variadic arg. And/or
> the condition that it be the same type as the result. Or have a flag
> in pg_aggregate to say "this agg returns one of its sorted input
> values, so preserve the collation".

I eventually decided that we were overthinking this problem.  At least
for regular ordered-set aggregates, we can just deem that the collation
of the aggregate is indeterminate unless all the inputs (both direct and
aggregated) agree on the collation to use.  This gives us the right
answer for all the standard aggregates, which have at most one collatable
input, and it's very unclear that anything more complicated would be an
improvement.  I definitely didn't like the hack that was in there that'd
force a sort column to absorb a possibly-unrelated collation.

For hypotheticals, I agree after reading the spec text that we're supposed
to unify the collations of matching hypothetical and aggregated arguments
to determine the collation to use for sorting that column.  I see that
the patch just leaves these columns out of the determination of the
aggregate's result collation.  That's okay for the time being at least,
since we have no hypotheticals with collatable output types, but I wonder
if anyone wants to argue for some other rule (and if so, what)?
        regards, tom lane

Re: WITHIN GROUP patch

От

Andrew Gierth

Дата:

22 декабря 2013 г., 13:37:49

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> I eventually decided that we were overthinking this problem.  AtTom> least for regular ordered-set aggregates, we
canjust deem thatTom> the collation of the aggregate is indeterminate unless all theTom> inputs (both direct and
aggregated)agree on the collation toTom> use.  This gives us the right answer for all the standardTom> aggregates,
whichhave at most one collatable input, and it'sTom> very unclear that anything more complicated would be anTom>
improvement. I definitely didn't like the hack that was inTom> there that'd force a sort column to absorb a
possibly-unrelatedTom>collation.
 

Yeah, I can go along with that, but see below.
Tom> For hypotheticals, I agree after reading the spec text thatTom> we're supposed to unify the collations of matching
hypotheticalTom>and aggregated arguments to determine the collation to use forTom> sorting that column.
 

Yeah, the spec seemed clear enough on that.
Tom> I see that the patch just leaves these columns out of theTom> determination of the aggregate's result collation.
That'sokayTom> for the time being at least, since we have no hypotheticals withTom> collatable output types, but I
wonderif anyone wants to argueTom> for some other rule (and if so, what)?
 

Any alternative seems a bit ad-hoc to me.

The examples I've thought of which would return collatable types are
all ones that would be implemented as plain ordered set functions even
if their logic was in some sense hypothetical. For example you could
envisage a value_prec(x) within group (order by y) that returns the
value of y which sorts immediately before x, but this would just be
declared as value_prec(anyelement) within group (anyelement) rather
than engaging the hypothetical argument stuff. (It's this sort of
thing that suggested pushing down the collation into the sort column
even for non-hypothetical ordered set functions.)

-- 
Andrew (irc:RhodiumToad)

Re: WITHIN GROUP patch

От

Tom Lane

Дата:

22 декабря 2013 г., 16:15:55

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> The examples I've thought of which would return collatable types are
> all ones that would be implemented as plain ordered set functions even
> if their logic was in some sense hypothetical. For example you could
> envisage a value_prec(x) within group (order by y) that returns the
> value of y which sorts immediately before x, but this would just be
> declared as value_prec(anyelement) within group (anyelement) rather
> than engaging the hypothetical argument stuff. (It's this sort of
> thing that suggested pushing down the collation into the sort column
> even for non-hypothetical ordered set functions.)

Meh.  I see no very good reason that we shouldn't insist that the
collation be set on the sort column rather than the other input.
That is, if the alternatives are
    value_prec(x collate "foo") within group (order by y)    value_prec(x) within group (order by y collate "foo")

which one makes it clearer that the ordering is to use collation foo?
The first one is at best unnatural, and at worst action-at-a-distance.
If you think of the sorting as an operation done in advance of
applying value_prec(), which is something the syntax rather encourages
you to think, there's no reason that it should absorb a collation
from a collate clause attached to a higher-level operation.

So I think the spec authors arguably got it wrong for hypotheticals,
and I'm uneager to extrapolate their choice of behavior into
situations where we don't know for certain that the collations of two
arguments must get unified.

More practically, I'm dubious about your assumption that an aggregate
like this needn't be marked hypothetical.  I see that use of
anyelement would be enough to constrain the types to be the same,
but it doesn't ordinarily constrain collations, and I don't think
it should start doing so.  So my reaction to this example is not
that we should hack the behavior for plain ordered-set aggregates,
but that we ought to find a rule that allows result-collation
determination for hypotheticals.  We speculated upthread about
"merge the collations normally, but ignore inputs declared ANY"
and "merge the collations normally, but ignore variadic inputs".
Either of those would get the job done in this example.  I kinda
think we should pick one of these rules and move on.
        regards, tom lane

Re: WITHIN GROUP patch

От

Tom Lane

Дата:

22 декабря 2013 г., 16:30:29

I wrote:
> ... So my reaction to this example is not
> that we should hack the behavior for plain ordered-set aggregates,
> but that we ought to find a rule that allows result-collation
> determination for hypotheticals.  We speculated upthread about
> "merge the collations normally, but ignore inputs declared ANY"
> and "merge the collations normally, but ignore variadic inputs".
> Either of those would get the job done in this example.  I kinda
> think we should pick one of these rules and move on.

Or, really, why don't we just do the same thing I'm advocating for
the plain-ordered-set case?  That is, if there's a single collation
applying to all the collatable inputs, that's the collation to use
for the aggregate; otherwise it has no determinate collation, and
it'll throw an error at runtime if it needs one.  We realized long
ago that we can't throw most need-a-collation errors at parse time,
because the parser lacks information about which functions need to
know a collation to use.  This seems to be in the same category.
        regards, tom lane

Re: WITHIN GROUP patch

От

Tom Lane

Дата:

23 декабря 2013 г., 17:35:13

I wrote:
> Or, really, why don't we just do the same thing I'm advocating for
> the plain-ordered-set case?  That is, if there's a single collation
> applying to all the collatable inputs, that's the collation to use
> for the aggregate; otherwise it has no determinate collation, and
> it'll throw an error at runtime if it needs one.

I went and tried to implement this, and realized that it would take some
pretty significant rewriting of parse_collate.c, because of examples
like this:
    rank(a,b) within group (order by c collate "foo", d collate "bar")

In the current parse_collate logic, it would throw error immediately
upon being told to merge the two explicit-COLLATE results.  We'd
need a way to postpone that error and instead just decide that the
rank aggregate's collation is indeterminate.  While that's perhaps
just a SMOP, it would mean that ordered-set aggregates don't resolve
collation the same way as other functions, which pretty much destroys
the argument for this approach.

What's more, the same problem applies to non-hypothetical ordered-set
aggregates, if they've got more than one sortable input column.

What I'm now thinking we want to do is:

1. Non-hypothetical direct args always contribute to determining the
agg's collation.

2. Hypothetical and aggregated args contribute to the agg's collation
only if the agg is designed so that there is always exactly one
aggregated arg (ie, it's non-variadic with one aggregated arg).
Otherwise we assign their collations per-sort-column and don't merge
them into the aggregate's collation.

This specification ensures that a variadic aggregate doesn't change
behavior depending on how many sort columns there happen to be.
        regards, tom lane

Re: WITHIN GROUP patch

От

Andrew Gierth

Дата:

23 декабря 2013 г., 18:49:03

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> What I'm now thinking we want to do is:
Tom> 1. Non-hypothetical direct args always contribute to determiningTom> the agg's collation.
Tom> 2. Hypothetical and aggregated args contribute to the agg'sTom> collation only if the agg is designed so that
thereis alwaysTom> exactly one aggregated arg (ie, it's non-variadic with oneTom> aggregated arg).  Otherwise we assign
theircollationsTom> per-sort-column and don't merge them into the aggregate'sTom> collation.
 
Tom> This specification ensures that a variadic aggregate doesn'tTom> change behavior depending on how many sort
columnsthere happenTom> to be.
 

Works for me.

-- 
Andrew (irc:RhodiumToad)

Re: WITHIN GROUP patch

От

Tom Lane

Дата:

23 декабря 2013 г., 21:21:11

Atri Sharma <atri.jiit@gmail.com> writes:
> Please find attached the latest patch for WITHIN GROUP. This patch is
> after fixing the merge conflicts.

I've committed this after significant editorialization --- most notably,
I pushed control of the sort step into the aggregate support functions.
I didn't like the way nodeAgg.c had been hacked up to do it the other way.
There's a couple hundred lines of code handling that in orderedsetaggs.c,
which is more or less comparable to the amount of code that didn't get
added to nodeAgg.c, so I think the argument that the original approach
avoided code bloat is bogus.

The main reason I pushed all the new aggregates into a single file
(orderedsetaggs.c) was so they could share a private typedef for the
transition state value.  It's possible that we should expose that
struct so that third-party aggregate functions could leverage the
existing transition-function infrastructure instead of having to
copy-and-paste it.  I wasn't sure where to put it though --- maybe
a new include file would be needed.  Anyway I left the point for
future discussion.
        regards, tom lane

Re: WITHIN GROUP patch

От

Atri Sharma

Дата:

24 декабря 2013 г., 11:07:35


Sent from my iPad

> On 24-Dec-2013, at 2:50, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 
> Atri Sharma <atri.jiit@gmail.com> writes:
>> Please find attached the latest patch for WITHIN GROUP. This patch is
>> after fixing the merge conflicts.
> 
> I've committed this after significant editorialization --- most notably,
> I pushed control of the sort step into the aggregate support functions.
> I didn't like the way nodeAgg.c had been hacked up to do it the other way.
> There's a couple hundred lines of code handling that in orderedsetaggs.c,
> which is more or less comparable to the amount of code that didn't get
> added to nodeAgg.c, so I think the argument that the original approach
> avoided code bloat is bogus.
> 
> The main reason I pushed all the new aggregates into a single file
> (orderedsetaggs.c) was so they could share a private typedef for the
> transition state value.  It's possible that we should expose that
> struct so that third-party aggregate functions could leverage the
> existing transition-function infrastructure instead of having to
> copy-and-paste it.  I wasn't sure where to put it though --- maybe
> a new include file would be needed.  Anyway I left the point for
> future discussion.
> 
>            regards, tom lane

Thank you!

Re: WITHIN GROUP patch

От

Andrew Gierth

Дата:

28 декабря 2013 г., 12:05:03

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> I've committed this after significant editorialization --- mostTom> notably, I pushed control of the sort step
intothe aggregateTom> support functions.
 

Initial tests suggest that your version is ~40% slower than ours on
some workloads.

On my system, this query takes ~950ms using our dev branch of the code,
and ~1050ms on git master (using \timing in psql for timings, and taking
the best of many consecutive runs):

select count(*) from (select percentile_disc(0.5) within group (order by i)         from generate_series(1,3) i,
generate_series(1,100000)j group by j) s;
 

About ~700ms of that is overhead, as tested by running this query with
enable_hashagg=false:

select count(*) from (select j         from generate_series(1,3) i, generate_series(1,100000) j group by j) s;

So your version is taking 350ms for the percentile calculations
compared to 250ms for ours.

-- 
Andrew (irc:RhodiumToad)

Re: WITHIN GROUP patch

От

Tom Lane

Дата:

04 января 2014 г., 23:01:29

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom> I've committed this after significant editorialization --- most
>  Tom> notably, I pushed control of the sort step into the aggregate
>  Tom> support functions.

> Initial tests suggest that your version is ~40% slower than ours on
> some workloads.

I poked at this a bit with perf and oprofile, and concluded that probably
the difference comes from ordered_set_startup() repeating lookups for each
group that could be done just once per query.  I'm not sure I believe the
40% figure; on this particular test case, oprofile breaks down the costs
of ordered_set_startup() like this:
 29        0.0756  postgres                 advance_transition_function 38307    99.9244  postgres
ordered_set_transition
1445      1.0808  postgres                 ordered_set_startup 31418    79.4990  postgres
tuplesort_begin_datum4056     10.2632  postgres                 get_typlenbyvalalign 1445      3.6564  postgres
       ordered_set_startup [self] 734       1.8573  postgres                 MemoryContextAllocZero 510       1.2905
postgres                RegisterExprContextCallback 283       0.7161  postgres                 exprType 247
0.6250 postgres                 get_sortgroupclause_tle 235       0.5946  postgres                 exprCollation 92
  0.2328  postgres                 ReleaseSysCache 78        0.1974  postgres                 SearchSysCache 71
0.1797 postgres                 AggGetAggref 63        0.1594  postgres                 AggCheckCallContext 58
0.1468 postgres                 AllocSetAlloc 46        0.1164  postgres
PrepareSortSupportFromOrderingOp43        0.1088  postgres                 AggGetPerAggEContext 40        0.1012
postgres                get_typlenbyval 39        0.0987  postgres                 palloc0 35        0.0886  postgres
             MemoryContextAlloc 17        0.0430  postgres                 get_sortgroupref_tle 10        0.0253
postgres                tuplesort_begin_common
 

The tuplesort_begin_datum calls are needed regardless --- your version
was just doing them inside nodeAgg rather than externally.  However,
get_typlenbyvalalign and some of the other calls here are to fetch
information that doesn't change across groups; probably we could arrange
to cache that info instead of recomputing it each time.  Still, it doesn't
look like that could save more than 20% of ordered_set_startup, which
itself is still only a few percent of the total query time.

Looking at this example makes me wonder if it wouldn't be worthwhile to
provide a way to reset and re-use a tuplesort object, instead of redoing
all the lookup work involved.  Or maybe just find a way to cache the
catalog lookups that are happening inside tuplesort_begin_datum, which are
about 50% of that function's cost it looks like.  We're paying this same
kind of price for repeated tuplesort setup in the existing nodeAgg code,
if we have an aggregate with ORDER BY or DISTINCT in a grouped query with
many groups.
        regards, tom lane

Re: WITHIN GROUP patch

От

David Rowley

Дата:

05 января 2014 г., 07:54:00

On Sun, Jan 5, 2014 at 12:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Looking at this example makes me wonder if it wouldn't be worthwhile to
provide a way to reset and re-use a tuplesort object, instead of redoing
all the lookup work involved. Or maybe just find a way to cache the
catalog lookups that are happening inside tuplesort_begin_datum, which are
about 50% of that function's cost it looks like. We're paying this same
kind of price for repeated tuplesort setup in the existing nodeAgg code,
if we have an aggregate with ORDER BY or DISTINCT in a grouped query with
many groups.

This sounds very similar to:

http://www.postgresql.org/message-id/CAApHDvrbq348M8dYj-7O4VaE5PS9ZoQ_34rGvaaN1QYXL2SP_A@mail.gmail.com

A reset function was added in the next patch which improved performance in my test case by about 5 times.

Perhaps they can make use of the same function.

Regards

David Rowley

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: WITHIN GROUP patch

От

Andrew Gierth

Дата:

07 января 2014 г., 22:46:38

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>> Initial tests suggest that your version is ~40% slower than ours on>> some workloads.
Tom> I poked at this a bit with perf and oprofile, and concluded thatTom> probably the difference comes from
ordered_set_startup()Tom>repeating lookups for each group that could be done just onceTom> per query.
 

Retesting with your changes shows that the gap is down to 15% but still
present:

work_mem=64MB enable_hashagg=off  (for baseline test)

baseline query (333ms on both versions):
select count(*) from (select j from generate_series(1,3) i,                     generate_series(1,100000) j group by j)
s;

test query:
select count(*) from (select percentile_disc(0.5) within group (order by i)         from generate_series(1,3) i,
     generate_series(1,100000) j group by j) s;
 

On the original patch as supplied: 571ms - 333ms = 238ms
On current master:                 607ms - 333ms = 274ms

Furthermore, I can't help noticing that the increased complexity has
now pretty much negated your original arguments for moving so much of
the work out of nodeAgg.c.

-- 
Andrew (irc:RhodiumToad)

Re: WITHIN GROUP patch

От

Tom Lane

Дата:

07 января 2014 г., 23:28:12

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> Retesting with your changes shows that the gap is down to 15% but still
> present:

There's probably some overhead from traversing advance_transition_function
for each row, which your version wouldn't have done.  15% sounds pretty
high for that though, since advance_transition_function doesn't have much
to do when the transition function is non strict and the state value is
pass-by-value (which "internal" is).  It's possible that we could do
something to micro-optimize that case.

I also wondered if it was possible to omit rechecking AggCheckCallContext
after the first time through in ordered_set_transition().  It seemed
unsafe to perhaps not have that happen at all, since if the point is to
detect misuse then the mere fact that argument 0 isn't null isn't much
comfort.  It strikes me though that now we could test for "first call" by
looking at fcinfo->flinfo->fn_extra, and be pretty sure that we've already
checked the context if that isn't null.  Whether this would save anything
noticeable isn't clear though; I didn't see AggCheckCallContext high in
the profile.

> Furthermore, I can't help noticing that the increased complexity has
> now pretty much negated your original arguments for moving so much of
> the work out of nodeAgg.c.

The key reason for that was, and remains, not having the behavior
hard-wired in nodeAgg; I believe that this design permits things to be
accomplished in aggregate implementation functions that would not have
been possible with the original patch.  I'm willing to accept some code
growth to have that flexibility.
        regards, tom lane

Re: WITHIN GROUP patch

От

Andrew Gierth

Дата:

08 января 2014 г., 00:16:35

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>> Furthermore, I can't help noticing that the increased complexity>> has now pretty much negated your original
argumentsfor moving so>> much of the work out of nodeAgg.c.

Tom> The key reason for that was, and remains, not having theTom> behavior hard-wired in nodeAgg; I believe that this
designTom>permits things to be accomplished in aggregate implementationTom> functions that would not have been possible
withthe originalTom> patch.  I'm willing to accept some code growth to have thatTom> flexibility.

Do you have an actual use case?

-- 
Andrew (irc:RhodiumToad)

Re: WITHIN GROUP patch

От

Tom Lane

Дата:

08 января 2014 г., 00:51:58

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom> The key reason for that was, and remains, not having the
>  Tom> behavior hard-wired in nodeAgg; I believe that this design
>  Tom> permits things to be accomplished in aggregate implementation
>  Tom> functions that would not have been possible with the original
>  Tom> patch.  I'm willing to accept some code growth to have that
>  Tom> flexibility.

> Do you have an actual use case?

Not a concrete example of an aggregate to build, no; but it seemed
plausible to me that a new aggregate might want more control over
the sorting operation than was possible with your patch.  Basically
the only flexibility that was there was to sort a hypothetical row before
or after its peers, right?  Now it's got essentially full control over
the sort parameters.

One idea that comes to mind is that an aggregate that's interested in the
"top N" rows might wish to flip the sort order, so that the rows it wants
come out of the tuplesort first rather than last --- and/or it might want
to tell the tuplesort about the row limitation, so that the bounded-sort
logic can be used.
        regards, tom lane

Re: WITHIN GROUP patch

От

Tom Lane

Дата:

08 января 2014 г., 02:49:55

I wrote:
> There's probably some overhead from traversing advance_transition_function
> for each row, which your version wouldn't have done.  15% sounds pretty
> high for that though, since advance_transition_function doesn't have much
> to do when the transition function is non strict and the state value is
> pass-by-value (which "internal" is).  It's possible that we could do
> something to micro-optimize that case.

The most obvious micro-optimization that's possible there is to avoid
redoing InitFunctionCallInfoData for each row.  I tried this as in the
attached patch.  It seems to be good for about half a percent overall
savings on your example case.  That's not much, but then again this
helps for *all* aggregates, and many of them are cheaper than ordered
aggregates.  I see about 2% overall savings on
    select count(*) from generate_series(1,1000000);
which seems like a more interesting number.

As against that, the roughly-kilobyte-sized FunctionCallInfoData is
no longer just transient data on the stack, but persists for the lifespan
of the query.  We pay that already for plain FuncExpr and OpExpr nodes,
though.

On balance, I'm inclined to apply this; the performance benefit may be
marginal but it seems like it makes advance_transition_function's API
a tad cleaner by reducing the number of distinct structs it's hacking
on.  Comments?

            regards, tom lane

diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 827b009..0dafb51 100644
*** a/src/backend/executor/nodeAgg.c
--- b/src/backend/executor/nodeAgg.c
*************** typedef struct AggStatePerAggData
*** 235,240 ****
--- 235,248 ----
       */

      Tuplesortstate *sortstate;    /* sort object, if DISTINCT or ORDER BY */
+
+     /*
+      * This field is a pre-initialized FunctionCallInfo struct used for
+      * calling this aggregate's transfn.  We save a few cycles per row by not
+      * re-initializing the unchanging fields; which isn't much, but it seems
+      * worth the extra space consumption.
+      */
+     FunctionCallInfoData transfn_fcinfo;
  }    AggStatePerAggData;

  /*
*************** static void initialize_aggregates(AggSta
*** 290,297 ****
                        AggStatePerGroup pergroup);
  static void advance_transition_function(AggState *aggstate,
                              AggStatePerAgg peraggstate,
!                             AggStatePerGroup pergroupstate,
!                             FunctionCallInfoData *fcinfo);
  static void advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup);
  static void process_ordered_aggregate_single(AggState *aggstate,
                                   AggStatePerAgg peraggstate,
--- 298,304 ----
                        AggStatePerGroup pergroup);
  static void advance_transition_function(AggState *aggstate,
                              AggStatePerAgg peraggstate,
!                             AggStatePerGroup pergroupstate);
  static void advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup);
  static void process_ordered_aggregate_single(AggState *aggstate,
                                   AggStatePerAgg peraggstate,
*************** initialize_aggregates(AggState *aggstate
*** 399,419 ****
   * Given new input value(s), advance the transition function of an aggregate.
   *
   * The new values (and null flags) have been preloaded into argument positions
!  * 1 and up in fcinfo, so that we needn't copy them again to pass to the
!  * transition function.  No other fields of fcinfo are assumed valid.
   *
   * It doesn't matter which memory context this is called in.
   */
  static void
  advance_transition_function(AggState *aggstate,
                              AggStatePerAgg peraggstate,
!                             AggStatePerGroup pergroupstate,
!                             FunctionCallInfoData *fcinfo)
  {
!     int            numTransInputs = peraggstate->numTransInputs;
      MemoryContext oldContext;
      Datum        newVal;
-     int            i;

      if (peraggstate->transfn.fn_strict)
      {
--- 406,425 ----
   * Given new input value(s), advance the transition function of an aggregate.
   *
   * The new values (and null flags) have been preloaded into argument positions
!  * 1 and up in peraggstate->transfn_fcinfo, so that we needn't copy them again
!  * to pass to the transition function.  We also expect that the static fields
!  * of the fcinfo are already initialized; that was done by ExecInitAgg().
   *
   * It doesn't matter which memory context this is called in.
   */
  static void
  advance_transition_function(AggState *aggstate,
                              AggStatePerAgg peraggstate,
!                             AggStatePerGroup pergroupstate)
  {
!     FunctionCallInfoData *fcinfo = &peraggstate->transfn_fcinfo;
      MemoryContext oldContext;
      Datum        newVal;

      if (peraggstate->transfn.fn_strict)
      {
*************** advance_transition_function(AggState *ag
*** 421,426 ****
--- 427,435 ----
           * For a strict transfn, nothing happens when there's a NULL input; we
           * just keep the prior transValue.
           */
+         int            numTransInputs = peraggstate->numTransInputs;
+         int            i;
+
          for (i = 1; i <= numTransInputs; i++)
          {
              if (fcinfo->argnull[i])
*************** advance_transition_function(AggState *ag
*** 467,478 ****
      /*
       * OK to call the transition function
       */
-     InitFunctionCallInfoData(*fcinfo, &(peraggstate->transfn),
-                              numTransInputs + 1,
-                              peraggstate->aggCollation,
-                              (void *) aggstate, NULL);
      fcinfo->arg[0] = pergroupstate->transValue;
      fcinfo->argnull[0] = pergroupstate->transValueIsNull;

      newVal = FunctionCallInvoke(fcinfo);

--- 476,484 ----
      /*
       * OK to call the transition function
       */
      fcinfo->arg[0] = pergroupstate->transValue;
      fcinfo->argnull[0] = pergroupstate->transValueIsNull;
+     fcinfo->isnull = false;        /* just in case transfn doesn't set it */

      newVal = FunctionCallInvoke(fcinfo);

*************** advance_aggregates(AggState *aggstate, A
*** 574,592 ****
          else
          {
              /* We can apply the transition function immediately */
!             FunctionCallInfoData fcinfo;

              /* Load values into fcinfo */
              /* Start from 1, since the 0th arg will be the transition value */
              Assert(slot->tts_nvalid >= numTransInputs);
              for (i = 0; i < numTransInputs; i++)
              {
!                 fcinfo.arg[i + 1] = slot->tts_values[i];
!                 fcinfo.argnull[i + 1] = slot->tts_isnull[i];
              }

!             advance_transition_function(aggstate, peraggstate, pergroupstate,
!                                         &fcinfo);
          }
      }
  }
--- 580,597 ----
          else
          {
              /* We can apply the transition function immediately */
!             FunctionCallInfoData *fcinfo = &peraggstate->transfn_fcinfo;

              /* Load values into fcinfo */
              /* Start from 1, since the 0th arg will be the transition value */
              Assert(slot->tts_nvalid >= numTransInputs);
              for (i = 0; i < numTransInputs; i++)
              {
!                 fcinfo->arg[i + 1] = slot->tts_values[i];
!                 fcinfo->argnull[i + 1] = slot->tts_isnull[i];
              }

!             advance_transition_function(aggstate, peraggstate, pergroupstate);
          }
      }
  }
*************** process_ordered_aggregate_single(AggStat
*** 622,638 ****
      MemoryContext workcontext = aggstate->tmpcontext->ecxt_per_tuple_memory;
      MemoryContext oldContext;
      bool        isDistinct = (peraggstate->numDistinctCols > 0);
      Datum       *newVal;
      bool       *isNull;
-     FunctionCallInfoData fcinfo;

      Assert(peraggstate->numDistinctCols < 2);

      tuplesort_performsort(peraggstate->sortstate);

      /* Load the column into argument 1 (arg 0 will be transition value) */
!     newVal = fcinfo.arg + 1;
!     isNull = fcinfo.argnull + 1;

      /*
       * Note: if input type is pass-by-ref, the datums returned by the sort are
--- 627,643 ----
      MemoryContext workcontext = aggstate->tmpcontext->ecxt_per_tuple_memory;
      MemoryContext oldContext;
      bool        isDistinct = (peraggstate->numDistinctCols > 0);
+     FunctionCallInfoData *fcinfo = &peraggstate->transfn_fcinfo;
      Datum       *newVal;
      bool       *isNull;

      Assert(peraggstate->numDistinctCols < 2);

      tuplesort_performsort(peraggstate->sortstate);

      /* Load the column into argument 1 (arg 0 will be transition value) */
!     newVal = fcinfo->arg + 1;
!     isNull = fcinfo->argnull + 1;

      /*
       * Note: if input type is pass-by-ref, the datums returned by the sort are
*************** process_ordered_aggregate_single(AggStat
*** 668,675 ****
          }
          else
          {
!             advance_transition_function(aggstate, peraggstate, pergroupstate,
!                                         &fcinfo);
              /* forget the old value, if any */
              if (!oldIsNull && !peraggstate->inputtypeByVal)
                  pfree(DatumGetPointer(oldVal));
--- 673,679 ----
          }
          else
          {
!             advance_transition_function(aggstate, peraggstate, pergroupstate);
              /* forget the old value, if any */
              if (!oldIsNull && !peraggstate->inputtypeByVal)
                  pfree(DatumGetPointer(oldVal));
*************** process_ordered_aggregate_multi(AggState
*** 704,710 ****
                                  AggStatePerGroup pergroupstate)
  {
      MemoryContext workcontext = aggstate->tmpcontext->ecxt_per_tuple_memory;
!     FunctionCallInfoData fcinfo;
      TupleTableSlot *slot1 = peraggstate->evalslot;
      TupleTableSlot *slot2 = peraggstate->uniqslot;
      int            numTransInputs = peraggstate->numTransInputs;
--- 708,714 ----
                                  AggStatePerGroup pergroupstate)
  {
      MemoryContext workcontext = aggstate->tmpcontext->ecxt_per_tuple_memory;
!     FunctionCallInfoData *fcinfo = &peraggstate->transfn_fcinfo;
      TupleTableSlot *slot1 = peraggstate->evalslot;
      TupleTableSlot *slot2 = peraggstate->uniqslot;
      int            numTransInputs = peraggstate->numTransInputs;
*************** process_ordered_aggregate_multi(AggState
*** 739,750 ****
              /* Start from 1, since the 0th arg will be the transition value */
              for (i = 0; i < numTransInputs; i++)
              {
!                 fcinfo.arg[i + 1] = slot1->tts_values[i];
!                 fcinfo.argnull[i + 1] = slot1->tts_isnull[i];
              }

!             advance_transition_function(aggstate, peraggstate, pergroupstate,
!                                         &fcinfo);

              if (numDistinctCols > 0)
              {
--- 743,753 ----
              /* Start from 1, since the 0th arg will be the transition value */
              for (i = 0; i < numTransInputs; i++)
              {
!                 fcinfo->arg[i + 1] = slot1->tts_values[i];
!                 fcinfo->argnull[i + 1] = slot1->tts_isnull[i];
              }

!             advance_transition_function(aggstate, peraggstate, pergroupstate);

              if (numDistinctCols > 0)
              {
*************** ExecInitAgg(Agg *node, EState *estate, i
*** 1799,1804 ****
--- 1802,1808 ----
                                  &transfnexpr,
                                  &finalfnexpr);

+         /* set up infrastructure for calling the transfn and finalfn */
          fmgr_info(transfn_oid, &peraggstate->transfn);
          fmgr_info_set_expr((Node *) transfnexpr, &peraggstate->transfn);

*************** ExecInitAgg(Agg *node, EState *estate, i
*** 1810,1815 ****
--- 1814,1826 ----

          peraggstate->aggCollation = aggref->inputcollid;

+         InitFunctionCallInfoData(peraggstate->transfn_fcinfo,
+                                  &peraggstate->transfn,
+                                  peraggstate->numTransInputs + 1,
+                                  peraggstate->aggCollation,
+                                  (void *) aggstate, NULL);
+
+         /* get info about relevant datatypes */
          get_typlenbyval(aggref->aggtype,
                          &peraggstate->resulttypeLen,
                          &peraggstate->resulttypeByVal);

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Re: WITHIN GROUP patch

Вложения

Вложения

Вложения