Обсуждение: Support for RANGE ... PRECEDING windows in OVER

Поиск
Список
Период
Сортировка

Support for RANGE ... PRECEDING windows in OVER

От
Craig Ringer
Дата:
Hi all

Since 8.4, PostgreSQL has had extremely useful window function support -
but support for "RANGE PRECEDING / FOLLOWING" windows was dropped late
in 8.4's development in order to get the rest of the feature in, per
http://archives.postgresql.org/pgsql-hackers/2010-02/msg00540.php.

It looks like there was discussion of requiring a new opclass to be
declared for types or otherwise extending opclasses to provide the
information required for RANGE ... PRECEDING / FOLLOWING (
http://www.postgresql.org/message-id/20100211201444.GA28270@svana.org )
. I can't find any sign that it went anywhere beyond some broad
discussion:
http://www.postgresql.org/message-id/13993.1265920013@sss.pgh.pa.us at
the time.

I've missed this feature more than once, and am curious about whether
any more recent changes may have made it cleaner to tackle this, or
whether consensus can be formed on adding the new entries to btree's
opclass to avoid the undesirable explicit lookups of the '+' and '-'
oprators.

Some question seems to remain open about how ranges over
timestamps/intervals should work, but this wasn't elaborated on.

There's been interest in this, eg:

http://pgsql.hackers.free-usenet.eu/[HACKERS]-range-intervals-in-window-function-frames_T66085695_S1


http://grokbase.com/t/postgresql/pgsql-general/105a89gm2n/postgresql-9-0-support-for-range-value-preceding-window-functions



-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Support for RANGE ... PRECEDING windows in OVER

От
Ian Link
Дата:
I am currently looking into this feature. However, as I am quite new to Postgres, I think it might take me a while to get up to speed. Anyways, I would also appreciate another round of discussion on the future of the windowing functions.

Ian Link

Thursday, June 20, 2013 7:24 PM
Hi all

Since 8.4, PostgreSQL has had extremely useful window function support -
but support for "RANGE PRECEDING / FOLLOWING" windows was dropped late
in 8.4's development in order to get the rest of the feature in, per
http://archives.postgresql.org/pgsql-hackers/2010-02/msg00540.php.

It looks like there was discussion of requiring a new opclass to be
declared for types or otherwise extending opclasses to provide the
information required for RANGE ... PRECEDING / FOLLOWING (
http://www.postgresql.org/message-id/20100211201444.GA28270@svana.org )
. I can't find any sign that it went anywhere beyond some broad
discussion:
http://www.postgresql.org/message-id/13993.1265920013@sss.pgh.pa.us at
the time.

I've missed this feature more than once, and am curious about whether
any more recent changes may have made it cleaner to tackle this, or
whether consensus can be formed on adding the new entries to btree's
opclass to avoid the undesirable explicit lookups of the '+' and '-'
oprators.

Some question seems to remain open about how ranges over
timestamps/intervals should work, but this wasn't elaborated on.

There's been interest in this, eg:

http://pgsql.hackers.free-usenet.eu/[HACKERS]-range-intervals-in-window-function-frames_T66085695_S1

http://grokbase.com/t/postgresql/pgsql-general/105a89gm2n/postgresql-9-0-support-for-range-value-preceding-window-functions



Вложения

Re: Support for RANGE ... PRECEDING windows in OVER

От
Craig Ringer
Дата:
On 06/21/2013 10:31 AM, Ian Link wrote:
> I am currently looking into this feature. However, as I am quite new to
> Postgres, I think it might take me a while to get up to speed. Anyways,
> I would also appreciate another round of discussion on the future of the
> windowing functions.

Good to know, and welcome.

I hope the links to the archived discussions on the matter were useful
to you.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Support for RANGE ... PRECEDING windows in OVER

От
Ian Link
Дата:
Thanks! The discussions have been useful, although I am currently just reviewing the code.
I think a good starting point will be to refactor/imrpove the WinGetFuncArgInPartition and WinGetFuncArgInFrame functions.
Tom Lane wrote this about them before comitting the patch:

I'm not terribly happy with the changes you made in WinGetFuncArgInPartition

and WinGetFuncArgInFrame to force the window function mark to not go

past frame start in some modes.  Not only is that pretty ugly, but I

think it can mask bugs in window functions: it's an error for a window

function to fetch a row before what it has set its mark to be, but in

some cases that wouldn't be detected because of this change.  I think

it would be better to revert those changes and find another method of

protecting fetches needed to determine the frame head.  One idea is

to create a separate read pointer that tracks the frame head whenever

actual fetches of the frame head might be needed by update_frameheadpos.

I committed it without changing that, but I think this should be

revisited before trying to add the RANGE value PRECEDING/FOLLOWING

options, because those will substantially expand the number of caseswhere that hack affects the behavior.

I am honestly not 100% certain why these functions have issues, but this seems a good place to start investigating.

Ian Link

Thursday, June 20, 2013 7:37 PM

Good to know, and welcome.

I hope the links to the archived discussions on the matter were useful
to you.

Thursday, June 20, 2013 7:24 PM
Hi all

Since 8.4, PostgreSQL has had extremely useful window function support -
but support for "RANGE PRECEDING / FOLLOWING" windows was dropped late
in 8.4's development in order to get the rest of the feature in, per
http://archives.postgresql.org/pgsql-hackers/2010-02/msg00540.php.

It looks like there was discussion of requiring a new opclass to be
declared for types or otherwise extending opclasses to provide the
information required for RANGE ... PRECEDING / FOLLOWING (
http://www.postgresql.org/message-id/20100211201444.GA28270@svana.org )
. I can't find any sign that it went anywhere beyond some broad
discussion:
http://www.postgresql.org/message-id/13993.1265920013@sss.pgh.pa.us at
the time.

I've missed this feature more than once, and am curious about whether
any more recent changes may have made it cleaner to tackle this, or
whether consensus can be formed on adding the new entries to btree's
opclass to avoid the undesirable explicit lookups of the '+' and '-'
oprators.

Some question seems to remain open about how ranges over
timestamps/intervals should work, but this wasn't elaborated on.

There's been interest in this, eg:

http://pgsql.hackers.free-usenet.eu/[HACKERS]-range-intervals-in-window-function-frames_T66085695_S1

http://grokbase.com/t/postgresql/pgsql-general/105a89gm2n/postgresql-9-0-support-for-range-value-preceding-window-functions



Вложения

Re: Support for RANGE ... PRECEDING windows in OVER

От
Hitoshi Harada
Дата:



On Thu, Jun 20, 2013 at 7:24 PM, Craig Ringer <craig@2ndquadrant.com> wrote: 
I've missed this feature more than once, and am curious about whether
any more recent changes may have made it cleaner to tackle this, or
whether consensus can be formed on adding the new entries to btree's
opclass to avoid the undesirable explicit lookups of the '+' and '-'
oprators.

 

As far as I know the later development didn't add anything to help this conversation.  I initially thought range type or knn gist would add something, but they were something else far from this.  On the other hand, if this makes it, it'll also open doors to range PARTITION BY for CREATE TABLE command, so the impact will be bigger than you may think.

I also later found that we are missing not only notion of '+' or '-', but also notion of 'zero value' in our catalog.  Per spec, RANGE BETWEEN needs to detect ERROR if the offset value is negative, but it is not always easy if you think about interval, numeric types as opposed to int64 used in ROWS BETWEEN.

Thanks,
--
Hitoshi Harada

Re: Support for RANGE ... PRECEDING windows in OVER

От
Craig Ringer
Дата:
On 06/21/2013 05:32 PM, Hitoshi Harada wrote:

> I also later found that we are missing not only notion of '+' or '-',
> but also notion of 'zero value' in our catalog.  Per spec, RANGE BETWEEN
> needs to detect ERROR if the offset value is negative, but it is not
> always easy if you think about interval, numeric types as opposed to
> int64 used in ROWS BETWEEN.

Zero can be tested for with `val = (@ val)` ie `val = abs(val)`. That
should make sense for any type in which the concept of zero makes sense.

Thanks for the warning on that issue.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Support for RANGE ... PRECEDING windows in OVER

От
Hitoshi Harada
Дата:



On Fri, Jun 21, 2013 at 3:20 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
On 06/21/2013 05:32 PM, Hitoshi Harada wrote:

> I also later found that we are missing not only notion of '+' or '-',
> but also notion of 'zero value' in our catalog.  Per spec, RANGE BETWEEN
> needs to detect ERROR if the offset value is negative, but it is not
> always easy if you think about interval, numeric types as opposed to
> int64 used in ROWS BETWEEN.

Zero can be tested for with `val = (@ val)` ie `val = abs(val)`. That
should make sense for any type in which the concept of zero makes sense.


Yeah, I mean, it needs to know if offset is negative or not by testing with zero.  So we need "zero value" or "is_negative function" for each type.

Thanks,
--
Hitoshi Harada

Re: Support for RANGE ... PRECEDING windows in OVER

От
ian link
Дата:
Forgive my ignorance, but I don't entirely understand the problem. What does '+' and '-' refer to exactly?
Thanks!


On Fri, Jun 21, 2013 at 4:35 AM, Hitoshi Harada <umi.tanuki@gmail.com> wrote:



On Fri, Jun 21, 2013 at 3:20 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
On 06/21/2013 05:32 PM, Hitoshi Harada wrote:

> I also later found that we are missing not only notion of '+' or '-',
> but also notion of 'zero value' in our catalog.  Per spec, RANGE BETWEEN
> needs to detect ERROR if the offset value is negative, but it is not
> always easy if you think about interval, numeric types as opposed to
> int64 used in ROWS BETWEEN.

Zero can be tested for with `val = (@ val)` ie `val = abs(val)`. That
should make sense for any type in which the concept of zero makes sense.


Yeah, I mean, it needs to know if offset is negative or not by testing with zero.  So we need "zero value" or "is_negative function" for each type.

Thanks,
--
Hitoshi Harada

Re: Support for RANGE ... PRECEDING windows in OVER

От
Craig Ringer
Дата:
On 06/22/2013 03:30 AM, ian link wrote:
> Forgive my ignorance, but I don't entirely understand the problem. What
> does '+' and '-' refer to exactly?

Consider "RANGE 4.5 PRECEDING'.

You need to be able to test whether, for the current row 'b', any given
row 'a' is within the range (b - 4.5) < a <= b . Not 100% sure about the
< vs <= boundaries, but that's irrelevant for the example.

To test that, you have to be able to do two things: you have to be able
to test whether one value is greater than another, and you have to be
able to add or subtract a constant from one of the values.

Right now, the b-tree access method provides information on the ordering
operators < <= = > >= <> , which provides half the answer. But these
don't give any concept of *distance* - you can test ordinality but not
cardinality.

To implement the "different by 4.5" part, you have to be able to add 4.5
to one value or subtract it from the other.

The obvious way to do that is to look up the function that implements
the '+' or '-' operator, and do:

((OPERATOR(+))(a, 4.5)) > b AND (a <= b)

or

((OPERATOR(-))(b, 4.5)) < a AND (a <= b);

The problem outlined by Tom in prior discussion about this is that
PostgreSQL tries really hard not to assume that particular operator
names mean particular things. Rather than "knowing" that "+" is always
"an operator that adds two values together; is transitive, symmetric and
reflexive", PostgreSQL requires that you define an *operator class* that
names the operator that has those properties.

Or at least, it does for less-than, less-than-or-equals, equals,
greater-than-or-equals, greater-than, and not-equals as part of the
b-tree operator class, which *usually* defines these operators as < <= =
>= > <>, but you could use any operator names you wanted if you really
liked.

Right now (as far as I know) there's no operator class that lets you
identify operators for addition and subtraction in a similar way. So
it's necessary to either add such an operator class (in which case
support has to be added for it for every type), extend the existing
b-tree operator class to provide the info, or blindly assume that "+"
and "-" are always addition and subtraction.

For an example of why such assumptions are a bad idea, consider matrix
multiplication. Normally, "a * b" = "b * a", but this isn't true for
multiplication of matrices. Similarly, if someone defined a "+" operator
as an alias for string concatenation (||), we'd be totally wrong to
assume we could use that for doing range-offset windowing.

So. Yeah. Operator classes required, unless we're going to change the
rules and make certain operator names "special" in PostgreSQL, so that
if you implement them they *must* have certain properties. This seems
like a pretty poor reason to add such a big change.

I hope this explanation (a) is actually correct and (b) is helpful.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Support for RANGE ... PRECEDING windows in OVER

От
ian link
Дата:

Thanks Craig! That definitely does help. I probably still have some questions but I think I will read through the rest of the code before asking. Thanks again!

Ian

> Craig Ringer
> Friday, June 21, 2013 8:41 PM
>
> On 06/22/2013 03:30 AM, ian link wrote:
>>
>> Forgive my ignorance, but I don't entirely understand the problem. What
>> does '+' and '-' refer to exactly?
>
> Consider "RANGE 4.5 PRECEDING'.
>
> You need to be able to test whether, for the current row 'b', any given
> row 'a' is within the range (b - 4.5) < a <= b . Not 100% sure about the
> < vs <= boundaries, but that's irrelevant for the example.
>
> To test that, you have to be able to do two things: you have to be able
> to test whether one value is greater than another, and you have to be
> able to add or subtract a constant from one of the values.
>
> Right now, the b-tree access method provides information on the ordering
> operators < <= = > >= <> , which provides half the answer. But these
> don't give any concept of *distance* - you can test ordinality but not
> cardinality.
>
> To implement the "different by 4.5" part, you have to be able to add 4.5
> to one value or subtract it from the other.
>
> The obvious way to do that is to look up the function that implements
> the '+' or '-' operator, and do:
>
> ((OPERATOR(+))(a, 4.5)) > b AND (a <= b)
>
> or
>
> ((OPERATOR(-))(b, 4.5)) < a AND (a <= b);
>
> The problem outlined by Tom in prior discussion about this is that
> PostgreSQL tries really hard not to assume that particular operator
> names mean particular things. Rather than "knowing" that "+" is always
> "an operator that adds two values together; is transitive, symmetric and
> reflexive", PostgreSQL requires that you define an *operator class* that
> names the operator that has those properties.
>
> Or at least, it does for less-than, less-than-or-equals, equals,
> greater-than-or-equals, greater-than, and not-equals as part of the
> b-tree operator class, which *usually* defines these operators as < <= =
>>
>> = > <>, but you could use any operator names you wanted if you really
>
> liked.
>
> Right now (as far as I know) there's no operator class that lets you
> identify operators for addition and subtraction in a similar way. So
> it's necessary to either add such an operator class (in which case
> support has to be added for it for every type), extend the existing
> b-tree operator class to provide the info, or blindly assume that "+"
> and "-" are always addition and subtraction.
>
> For an example of why such assumptions are a bad idea, consider matrix
> multiplication. Normally, "a * b" = "b * a", but this isn't true for
> multiplication of matrices. Similarly, if someone defined a "+" operator
> as an alias for string concatenation (||), we'd be totally wrong to
> assume we could use that for doing range-offset windowing.
>
> So. Yeah. Operator classes required, unless we're going to change the
> rules and make certain operator names "special" in PostgreSQL, so that
> if you implement them they *must* have certain properties. This seems
> like a pretty poor reason to add such a big change.
>
> I hope this explanation (a) is actually correct and (b) is helpful.
>
> ian link
> Friday, June 21, 2013 12:30 PM
> Forgive my ignorance, but I don't entirely understand the problem. What does '+' and '-' refer to exactly?
> Thanks!
>
>
>
> Hitoshi Harada
> Friday, June 21, 2013 4:35 AM
>
>
>

On 06/22/2013 03:30 AM, ian link wrote:
> Forgive my ignorance, but I don't entirely understand the problem. What
> does '+' and '-' refer to exactly?

Consider "RANGE 4.5 PRECEDING'.

You need to be able to test whether, for the current row 'b', any given
row 'a' is within the range (b - 4.5) < a <= b . Not 100% sure about the
< vs <= boundaries, but that's irrelevant for the example.

To test that, you have to be able to do two things: you have to be able
to test whether one value is greater than another, and you have to be
able to add or subtract a constant from one of the values.

Right now, the b-tree access method provides information on the ordering
operators < <= = > >= <> , which provides half the answer. But these
don't give any concept of *distance* - you can test ordinality but not
cardinality.

To implement the "different by 4.5" part, you have to be able to add 4.5
to one value or subtract it from the other.

The obvious way to do that is to look up the function that implements
the '+' or '-' operator, and do:

((OPERATOR(+))(a, 4.5)) > b AND (a <= b)

or

((OPERATOR(-))(b, 4.5)) < a AND (a <= b);

The problem outlined by Tom in prior discussion about this is that
PostgreSQL tries really hard not to assume that particular operator
names mean particular things. Rather than "knowing" that "+" is always
"an operator that adds two values together; is transitive, symmetric and
reflexive", PostgreSQL requires that you define an *operator class* that
names the operator that has those properties.

Or at least, it does for less-than, less-than-or-equals, equals,
greater-than-or-equals, greater-than, and not-equals as part of the
b-tree operator class, which *usually* defines these operators as < <= =
>= > <>, but you could use any operator names you wanted if you really
liked.

Right now (as far as I know) there's no operator class that lets you
identify operators for addition and subtraction in a similar way. So
it's necessary to either add such an operator class (in which case
support has to be added for it for every type), extend the existing
b-tree operator class to provide the info, or blindly assume that "+"
and "-" are always addition and subtraction.

For an example of why such assumptions are a bad idea, consider matrix
multiplication. Normally, "a * b" = "b * a", but this isn't true for
multiplication of matrices. Similarly, if someone defined a "+" operator
as an alias for string concatenation (||), we'd be totally wrong to
assume we could use that for doing range-offset windowing.

So. Yeah. Operator classes required, unless we're going to change the
rules and make certain operator names "special" in PostgreSQL, so that
if you implement them they *must* have certain properties. This seems
like a pretty poor reason to add such a big change.

I hope this explanation (a) is actually correct and (b) is helpful.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Вложения

Re: Support for RANGE ... PRECEDING windows in OVER

От
ian link
Дата:
I found some time and I think I am up to speed now. I finally figured out how to add new operator strategies and made a little test operator for myself.

It seems pretty clear that assuming '+' and '-' are addition and subtraction is a bad idea. I don't think it would be too tricky to add support for new operator strategies. Andrew Gierth suggested calling these new strategies "offset -" and "offset +", which I think describes it pretty well. I assigned the operator itself to be "@+" and "@-" but that can obviously be changed. If this sounds like a good path to you guys, I will go ahead and implement the operators for the appropriate types. Please let me know if I am misunderstanding something - I am still figuring stuff out :)

Aside from the opclass stuff, there were some other important issues mentioned with the original RANGE support. I think I will address those after the opclass stuff is done.

Thanks!
Ian


On Sat, Jun 22, 2013 at 4:38 PM, ian link <ian@ilink.io> wrote:

Thanks Craig! That definitely does help. I probably still have some questions but I think I will read through the rest of the code before asking. Thanks again!

Ian

> Craig Ringer
> Friday, June 21, 2013 8:41 PM


>
> On 06/22/2013 03:30 AM, ian link wrote:
>>
>> Forgive my ignorance, but I don't entirely understand the problem. What
>> does '+' and '-' refer to exactly?
>
> Consider "RANGE 4.5 PRECEDING'.
>
> You need to be able to test whether, for the current row 'b', any given
> row 'a' is within the range (b - 4.5) < a <= b . Not 100% sure about the
> < vs <= boundaries, but that's irrelevant for the example.
>
> To test that, you have to be able to do two things: you have to be able
> to test whether one value is greater than another, and you have to be
> able to add or subtract a constant from one of the values.
>
> Right now, the b-tree access method provides information on the ordering
> operators < <= = > >= <> , which provides half the answer. But these
> don't give any concept of *distance* - you can test ordinality but not
> cardinality.
>
> To implement the "different by 4.5" part, you have to be able to add 4.5
> to one value or subtract it from the other.
>
> The obvious way to do that is to look up the function that implements
> the '+' or '-' operator, and do:
>
> ((OPERATOR(+))(a, 4.5)) > b AND (a <= b)
>
> or
>
> ((OPERATOR(-))(b, 4.5)) < a AND (a <= b);
>
> The problem outlined by Tom in prior discussion about this is that
> PostgreSQL tries really hard not to assume that particular operator
> names mean particular things. Rather than "knowing" that "+" is always
> "an operator that adds two values together; is transitive, symmetric and
> reflexive", PostgreSQL requires that you define an *operator class* that
> names the operator that has those properties.
>
> Or at least, it does for less-than, less-than-or-equals, equals,
> greater-than-or-equals, greater-than, and not-equals as part of the
> b-tree operator class, which *usually* defines these operators as < <= =
>>
>> = > <>, but you could use any operator names you wanted if you really
>
> liked.
>
> Right now (as far as I know) there's no operator class that lets you
> identify operators for addition and subtraction in a similar way. So
> it's necessary to either add such an operator class (in which case
> support has to be added for it for every type), extend the existing
> b-tree operator class to provide the info, or blindly assume that "+"
> and "-" are always addition and subtraction.
>
> For an example of why such assumptions are a bad idea, consider matrix
> multiplication. Normally, "a * b" = "b * a", but this isn't true for
> multiplication of matrices. Similarly, if someone defined a "+" operator
> as an alias for string concatenation (||), we'd be totally wrong to
> assume we could use that for doing range-offset windowing.
>
> So. Yeah. Operator classes required, unless we're going to change the
> rules and make certain operator names "special" in PostgreSQL, so that
> if you implement them they *must* have certain properties. This seems
> like a pretty poor reason to add such a big change.
>
> I hope this explanation (a) is actually correct and (b) is helpful.
>
> ian link
> Friday, June 21, 2013 12:30 PM

> Forgive my ignorance, but I don't entirely understand the problem. What does '+' and '-' refer to exactly?
> Thanks!
>
>
>
> Hitoshi Harada
> Friday, June 21, 2013 4:35 AM
>
>
>

On 06/22/2013 03:30 AM, ian link wrote:
> Forgive my ignorance, but I don't entirely understand the problem. What
> does '+' and '-' refer to exactly?

Consider "RANGE 4.5 PRECEDING'.

You need to be able to test whether, for the current row 'b', any given
row 'a' is within the range (b - 4.5) < a <= b . Not 100% sure about the
< vs <= boundaries, but that's irrelevant for the example.

To test that, you have to be able to do two things: you have to be able
to test whether one value is greater than another, and you have to be
able to add or subtract a constant from one of the values.

Right now, the b-tree access method provides information on the ordering
operators < <= = > >= <> , which provides half the answer. But these
don't give any concept of *distance* - you can test ordinality but not
cardinality.

To implement the "different by 4.5" part, you have to be able to add 4.5
to one value or subtract it from the other.

The obvious way to do that is to look up the function that implements
the '+' or '-' operator, and do:

((OPERATOR(+))(a, 4.5)) > b AND (a <= b)

or

((OPERATOR(-))(b, 4.5)) < a AND (a <= b);

The problem outlined by Tom in prior discussion about this is that
PostgreSQL tries really hard not to assume that particular operator
names mean particular things. Rather than "knowing" that "+" is always
"an operator that adds two values together; is transitive, symmetric and
reflexive", PostgreSQL requires that you define an *operator class* that
names the operator that has those properties.

Or at least, it does for less-than, less-than-or-equals, equals,
greater-than-or-equals, greater-than, and not-equals as part of the
b-tree operator class, which *usually* defines these operators as < <= =
>= > <>, but you could use any operator names you wanted if you really
liked.

Right now (as far as I know) there's no operator class that lets you
identify operators for addition and subtraction in a similar way. So
it's necessary to either add such an operator class (in which case
support has to be added for it for every type), extend the existing
b-tree operator class to provide the info, or blindly assume that "+"
and "-" are always addition and subtraction.

For an example of why such assumptions are a bad idea, consider matrix
multiplication. Normally, "a * b" = "b * a", but this isn't true for
multiplication of matrices. Similarly, if someone defined a "+" operator
as an alias for string concatenation (||), we'd be totally wrong to
assume we could use that for doing range-offset windowing.

So. Yeah. Operator classes required, unless we're going to change the
rules and make certain operator names "special" in PostgreSQL, so that
if you implement them they *must* have certain properties. This seems
like a pretty poor reason to add such a big change.

I hope this explanation (a) is actually correct and (b) is helpful.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Support for RANGE ... PRECEDING windows in OVER

От
Josh Berkus
Дата:
On 06/30/2013 08:54 PM, ian link wrote:
> I found some time and I think I am up to speed now. I finally figured out
> how to add new operator strategies and made a little test operator for
> myself.
> 
> It seems pretty clear that assuming '+' and '-' are addition and
> subtraction is a bad idea. I don't think it would be too tricky to add
> support for new operator strategies. Andrew Gierth suggested calling these
> new strategies "offset -" and "offset +", which I think describes it pretty
> well. I assigned the operator itself to be "@+" and "@-" but that can
> obviously be changed. If this sounds like a good path to you guys, I will
> go ahead and implement the operators for the appropriate types. Please let
> me know if I am misunderstanding something - I am still figuring stuff out
> :)
> 
> Aside from the opclass stuff, there were some other important issues
> mentioned with the original RANGE support. I think I will address those
> after the opclass stuff is done.

Are these things you plan to get done this week, or for next CommitFest?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Support for RANGE ... PRECEDING windows in OVER

От
ian link
Дата:
<div dir="ltr">Definitely not this week. Hopefully for next commit fest.</div><div class="gmail_extra"><br /><br /><div
class="gmail_quote">OnSun, Jun 30, 2013 at 9:56 PM, Josh Berkus <span dir="ltr"><<a href="mailto:josh@agliodbs.com"
target="_blank">josh@agliodbs.com</a>></span>wrote:<br /><blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px#ccc solid;padding-left:1ex"><div class="im">On 06/30/2013 08:54 PM, ian link wrote:<br /> > I
foundsome time and I think I am up to speed now. I finally figured out<br /> > how to add new operator strategies
andmade a little test operator for<br /> > myself.<br /> ><br /> > It seems pretty clear that assuming '+' and
'-'are addition and<br /> > subtraction is a bad idea. I don't think it would be too tricky to add<br /> >
supportfor new operator strategies. Andrew Gierth suggested calling these<br /> > new strategies "offset -" and
"offset+", which I think describes it pretty<br /> > well. I assigned the operator itself to be "@+" and "@-" but
thatcan<br /> > obviously be changed. If this sounds like a good path to you guys, I will<br /> > go ahead and
implementthe operators for the appropriate types. Please let<br /> > me know if I am misunderstanding something - I
amstill figuring stuff out<br /> > :)<br /> ><br /> > Aside from the opclass stuff, there were some other
importantissues<br /> > mentioned with the original RANGE support. I think I will address those<br /> > after the
opclassstuff is done.<br /><br /></div>Are these things you plan to get done this week, or for next CommitFest?<br
/><spanclass="HOEnZb"><font color="#888888"><br /> --<br /> Josh Berkus<br /> PostgreSQL Experts Inc.<br /><a
href="http://pgexperts.com"target="_blank">http://pgexperts.com</a><br /></font></span></blockquote></div><br /></div> 

Re: Support for RANGE ... PRECEDING windows in OVER

От
Robert Haas
Дата:
On Sun, Jun 30, 2013 at 11:54 PM, ian link <ian@ilink.io> wrote:
> I found some time and I think I am up to speed now. I finally figured out
> how to add new operator strategies and made a little test operator for
> myself.
>
> It seems pretty clear that assuming '+' and '-' are addition and subtraction
> is a bad idea. I don't think it would be too tricky to add support for new
> operator strategies. Andrew Gierth suggested calling these new strategies
> "offset -" and "offset +", which I think describes it pretty well. I
> assigned the operator itself to be "@+" and "@-" but that can obviously be
> changed. If this sounds like a good path to you guys, I will go ahead and
> implement the operators for the appropriate types. Please let me know if I
> am misunderstanding something - I am still figuring stuff out :)

I don't think I understand the design you have in mind.  I'm actually
not clear that it would be all that bad to assume fixed operator
names, as we apparently do in a few places despite the existence of
operator classes.  But if that is bad, then I don't know how using @+
and @- instead helps anything.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Support for RANGE ... PRECEDING windows in OVER

От
Alvaro Herrera
Дата:
Robert Haas escribió:
> On Sun, Jun 30, 2013 at 11:54 PM, ian link <ian@ilink.io> wrote:

> > It seems pretty clear that assuming '+' and '-' are addition and subtraction
> > is a bad idea. I don't think it would be too tricky to add support for new
> > operator strategies. Andrew Gierth suggested calling these new strategies
> > "offset -" and "offset +", which I think describes it pretty well. I
> > assigned the operator itself to be "@+" and "@-" but that can obviously be
> > changed. If this sounds like a good path to you guys, I will go ahead and
> > implement the operators for the appropriate types. Please let me know if I
> > am misunderstanding something - I am still figuring stuff out :)
> 
> I don't think I understand the design you have in mind.  I'm actually
> not clear that it would be all that bad to assume fixed operator
> names, as we apparently do in a few places despite the existence of
> operator classes.  But if that is bad, then I don't know how using @+
> and @- instead helps anything.

Yeah.

Currently, all operator classes are tied to access methods.  Since
nobody seems to have any great idea about creating an access method that
requires addition and subtraction, would it make sense to have operator
classes that exist solely to support keeping track of such operators for
the various datatypes?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Support for RANGE ... PRECEDING windows in OVER

От
Robert Haas
Дата:
On Mon, Jul 1, 2013 at 3:28 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Robert Haas escribió:
>> On Sun, Jun 30, 2013 at 11:54 PM, ian link <ian@ilink.io> wrote:
>
>> > It seems pretty clear that assuming '+' and '-' are addition and subtraction
>> > is a bad idea. I don't think it would be too tricky to add support for new
>> > operator strategies. Andrew Gierth suggested calling these new strategies
>> > "offset -" and "offset +", which I think describes it pretty well. I
>> > assigned the operator itself to be "@+" and "@-" but that can obviously be
>> > changed. If this sounds like a good path to you guys, I will go ahead and
>> > implement the operators for the appropriate types. Please let me know if I
>> > am misunderstanding something - I am still figuring stuff out :)
>>
>> I don't think I understand the design you have in mind.  I'm actually
>> not clear that it would be all that bad to assume fixed operator
>> names, as we apparently do in a few places despite the existence of
>> operator classes.  But if that is bad, then I don't know how using @+
>> and @- instead helps anything.
>
> Yeah.
>
> Currently, all operator classes are tied to access methods.  Since
> nobody seems to have any great idea about creating an access method that
> requires addition and subtraction, would it make sense to have operator
> classes that exist solely to support keeping track of such operators for
> the various datatypes?

I suppose if we really wanted to do this, it would make more sense to
have a new kind of object, maybe CREATE TYPE INTERFACE, rather than
shoehorning it into the operator class machinery.  It seems like a
fairly heavyweight solution, however.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Support for RANGE ... PRECEDING windows in OVER

От
Josh Berkus
Дата:
On 07/01/2013 12:05 AM, ian link wrote:
> Definitely not this week. Hopefully for next commit fest.
> 

OK, marked "Returned with Feedback".  It'll be up to you to add it to
the next commitfest if you think it's ready by then.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Support for RANGE ... PRECEDING windows in OVER

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Jul 1, 2013 at 3:28 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>> Currently, all operator classes are tied to access methods.  Since
>> nobody seems to have any great idea about creating an access method that
>> requires addition and subtraction, would it make sense to have operator
>> classes that exist solely to support keeping track of such operators for
>> the various datatypes?

We certainly could envision adding "+" and "-" items to btree opfamilies,
with the proviso that they'd have to be optional so as not to break
existing extensions that create btree opfamilies.  If we went that way,
I'd suggest that what we actually add to the definition is *not*
operators, but functions --- that is, specify that function slots 2 and
3 can contain addition (resp subtraction) functions that are compatible
with the family's ordering behavior.  Otherwise you'd have the planner
trying to match WHERE clauses to the extra operators, which is somewhere
between wrong and dangerous.  But a function that isn't actually called
by the index AM is not going to pose a hazard of being misapplied.
Besides, it'd likely be easier not harder for the window-function
machinery to work with a function than an operator.

> I suppose if we really wanted to do this, it would make more sense to
> have a new kind of object, maybe CREATE TYPE INTERFACE, rather than
> shoehorning it into the operator class machinery.  It seems like a
> fairly heavyweight solution, however.

Yeah, there's something to be said for not wedging this into the index
opclass infrastructure.  I'd be happier about building such new
infrastructure if we could unify this requirement with Peter's
"transforms" feature, which also seems to need more datatype-related
knowledge than we have in the catalogs now.
        regards, tom lane



Re: Support for RANGE ... PRECEDING windows in OVER

От
Craig Ringer
Дата:
On 07/02/2013 02:39 AM, Robert Haas wrote:
> I'm actually
> not clear that it would be all that bad to assume fixed operator
> names, as we apparently do in a few places despite the existence of
> operator classes.  But if that is bad, then I don't know how using @+
> and @- instead helps anything.

Personally I'm not clear why it's bad to reserve certain fundamental
operators like '+' and '-', requiring that they have particular semantics.

Want to use "+" as an alias for || because your Java programmers are
used to writing + for string concatenation? Um, don't do that.

Existing code would be unaffected since RANGE couldn't ever be used in
existing code. At worst, weird user-defined implementations of "+" and
"-" would result in bizarre window function behaviour if the operators
were unsuitable. Exceeding available memory could certainly be an issue
in cases like "+" as concatenation.

The main advantage I see of adding opclass entries for this is that it
makes it explicit when the operators have semantics suitable for use in
range windows.

I don't have a strong opinion on whether we should just use "+" and "-"
or whether we really need an opclass.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Support for RANGE ... PRECEDING windows in OVER

От
Tom Lane
Дата:
Craig Ringer <craig@2ndquadrant.com> writes:
> On 07/02/2013 02:39 AM, Robert Haas wrote:
>> I'm actually
>> not clear that it would be all that bad to assume fixed operator
>> names, as we apparently do in a few places despite the existence of
>> operator classes.  But if that is bad, then I don't know how using @+
>> and @- instead helps anything.

> Personally I'm not clear why it's bad to reserve certain fundamental
> operators like '+' and '-', requiring that they have particular semantics.

It is bad.  It's against project policy, not least because we have
assorted *existing* datatypes for which "obvious" operator names like
"=" do not have all the properties you might expect.

If you need a more concrete example of why that sort of thinking is
bad, you might consider the difference between < and ~<~ for type text.
If we hard-wired knowledge about operator behavior to operator names,
it would be impossible for the system to understand that both of those
operators represent sorting-related behaviors.

Or to be even more concrete: if we allow RANGE to suppose that there's
only one possible definition of "+" for a datatype, we're effectively
supposing that there's only one possible sort ordering for that type.
Which is already a wrong assumption, and has been since Postgres was
still at Berkeley.  If you go this way, you won't be able to support
both WINDOW ... ORDER BY foo USING < RANGE ... and WINDOW ... ORDER BY
foo USING ~<~ RANGE ... because you won't know which addition operator
to apply.

(And yeah, I'm aware that the SQL standard only expects RANGE to support
sort keys that are of numeric, datetime, or interval type.  I would hope
that we have higher expectations than that.  Even if we don't, it's not
exactly hard to credit that people might have multiple ideas about how
to sort interval values.)

There are indeed still some places where we rely on operator names to
mean something, but we need to get away from that idea not add more.
Ideally, any property the system understands about an operator or
function should be explicitly declared through opclass membership or
some similar representation.  We've made substantial progress in that
direction in the last fifteen years.  I don't want to reverse that
progress in the name of minor expediencies, especially not ones that
fail to support flexibility that has been in the system for a couple
or three decades already.
        regards, tom lane



Re: Support for RANGE ... PRECEDING windows in OVER

От
ian link
Дата:
I'm fine with moving the operators over to functions. I just don't want to implement anything that is against best practice. If we are OK with that direction, I'll go ahead and start on the new patch.

Ian


On Mon, Jul 1, 2013 at 9:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Craig Ringer <craig@2ndquadrant.com> writes:
> On 07/02/2013 02:39 AM, Robert Haas wrote:
>> I'm actually
>> not clear that it would be all that bad to assume fixed operator
>> names, as we apparently do in a few places despite the existence of
>> operator classes.  But if that is bad, then I don't know how using @+
>> and @- instead helps anything.

> Personally I'm not clear why it's bad to reserve certain fundamental
> operators like '+' and '-', requiring that they have particular semantics.

It is bad.  It's against project policy, not least because we have
assorted *existing* datatypes for which "obvious" operator names like
"=" do not have all the properties you might expect.

If you need a more concrete example of why that sort of thinking is
bad, you might consider the difference between < and ~<~ for type text.
If we hard-wired knowledge about operator behavior to operator names,
it would be impossible for the system to understand that both of those
operators represent sorting-related behaviors.

Or to be even more concrete: if we allow RANGE to suppose that there's
only one possible definition of "+" for a datatype, we're effectively
supposing that there's only one possible sort ordering for that type.
Which is already a wrong assumption, and has been since Postgres was
still at Berkeley.  If you go this way, you won't be able to support
both WINDOW ... ORDER BY foo USING < RANGE ... and WINDOW ... ORDER BY
foo USING ~<~ RANGE ... because you won't know which addition operator
to apply.

(And yeah, I'm aware that the SQL standard only expects RANGE to support
sort keys that are of numeric, datetime, or interval type.  I would hope
that we have higher expectations than that.  Even if we don't, it's not
exactly hard to credit that people might have multiple ideas about how
to sort interval values.)

There are indeed still some places where we rely on operator names to
mean something, but we need to get away from that idea not add more.
Ideally, any property the system understands about an operator or
function should be explicitly declared through opclass membership or
some similar representation.  We've made substantial progress in that
direction in the last fifteen years.  I don't want to reverse that
progress in the name of minor expediencies, especially not ones that
fail to support flexibility that has been in the system for a couple
or three decades already.

                        regards, tom lane