Re: Support for RANGE ... PRECEDING windows in OVER

Поиск
Список
Период
Сортировка
От ian link
Тема Re: Support for RANGE ... PRECEDING windows in OVER
Дата
Msg-id CAOOwM5KajDqb=M8EXoQaV2nmDAo8AvE8H26HnxC_obtQf2_DNw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Support for RANGE ... PRECEDING windows in OVER  (ian link <ian@ilink.io>)
Ответы Re: Support for RANGE ... PRECEDING windows in OVER  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
I found some time and I think I am up to speed now. I finally figured out how to add new operator strategies and made a little test operator for myself.

It seems pretty clear that assuming '+' and '-' are addition and subtraction is a bad idea. I don't think it would be too tricky to add support for new operator strategies. Andrew Gierth suggested calling these new strategies "offset -" and "offset +", which I think describes it pretty well. I assigned the operator itself to be "@+" and "@-" but that can obviously be changed. If this sounds like a good path to you guys, I will go ahead and implement the operators for the appropriate types. Please let me know if I am misunderstanding something - I am still figuring stuff out :)

Aside from the opclass stuff, there were some other important issues mentioned with the original RANGE support. I think I will address those after the opclass stuff is done.

Thanks!
Ian


On Sat, Jun 22, 2013 at 4:38 PM, ian link <ian@ilink.io> wrote:

Thanks Craig! That definitely does help. I probably still have some questions but I think I will read through the rest of the code before asking. Thanks again!

Ian

> Craig Ringer
> Friday, June 21, 2013 8:41 PM


>
> On 06/22/2013 03:30 AM, ian link wrote:
>>
>> Forgive my ignorance, but I don't entirely understand the problem. What
>> does '+' and '-' refer to exactly?
>
> Consider "RANGE 4.5 PRECEDING'.
>
> You need to be able to test whether, for the current row 'b', any given
> row 'a' is within the range (b - 4.5) < a <= b . Not 100% sure about the
> < vs <= boundaries, but that's irrelevant for the example.
>
> To test that, you have to be able to do two things: you have to be able
> to test whether one value is greater than another, and you have to be
> able to add or subtract a constant from one of the values.
>
> Right now, the b-tree access method provides information on the ordering
> operators < <= = > >= <> , which provides half the answer. But these
> don't give any concept of *distance* - you can test ordinality but not
> cardinality.
>
> To implement the "different by 4.5" part, you have to be able to add 4.5
> to one value or subtract it from the other.
>
> The obvious way to do that is to look up the function that implements
> the '+' or '-' operator, and do:
>
> ((OPERATOR(+))(a, 4.5)) > b AND (a <= b)
>
> or
>
> ((OPERATOR(-))(b, 4.5)) < a AND (a <= b);
>
> The problem outlined by Tom in prior discussion about this is that
> PostgreSQL tries really hard not to assume that particular operator
> names mean particular things. Rather than "knowing" that "+" is always
> "an operator that adds two values together; is transitive, symmetric and
> reflexive", PostgreSQL requires that you define an *operator class* that
> names the operator that has those properties.
>
> Or at least, it does for less-than, less-than-or-equals, equals,
> greater-than-or-equals, greater-than, and not-equals as part of the
> b-tree operator class, which *usually* defines these operators as < <= =
>>
>> = > <>, but you could use any operator names you wanted if you really
>
> liked.
>
> Right now (as far as I know) there's no operator class that lets you
> identify operators for addition and subtraction in a similar way. So
> it's necessary to either add such an operator class (in which case
> support has to be added for it for every type), extend the existing
> b-tree operator class to provide the info, or blindly assume that "+"
> and "-" are always addition and subtraction.
>
> For an example of why such assumptions are a bad idea, consider matrix
> multiplication. Normally, "a * b" = "b * a", but this isn't true for
> multiplication of matrices. Similarly, if someone defined a "+" operator
> as an alias for string concatenation (||), we'd be totally wrong to
> assume we could use that for doing range-offset windowing.
>
> So. Yeah. Operator classes required, unless we're going to change the
> rules and make certain operator names "special" in PostgreSQL, so that
> if you implement them they *must* have certain properties. This seems
> like a pretty poor reason to add such a big change.
>
> I hope this explanation (a) is actually correct and (b) is helpful.
>
> ian link
> Friday, June 21, 2013 12:30 PM

> Forgive my ignorance, but I don't entirely understand the problem. What does '+' and '-' refer to exactly?
> Thanks!
>
>
>
> Hitoshi Harada
> Friday, June 21, 2013 4:35 AM
>
>
>

On 06/22/2013 03:30 AM, ian link wrote:
> Forgive my ignorance, but I don't entirely understand the problem. What
> does '+' and '-' refer to exactly?

Consider "RANGE 4.5 PRECEDING'.

You need to be able to test whether, for the current row 'b', any given
row 'a' is within the range (b - 4.5) < a <= b . Not 100% sure about the
< vs <= boundaries, but that's irrelevant for the example.

To test that, you have to be able to do two things: you have to be able
to test whether one value is greater than another, and you have to be
able to add or subtract a constant from one of the values.

Right now, the b-tree access method provides information on the ordering
operators < <= = > >= <> , which provides half the answer. But these
don't give any concept of *distance* - you can test ordinality but not
cardinality.

To implement the "different by 4.5" part, you have to be able to add 4.5
to one value or subtract it from the other.

The obvious way to do that is to look up the function that implements
the '+' or '-' operator, and do:

((OPERATOR(+))(a, 4.5)) > b AND (a <= b)

or

((OPERATOR(-))(b, 4.5)) < a AND (a <= b);

The problem outlined by Tom in prior discussion about this is that
PostgreSQL tries really hard not to assume that particular operator
names mean particular things. Rather than "knowing" that "+" is always
"an operator that adds two values together; is transitive, symmetric and
reflexive", PostgreSQL requires that you define an *operator class* that
names the operator that has those properties.

Or at least, it does for less-than, less-than-or-equals, equals,
greater-than-or-equals, greater-than, and not-equals as part of the
b-tree operator class, which *usually* defines these operators as < <= =
>= > <>, but you could use any operator names you wanted if you really
liked.

Right now (as far as I know) there's no operator class that lets you
identify operators for addition and subtraction in a similar way. So
it's necessary to either add such an operator class (in which case
support has to be added for it for every type), extend the existing
b-tree operator class to provide the info, or blindly assume that "+"
and "-" are always addition and subtraction.

For an example of why such assumptions are a bad idea, consider matrix
multiplication. Normally, "a * b" = "b * a", but this isn't true for
multiplication of matrices. Similarly, if someone defined a "+" operator
as an alias for string concatenation (||), we'd be totally wrong to
assume we could use that for doing range-offset windowing.

So. Yeah. Operator classes required, unless we're going to change the
rules and make certain operator names "special" in PostgreSQL, so that
if you implement them they *must* have certain properties. This seems
like a pretty poor reason to add such a big change.

I hope this explanation (a) is actually correct and (b) is helpful.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Josh Berkus
Дата:
Сообщение: Re: New regression test time
Следующее
От: Josh Berkus
Дата:
Сообщение: Re: Eliminating PD_ALL_VISIBLE, take 2