Обсуждение: Possible marginally-incompatible change to array subscripting
I'm reviewing Yury Zhuravlev's patch to allow array slice boundaries to be omitted, for example "a[4:]" means "the slice extending from element 4 to the last element of a". It strikes me that there's an improvement we could easily make for the case where a mixture of slice and non-slice syntax appears, that is something like "a[3:4][5]". Now, this has always meant a slice, and the way we've traditionally managed that is to treat simple subscripts as being the range upper bound with a lower bound of 1; that is, what this example means is exactly "a[3:4][1:5]". ISTM that if we'd had Yury's code in there from the beginning, what we would define this as meaning is "a[3:4][:5]", ie the implied range runs from whatever the array lower bound is up to the specified subscript. This would make no difference of course for the common case where the array lower bound is 1, but it seems a lot less arbitrary when it isn't. So I think we should strongly consider changing it to mean that, even though it would be non-backwards-compatible in such cases. Comments? regards, tom lane
On Tue, Dec 22, 2015 at 11:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm reviewing Yury Zhuravlev's patch to allow array slice boundaries to be > omitted, for example "a[4:]" means "the slice extending from element 4 to > the last element of a". It strikes me that there's an improvement we > could easily make for the case where a mixture of slice and non-slice > syntax appears, that is something like "a[3:4][5]". Now, this has always > meant a slice, and the way we've traditionally managed that is to treat > simple subscripts as being the range upper bound with a lower bound of 1; > that is, what this example means is exactly "a[3:4][1:5]". > > ISTM that if we'd had Yury's code in there from the beginning, what we > would define this as meaning is "a[3:4][:5]", ie the implied range runs > from whatever the array lower bound is up to the specified subscript. > > This would make no difference of course for the common case where the > array lower bound is 1, but it seems a lot less arbitrary when it isn't. > So I think we should strongly consider changing it to mean that, even > though it would be non-backwards-compatible in such cases. > > Comments? Gosh, our arrays are strange. I would have expected a[3:4][5] to mean a[3:4][5:5]. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> This would make no difference of course for the common case where the > array lower bound is 1, but it seems a lot less arbitrary when it isn't. > So I think we should strongly consider changing it to mean that, even > though it would be non-backwards-compatible in such cases. > > Comments? If you break backwards compatibility, it can be done arrays similar to C/C++/Python/Ruby and other languages style? I'm sorry to bring up this thread again... > ISTM that if we'd had Yury's code in there from the beginning, what we > would define this as meaning is "a[3:4][:5]", ie the implied range runs > from whatever the array lower bound is up to the specified subscript. [3:4][:5] instead a[3:4][5] at least this is logical. But after what will result from a[3:4][5]? One element? Thanks. -- Yury Zhuravlev Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Dec 22, 2015 at 11:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> ISTM that if we'd had Yury's code in there from the beginning, what we >> would define this as meaning is "a[3:4][:5]", ie the implied range runs >> from whatever the array lower bound is up to the specified subscript. > Gosh, our arrays are strange. I would have expected a[3:4][5] to mean > a[3:4][5:5]. Yeah, probably, now that you mention it ... but that seems like too much of a compatibility break. Or does anyone want to argue for just doing that and never mind the compatibility issues? This is a pretty weird corner case already; there can't be very many people relying on it. Another point worth realizing is that the implicit insertion of "1:" happens in the parser, meaning that existing stored views/rules will dump out with that added and hence aren't going to change meaning no matter what we decide here. (BTW, now that I've read the patch a bit further, it actually silently changed the semantics as I'm suggesting already. We could undo that without too much extra code, but I feel that we shouldn't. Robert's idea seems like a plausible alternative, but it would take a nontrivial amount of code to implement it unless we are willing to double-evaluate such a subscript.) regards, tom lane
Yury Zhuravlev <u.zhuravlev@postgrespro.ru> writes: > If you break backwards compatibility, it can be done arrays > similar to C/C++/Python/Ruby and other languages style? > I'm sorry to bring up this thread again... I am not sure just exactly how incompatible that would be, but surely it would break enormously more code than what we're discussing here. So no, I don't think any such proposal has a chance. There are degrees of incompatibility, and considering a small/narrow one does not mean that we'd also consider major breakage. regards, tom lane
On Tue, Dec 22, 2015 at 12:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Dec 22, 2015 at 11:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> ISTM that if we'd had Yury's code in there from the beginning, what we >>> would define this as meaning is "a[3:4][:5]", ie the implied range runs >>> from whatever the array lower bound is up to the specified subscript. > >> Gosh, our arrays are strange. I would have expected a[3:4][5] to mean >> a[3:4][5:5]. > > Yeah, probably, now that you mention it ... but that seems like too much > of a compatibility break. Or does anyone want to argue for just doing > that and never mind the compatibility issues? This is a pretty weird > corner case already; there can't be very many people relying on it. To be honest, I'd be inclined not to change the semantics at all. But that's just me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 12/22/2015 10:01 AM, Robert Haas wrote: > On Tue, Dec 22, 2015 at 12:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> On Tue, Dec 22, 2015 at 11:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> ISTM that if we'd had Yury's code in there from the beginning, what we >>>> would define this as meaning is "a[3:4][:5]", ie the implied range runs >>>> from whatever the array lower bound is up to the specified subscript. >> >>> Gosh, our arrays are strange. I would have expected a[3:4][5] to mean >>> a[3:4][5:5]. >> >> Yeah, probably, now that you mention it ... but that seems like too much >> of a compatibility break. Or does anyone want to argue for just doing >> that and never mind the compatibility issues? This is a pretty weird >> corner case already; there can't be very many people relying on it. > > To be honest, I'd be inclined not to change the semantics at all. But > that's just me. > I think a sane approach is better than a safe approach. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Centered full stack support, consulting and development. Announcing "I'm offended" is basically telling the world you can't control your own emotions, so everyone else should do it for you.
2015-12-22 18:34 GMT+01:00 Robert Haas <robertmhaas@gmail.com>:
On Tue, Dec 22, 2015 at 11:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm reviewing Yury Zhuravlev's patch to allow array slice boundaries to be
> omitted, for example "a[4:]" means "the slice extending from element 4 to
> the last element of a". It strikes me that there's an improvement we
> could easily make for the case where a mixture of slice and non-slice
> syntax appears, that is something like "a[3:4][5]". Now, this has always
> meant a slice, and the way we've traditionally managed that is to treat
> simple subscripts as being the range upper bound with a lower bound of 1;
> that is, what this example means is exactly "a[3:4][1:5]".
>
> ISTM that if we'd had Yury's code in there from the beginning, what we
> would define this as meaning is "a[3:4][:5]", ie the implied range runs
> from whatever the array lower bound is up to the specified subscript.
>
> This would make no difference of course for the common case where the
> array lower bound is 1, but it seems a lot less arbitrary when it isn't.
> So I think we should strongly consider changing it to mean that, even
> though it would be non-backwards-compatible in such cases.
>
> Comments?
Gosh, our arrays are strange. I would have expected a[3:4][5] to mean
a[3:4][5:5].
exactly,
Pavel
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Pavel Stehule <pavel.stehule@gmail.com> writes: > 2015-12-22 18:34 GMT+01:00 Robert Haas <robertmhaas@gmail.com>: >> On Tue, Dec 22, 2015 at 11:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> ISTM that if we'd had Yury's code in there from the beginning, what we >>> would define this as meaning is "a[3:4][:5]", ie the implied range runs >>> from whatever the array lower bound is up to the specified subscript. >> Gosh, our arrays are strange. I would have expected a[3:4][5] to mean >> a[3:4][5:5]. > exactly, Since it's not clear that we've got consensus on doing anything differently, I've adjusted the current patch to preserve the existing behavior here (and added some regression tests showing that behavior). If we do decide to change it, it'd be more appropriate to make that change in a separate commit, anyway. regards, tom lane
On 12/22/15 12:01 PM, Tom Lane wrote: > Yury Zhuravlev <u.zhuravlev@postgrespro.ru> writes: >> If you break backwards compatibility, it can be done arrays >> similar to C/C++/Python/Ruby and other languages style? >> I'm sorry to bring up this thread again... > > I am not sure just exactly how incompatible that would be, but surely it > would break enormously more code than what we're discussing here. > So no, I don't think any such proposal has a chance. There are degrees > of incompatibility, and considering a small/narrow one does not mean that > we'd also consider major breakage. As I see it, the biggest problem with our arrays is that they can't decide if they're a simple array (which means >1 dimension is an array of arrays) or a matrix (all slices in a dimension must be the same size). They seem to be more like matricies than arrays, but then there's a bunch of places that completely ignore dimensionality. It would be nice to standardize them one way or another, but it seems like the breakage from that would be horrific. One could theoretically construct a custom "type" that followed more traditional semantics, but then you'd lose all the syntax... which I suspect would make any such "type" all but unusable. The other problem would be having it deal with any other data type, but at least there's ways you can work around that for the most part. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
Jim Nasby <Jim.Nasby@BlueTreble.com> writes: > One could theoretically construct a custom "type" that followed more > traditional semantics, but then you'd lose all the syntax... which I > suspect would make any such "type" all but unusable. The other problem > would be having it deal with any other data type, but at least there's > ways you can work around that for the most part. Yeah. We've speculated a bit about allowing other datatypes to have access to the subscript syntax, which could be modeled as allowing 'a[b]' to be an overloadable operator. That seems possibly doable if someone wanted to put time into it. However, that still leaves a heck of a lot of functionality on the table, such as automatic creation of array types corresponding to new scalar types, not to mention the parser's understanding of "anyarray" vs "anyelement" polymorphism. I have no idea how we might make those things extensible. regards, tom lane