Обсуждение: [BUGS] BUG #14494: Regression - Null arrays are not queryable

Поиск
Список
Период
Сортировка

[BUGS] BUG #14494: Regression - Null arrays are not queryable

От
linasvepstas@gmail.com
Дата:
The following bug has been logged on the website:

Bug reference:      14494
Logged by:          Linas Vepstas
Email address:      linasvepstas@gmail.com
PostgreSQL version: 9.6.1
Operating system:   Debian unstable
Description:

Array handling appears to have regressed,
with lots of surprising results for empty
and null arrays.

CREATE TABLE foo (
  name    TEXT,
  outgoing BIGINT[] 
);

INSERT INTO FOO VALUES ('one', '{43, 67}');
INSERT INTO FOO VALUES ('two', '{}');
INSERT INTO FOO VALUES ('three', null);

test=> select * from foo;
 name  | outgoing 
-------+----------
 one   | {43,67}
 two   | {}
 three | 
(3 rows)

very good!

test=> select * from foo where outgoing='{}';
 name | outgoing 
------+----------
 two  | {}
(1 row)

very good!

test=> select * from foo where outgoing=null;
 name | outgoing 
------+----------
(0 rows)

unexpected: wanted one row here.

test=> select * from foo where outgoing != '{43,67}';
 name | outgoing 
------+----------
 two  | {}
(1 row)

unexpected: wanted two rows here. Where did the null row go?
test=> select * from foo where array_length(outgoing,1)=2 ;
 name | outgoing 
------+----------
 one  | {43,67}
(1 row)

very good

test=> select * from foo where array_length(outgoing,1)=0 ;
 name | outgoing 
------+----------
(0 rows)

Huh? there is at least one array whose length is zero, and another that
doesn't have a length...



--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14494: Regression - Null arrays are not queryable

От
Andres Freund
Дата:
Hi,

On 2017-01-13 18:38:24 +0000, linasvepstas@gmail.com wrote:
> The following bug has been logged on the website:
> 
> Bug reference:      14494
> Logged by:          Linas Vepstas
> Email address:      linasvepstas@gmail.com
> PostgreSQL version: 9.6.1
> Operating system:   Debian unstable
> Description:        
> 
> Array handling appears to have regressed,

Regressed from what?  Because these don't look like they'd have changed
recently.

> test=> select * from foo where outgoing=null;
>  name | outgoing 
> ------+----------
> (0 rows)
> 
> unexpected: wanted one row here.

Comparisons with NULL result in NULL, which evaluates to false in a
WHERE clause.  Try WHERE outgoing IS NULL;


> test=> select * from foo where outgoing != '{43,67}';
>  name | outgoing 
> ------+----------
>  two  | {}
> (1 row)
> 
> unexpected: wanted two rows here. Where did the null row go?

Same answer as above. WHERE outgoing IS DISTINCT FROM ...;

> 
> test=> select * from foo where array_length(outgoing,1)=0 ;
>  name | outgoing 
> ------+----------
> (0 rows)
> 
> Huh? there is at least one array whose length is zero, 

An empty array doesn't have a dimension 1 and thus results in NULL. Yes,
that's a bit confusing.

> and another that doesn't have a length...

You mean the NULL one? Why would you expect a 0 as a result then?

Greetings,

Andres Freund


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14494: Regression - Null arrays are not queryable

От
"David G. Johnston"
Дата:
On Fri, Jan 13, 2017 at 11:38 AM, <linasvepstas@gmail.com> wrote:
The following bug has been logged on the website:

Bug reference:      14494
Logged by:          Linas Vepstas
Email address:      linasvepstas@gmail.com
PostgreSQL version: 9.6.1
Operating system:   Debian unstable
Description:

Array handling appears to have regressed,

​From what?
 
with lots of surprising results for empty
and null arrays.

 

test=> select * from foo where outgoing=null;
 name | outgoing
------+----------
(0 rows) 

unexpected: wanted one row here.


​​While you can configure the system to yield "true" for "null = null" by default it does not - I suspect your prior system has this configuration option turned on.  This is not specific to arrays.


test=> select * from foo where outgoing != '{43,67}';
 name | outgoing
------+----------
 two  | {}
(1 row) 

unexpected: wanted two rows here. Where did the null row go?

​Again, null != (non-null) yields null, not true​, and would work this way regardless of the option mentioned above...


test=> select * from foo where array_length(outgoing,1)=0 ;
 name | outgoing
------+----------
(0 rows)

Huh? there is at least one array whose length is zero, and another that
doesn't have a length...

Empty arrays and nulls, when provided to array_length, report NULL for all dimensions.​

​David J.​

Re: [BUGS] BUG #14494: Regression - Null arrays are not queryable

От
Linas Vepstas
Дата:


On Fri, Jan 13, 2017 at 3:05 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2017-01-13 18:38:24 +0000, linasvepstas@gmail.com wrote:
> The following bug has been logged on the website:
>
> Bug reference:      14494
> Logged by:          Linas Vepstas
> Email address:      linasvepstas@gmail.com
> PostgreSQL version: 9.6.1
> Operating system:   Debian unstable
> Description:
>
> Array handling appears to have regressed,

Regressed from what?  Because these don't look like they'd have changed
recently.

I'd have to do forensics, but the two other versions I use are 8.4 and 9.1

Let me explain it this way: programmers have a mental model for strings (in C/C++, or in python or in perl, etc): they have a length, and in C/C++, they are null-terminated. There is a concept of a null string: a string of length zero: its a valid string, that you can treat like any other. Its not exceptional, or different in any way.  Standard string ops work on them: they behave like strings of length zero.

The mental model of an array would be similar: an array of length zero, but is still an array, and can be handled using the same kind of code that non-zero length arrays use.  One does not have to write special-case code for the empty array (in must programming languages).

In postgres, zero-length arrays appear to be confusing, non-standard. Special-case code needs to be written to handle them, since they are exceptional. This doubles the amount of unit-testing required, and doubles the attack surface for bugs. It is an unintuitive and surprising situation, for those of us, like me, who have data that contains arrays, some of which are zero-legnth arrays.

I respect that you might want to design postgres the way way it is currently, but it is counter-intuitive and surprising, and opens the door to surprising behaviors and bugs in user-land.  It doesn't "smell" like a rational design choice; instead, it smells like something arbitrary or accidental.

--linas

 

> test=> select * from foo where outgoing=null;
>  name | outgoing
> ------+----------
> (0 rows)
>
> unexpected: wanted one row here.

Comparisons with NULL result in NULL, which evaluates to false in a
WHERE clause.  Try WHERE outgoing IS NULL;


> test=> select * from foo where outgoing != '{43,67}';
>  name | outgoing
> ------+----------
>  two  | {}
> (1 row)
>
> unexpected: wanted two rows here. Where did the null row go?

Same answer as above. WHERE outgoing IS DISTINCT FROM ...;

>
> test=> select * from foo where array_length(outgoing,1)=0 ;
>  name | outgoing
> ------+----------
> (0 rows)
>
> Huh? there is at least one array whose length is zero,

An empty array doesn't have a dimension 1 and thus results in NULL. Yes,
that's a bit confusing.

> and another that doesn't have a length...

You mean the NULL one? Why would you expect a 0 as a result then?

Greetings,

Andres Freund

Re: [BUGS] BUG #14494: Regression - Null arrays are not queryable

От
"David G. Johnston"
Дата:
On Fri, Feb 17, 2017 at 5:17 PM, Linas Vepstas <linasvepstas@gmail.com> wrote:

On Fri, Jan 13, 2017 at 3:05 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2017-01-13 18:38:24 +0000, linasvepstas@gmail.com wrote:
> The following bug has been logged on the website:
>
> Bug reference:      14494
> Logged by:          Linas Vepstas
> Email address:      linasvepstas@gmail.com
> PostgreSQL version: 9.6.1
> Operating system:   Debian unstable
> Description:
>
> Array handling appears to have regressed,

Regressed from what?  Because these don't look like they'd have changed
recently.

I'd have to do
​​
forensics, but the two other versions I use are 8.4 and 9.1

Let me explain it this way: programmers have a mental model for strings (in C/C++, or in python or in perl, etc): they have a length, and in C/C++, they are null-terminated. There is a concept of a null string: a string of length zero: its a valid string, that you can treat like any other. Its not exceptional, or different in any way.  Standard string ops work on them: they behave like strings of length zero.

The mental model of an array would be similar: an array of length zero, but is still an array, and can be handled using the same kind of code that non-zero length arrays use.  One does not have to write special-case code for the empty array (in must programming languages).

In postgres, zero-length arrays appear to be confusing, non-standard. Special-case code needs to be written to handle them, since they are exceptional. This doubles the amount of unit-testing required, and doubles the attack surface for bugs. It is an unintuitive and surprising situation, for those of us, like me, who have data that contains arrays, some of which are zero-legnth arrays.

I respect that you might want to design postgres the way way it is currently, but it is counter-intuitive and surprising, and opens the door to surprising behaviors and bugs in user-land.  It doesn't "smell" like a rational design choice; instead, it smells like something arbitrary or accidental.

--linas

 
It is a shared belief (in differing strengths) that our array behavior is less than ideal - but it is functional and the opinions of people already using our software and have working applications based around the present behavior are afforded considerable weight when deciding​ whether to fix something not really broken but also not ideal.  Usually the outcome is that we leave things unchanged.

Also, you speak of an array of having "a length" - but that isn't true in PostgreSQL.  Arrays are multi-dimensional.  So at least at a distance you are constructing a flawed mental model of an array based solely upon experience with one-dimensional arrays.

In any case that only addresses:

select * from foo where array_length(outgoing,1)=0 ;

IOW - you make good points but so does PostgreSQL, and given present information pragmatism trumps ideology.

That said, if you actually supply some real examples of what you are doing you might find people willing to point out potentially better ways to do the same thing in PostgreSQL without having to worry about corner cases.

SQL has a concept of null for which we need to retain as consistent handling as possible.  That covers your other items.  "NULL" is not the same as <empty> no matter what the underlying data type is (arrays, strings, etc...).  This covers the other two "unexpecteds".

I would advise you minimize how often NULL appears in your data.  You already seem comfortable with the concept of "empty" and in many situations an empty something is exactly what is meant to be conveyed by the use of NULL.

If you really want to debate/explore the topic the -general list is the proper choice.  This list should be used for true bug reporting which you have not yet done.  Unintended and undocumented regressions are bugs but so far that hasn't been shown to be the case.  The rest of the exposition is discussion, not bug reporting or exploring.

David J.

Re: [BUGS] BUG #14494: Regression - Null arrays are not queryable

От
Linas Vepstas
Дата:
OK. One of these is a change in behavior from earlier versions; I won't be able to double-check to provide details for at least a few weeks. (or have any time for other discussion)

In my case, I am layering a graph DB (a hypergraph DB) on top of SQL. The array is a list of primary keys. Think of a "multi-tree" or a "forest": a bunch of trees (DAG's) but these may share nodes. Thus all rows are nodes, the array is all the children of the node, and leaf-nodes have no children: either an empty array or a null array or a zero-length array, I'm not picky on the representation.

--linas

On Fri, Feb 17, 2017 at 7:08 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Fri, Feb 17, 2017 at 5:17 PM, Linas Vepstas <linasvepstas@gmail.com> wrote:

On Fri, Jan 13, 2017 at 3:05 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2017-01-13 18:38:24 +0000, linasvepstas@gmail.com wrote:
> The following bug has been logged on the website:
>
> Bug reference:      14494
> Logged by:          Linas Vepstas
> Email address:      linasvepstas@gmail.com
> PostgreSQL version: 9.6.1
> Operating system:   Debian unstable
> Description:
>
> Array handling appears to have regressed,

Regressed from what?  Because these don't look like they'd have changed
recently.

I'd have to do
​​
forensics, but the two other versions I use are 8.4 and 9.1

Let me explain it this way: programmers have a mental model for strings (in C/C++, or in python or in perl, etc): they have a length, and in C/C++, they are null-terminated. There is a concept of a null string: a string of length zero: its a valid string, that you can treat like any other. Its not exceptional, or different in any way.  Standard string ops work on them: they behave like strings of length zero.

The mental model of an array would be similar: an array of length zero, but is still an array, and can be handled using the same kind of code that non-zero length arrays use.  One does not have to write special-case code for the empty array (in must programming languages).

In postgres, zero-length arrays appear to be confusing, non-standard. Special-case code needs to be written to handle them, since they are exceptional. This doubles the amount of unit-testing required, and doubles the attack surface for bugs. It is an unintuitive and surprising situation, for those of us, like me, who have data that contains arrays, some of which are zero-legnth arrays.

I respect that you might want to design postgres the way way it is currently, but it is counter-intuitive and surprising, and opens the door to surprising behaviors and bugs in user-land.  It doesn't "smell" like a rational design choice; instead, it smells like something arbitrary or accidental.

--linas

 
It is a shared belief (in differing strengths) that our array behavior is less than ideal - but it is functional and the opinions of people already using our software and have working applications based around the present behavior are afforded considerable weight when deciding​ whether to fix something not really broken but also not ideal.  Usually the outcome is that we leave things unchanged.

Also, you speak of an array of having "a length" - but that isn't true in PostgreSQL.  Arrays are multi-dimensional.  So at least at a distance you are constructing a flawed mental model of an array based solely upon experience with one-dimensional arrays.

In any case that only addresses:

select * from foo where array_length(outgoing,1)=0 ;

IOW - you make good points but so does PostgreSQL, and given present information pragmatism trumps ideology.

That said, if you actually supply some real examples of what you are doing you might find people willing to point out potentially better ways to do the same thing in PostgreSQL without having to worry about corner cases.

SQL has a concept of null for which we need to retain as consistent handling as possible.  That covers your other items.  "NULL" is not the same as <empty> no matter what the underlying data type is (arrays, strings, etc...).  This covers the other two "unexpecteds".

I would advise you minimize how often NULL appears in your data.  You already seem comfortable with the concept of "empty" and in many situations an empty something is exactly what is meant to be conveyed by the use of NULL.

If you really want to debate/explore the topic the -general list is the proper choice.  This list should be used for true bug reporting which you have not yet done.  Unintended and undocumented regressions are bugs but so far that hasn't been shown to be the case.  The rest of the exposition is discussion, not bug reporting or exploring.

David J.