Обсуждение: COALESCE doccumentation

Поиск
Список
Период
Сортировка

COALESCE doccumentation

От
PG Doc comments form
Дата:
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/11/functions-conditional.html
Description:

The documentation says:

"The COALESCE function returns the first of its arguments that is not null.
Null is returned only if all arguments are null. It is often used to
substitute a default value for null values when data is retrieved for
display, for example"

Which is NOT TRUE. In addition, it is obscure and misleading.

The COALESCE(<value expression>, ..., <value expression>) scans the list of
<value expression>s from left to right, determines the highest data type in
the list and returns the first non-NULL in the list, casting it to the
highest data type selected in the previous step, but if all the <value
expressions>s are NULL, the result is NULL. In case when any of the <value
expressions>s in the list cannot be evaluated to the highest data type then
exception is thrown.

Try, for example. 

SELECT COALESCE(NULL, '12', 3, 1, 1); -- OK!
SELECT COALESCE(NULL, '12.2', 3.2, '1', 1.1); -- OK!
SELECT COALESCE(NULL, '1d2', 3, 1, 1); -- error
SELECT COALESCE(NULL, '12', 3, '1d', 1); -- cast error

Thanks.

Re: COALESCE doccumentation

От
Bruce Momjian
Дата:
On Sat, Apr 27, 2019 at 08:26:57PM +0000, PG Doc comments form wrote:
> The following documentation comment has been logged on the website:
> 
> Page: https://www.postgresql.org/docs/11/functions-conditional.html
> Description:
> 
> The documentation says:
> 
> "The COALESCE function returns the first of its arguments that is not null.
> Null is returned only if all arguments are null. It is often used to
> substitute a default value for null values when data is retrieved for
> display, for example"
> 
> Which is NOT TRUE. In addition, it is obscure and misleading.
> 
> The COALESCE(<value expression>, ..., <value expression>) scans the list of
> <value expression>s from left to right, determines the highest data type in

What is a highest data type?  The first data type?

> the list and returns the first non-NULL in the list, casting it to the
> highest data type selected in the previous step, but if all the <value
> expressions>s are NULL, the result is NULL. In case when any of the <value
> expressions>s in the list cannot be evaluated to the highest data type then
> exception is thrown.
> 
> Try, for example. 
> 
> SELECT COALESCE(NULL, '12', 3, 1, 1); -- OK!
> SELECT COALESCE(NULL, '12.2', 3.2, '1', 1.1); -- OK!
> SELECT COALESCE(NULL, '1d2', 3, 1, 1); -- error
> SELECT COALESCE(NULL, '12', 3, '1d', 1); -- cast error

What is wrong about the existing description?  That it doesn't mention
mixing data types?

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: COALESCE doccumentation

От
Daniil Treshchin
Дата:
Hi Bruce, David.

Comment: "Without further elaboration as to the shortcomings"
I provided a clear description on why, but for some reason it was missed in this thread.

Here it is:

The COALESCE(<value expression>, ..., <value expression>) scans the list of <value expression>s from left to right, determines the highest data type in the list and returns the first non-NULL in the list, casting it to the highest data type selected in the previous step, but if all the <value expressions>s are NULL, the result is NULL. In case when any of the <value expressions>s in the list cannot be evaluated to the highest data type then exception is thrown.

Question: What is a highest data type? The first data type?

This is not the first datatype but there is an order, for example:

numeric
integer
character varying
text

If you mix numeric, integers and text in any order, then it will be evaluated to numeric;
If you mix integers and text, then it will be evaluated to integer;

Again, from my example:

SELECT COALESCE(NULL, '12', 3, 1, 1); -- OK! evaluated to integer
SELECT COALESCE(NULL, '12.2', 3.2, '1', 1.1); -- OK! evaluated to numeric
SELECT COALESCE(NULL, '12', 3, 1.2, 1);  -- OK! evaluated to numeric
SELECT COALESCE(NULL, '1d2', 3, 1, 1); -- error
SELECT COALESCE(NULL, '12', 3, '1d', 1); -- cast error

The documentation says nothing about it and the description is obscure.
What the docs are not saying:

1) COALESCE scans the list of <value expression>s
2) COALESCE determines the highest data type in the list of <value expressions>s
3) COALESCE casting the first non-NULL to the highest data type
4) COALESCE evaluates every <value expression>s to the highest datatype selected in step (2)

In addition the doc says:
"Like a CASE expression, COALESCE only evaluates the arguments that are needed to determine the result; that is, arguments to the right of the first non-null argument are not evaluated."

This is so not true according to the doc and just breaks a simple test.

SELECT COALESCE(NULL, '12', '3', '1.2', (SELECT profile_code FROM profile LIMIT 1));  -- OK! evaluates to character varying
SELECT COALESCE(NULL, '12', 3, '1.2', (SELECT profile_code FROM profile LIMIT 1));  -- ERROR! failed to evaluate to the highest datatype - integer

Thank you,
On Monday, April 29, 2019, 4:46:39 PM PDT, David G. Johnston <david.g.johnston@gmail.com> wrote:


On Mon, Apr 29, 2019 at 4:13 PM Bruce Momjian <bruce@momjian.us> wrote:
On Sat, Apr 27, 2019 at 08:26:57PM +0000, PG Doc comments form wrote:
> The following documentation comment has been logged on the website:
>
> Page: https://www.postgresql.org/docs/11/functions-conditional.html
> Description:
>
> The documentation says:
>
> "The COALESCE function returns the first of its arguments that is not null.
> Null is returned only if all arguments are null. It is often used to
> substitute a default value for null values when data is retrieved for
> display, for example"
>
> Which is NOT TRUE. In addition, it is obscure and misleading.
> Try, for example.
>
> SELECT COALESCE(NULL, '12', 3, 1, 1); -- OK!
> SELECT COALESCE(NULL, '12.2', 3.2, '1', 1.1); -- OK!
> SELECT COALESCE(NULL, '1d2', 3, 1, 1); -- error
> SELECT COALESCE(NULL, '12', 3, '1d', 1); -- cast error

What is wrong about the existing description?  That it doesn't mention
mixing data types?


The OP's example demonstrates a more basic confusion about types that needs to be learned independent of COALESCE.

The literals surrounded with single quotes have an unknown type as written.  Since other literals are plain numbers those are typed as integers.  Since all arguments to COALESCE need to be the same type (same applies to other functions or things like UNION) the untyped literal is coerced to the known typed arguments - which fails when the untyped literals are not pure numbers.

There is a concept of parent types - though I cannot find the rules for them at the moment - but basically most things devolve toward text; the issue here is there isn't any "devolving" (parent) involved - its a straight assignment of the known integer type to the untyped literals.

Without further elaboration as to the shortcomings of the current documentation its hard to add more material that isn't this section of the documentation's place to teach.

David J.

Re: COALESCE doccumentation

От
Tom Lane
Дата:
Bruce Momjian <bruce@momjian.us> writes:
> On Sat, Apr 27, 2019 at 08:26:57PM +0000, PG Doc comments form wrote:
>> Try, for example. 
>> 
>> SELECT COALESCE(NULL, '12', 3, 1, 1); -- OK!
>> SELECT COALESCE(NULL, '12.2', 3.2, '1', 1.1); -- OK!
>> SELECT COALESCE(NULL, '1d2', 3, 1, 1); -- error
>> SELECT COALESCE(NULL, '12', 3, '1d', 1); -- cast error

> What is wrong about the existing description?  That it doesn't mention
> mixing data types?

The place where that behavior is described is

https://www.postgresql.org/docs/current/typeconv-union-case.html

It doesn't mention COALESCE explicitly, though ... and I imagine
digging in the code for coerce_to_common_type callers might find
some other cases that aren't listed there.

            regards, tom lane



Re: COALESCE doccumentation

От
Daniil Treshchin
Дата:
Oh, I was just playing a little bit with it to make sure how it works with some nesting involved.

SELECT COALESCE(NULL, '12', '3', 1.2, (SELECT (CASE WHEN profile_code = 'abc' THEN '2d' ELSE '2d' END) FROM profile LIMIT 1));
-- fail
SELECT COALESCE(NULL, '12', '3', 1.2, (SELECT (CASE WHEN profile_code = 'abc' THEN '2d' ELSE '2d' END) FROM profile LIMIT 1) :: numeric);  -- OK!

As said in the doc, COALESCE only evaluates the arguments that are needed

Agreed. I take my words back about that statement about evaluation, but there is still a problem in here as you can see from this confusion. It evaluates the type of the <value expression>s.

Thank you,
On Monday, April 29, 2019, 6:31:16 PM PDT, Daniil Treshchin <t.dnil@yahoo.com> wrote:


Hi Bruce, David.

Comment: "Without further elaboration as to the shortcomings"
I provided a clear description on why, but for some reason it was missed in this thread.

Here it is:

The COALESCE(<value expression>, ..., <value expression>) scans the list of <value expression>s from left to right, determines the highest data type in the list and returns the first non-NULL in the list, casting it to the highest data type selected in the previous step, but if all the <value expressions>s are NULL, the result is NULL. In case when any of the <value expressions>s in the list cannot be evaluated to the highest data type then exception is thrown.

Question: What is a highest data type? The first data type?

This is not the first datatype but there is an order, for example:

numeric
integer
character varying
text

If you mix numeric, integers and text in any order, then it will be evaluated to numeric;
If you mix integers and text, then it will be evaluated to integer;

Again, from my example:

SELECT COALESCE(NULL, '12', 3, 1, 1); -- OK! evaluated to integer
SELECT COALESCE(NULL, '12.2', 3.2, '1', 1.1); -- OK! evaluated to numeric
SELECT COALESCE(NULL, '12', 3, 1.2, 1);  -- OK! evaluated to numeric
SELECT COALESCE(NULL, '1d2', 3, 1, 1); -- error
SELECT COALESCE(NULL, '12', 3, '1d', 1); -- cast error

The documentation says nothing about it and the description is obscure.
What the docs are not saying:

1) COALESCE scans the list of <value expression>s
2) COALESCE determines the highest data type in the list of <value expressions>s
3) COALESCE casting the first non-NULL to the highest data type
4) COALESCE evaluates every <value expression>s to the highest datatype selected in step (2)

In addition the doc says:
"Like a CASE expression, COALESCE only evaluates the arguments that are needed to determine the result; that is, arguments to the right of the first non-null argument are not evaluated."

This is so not true according to the doc and just breaks a simple test.

SELECT COALESCE(NULL, '12', '3', '1.2', (SELECT profile_code FROM profile LIMIT 1));  -- OK! evaluates to character varying
SELECT COALESCE(NULL, '12', 3, '1.2', (SELECT profile_code FROM profile LIMIT 1));  -- ERROR! failed to evaluate to the highest datatype - integer

Thank you,
On Monday, April 29, 2019, 4:46:39 PM PDT, David G. Johnston <david.g.johnston@gmail.com> wrote:


On Mon, Apr 29, 2019 at 4:13 PM Bruce Momjian <bruce@momjian.us> wrote:
On Sat, Apr 27, 2019 at 08:26:57PM +0000, PG Doc comments form wrote:
> The following documentation comment has been logged on the website:
>
> Page: https://www.postgresql.org/docs/11/functions-conditional.html
> Description:
>
> The documentation says:
>
> "The COALESCE function returns the first of its arguments that is not null.
> Null is returned only if all arguments are null. It is often used to
> substitute a default value for null values when data is retrieved for
> display, for example"
>
> Which is NOT TRUE. In addition, it is obscure and misleading.
> Try, for example.
>
> SELECT COALESCE(NULL, '12', 3, 1, 1); -- OK!
> SELECT COALESCE(NULL, '12.2', 3.2, '1', 1.1); -- OK!
> SELECT COALESCE(NULL, '1d2', 3, 1, 1); -- error
> SELECT COALESCE(NULL, '12', 3, '1d', 1); -- cast error

What is wrong about the existing description?  That it doesn't mention
mixing data types?


The OP's example demonstrates a more basic confusion about types that needs to be learned independent of COALESCE.

The literals surrounded with single quotes have an unknown type as written.  Since other literals are plain numbers those are typed as integers.  Since all arguments to COALESCE need to be the same type (same applies to other functions or things like UNION) the untyped literal is coerced to the known typed arguments - which fails when the untyped literals are not pure numbers.

There is a concept of parent types - though I cannot find the rules for them at the moment - but basically most things devolve toward text; the issue here is there isn't any "devolving" (parent) involved - its a straight assignment of the known integer type to the untyped literals.

Without further elaboration as to the shortcomings of the current documentation its hard to add more material that isn't this section of the documentation's place to teach.

David J.

Re: COALESCE doccumentation

От
Daniil Treshchin
Дата:
"SQL is a strongly typed language"
 - nobody argues with that, that's a common fact everybody knows.

"so it is well within its right to require that a query be well-formed such that all inputs to functions (which COALESCE is in this context) are of the same type or raise a compile time error"

- they are not of the same type. That's the whole problem from the beginning, I present the values in different data types and it selects the "highest" based on some internal order and that's the type that is required for all the other value expressions. In this case the DB engine picks a datatype for me when I don't explicitly cast everything. Moreover, it throws an exception if it failed to evaluate the datatype from a value expression even through it's totally acceptable, it's just doesn't analyze the more complex stuff.

INTEGER is "higher" than CHARACTER VARYING, so the expressions will be validated to INTEGER and not to CHARACTER VARYING. Why, because the engine picked it this way for you.

Here is another example: SELECT COALESCE(NULL, '12', 2,  3 :: INTEGER, 1 :: NUMERIC, 1 :: INTEGER); Why is the result NUMERIC?

Anyway, the doc still doesn't give any info on these:

1) COALESCE scans the list of <value expression>s
2) COALESCE determines the highest data type in the list of <value expressions>s
3) COALESCE casting the first non-NULL to the highest data type
4) COALESCE evaluates every <value expression>s to the highest datatype selected in step (2). Correction from the last email I checked, the value expression is not evaluated itself but it is "analyzed" to the highest datatype or exception is thrown during various, not obvious conditions.

Thank you.
On Monday, April 29, 2019, 10:26:58 PM PDT, David G. Johnston <david.g.johnston@gmail.com> wrote:


On Mon, Apr 29, 2019 at 7:01 PM Daniil Treshchin <t.dnil@yahoo.com> wrote:
but there is still a problem in here as you can see from this confusion. It evaluates the type of the <value expression>s.

SQL is a strongly typed language so it is well within its right to require that a query be well-formed such that all inputs to functions (which COALESCE is in this context) are of the same type or raise a compile time error.

The expression is still not evaluated for each row during runtime which is all the section on short-circuiting promises.

David J.

Re: COALESCE doccumentation

От
Daniil Treshchin
Дата:
Hi David.

"You invented the “higher” concept"
- False. Not me. That's Joe Celko p. 227 ISBIN 978-0-12-800761-7. That's the ultimate authority still living on this earth.

"There is not a single character varying typed value in any of these examples"
- False. You clearly didn't even look at the examples that I put in these emails and you're quickly run to conclusions.

So you're at least wrong in these two points.

Anyway, I give up, you guys decide if there is a need for additional doc.

Thanks.
On Monday, April 29, 2019, 11:53:44 PM PDT, David G. Johnston <david.g.johnston@gmail.com> wrote:


On Monday, April 29, 2019, Daniil Treshchin <t.dnil@yahoo.com> wrote:

INTEGER is "higher" than CHARACTER VARYING, so the expressions will be validated to INTEGER and not to CHARACTER VARYING. Why, because the engine picked it this way for you.

You invented the “higher” concept; that the engine doesn’t work this way is the point I’m trying to make.  There is not a single character varying typed value in any of these examples - and since there are integers the outcome will be some kind of number.

Here is another example: SELECT COALESCE(NULL, '12', 2,  3 :: INTEGER, 1 :: NUMERIC, 1 :: INTEGER); Why is the result NUMERIC?

Null/unknown; unknown literal; integer; integer with redundant cast; integer cast explicitly to numeric; integer with redundant cast

Integers devolve to numeric implicitly (since an integer can be represented as a numeric but not the other way around); both unknowns can be coerced to numeric as well.  All is good at planning time and the query can be run.
 

Anyway, the doc still doesn't give any info on these:

Tom linked to where this is covered - in the more fundamental SQL concept of UNION between multiple queries.

PostgreSQL does require explicit casting in places where people would not expect or desire to do so.  The hurdle to be more do-what-I-mean is high.  The material is covered in appropriate chapters early on in the documentation.  Repeating or cross-referencing might be worthwhile but few questions make it to these lists and so the motivation is low.  Threads like this tend to be more effective, or at least easier.

David J.