Обсуждение: Counting bool flags in a complex query

Поиск
Список
Период
Сортировка

Counting bool flags in a complex query

От
Michael Richards
Дата:
Hi.

I think I've created a monster...

Working on an email system I have the following:
Table    = usermail
+----------------------------------+--------------------------+-------+
|              Field               |              Type        | Length|
+----------------------------------+--------------------------+-------+
| contentlength                    | int4                     |     4 |
| folder                           | int4                     |     4 |
| flagnew                          | bool                     |     1 |
etc...

And:
Table    = folders
+----------------------------------+--------------------------+-------+
|              Field               |              Type        | Length|
+----------------------------------+--------------------------+-------+
| loginid                          | varchar() not null       |    16 |
| folderid                         | int4 not null default (  |     4 |
| foldername                       | varchar()                |    25 |
etc...

So each email message has an entry in usermail, and each mail folder has
an entry in folders. I need to extract the following info:
foldername, number of messages in that folder, number of messages in that
folder with flagread set, total size of all the messages in each folder

Since postgres does not appear to support outer joins, I've come up with a
really icky query that almost does what I want:

SELECT folderid,foldername,count(*),sum(contentlength)  FROM usermail,folders  WHERE usermail.loginid='michael' AND
 folders.loginid=usermail.loginid AND        usermail.folder=folders.folderid GROUP BY folderid,foldername 
 
UNION SELECT folderid,foldername,null,null FROM folders  WHERE loginid='michael' AND        folderid NOT IN
(SELECTfolder FROM usermail WHERE loginid='michael');
 

WHEW!

folderid|foldername      |count|    sum
--------+----------------+-----+-------     -4|Deleted Messages|  110| 245627     -3|Saved Drafts    |     |
-2|SentMail       |    7|  10878     -1|New Mail Folder |   73|8831226      1|OOL             |    7|   8470
 
etc...

My final problem is to count all the messages with flagnew set to true.
The only way I can think to do this is to convert the bool value to a 1 or
0 (which I think should be a standard conversion anyway) and run a sum()
on them.

Unless anyone can come up with a better way to do this, What is the best
way to implement a conversion from bool to int?

-Michael



Re: [HACKERS] Counting bool flags in a complex query

От
Duane Currie
Дата:
> Hi.
> 
> I think I've created a monster...
> 
...
> 
> My final problem is to count all the messages with flagnew set to true.
> The only way I can think to do this is to convert the bool value to a 1 or
> 0 (which I think should be a standard conversion anyway) and run a sum()
> on them.
> 
> Unless anyone can come up with a better way to do this, What is the best
> way to implement a conversion from bool to int?
> 
> -Michael

Of course, you could always use count() and a 'WHERE flagnew' clause...

Duane


Re: [HACKERS] Counting bool flags in a complex query

От
Michael Richards
Дата:
On Wed, 14 Jul 1999, Duane Currie wrote:

> > My final problem is to count all the messages with flagnew set to true.
> > The only way I can think to do this is to convert the bool value to a 1 or
> > 0 (which I think should be a standard conversion anyway) and run a sum()
> > on them.
> > 
> > Unless anyone can come up with a better way to do this, What is the best
> > way to implement a conversion from bool to int?
> 
> Of course, you could always use count() and a 'WHERE flagnew' clause...

Problem with that of course is that by limiting the query with a "where",
I'd lose all the records in the original count, and therefore the total
number of messages (a count that ignores the status of flagnew) would be
wrong.

What I was sort of hoping for was a way to implement a native conversion
from bool to int, and have it included in the standard postgres system. I
think the conversion if a reasonable logical one where true==1 and
false==0. The problem is, I don't have a sweet clue how to do this. I
think it should be a trivial matter to insert something into a system
table...

-Michael



Re: [HACKERS] Counting bool flags in a complex query

От
Thomas Lockhart
Дата:
> Unless anyone can come up with a better way to do this, What is the best
> way to implement a conversion from bool to int?

Try
 select sum(case when bfield = TRUE then 1 else 0 end) from table;

It works for me...
            - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Re: [HACKERS] Counting bool flags in a complex query

От
Michael Richards
Дата:
On Wed, 14 Jul 1999, Thomas Lockhart wrote:

> > Unless anyone can come up with a better way to do this, What is the best
> > way to implement a conversion from bool to int?
> 
>   select sum(case when bfield = TRUE then 1 else 0 end) from table;

I'm not sure this is correct, but I think I see a bug of some sort...

SELECT folderid,foldername,count(*),sum(contentlength),sum(case when
flagnew = TRUE then 1 else 0 end) FROM usermail,folders WHERE
usermail.loginid='michael' and folders.loginid=usermail.loginid AND
usermail.folder = folders.folderid GROUP BY folderid,foldername UNION
SELECT folderid,foldername,0,0,0 FROM folders WHERE loginid='michael' AND
NOT EXISTS (SELECT folder FROM usermail WHERE loginid='michael' AND
folder=folderid) ;
ERROR:  _finalize_primnode: can't handle node 723

It seems to be the union that is confuzing it...

SELECT folderid,foldername,count(*),sum(contentlength),sum(case when
flagnew = TRUE then 1 else 0 end) FROM usermail,folders WHERE
usermail.loginid='michael' and folders.loginid=usermail.loginid AND
usermail.folder = folders.folderid GROUP BY folderid,foldername;
                         
 
folderid|foldername      |count|    sum|sum
--------+----------------+-----+-------+---     -4|Deleted Messages|  110| 245627| 50     -2|Sent Mail       |    7|
10878| 2     -1|New Mail Folder |   73|8831226|  1      1|OOL             |    7|   8470|  0
 
etc

-Michael



Re: [HACKERS] Counting bool flags in a complex query

От
Tom Lane
Дата:
Michael Richards <miker@scifair.acadiau.ca> writes:
> ERROR:  _finalize_primnode: can't handle node 723

Grumble.  Still another routine that doesn't know as much as it should
about traversing parsetrees.  Looks like a job for <flourish of trumpets>
expression_tree_walker.

> It seems to be the union that is confuzing it...

CASE expression inside a UNION/INTERSECT/EXCEPT, to be specific.

Will fix this in time for 6.5.1.
        regards, tom lane


Re: [HACKERS] Counting bool flags in a complex query

От
Tom Lane
Дата:
Michael Richards <miker@scifair.acadiau.ca> writes:
> I'm not sure this is correct, but I think I see a bug of some sort...

> SELECT folderid,foldername,count(*),sum(contentlength),sum(case when
> flagnew = TRUE then 1 else 0 end) FROM usermail,folders WHERE
> usermail.loginid='michael' and folders.loginid=usermail.loginid AND
> usermail.folder = folders.folderid GROUP BY folderid,foldername UNION
> SELECT folderid,foldername,0,0,0 FROM folders WHERE loginid='michael' AND
> NOT EXISTS (SELECT folder FROM usermail WHERE loginid='michael' AND
> folder=folderid) ;
> ERROR:  _finalize_primnode: can't handle node 723

I committed a fix last night; it will be in 6.5.1.
        regards, tom lane


Re: [HACKERS] Counting bool flags in a complex query

От
Michael Richards
Дата:
On Thu, 15 Jul 1999, Tom Lane wrote:

> Michael Richards <miker@scifair.acadiau.ca> writes:
> > I'm not sure this is correct, but I think I see a bug of some sort...
> 
> I committed a fix last night; it will be in 6.5.1.

I've found what I believe is another set of bugs:
This is my monster query again...

My folder numbers are: negative numbers are system folders such as New
mail, trash, drafts and sentmail. I wanted to order the tuples so that the
folderids were sorted from -1 to -4, then 1 to x. This way the system
folders would always appear first in the list.

This may not be valid SQL, as none of my books mention it. Is it possible
to order by an expression?

Here are some examples which some some odd behaviour. My suspected bug
findings are at the end:

SELECT folderid,foldername,count(*) as "messgaes",sum(bool2int(flagnew))
as "newmessages",sum(contentlength) as "size" FROM usermail,folders WHERE
usermail.loginid='michael' and folders.loginid=usermail.loginid AND
usermail.folder = folders.folderid GROUP BY folderid,foldername UNION
SELECT folderid,foldername,0,0,0 FROM folders WHERE loginid='michael' AND
NOT EXISTS (SELECT folder FROM usermail WHERE loginid='michael' AND
folder=folderid) order by (folderid>0);
folderid|foldername      |messgaes|newmessages|   size
--------+----------------+--------+-----------+-------     -4|Deleted Messages|     110|         50| 245627     -2|Sent
Mail      |       7|          2|  10878     -1|New Mail Folder |      73|          1|8831226      1|OOL             |
   7|          0|   8470      2|suggestions     |      26|          0|  35433      3|Acadia          |       5|
0|  17703      4|advertising     |       4|          2|   5394      5|dealt with      |       3|          0|   2883
36|dauphne        |       9|          0|  66850     -3|Saved Drafts    |       0|          0|      0
 
(10 rows)

It looks like the order by is only being applied to the original select,
not the unioned select. Some authority should check on it, but by thought
it that a union does not necessarily maintain the order, so the entire
select should be applied to the order.

I'm not so good at interpreting the query plan, but here it is:
Unique  (cost=8.10 rows=0 width=0) ->  Sort  (cost=8.10 rows=0 width=0)   ->  Append  (cost=8.10 rows=0 width=0)     ->
Aggregate  (cost=6.05 rows=1 width=49)       ->  Group  (cost=6.05 rows=1 width=49)         ->  Sort  (cost=6.05 rows=1
width=49)          ->  Nested Loop  (cost=6.05 rows=1 width=49)             ->  Index Scan using usermail_pkey on
usermail (cost=2.05 rows=2 width=21)             ->  Index Scan using folders_pkey on folders  (cost=2.00 rows=8448
width=28)      -> Index Scan using folders_pkey on folders (cost=2.05 rows=2 width=16)            SubPlan
->Index Scan using usermail_pkey on usermail (cost=2.05 rows=1 width=4)
 

I would have expected the folderid -3 to appear as the 3rd one in this
case.

I'm probably going to change the numbering scheme of the system folders so
they will sort correctly without a kluge such as:
create function ordfolderid(int) returns int as 'select $1*-1 where $1<0
union select $1+1*10 where $1>=0' language 'sql';

Then running the order clause as: 
order by (folderid<0),ordfolderid(folderid)
My thought behind this kludge is that the table should first be ordered by
the t/f value of the fact folderid<0, then within each of the true and
false sortings, subsort those by the value of folderid.

Complicated enough for you?

Well, in my playing I notice what appears to be more of a bug...
SELECT folderid,foldername,count(*) as "messages",sum(bool2int(flagnew))
as "newmessages",sum(contentlength) as "size" FROM usermail,folders WHERE
usermail.loginid='michael' and folders.loginid=usermail.loginid AND
usermail.folder = folders.folderid GROUP BY folderid,foldername UNION
SELECT folderid,foldername,0,0,0 FROM folders WHERE loginid='michael' AND
NOT EXISTS (SELECT folder FROM usermail WHERE loginid='michael' AND
folder=folderid) order by (folderid<0);
folderid|foldername      |messgaes|newmessages|   size
--------+----------------+--------+-----------+-------      1|OOL             |       7|          0|   8470
2|suggestions    |      26|          0|  35433      3|Acadia          |       5|          0|  17703      4|advertising
  |       4|          2|   5394      5|dealt with      |       3|          0|   2883     36|dauphne         |       9|
       0|  66850     -4|Deleted Messages|     110|         50| 245627     -2|Sent Mail       |       7|          2|
10878    -1|New Mail Folder |      73|          1|8831226     -3|Saved Drafts    |       0|          0|      0
 
(10 rows)

SELECT folderid,foldername,count(*) as "messages",sum(bool2int(flagnew))
as "newmessages",sum(contentlength) as "size" FROM usermail,folders WHERE
usermail.loginid='michael' and folders.loginid=usermail.loginid AND
usermail.folder = folders.folderid GROUP BY folderid,foldername UNION
SELECT folderid,foldername,0,0,0 FROM folders WHERE loginid='michael' AND
NOT EXISTS (SELECT folder FROM usermail WHERE loginid='michael' AND
folder=folderid) order by (messages<10);
ERROR:  attribute 'messages' not found

Using a column name within an expression in the order by does not seem to
work...
Or a much simpler example to illustrate the bug:
fastmail=> select 1 as "test" order by (test<9);
ERROR:  attribute 'test' not found

fastmail=> select 1 as "test" order by test;
test
----  1
(1 row)


I was almost able to make it work properly aside from the sorting issue
with my kludged up routine... This is so nasty that I most definitely
don't want to put it into production:

SELECT folderid,foldername,count(*) as "messages",sum(bool2int(flagnew))
as "newmessages",sum(contentlength) as "size",(folderid>=0) FROM
usermail,folders WHERE usermail.loginid='michael' and
folders.loginid=usermail.loginid AND usermail.folder = folders.folderid
GROUP BY folderid,foldername UNION SELECT
folderid,foldername,0,0,0,(folderid>=0) FROM folders WHERE
loginid='michael' AND NOT EXISTS (SELECT folder FROM usermail WHERE
loginid='michael' AND folder=folderid) order by 6,ordfolderid(folderid);
folderid|foldername      |messages|newmessages|   size|?column?
--------+----------------+--------+-----------+-------+--------     -1|New Mail Folder |      73|          1|8831226|f
         -2|Sent Mail       |       7|          2|  10878|f            -4|Deleted Messages|     110|         50|
245627|f           -3|Saved Drafts    |       0|          0|      0|f             1|OOL             |       7|
0|  8470|t             2|suggestions     |      26|          0|  35433|t             3|Acadia          |       5|
  0|  17703|t             4|advertising     |       4|          2|   5394|t             5|dealt with      |       3|
     0|   2883|t            36|dauphne         |       9|          0|  66850|t       
 
(10 rows)

Do I need outer joins to make this work instead of the screwed up union
method I'm trying here, or is it just a series of bugs?

-Michael



Re: [HACKERS] Counting bool flags in a complex query

От
Tom Lane
Дата:
Michael Richards <miker@scifair.acadiau.ca> writes:
> I've found what I believe is another set of bugs:

I can shed some light on these.

> This may not be valid SQL, as none of my books mention it. Is it possible
> to order by an expression?

Postgres accepts expressions as ORDER BY clauses, although strict SQL92
only allows sorting by a column name or number.

> It looks like the order by is only being applied to the original select,
> not the unioned select. Some authority should check on it, but by thought
> it that a union does not necessarily maintain the order, so the entire
> select should be applied to the order.

That looks like a bug to me too --- I think the ORDER BY is supposed to
apply across the whole UNION result.  Will look into it.

> I'm probably going to change the numbering scheme of the system folders so
> they will sort correctly without a kluge such as:

Good plan.  Although you could sort by a user-defined function result,
it's likely to be horribly slow (because user-defined functions are
slow:-().

> Using a column name within an expression in the order by does not seem to
> work...
> Or a much simpler example to illustrate the bug:
> fastmail=> select 1 as "test" order by (test<9);
> ERROR:  attribute 'test' not found

This is not so much a bug as a definitional issue.  For SQL92
compatibility, we accept ORDER BY a column label so long as it's
a bare column label, but column labels are NOT part of the namespace
for full expression evaluation.  You can't do this either:

select 1 as "test" , test<9 ;
ERROR:  attribute 'test' not found

There are all sorts of squirrely questions about this feature IMHO.
For example,

create table z1 (f1 int4, f2 int4);
CREATE
select f1 as f2, f2 from z1 order by f2;
f2|f2
--+--
(0 rows)

Which column do you think it's ordering by?  Which column *should* it
order by?  I think this ought to draw an "ambiguous column label" error
... there is code in there that claims to be looking for such a thing,
in fact, so I am not quite sure why it doesn't trigger on this example.
        regards, tom lane


Re: [HACKERS] Counting bool flags in a complex query

От
Michael Richards
Дата:
On Fri, 16 Jul 1999, Tom Lane wrote:

> Good plan.  Although you could sort by a user-defined function result,
> it's likely to be horribly slow (because user-defined functions are
> slow:-().
Yes, but I did include my horrible design ideas so you could see why in
"god's name" I was trying to do what I was trying to do when I found what
looked to be a "bug"

> This is not so much a bug as a definitional issue.  For SQL92
> compatibility, we accept ORDER BY a column label so long as it's
> a bare column label, but column labels are NOT part of the namespace
> for full expression evaluation.  You can't do this either:
> 
> select 1 as "test" , test<9 ;
> ERROR:  attribute 'test' not found
> 
> There are all sorts of squirrely questions about this feature IMHO.
> For example,
> 
> create table z1 (f1 int4, f2 int4);
> CREATE
> select f1 as f2, f2 from z1 order by f2;
> f2|f2
> --+--
> (0 rows)
> 
> Which column do you think it's ordering by?  Which column *should* it
> order by?  I think this ought to draw an "ambiguous column label" error
> ... there is code in there that claims to be looking for such a thing,
> in fact, so I am not quite sure why it doesn't trigger on this example.

Good point. Is there anything in the SQL standard that defined how this
"is supposed" to work? I suppose with no expression support it isn't
really necessary. How about requiring quotes when we're to look at it was
"named" columns? If 
select f1 as f2, f2 from z1 order by "f2";

Of course I have no idea how this would conflicy with SQL-92. It's more of
an idea...

-Michael



Re: [SQL] Re: [HACKERS] Counting bool flags in a complex query

От
Tom Lane
Дата:
Michael Richards <miker@scifair.acadiau.ca> writes:
>> For example,
>> 
>> create table z1 (f1 int4, f2 int4);
>> CREATE
>> select f1 as f2, f2 from z1 order by f2;
>> f2|f2
>> --+--
>> (0 rows)
>> 
>> Which column do you think it's ordering by?  Which column *should* it
>> order by?  I think this ought to draw an "ambiguous column label" error

> Good point. Is there anything in the SQL standard that defined how this
> "is supposed" to work?

After looking at the SQL spec I think the above definitely ought to draw
an error.  We have the following verbiage concerning the column names
for the result of a SELECT:
           a) If the i-th <derived column> in the <select list> specifies             an <as clause> that contains a
<columnname> C, then the             <column name> of the i-th column of the result is C.
 
           b) If the i-th <derived column> in the <select list> does not             specify an <as clause> and the
<valueexpression> of that             <derived column> is a single <column reference>, then the             <column
name>of the i-th column of the result is C.
 
           c) Otherwise, the <column name> of the i-th column of the <query             specification> is
implementation-dependentand different             from the <column name> of any column, other than itself, of
 a table referenced by any <table reference> contained in the             SQL-statement.
 

which Postgres does indeed follow, and we see from (a) and (b) that "f2"
is the required column name for both columns of the SELECT result.
Now ORDER BY says
           a) If a <sort specification> contains a <column name>, then T             shall contain exactly one column
withthat <column name> and             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^             the <sort specification> identifies
thatcolumn.
 

which sure looks to me like it mandates an error for the example
statement.

However, since SQL doesn't consider the possibility of expressions as
ORDER BY entries, we are more or less on our own for those.  An
expression appearing in the target list of a SELECT is not allowed to
refer to columns by their "AS" names (and this does seem to be mandated
by SQL92).  So I think it makes sense to carry over the same restriction
to ORDER BY.
        regards, tom lane


Re: Counting bool flags in a complex query

От
Tom Lane
Дата:
Quite awhile ago, Michael Richards <miker@scifair.acadiau.ca> wrote:
> It looks like the order by is only being applied to the original select,
> not the unioned select. Some authority should check on it, but by thought
> it that a union does not necessarily maintain the order, so the entire
> select should be applied to the order.

Just FYI, I have committed code for 7.1 that allows ORDER BY to work
correctly for a UNION'd query.  A limitation is that you can only do
ordering on columns that are outputs of the UNION:

regression=# select q1 from int8_tbl union select q2 from int8_tbl order by 1;       q1
--------------------4567890123456789              123              456 4567890123456789
(4 rows)

regression=# select q1 from int8_tbl union select q2 from int8_tbl order by int8_tbl.q1+1;
ERROR:  ORDER BY on a UNION/INTERSECT/EXCEPT result must be on one of the result columns

In the general case of an arbitrary ORDER BY expression, it's not clear
how to transpose it into each UNION source select anyway.  It could
be made to work for expressions using only the output columns, but since
ORDER BY expressions are not standard SQL I'm not in a big hurry to make
that happen...
        regards, tom lane