Re: count(*) vs count(id)

Поиск
Список
Период
Сортировка
От Hellmuth Vargas
Тема Re: count(*) vs count(id)
Дата
Msg-id CAN3Qy4q1VXQHbUsVH7+yKumdfyfcxQAaHTGS5XKgjZ-5PMOpRA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: count(*) vs count(id)  (Laurenz Albe <laurenz.albe@cybertec.at>)
Ответы Re: count(*) vs count(id)  (Cherio <cherio@gmail.com>)
Список pgsql-general


Hello list

My English is not very good, so I pretend that through the examples you understand what I intend to expose

-- Recreate the query that is supposedly wrong

select calendar.entry, count(*)  
from generate_series('2020-08-20'::date, '2020-09-15'::date, '1 day'::interval) as calendar(entry)
left join (values      (1,'2020-08-28 09:44:11'::timestamp),
      (2,'2020-08-28 10:44:11'::timestamp),
      (3,'2020-08-29 11:44:11'::timestamp),
      (4,'2020-09-01 02:44:11'::timestamp),
      (5,'2020-09-02 03:44:11'::timestamp),
      (6,'2020-09-02 04:44:11'::timestamp),
      (7,'2020-09-03 05:44:11'::timestamp),
      (8,'2020-09-04 06:44:11'::timestamp),
      (10,'2020-09-04 07:44:11'::timestamp)) as call_records(id,timestamp)
on calendar.entry = call_records.timestamp::date
group by calendar.entry;

-- wrong???

         entry          | count
------------------------+-------
 2020-08-20 00:00:00-05 |     1
 2020-08-21 00:00:00-05 |     1
 2020-08-22 00:00:00-05 |     1
 2020-08-23 00:00:00-05 |     1
 2020-08-24 00:00:00-05 |     1
 2020-08-25 00:00:00-05 |     1
 2020-08-26 00:00:00-05 |     1
 2020-08-27 00:00:00-05 |     1
 2020-08-28 00:00:00-05 |     2
 2020-08-29 00:00:00-05 |     1
 2020-08-30 00:00:00-05 |     1
 2020-08-31 00:00:00-05 |     1
 2020-09-01 00:00:00-05 |     1
 2020-09-02 00:00:00-05 |     2
 2020-09-03 00:00:00-05 |     1
 2020-09-04 00:00:00-05 |     2
 2020-09-05 00:00:00-05 |     1
 2020-09-06 00:00:00-05 |     1
 2020-09-07 00:00:00-05 |     1
 2020-09-08 00:00:00-05 |     1
 2020-09-09 00:00:00-05 |     1
 2020-09-10 00:00:00-05 |     1
 2020-09-11 00:00:00-05 |     1
 2020-09-12 00:00:00-05 |     1
 2020-09-13 00:00:00-05 |     1
 2020-09-14 00:00:00-05 |     1
 2020-09-15 00:00:00-05 |     1

-- In the count I will only consider the records of call_records

select calendar.entry, count(call_records.*)  
from generate_series('2020-08-20'::date, '2020-09-15'::date, '1 day'::interval) as calendar(entry)
left join (values      (1,'2020-08-28 09:44:11'::timestamp),
      (2,'2020-08-28 10:44:11'::timestamp),
      (3,'2020-08-29 11:44:11'::timestamp),
      (4,'2020-09-01 02:44:11'::timestamp),
      (5,'2020-09-02 03:44:11'::timestamp),
      (6,'2020-09-02 04:44:11'::timestamp),
      (7,'2020-09-03 05:44:11'::timestamp),
      (8,'2020-09-04 06:44:11'::timestamp),
      (10,'2020-09-04 07:44:11'::timestamp)) as call_records(id,timestamp)
on calendar.entry = call_records.timestamp::date
group by calendar.entry;


--- perfect
         entry          | count
------------------------+-------
 2020-08-20 00:00:00-05 |     0
 2020-08-21 00:00:00-05 |     0
 2020-08-22 00:00:00-05 |     0
 2020-08-23 00:00:00-05 |     0
 2020-08-24 00:00:00-05 |     0
 2020-08-25 00:00:00-05 |     0
 2020-08-26 00:00:00-05 |     0
 2020-08-27 00:00:00-05 |     0
 2020-08-28 00:00:00-05 |     2
 2020-08-29 00:00:00-05 |     1
 2020-08-30 00:00:00-05 |     0
 2020-08-31 00:00:00-05 |     0
 2020-09-01 00:00:00-05 |     1
 2020-09-02 00:00:00-05 |     2
 2020-09-03 00:00:00-05 |     1
 2020-09-04 00:00:00-05 |     2
 2020-09-05 00:00:00-05 |     0
 2020-09-06 00:00:00-05 |     0
 2020-09-07 00:00:00-05 |     0
 2020-09-08 00:00:00-05 |     0
 2020-09-09 00:00:00-05 |     0
 2020-09-10 00:00:00-05 |     0
 2020-09-11 00:00:00-05 |     0
 2020-09-12 00:00:00-05 |     0
 2020-09-13 00:00:00-05 |     0
 2020-09-14 00:00:00-05 |     0
 2020-09-15 00:00:00-05 |     0

when placing * I want to bring all the join records between both tables and when counting them of course there will be a row for the dates   2020-08-30 ,  2020-08-31 so the call_records fields are null

select *  
from generate_series('2020-08-20'::date, '2020-09-15'::date, '1 day'::interval) as calendar(entry)
left join (values      (1,'2020-08-28 09:44:11'::timestamp),
      (2,'2020-08-28 10:44:11'::timestamp),
      (3,'2020-08-29 11:44:11'::timestamp),
      (4,'2020-09-01 02:44:11'::timestamp),
      (5,'2020-09-02 03:44:11'::timestamp),
      (6,'2020-09-02 04:44:11'::timestamp),
      (7,'2020-09-03 05:44:11'::timestamp),
      (8,'2020-09-04 06:44:11'::timestamp),
      (10,'2020-09-04 07:44:11'::timestamp)) as call_records(id,timestamp)
on calendar.entry = call_records.timestamp::date 


         entry          | id |      timestamp
------------------------+----+---------------------
 2020-08-20 00:00:00-05 |    |
 2020-08-21 00:00:00-05 |    |
 2020-08-22 00:00:00-05 |    |
 2020-08-23 00:00:00-05 |    |
 2020-08-24 00:00:00-05 |    |
 2020-08-25 00:00:00-05 |    |
 2020-08-26 00:00:00-05 |    |
 2020-08-27 00:00:00-05 |    |
 2020-08-28 00:00:00-05 |  1 | 2020-08-28 09:44:11
 2020-08-28 00:00:00-05 |  2 | 2020-08-28 10:44:11
 2020-08-29 00:00:00-05 |  3 | 2020-08-29 11:44:11
 2020-08-30 00:00:00-05 |    |
 2020-08-31 00:00:00-05 |    |
 2020-09-01 00:00:00-05 |  4 | 2020-09-01 02:44:11
 2020-09-02 00:00:00-05 |  5 | 2020-09-02 03:44:11
 2020-09-02 00:00:00-05 |  6 | 2020-09-02 04:44:11
 2020-09-03 00:00:00-05 |  7 | 2020-09-03 05:44:11
 2020-09-04 00:00:00-05 |  8 | 2020-09-04 06:44:11
 2020-09-04 00:00:00-05 | 10 | 2020-09-04 07:44:11
 2020-09-05 00:00:00-05 |    |
 2020-09-06 00:00:00-05 |    |
 2020-09-07 00:00:00-05 |    |
 2020-09-08 00:00:00-05 |    |
 2020-09-09 00:00:00-05 |    |
 2020-09-10 00:00:00-05 |    |
 2020-09-11 00:00:00-05 |    |
 2020-09-12 00:00:00-05 |    |
 2020-09-13 00:00:00-05 |    |
 2020-09-14 00:00:00-05 |    |
 2020-09-15 00:00:00-05 |    |

--- counting

select entry, count(*)
from (
select *  
from generate_series('2020-08-20'::date, '2020-09-15'::date, '1 day'::interval) as calendar(entry)
left join (values      (1,'2020-08-28 09:44:11'::timestamp),
      (2,'2020-08-28 10:44:11'::timestamp),
      (3,'2020-08-29 11:44:11'::timestamp),
      (4,'2020-09-01 02:44:11'::timestamp),
      (5,'2020-09-02 03:44:11'::timestamp),
      (6,'2020-09-02 04:44:11'::timestamp),
      (7,'2020-09-03 05:44:11'::timestamp),
      (8,'2020-09-04 06:44:11'::timestamp),
      (10,'2020-09-04 07:44:11'::timestamp)) as call_records(id,timestamp)
on calendar.entry = call_records.timestamp::date
) as u
group by entry

         entry          | count
------------------------+-------
 2020-08-20 00:00:00-05 |     1
 2020-08-21 00:00:00-05 |     1
 2020-08-22 00:00:00-05 |     1
 2020-08-23 00:00:00-05 |     1
 2020-08-24 00:00:00-05 |     1
 2020-08-25 00:00:00-05 |     1
 2020-08-26 00:00:00-05 |     1
 2020-08-27 00:00:00-05 |     1
 2020-08-28 00:00:00-05 |     2
 2020-08-29 00:00:00-05 |     1
 2020-08-30 00:00:00-05 |     1
 2020-08-31 00:00:00-05 |     1
 2020-09-01 00:00:00-05 |     1
 2020-09-02 00:00:00-05 |     2
 2020-09-03 00:00:00-05 |     1
 2020-09-04 00:00:00-05 |     2
 2020-09-05 00:00:00-05 |     1
 2020-09-06 00:00:00-05 |     1
 2020-09-07 00:00:00-05 |     1
 2020-09-08 00:00:00-05 |     1
 2020-09-09 00:00:00-05 |     1
 2020-09-10 00:00:00-05 |     1
 2020-09-11 00:00:00-05 |     1
 2020-09-12 00:00:00-05 |     1
 2020-09-13 00:00:00-05 |     1
 2020-09-14 00:00:00-05 |     1
 2020-09-15 00:00:00-05 |     1

  



El mar, 2 de feb. de 2021 a la(s) 03:31, Laurenz Albe (laurenz.albe@cybertec.at) escribió:
On Mon, 2021-02-01 at 19:14 -0600, Matt Zagrabelny wrote:
> > > What is count(*) counting then? I thought it was rows.
> >
> > Yeah, but count(id) only counts rows where id isn't null.
>
> I guess I'm still not understanding it...
>
> I don't have any rows where id is null:

Then the *result* of count(*) and count(id) will be the same.

The asterisk in count(*) is misleading.  Different from any other
programming language that I know, the SQL standard has decided that
you cannot have an aggregate function without arguments.  You have
to use the asterisk in that case.

So count(*) really is count(), that is, it counts one for every
row that it finds, no matter what the row contains.

But count(id) includes a check: if "id IS NULL", it is not counted.

If that condition is satisfied for all "id"s, you end up with
the same count.  But count(id) is more expensive, because it
will perform this unnecessary NULLness check for each row.

In short: use count(*) if you want to count rows, and use
count(x) if you want to count all rows where x IS NOT NULL.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com





--
Cordialmente,

Ing. Hellmuth I. Vargas S.

В списке pgsql-general по дате отправления:

Предыдущее
От: Tim Cross
Дата:
Сообщение: Re: ransomware
Следующее
От: Joao Miguel Ferreira
Дата:
Сообщение: permission denied to create and drop user