Обсуждение: select count...

Поиск
Список
Период
Сортировка

select count...

От
"P. Dwayne Miller"
Дата:
What's the fastest way to select the number of rows in a table?  If I
use count(*) with no whereclause, it uses a seq_scan and takes 4 secs
(122k rows).  With a where clause, it uses an index and returns in < 1
sec.  Selecting count(requestnumber), which is an indexed column, with
no where clause again takes 4 secs.  This latter version, I thought,
would use the index.  The values of requestnumber are very distributed.

Thanks



Re: select count...

От
Doug McNaught
Дата:
"P. Dwayne Miller" <dmiller@espgroup.net> writes:

> What's the fastest way to select the number of rows in a table?  If I
> use count(*) with no whereclause, it uses a seq_scan and takes 4 secs
> (122k rows).  With a where clause, it uses an index and returns in < 1
> sec.  Selecting count(requestnumber), which is an indexed column, with
> no where clause again takes 4 secs.  This latter version, I thought,
> would use the index.  The values of requestnumber are very distributed.

Exactly how would you expect to get a count of all the rows in the
table (no WHERE clause) without a sequential scan?  I don't see any
problem with the above results.

The only case in which COUNT(requestnumber) might use the index would
be if there were a significant number of NULLs in that column, but you 
don't give any information on that.

-Doug
-- 
The rain man gave me two cures; he said jump right in,
The first was Texas medicine--the second was just railroad gin,
And like a fool I mixed them, and it strangled up my mind,
Now people just get uglier, and I got no sense of time...          --Dylan


Re: select count...

От
"P. Dwayne Miller"
Дата:
I think 4 seconds is way too long to return the results.  And NULLs in a
column should not change the answer.  It seems logical that even a sequential
scan of an index would be much faster than a scan of the table (in this case
the record size is fairly large).

I'm trying to optimize queries that are being ported from another DBMS, where
the same query above returns in 10s of milliseconds.  4 secs is simply too
long.  So I'm looking for a way to do it faster.

MS SQL Server docs have optimization hints for such a query and using the
'count(requestnumber)' syntax, where requestnumber is an indexed field, was
suggested.

It's me and Postgres against another developer and MS SQL Server to see who
gets the port done fastest, with the best performance after the port.  I don't
want to lose!

D

Doug McNaught wrote:

> "P. Dwayne Miller" <dmiller@espgroup.net> writes:
>
> > What's the fastest way to select the number of rows in a table?  If I
> > use count(*) with no whereclause, it uses a seq_scan and takes 4 secs
> > (122k rows).  With a where clause, it uses an index and returns in < 1
> > sec.  Selecting count(requestnumber), which is an indexed column, with
> > no where clause again takes 4 secs.  This latter version, I thought,
> > would use the index.  The values of requestnumber are very distributed.
>
> Exactly how would you expect to get a count of all the rows in the
> table (no WHERE clause) without a sequential scan?  I don't see any
> problem with the above results.
>
> The only case in which COUNT(requestnumber) might use the index would
> be if there were a significant number of NULLs in that column, but you
> don't give any information on that.
>
> -Doug
> --
> The rain man gave me two cures; he said jump right in,
> The first was Texas medicine--the second was just railroad gin,
> And like a fool I mixed them, and it strangled up my mind,
> Now people just get uglier, and I got no sense of time...          --Dylan
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster



Re: Re: select count...

От
Hannu Krosing
Дата:
"P. Dwayne Miller" wrote:
> 
> I think 4 seconds is way too long to return the results.  And NULLs in a
> column should not change the answer.  It seems logical that even a sequential
> scan of an index would be much faster than a scan of the table (in this case
> the record size is fairly large).
> 
> I'm trying to optimize queries that are being ported from another DBMS, where
> the same query above returns in 10s of milliseconds.  4 secs is simply too
> long.  So I'm looking for a way to do it faster.
> 
> MS SQL Server docs have optimization hints for such a query and using the
> 'count(requestnumber)' syntax, where requestnumber is an indexed field, was
> suggested.

Could you possibly mean "select(distinct requestnumber)" ?

If the performance of count(xxx) is critical for your app, I suggest
keeping the 
counts in a separate table with a trigger. Postgres can not optimise to
use 
indexes _only_ , as indexes don't keep commit information - it must be
checked 
from data heap.

---------------
Hannu


Re: Re: select count...

От
Tom Lane
Дата:
"P. Dwayne Miller" <dmiller@espgroup.net> writes:
> I think 4 seconds is way too long to return the results.  And NULLs in a
> column should not change the answer.

If you're doing count(foo) then NULLs in column foo definitely *should*
change the answer.  count(foo) does not count nulls.

It seemed to me that your original question was comparing apples and
oranges.  count(*) with no where clause will count all the rows in
the table, sure enough, but if you add a where clause then it's not
counting all the rows anymore, so why shouldn't that take less time?

But possibly the answer you need is just that Postgres does not maintain
an accurate count of the rows in a table, so it has to scan the table
to compute count(*).  Some other DBMSes do maintain such a count and so
they can return count(*) essentially instantaneously.  But they pay for
that speed with a distributed slowdown in all updates of the table.  If
you have a database application that's designed around the assumption
that count(*) is free, you'll probably need to rethink that assumption
to get good performance with Postgres.
        regards, tom lane