Обсуждение: select count...
What's the fastest way to select the number of rows in a table? If I use count(*) with no whereclause, it uses a seq_scan and takes 4 secs (122k rows). With a where clause, it uses an index and returns in < 1 sec. Selecting count(requestnumber), which is an indexed column, with no where clause again takes 4 secs. This latter version, I thought, would use the index. The values of requestnumber are very distributed. Thanks
"P. Dwayne Miller" <dmiller@espgroup.net> writes: > What's the fastest way to select the number of rows in a table? If I > use count(*) with no whereclause, it uses a seq_scan and takes 4 secs > (122k rows). With a where clause, it uses an index and returns in < 1 > sec. Selecting count(requestnumber), which is an indexed column, with > no where clause again takes 4 secs. This latter version, I thought, > would use the index. The values of requestnumber are very distributed. Exactly how would you expect to get a count of all the rows in the table (no WHERE clause) without a sequential scan? I don't see any problem with the above results. The only case in which COUNT(requestnumber) might use the index would be if there were a significant number of NULLs in that column, but you don't give any information on that. -Doug -- The rain man gave me two cures; he said jump right in, The first was Texas medicine--the second was just railroad gin, And like a fool I mixed them, and it strangled up my mind, Now people just get uglier, and I got no sense of time... --Dylan
I think 4 seconds is way too long to return the results. And NULLs in a column should not change the answer. It seems logical that even a sequential scan of an index would be much faster than a scan of the table (in this case the record size is fairly large). I'm trying to optimize queries that are being ported from another DBMS, where the same query above returns in 10s of milliseconds. 4 secs is simply too long. So I'm looking for a way to do it faster. MS SQL Server docs have optimization hints for such a query and using the 'count(requestnumber)' syntax, where requestnumber is an indexed field, was suggested. It's me and Postgres against another developer and MS SQL Server to see who gets the port done fastest, with the best performance after the port. I don't want to lose! D Doug McNaught wrote: > "P. Dwayne Miller" <dmiller@espgroup.net> writes: > > > What's the fastest way to select the number of rows in a table? If I > > use count(*) with no whereclause, it uses a seq_scan and takes 4 secs > > (122k rows). With a where clause, it uses an index and returns in < 1 > > sec. Selecting count(requestnumber), which is an indexed column, with > > no where clause again takes 4 secs. This latter version, I thought, > > would use the index. The values of requestnumber are very distributed. > > Exactly how would you expect to get a count of all the rows in the > table (no WHERE clause) without a sequential scan? I don't see any > problem with the above results. > > The only case in which COUNT(requestnumber) might use the index would > be if there were a significant number of NULLs in that column, but you > don't give any information on that. > > -Doug > -- > The rain man gave me two cures; he said jump right in, > The first was Texas medicine--the second was just railroad gin, > And like a fool I mixed them, and it strangled up my mind, > Now people just get uglier, and I got no sense of time... --Dylan > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster
"P. Dwayne Miller" wrote: > > I think 4 seconds is way too long to return the results. And NULLs in a > column should not change the answer. It seems logical that even a sequential > scan of an index would be much faster than a scan of the table (in this case > the record size is fairly large). > > I'm trying to optimize queries that are being ported from another DBMS, where > the same query above returns in 10s of milliseconds. 4 secs is simply too > long. So I'm looking for a way to do it faster. > > MS SQL Server docs have optimization hints for such a query and using the > 'count(requestnumber)' syntax, where requestnumber is an indexed field, was > suggested. Could you possibly mean "select(distinct requestnumber)" ? If the performance of count(xxx) is critical for your app, I suggest keeping the counts in a separate table with a trigger. Postgres can not optimise to use indexes _only_ , as indexes don't keep commit information - it must be checked from data heap. --------------- Hannu
"P. Dwayne Miller" <dmiller@espgroup.net> writes: > I think 4 seconds is way too long to return the results. And NULLs in a > column should not change the answer. If you're doing count(foo) then NULLs in column foo definitely *should* change the answer. count(foo) does not count nulls. It seemed to me that your original question was comparing apples and oranges. count(*) with no where clause will count all the rows in the table, sure enough, but if you add a where clause then it's not counting all the rows anymore, so why shouldn't that take less time? But possibly the answer you need is just that Postgres does not maintain an accurate count of the rows in a table, so it has to scan the table to compute count(*). Some other DBMSes do maintain such a count and so they can return count(*) essentially instantaneously. But they pay for that speed with a distributed slowdown in all updates of the table. If you have a database application that's designed around the assumption that count(*) is free, you'll probably need to rethink that assumption to get good performance with Postgres. regards, tom lane